Apply on Employer Site

TechLine Consulting · 2 hours ago

GCP Site Reliability Engineer

United States

Full-time

Remote

Senior Level

$160K/yr - $200K/yr

TechLine Consulting is seeking a Senior Site Reliability Engineer with deep experience operating production systems on Google Cloud Platform (GCP). The role involves owning CI/CD pipelines, operating and scaling AI infrastructure, and building internal platforms to improve developer velocity and production safety.

Hiring Manager

Meagan Palermo

Responsibilities

Own the reliability, uptime, and scalability of production services running on GCP by designing and implementing self-healing, fault-tolerant architectures using managed cloud services where appropriate

Architect and operate GCP-centric CI/CD pipelines, ensuring safe, repeatable, and reversible deployments across environments using GitLab CI and infrastructure-as-code

Build and maintain GCP infrastructure using Terraform and CloudFormation, with a focus on consistency, security, cost efficiency, and long-term maintainability

Partner closely with application engineers to improve the developer experience on GCP, making the preferred deployment paths, tooling, and patterns the easiest and most reliable options

Design and manage specialized GCP pipelines supporting AI workloads and human-in-the-loop systems, including compute orchestration, data pipelines, database operations, performance tuning, and compliance considerations

Lead production incident response for GCP-hosted systems, conducting post-mortems that emphasize root cause analysis and code-driven prevention over manual process changes

Implement comprehensive observability across GCP services by building metrics, logging, dashboards, and alerting using tools such as Prometheus, Grafana, and Datadog

Influence GCP architecture decisions, balancing reliability, performance, cost, and developer productivity as the platform scales

Qualification

Google Cloud Platform (GCP)CI/CDInfrastructure as CodePythonTerraformObservabilityJavaGitLab CIPrometheusGrafanaDatadog

Required

Deep experience operating production systems on Google Cloud Platform (GCP)

Designed and operated high-availability, cloud-native systems on GCP

Increased deployment frequency through automation

Proactively reduced operational risk through strong observability and infrastructure-as-code practices

Own the reliability, uptime, and scalability of production services running on GCP

Design and implement self-healing, fault-tolerant architectures using managed cloud services

Architect and operate GCP-centric CI/CD pipelines

Ensure safe, repeatable, and reversible deployments across environments using GitLab CI and infrastructure-as-code

Build and maintain GCP infrastructure using Terraform and CloudFormation

Focus on consistency, security, cost efficiency, and long-term maintainability

Partner closely with application engineers to improve the developer experience on GCP

Design and manage specialized GCP pipelines supporting AI workloads and human-in-the-loop systems

Lead production incident response for GCP-hosted systems

Conduct post-mortems that emphasize root cause analysis and code-driven prevention

Implement comprehensive observability across GCP services

Build metrics, logging, dashboards, and alerting using tools such as Prometheus, Grafana, and Datadog

Influence GCP architecture decisions, balancing reliability, performance, cost, and developer productivity

Preferred

Experience with Python and Java

Familiarity with Prometheus, Grafana, and Datadog

Company

TechLine Consulting

At TechLine Consulting, people are our purpose and precision is our practice for Technology and Engineering.

Founded in 2025

Pembroke Pines, Florida, US

2-10 employees

https://www.techline-consulting.com

Funding

Current Stage

Early Stage

Company data provided by crunchbase