SRE engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Jobs via Dice ยท 10 hours ago

SRE engineer

Dice is the leading career destination for tech experts, and they are seeking an SRE Engineer for SLK America Inc. to drive the reliability and performance of critical platforms while collaborating with engineering and operations teams to enhance the software delivery lifecycle.

Computer Software

Responsibilities

Drive the reliability, availability, and performance of mission-critical customer-facing and internal platforms
Design, build, and maintain highly available, scalable, and secure infrastructure in a regulated financial services environment
Partner with application engineering, architecture, and operations teams to embed reliability and observability into the full software delivery lifecycle
Implement automation to reduce toil, including infrastructure provisioning, deployments, monitoring, and incident response
Define, measure, and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for key services
Lead or participate in incident response, root-cause analysis, and post-incident reviews, with a strong focus on prevention and continuous improvement
Champion best practices in reliability, performance engineering, capacity planning, and change management
Collaborate closely with security, risk, and compliance teams to ensure infrastructure and services adhere to regulatory and internal control requirements
Design and implement cloud and on-prem infrastructure leveraging containers and orchestration platforms (e.g., Kubernetes/OpenShift)
Use infrastructure-as-code tools (e.g., Terraform, CloudFormation) for repeatable, auditable environment provisioning
Build and maintain CI/CD pipelines for automated, reliable application and infrastructure deployments
Implement comprehensive observability (logging, metrics, tracing, dashboards, and alerting) using industry-standard tools
Optimize systems for cost, performance, resilience, and operational simplicity

Qualification

Site Reliability EngineeringInfrastructure as CodeCloud PlatformsContainer OrchestrationCI/CD PipelinesScripting LanguagesMonitoring SolutionsNetworkingLinux SystemsCommunication SkillsCollaboration Skills

Required

MIN 7+ years exp
Drive the reliability, availability, and performance of mission-critical customer-facing and internal platforms
Design, build, and maintain highly available, scalable, and secure infrastructure in a regulated financial services environment
Partner with application engineering, architecture, and operations teams to embed reliability and observability into the full software delivery lifecycle
Implement automation to reduce toil, including infrastructure provisioning, deployments, monitoring, and incident response
Define, measure, and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for key services
Lead or participate in incident response, root-cause analysis, and post-incident reviews, with a strong focus on prevention and continuous improvement
Champion best practices in reliability, performance engineering, capacity planning, and change management
Collaborate closely with security, risk, and compliance teams to ensure infrastructure and services adhere to regulatory and internal control requirements
Design and implement cloud and on-prem infrastructure leveraging containers and orchestration platforms (e.g., Kubernetes/OpenShift)
Use infrastructure-as-code tools (e.g., Terraform, CloudFormation) for repeatable, auditable environment provisioning
Build and maintain CI/CD pipelines for automated, reliable application and infrastructure deployments
Implement comprehensive observability (logging, metrics, tracing, dashboards, and alerting) using industry-standard tools
Optimize systems for cost, performance, resilience, and operational simplicity
5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related roles (adjust based on level)
Strong experience with at least one major cloud platform and container orchestration technologies
Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash) for automation and tooling
Hands-on experience with infrastructure-as-code, configuration management, and CI/CD tooling
Solid understanding of networking, Linux systems, and distributed systems fundamentals
Experience implementing and operating monitoring, alerting, and logging solutions at scale
Excellent communication and collaboration skills with a track record of partnering across engineering, product, and operations

Preferred

Prior experience in a regulated industry (e.g., financial services) or with strong risk and security controls is a plus

Benefits

Health
Retirement
Paid time off

Company

Jobs via Dice

twitter
company-logo
Welcome to Jobs via Dice, the go-to destination for discovering the tech jobs you want.

Funding

Current Stage
Early Stage
Company data provided by crunchbase