Jobs via Dice ยท 10 hours ago
SRE engineer
Dice is the leading career destination for tech experts, and they are seeking an SRE Engineer for SLK America Inc. to drive the reliability and performance of critical platforms while collaborating with engineering and operations teams to enhance the software delivery lifecycle.
Computer Software
Responsibilities
Drive the reliability, availability, and performance of mission-critical customer-facing and internal platforms
Design, build, and maintain highly available, scalable, and secure infrastructure in a regulated financial services environment
Partner with application engineering, architecture, and operations teams to embed reliability and observability into the full software delivery lifecycle
Implement automation to reduce toil, including infrastructure provisioning, deployments, monitoring, and incident response
Define, measure, and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for key services
Lead or participate in incident response, root-cause analysis, and post-incident reviews, with a strong focus on prevention and continuous improvement
Champion best practices in reliability, performance engineering, capacity planning, and change management
Collaborate closely with security, risk, and compliance teams to ensure infrastructure and services adhere to regulatory and internal control requirements
Design and implement cloud and on-prem infrastructure leveraging containers and orchestration platforms (e.g., Kubernetes/OpenShift)
Use infrastructure-as-code tools (e.g., Terraform, CloudFormation) for repeatable, auditable environment provisioning
Build and maintain CI/CD pipelines for automated, reliable application and infrastructure deployments
Implement comprehensive observability (logging, metrics, tracing, dashboards, and alerting) using industry-standard tools
Optimize systems for cost, performance, resilience, and operational simplicity
Qualification
Required
MIN 7+ years exp
Drive the reliability, availability, and performance of mission-critical customer-facing and internal platforms
Design, build, and maintain highly available, scalable, and secure infrastructure in a regulated financial services environment
Partner with application engineering, architecture, and operations teams to embed reliability and observability into the full software delivery lifecycle
Implement automation to reduce toil, including infrastructure provisioning, deployments, monitoring, and incident response
Define, measure, and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for key services
Lead or participate in incident response, root-cause analysis, and post-incident reviews, with a strong focus on prevention and continuous improvement
Champion best practices in reliability, performance engineering, capacity planning, and change management
Collaborate closely with security, risk, and compliance teams to ensure infrastructure and services adhere to regulatory and internal control requirements
Design and implement cloud and on-prem infrastructure leveraging containers and orchestration platforms (e.g., Kubernetes/OpenShift)
Use infrastructure-as-code tools (e.g., Terraform, CloudFormation) for repeatable, auditable environment provisioning
Build and maintain CI/CD pipelines for automated, reliable application and infrastructure deployments
Implement comprehensive observability (logging, metrics, tracing, dashboards, and alerting) using industry-standard tools
Optimize systems for cost, performance, resilience, and operational simplicity
5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related roles (adjust based on level)
Strong experience with at least one major cloud platform and container orchestration technologies
Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash) for automation and tooling
Hands-on experience with infrastructure-as-code, configuration management, and CI/CD tooling
Solid understanding of networking, Linux systems, and distributed systems fundamentals
Experience implementing and operating monitoring, alerting, and logging solutions at scale
Excellent communication and collaboration skills with a track record of partnering across engineering, product, and operations
Preferred
Prior experience in a regulated industry (e.g., financial services) or with strong risk and security controls is a plus
Benefits
Health
Retirement
Paid time off
Company
Jobs via Dice
Welcome to Jobs via Dice, the go-to destination for discovering the tech jobs you want.
Funding
Current Stage
Early StageCompany data provided by crunchbase