DevOps / Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

TWO95 International, Inc ยท 17 hours ago

DevOps / Site Reliability Engineer

TWO95 International, Inc is seeking a Lead Site Reliability Engineer to oversee the uptime and reliability of their cloud infrastructure and kiosks platform. The role involves leading incident responses, automating infrastructure tasks, and collaborating with various teams to ensure operational readiness.

CRMHuman ResourcesInformation TechnologyStaffing Agency
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Own uptime, SLAs, and overall reliability of cloud infrastructure and kiosks platform
Lead incident response, root-cause analysis, and drive actionable postmortems
Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team
Maintain and improve monitoring, alerting, and observability (Grafana, Prometheus, New Relic, etc)
Manage, operate and recommend improvement of mo
Execute and continuously improve disaster recovery and business continuity plans
Partner with platform engineering, QA, and development teams to ensure operational readiness
Establish and maintain runbooks, operational standards, and reliability best practices
Provide leadership, mentorship, and clear communication during both normal operations and incidents
Optimize cloud and Kubernetes environments for reliability, performance, and scalability

Qualification

Cloud infrastructureKubernetesIncident responseInfrastructure as Code (IaC)Monitoring toolsLeadershipCommunication

Required

Own uptime, SLAs, and overall reliability of cloud infrastructure and kiosks platform
Lead incident response, root-cause analysis, and drive actionable postmortems
Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team
Maintain and improve monitoring, alerting, and observability (Grafana, Prometheus, New Relic, etc)
Manage, operate and recommend improvement of monitoring systems
Execute and continuously improve disaster recovery and business continuity plans
Partner with platform engineering, QA, and development teams to ensure operational readiness
Establish and maintain runbooks, operational standards, and reliability best practices
Provide leadership, mentorship, and clear communication during both normal operations and incidents
Optimize cloud and Kubernetes environments for reliability, performance, and scalability

Company

TWO95 International, Inc

twittertwittertwitter
company-logo
Two95 International is a staffing and recruiting company offering contingent staffing and managed outsourced services.

H1B Sponsorship

TWO95 International, Inc has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (4)
2023 (5)
2022 (3)
2021 (11)
2020 (7)

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Shanker Koladi
Founder
linkedin
Company data provided by crunchbase