Prometheus Group · 2 hours ago
Site Reliability Engineer
Prometheus Group is a leading global provider of comprehensive enterprise asset management software solutions. The Site Reliability Engineer is responsible for ensuring the availability and performance of hosted customer sites, managing infrastructure, and improving operational efficiency through automation and proactive problem-solving.
Responsibilities
Work as a part of a response team to resolve reported issues
Pro-actively identify problems and/or gaps in the deployed applications and infrastructure and develop disruption preventive measures
Continue to develop and deliver tools to continuously enhance monitoring capabilities
Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
Identify ways to resolve common issues by developing and deploying automation to respond to common human interactions
Work closely with development and DevOps teams to ensure that platforms are designed with "operability" and “observability” in mind
Function well in a fast-paced, rapidly changing environment
Participate actively in managing the Kubernetes cluster lifecycle
Qualification
Required
Bachelor's in computer science, IT, software engineering, or related field
3+ years of working experience as a software developer, AWS cloud engineer, or AWS infrastructure engineer
3+ years of hands-on experience with managing Kubernetes clusters and docker containers
3+ years of hand-on experience managing and troubleshooting Linux servers
2+ years of automation experience in Terraform, Python, or Ansible
Over two years of experience configuring AWS PostgreSQL, MS SQL, and Oracle RDS databases
2+ years of experience of configuring and managing Azure SQL
Strong critical thinking skills
Strong troubleshooting experience involving Kubernetes cluster, dockers containers, and Linux
Demonstrable experience working with Remote Monitoring and Logging tools, including but not limited to Dynatrace, Grafana, and Pingdom
Preferred
Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Developers, IT Operations, and Engineers
Ability to work well in high pressure situations
Strong inter-team and intra-team collaboration experience
Knowledge of data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, and related topics
Kubernetes Certified Administrator or related certification a plus
Benefits
Employee base HSA plan, dental, life and short-term disability coverage 100% paid for by Prometheus Group
HSA & FSA plan options
Retirement Savings with Generous Company Match & Immediate Vesting
Gym membership to O2 Fitness
Casual dress attire
Half-Day Fridays
Generous Paid Time Off
Company Outings, Trips & Activities
Paid Parental Leave
Company
Prometheus Group
Prometheus Platform is a suite of integrated solutions deployed on mobile/desktop that natively extend and enhance your EAM, ERP, or CMMS.
H1B Sponsorship
Prometheus Group has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (4)
2023 (2)
2022 (1)
Funding
Current Stage
Late StageTotal Funding
unknownKey Investors
Advent InternationalTA Associates
2024-06-05Private Equity
2019-05-30Acquired
2013-07-15Private Equity
Recent News
2026-01-22
Company data provided by crunchbase