Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Hard Rock Digital · 11 hours ago

Senior Site Reliability Engineer

Hard Rock Digital is focused on becoming the best online sportsbook, casino, and social gaming company in the world. They are looking for a skilled Senior Site Reliability Engineer (SRE) to maintain and improve the reliability, scalability, and performance of their Java-based application, managing and monitoring applications and infrastructure using the Grafana stack.

Computer Software
check
H1B Sponsor Likelynote

Responsibilities

Ensure the availability, reliability, and performance of a high-traffic Java-based application in a distributed environment
Troubleshoot and resolve complex issues in production and non-production environments
Participate in both pre- and post-deployment performance testing and monitoring efforts to improve application performance
Optimize Java application performance, ensuring efficient resource utilization and scaling
Deploy and manage the Grafana stack (Grafana, Prometheus, Loki) to provide real-time monitoring, logging, and alerting
Implement and refine observability strategies to enhance application and infrastructure visibility
Create and maintain dashboards, alerts, and logs for comprehensive monitoring of system health and performance
Support the operations team’s incident response efforts, conduct post-mortems, and identify root causes of issues to prevent recurrence
Document and share lessons learned from incidents, contributing to a culture of continuous improvement
Work closely with developers, architects, and other engineers to design and implement solutions that improve application reliability
Collaborate closely with DevOps and NOC teams to support the application platform
Communicate SRE practices and principles to technical and non-technical stakeholders
Provide feedback and insights on application performance, potential improvements, and observability metrics

Qualification

Site Reliability EngineeringKubernetes managementGrafana stack expertiseJava application optimizationCloud Platform expertiseInfrastructure as CodeScripting abilitiesCI/CD pipelinesIncident responseTroubleshooting skillsMentoringCommunication skills

Required

Degree in computer science or a related field, or equivalent work experience
5+ years in SRE, DevOps, or similar Infrastructure roles
Experience managing large-scale, high-availability production systems
Track record of incident response and post-mortem processes
Experience with capacity planning and performance optimization
3+ years hands-on experience managing production Kubernetes clusters
Deep understanding of k8s architecture, networking, storage, and security
Experience with cluster scaling (Karpenter), upgrades, and multi-cluster management
Proficiency with kubectl, Helm, and Kubernetes operators
Container orchestration and troubleshooting expertise
Advanced expertise with the Grafana stack for dashboards, alerting, and visualization
Hands-on experience with Grafana Alloy for telemetry data collection
Proficiency in PromQL
Experience with Loki for log aggregation and analysis
Experience building comprehensive monitoring and alerting strategies
Hands-on experience managing Java-based applications in large-scale, distributed environments, with a focus on JVM tuning and application optimization
Cloud Platform expertise (AWS, GCP, or Azure)
Familiarity with infrastructure as code (IAC) tools like Terraform/Terragrunt or Ansible
ArgoCD proficiency for GitOps workflows and continuous deployment
Strong scripting abilities in Bash, Python, or Go
Experience with CI/CD pipelines and automation tools
Configuration Management and deployment automation
Strong troubleshooting skills, with a proactive approach to diagnosing and resolving performance bottlenecks
Proven experience managing on-call rotations, incident response, and root cause analysis
Ability to mentor junior team members
Strong communication skills (both written and verbal), positive attitude, and ability to receive constructive feedback

Benefits

Competitive pay and benefits
Flexible vacation allowance
A hybrid / remote working environment
Startup culture backed by a secure, global brand

Company

Hard Rock Digital

twittertwitter
company-logo
Hard Rock Digital is building the future of online sports betting and interactive gaming.

H1B Sponsorship

Hard Rock Digital has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2024 (4)
2022 (5)
2021 (1)

Funding

Current Stage
Late Stage

Leadership Team

M
Marlon Goldstein
Executive Managing Director & CEO
linkedin
leader-logo
Earl Mitchell
SVP Predictive Analytics
linkedin
Company data provided by crunchbase