Strive Gaming · 16 hours ago
DevOps / Infrastructure Engineer
Strive Gaming is seeking a hands-on DevOps / Infrastructure Engineer who is passionate about observability. The role involves designing, building, and maintaining the observability platform, troubleshooting incidents, and collaborating with teams to improve system reliability and performance.
Information Services
Responsibilities
Design, build, and maintain our observability platform—metrics, logs, traces, and everything in between
Get hands-on with infrastructure: deploy services, troubleshoot incidents, and fix things when they break (because they will)
Instrument applications and services to capture meaningful telemetry data that drives real insights
Build dashboards and alerting systems that teams actually use—not just noise generators
Dive into production issues, correlate data across systems, and lead root cause analysis
Champion observability best practices across engineering teams and help developers instrument their own code
Automate everything you can: infrastructure provisioning, deployment pipelines, and operational runbooks
Work closely with SRE and development teams to improve system reliability and performance
Evaluate and integrate new observability tools and technologies as the landscape evolves
Qualification
Required
3+ years of experience in DevOps, Infrastructure, or SRE roles—with real production battle scars
Deep hands-on experience with observability tools: Prometheus, Grafana, Datadog, New Relic, Splunk, ELK stack, Jaeger, or similar
Strong proficiency with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
Solid scripting and automation skills (Python, Bash, Go, or similar)
Experience with containerisation and orchestration (Docker, Kubernetes)
Understanding of distributed systems, microservices architectures, and the unique observability challenges they present
Familiarity with CI/CD pipelines and GitOps workflows
Excellent troubleshooting skills—you're the person who doesn't give up until you've found the root cause
Preferred
Experience with OpenTelemetry and vendor-agnostic instrumentation strategies
Background in building custom exporters, collectors, or integrations
Familiarity with chaos engineering and resilience testing practices
Experience with FinOps and cloud cost optimisation
Contributions to open-source observability projects
Benefits
Competitive salary and equity package
Flexible working arrangements
Learning and development budget
Modern tech stack and the autonomy to make real impact
A team that values doing things properly over just doing things quickly