Associate Principal, Site Reliability Engineering jobs in United States
cer-icon
Apply on Employer Site
company-logo

The Options Clearing Corporation (OCC) · 6 hours ago

Associate Principal, Site Reliability Engineering

The Options Clearing Corporation (OCC) is the world's largest equity derivatives clearing organization, dedicated to promoting stability and market integrity. They are seeking an Associate Principal in Site Reliability Engineering to enhance system reliability and developer productivity through automation and to provide guidance in cloud technologies and application monitoring.

FinanceFinancial ServicesProperty Management
badNo H1Bnote

Responsibilities

Collaborate with development, operations and infrastructure teams to ensure availability of services, and to work through implementation issues
Develop automation for incident response and to prevent problem recurrence
Create and enhance runbooks to respond to service outages or degradations
Assess the production readiness of services
Define and track operational metrics for production performance, reliability, scalability and availability
Architect, develop and maintain shared services and tools to improve reliability and reduce toil across the organization
Contribute to the team’s continuous improvement through research, retrospectives, discussion groups and code reviews
Influences timelines and expectations amongst the team
Provide knowledge by guiding and mentoring junior members, and preparing stories for the sprint backlog

Qualification

Large-scale distributed systemsPublic cloud environmentsAIOpsPredictive analysisProgramming/scripting languagesContainer orchestration systemsAgile / Scrum methodologyContinuous IntegrationDeliveryChaos Engineering toolsAnalytical problem-solvingDocumentation skillsSelf-starterTeam playerCommunication skills

Required

Experience with maintaining and troubleshooting large-scale distributed systems
Experience with Agile / Scrum methodology
Able to succeed in fast-paced environment with frequent changes
Comfortable communicating with both technical and non-technical audiences
Strong documentation skills
Analytical problem-solving approach
Self-starter – takes the initiative to research, learn and deliver. Anticipates the play
Team player – humble, collaborative, and focused on making sure the entire team succeeds
Experience managing infrastructure in public cloud environments like AWS (preferred), Azure or GCP
Experience with AIOps and predictive analysis for anomaly detection, forecasting system capacity using monitoring and alerting tools like Splunk, AppDynamics, Datadog, StackDriver, Sysdig, Prometheus or Grafana
Programming/scripting experience in languages like Java, Bash, Python or Go
Experience with distributed messaging systems like Kafka, RabbitMQ, or ActiveMQ
Experience with container orchestration systems like Kubernetes, Mesos, Docker Swarm or Rancher
Experience with using Continuous Integration and Continuous Delivery (CI/CD) tools like Jenkins, Travis, Harness, Appveyor, CodeBuild or CodePipeline
Familiarity with leveraging large language models (LLMs) to automate and optimize SRE workflows. This may include using AI-powered tools to perform tasks such as, writing scripts, summarizing incident reports, or even creating and maintaining AI workloads
Familiarity with leveraging large language models (LLMs) to automate and optimize SRE workflows. This may include using AI-powered tools to perform tasks such as, writing scripts, summarizing incident reports, data analysis or even creating and maintaining AI workloads
Basic exposure to Chaos Engineering tools like, Gremlin, Chaos Monkey, Harness Chaos Engineering, or cloud-native fault injection services like AWS FIS
Bachelor's or Master's Degrees in Computer Science, Information Systems or other related field, or equivalent work experience
Minimum of 4+ years of experience in Site Reliability Engineering / DevOps

Benefits

A hybrid work environment, up to 2 days per week of remote work
Tuition Reimbursement to support your continued education
Student Loan Repayment Assistance
Technology Stipend allowing you to use the device of your choice to connect to our network while working remotely
Generous PTO and Parental leave
401k Employer Match
Competitive health benefits including medical, dental and vision

Company

The Options Clearing Corporation (OCC)

company-logo
The Options Clearing Corporation (OCC) is the world's largest equity derivatives clearing organization.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
David Hoag
Chief Information Officer
linkedin
leader-logo
Kristen Baldwin
Chief Information Officer
Company data provided by crunchbase