Senior Principal Site Reliability Engineer | Oracle Health Federal Operations Team jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oracle · 1 month ago

Senior Principal Site Reliability Engineer | Oracle Health Federal Operations Team

Oracle is a technology leader that’s changing how the world does business, and they are seeking a Senior Principal Site Reliability Engineer to join their Oracle Health Federal Operations Team. This role involves defining and deploying key services focused on architecture, production operations, and performance management while ensuring reliability and performance across multiple cross-functional teams.

Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware
badNo H1BnoteSecurity Clearance Requirednote

Responsibilities

Own the full service lifecycle: design, implementation, deployment, on-call, and continuous improvement—maintaining high code and reliability standards
Define and meet service-level objectives (availability, latency, durability) while reducing toil through automation, observability, and self-healing mechanisms
Lead architecture, analysis, design, implementation, and production operations for Core System Framework solutions, with strong documentation and runbooks
Create and maintain clear, version-controlled documentation—architectural diagrams, SOPs, runbooks, and incident playbooks—to ensure repeatable operations, auditability, and fast onboarding
Design, write, and deploy software that improves the availability, scalability, and efficiency of platform services
Develop designs, architectures, standards, and methods for large-scale distributed systems
Build automation to prevent problem recurrence; drive real-time monitoring, alerting, and self-healing into production systems
Conduct capacity planning and demand forecasting; perform software performance analysis, system tuning, and optimization
Contribute to and support platform services across architecture, provisioning, configuration, deployment, and ongoing operations
Partner with distributed teams to prototype and launch new platform services
Stay current on emerging technologies and introduce innovations that improve reliability, security, and developer productivity
Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence
Set and raise engineering standards across multiple teams; model best practices in reliability, security, and automation
Collaborate closely with storage, networking, observability, and security teams to deliver platform features and secure-by-default designs
Participate in an on-call rotation; lead incident response, postmortems, and follow-through on corrective actions to drive continuous improvement

Qualification

Site Reliability EngineeringDevOpsDistributed SystemsAutomationPerformance ManagementIncident ResponseMentoringCollaborationDocumentation

Required

Experience in defining and deploying key services with deep focus on architecture, production operations, capacity planning, performance management, deployment, and release engineering
Ability to own the full service lifecycle: design, implementation, deployment, on-call, and continuous improvement
Experience in defining and meeting service-level objectives (availability, latency, durability) while reducing toil through automation, observability, and self-healing mechanisms
Strong documentation skills including creating and maintaining clear, version-controlled documentation—architectural diagrams, SOPs, runbooks, and incident playbooks
Experience in designing, writing, and deploying software that improves the availability, scalability, and efficiency of platform services
Ability to develop designs, architectures, standards, and methods for large-scale distributed systems
Experience in building automation to prevent problem recurrence; driving real-time monitoring, alerting, and self-healing into production systems
Conducting capacity planning and demand forecasting; performing software performance analysis, system tuning, and optimization
Experience in contributing to and supporting platform services across architecture, provisioning, configuration, deployment, and ongoing operations
Ability to partner with distributed teams to prototype and launch new platform services
Staying current on emerging technologies and introducing innovations that improve reliability, security, and developer productivity
Mentoring and guiding engineers in distributed systems design, high-scale data processing, and operational excellence
Setting and raising engineering standards across multiple teams; modeling best practices in reliability, security, and automation
Collaborating closely with storage, networking, observability, and security teams to deliver platform features and secure-by-default designs
Participating in an on-call rotation; leading incident response, postmortems, and follow-through on corrective actions to drive continuous improvement

Company

Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.

Funding

Current Stage
Public Company
Total Funding
$25.75B
Key Investors
Sequoia Capital
2025-09-24Post Ipo Debt· $18B
2025-02-03Post Ipo Debt· $7.75B
1986-03-12IPO

Leadership Team

leader-logo
Esteban Rubens
Healthcare Field CTO
linkedin
G
Gerard Warrens
Field CTO, Business Strategy and Transformative Technologies
linkedin
Company data provided by crunchbase