Toyota Financial Services Corporation · 3 hours ago
Site Reliability Engineer, Lead - Consumer Lending Domain
Toyota Financial Services Corporation is seeking a Lead Site Reliability Engineer to join their new SRE team for application domains. The role focuses on ensuring the reliability, performance, and availability of applications by collaborating with various teams and implementing automation and observability improvements.
Financial Services
Responsibilities
Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment
Working with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team culture and support professional development
Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform
Build automation and operational runbooks primarily using Python scripting
Manage container orchestration platforms and related cloud-native services
Drive reliability improvements through Service Level Objectives (SLOs), error budgets and Service Level Agreements (SLAs) aligned with business goals
Design & implement observability improvements using Dynatrace & CloudWatch
Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence
Conduct blameless post-incident reviews and drive continuous improvement
Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals
Participate in architectural reviews, providing input on reliability and scalability
Mentor, guide & provide technical direction to colleagues & SREs on the team, including design decisions & tradeoffs
Qualification
Required
Experience with DevOps tools like GitHub, Harness & Dynatrace
Experience building self-healing systems and automated remediation workflows
10+ years of experience in Site Reliability Engineering, DevOps, or related field
Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance
Strong experience with Terraform for AWS IaC
Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack)
Deep knowledge of container orchestration (Kubernetes/EKS)
Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes)
Effective communication skills, with the ability to convey complex technical concepts to diverse audiences
Preferred
AWS certifications (DevOps Engineer, Solutions Architect, etc.)
Familiarity with GitOps, secrets management, and infrastructure monitoring best practices
Experience building self-healing systems and automated remediation workflows
Benefits
A work environment built on teamwork, flexibility, and respect.
Professional growth and development programs to help advance your career, as well as tuition reimbursement.
Team Member Vehicle Purchase Discount
Toyota Team Member Lease Vehicle Program (if applicable)
Comprehensive health care and wellness plans for your entire family.
Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute.
Paid holidays and paid time off.
Referral services related to prenatal services, adoption, childcare, schools, and more.
Flexible spending accounts.
Relocation assistance (if applicable).
Company
Toyota Financial Services Corporation
Toyota Financial Services Corporation is made up of affiliates in more than 35 countries/locations.