Lead Site Reliability Engineer - Applications/Domains jobs in United States
cer-icon
Apply on Employer Site
company-logo

Toyota Financial Services Corporation ยท 2 months ago

Lead Site Reliability Engineer - Applications/Domains

Toyota Financial Services is a key part of Toyota, focused on providing innovative financial solutions. They are seeking a Lead Site Reliability Engineer to ensure the reliability, performance, and availability of applications within various domains, working closely with development teams and managing automation to enhance system efficiency.

Financial Services
badNo H1Bnote

Responsibilities

Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment
Working with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team culture and support professional development
Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform
Build automation and operational runbooks primarily using Python scripting
Manage container orchestration platforms and related cloud-native services
Drive reliability improvements through Service Level Objectives (SLOs), error budgets and Service Level Agreements (SLAs) aligned with business goals
Design & implement observability improvements using Dynatrace & CloudWatch
Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence
Conduct blameless post-incident reviews and drive continuous improvement
Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals
Participate in architectural reviews, providing input on reliability and scalability

Qualification

Site Reliability EngineeringDevOps toolsInfrastructure as CodeContainer orchestrationPythonAWSGitHubTerraformDynatraceEffective communication

Required

Experience with DevOps tools like GitHub, Harness & Dynatrace
Experience building self-healing systems and automated remediation workflows
5+ years of experience in Site Reliability Engineering, DevOps, or related field
Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance
Strong experience with Terraform for AWS IaC
Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack)
Deep knowledge of container orchestration (Kubernetes/EKS)
Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes)
Effective communication skills, with the ability to convey complex technical concepts to diverse audiences

Preferred

AWS certifications (DevOps Engineer, Solutions Architect, etc.)
Familiarity with GitOps, secrets management, and infrastructure monitoring best practices
Experience building self-healing systems and automated remediation workflows

Benefits

A work environment built on teamwork, flexibility, and respect
Professional growth and development programs to help advance your career, as well as tuition reimbursement
Team Member Vehicle Purchase Discount
Toyota Team Member Lease Vehicle Program (if applicable)
Comprehensive health care and wellness plans for your entire family
Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute
Paid holidays and paid time off
Referral services related to prenatal services, adoption, childcare, schools and more
Tax Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
Relocation assistance (if applicable)

Company

Toyota Financial Services Corporation

twitter
company-logo
Toyota Financial Services Corporation is made up of affiliates in more than 35 countries/locations.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Brajesh Kumar
Chief Technology Officer
linkedin
leader-logo
Kris Pritchard
Vice President & Chief Risk Officer
linkedin
Company data provided by crunchbase