Toyota North America · 2 days ago
Lead Site Reliability Engineer - Applications/Domains
Toyota North America is a leading company in the mobility sector, and they are seeking a Lead Site Reliability Engineer to enhance the reliability and performance of their applications. The role involves collaborating with various teams to improve system health, automate processes, and ensure high availability of services.
Manufacturing
Responsibilities
Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment
Working with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team culture and support professional development
Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform
Build automation and operational runbooks primarily using Python scripting
Manage container orchestration platforms and related cloud-native services
Drive reliability improvements through Service Level Objectives (SLOs), error budgets and Service Level Agreements (SLAs) aligned with business goals
Design & implement observability improvements using Dynatrace & CloudWatch
Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence
Conduct blameless post-incident reviews and drive continuous improvement
Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals
Participate in architectural reviews, providing input on reliability and scalability
Qualification
Required
Experience with DevOps tools like GitHub, Harness & Dynatrace
Experience building self-healing systems and automated remediation workflows
5+ years of experience in Site Reliability Engineering, DevOps, or related field
Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance
Strong experience with Terraform for AWS IaC
Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack)
Deep knowledge of container orchestration (Kubernetes/EKS)
Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes)
Effective communication skills, with the ability to convey complex technical concepts to diverse audiences
Preferred
AWS certifications (DevOps Engineer, Solutions Architect, etc.)
Familiarity with GitOps, secrets management, and infrastructure monitoring best practices
Experience building self-healing systems and automated remediation workflows
Benefits
A work environment built on teamwork, flexibility, and respect
Professional growth and development programs to help advance your career, as well as tuition reimbursement
Team Member Vehicle Purchase Discount
Toyota Team Member Lease Vehicle Program (if applicable)
Comprehensive health care and wellness plans for your entire family
Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute
Paid holidays and paid time off
Referral services related to prenatal services, adoption, childcare, schools and more
Tax Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
Relocation assistance (if applicable)
Company
Toyota North America
At Toyota, we’re known for making some of the highest quality vehicles on the road. But there is more to our story.
Funding
Current Stage
Late StageTotal Funding
$4.5MKey Investors
ARPA-E
2024-12-18Grant· $4.5M
Recent News
Morningstar.com
2026-01-05
2026-01-05
Company data provided by crunchbase