Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Ford Pro · 2 hours ago

Senior Site Reliability Engineer

Ford Pro is focused on leveraging advanced technology to redefine the transportation landscape. As a Senior Site Reliability Engineer, you will ensure the reliability, performance, and scalability of the Ford Service Reservation Platform, utilizing SRE practices, advanced observability, and automation to enhance customer experience.

AutomotiveFleet ManagementSoftware
check
H1B Sponsorednote

Responsibilities

Lead the implementation and continuous evolution of Site Reliability Engineering (SRE) practices to ensure exceptional high availability, performance, and scalability for the Ford Service Reservation Platform and its applications
Define, implement, and rigorously maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for key services, directly aligning reliability goals with critical business and customer outcomes
Generate regular SLO and error budget reports, collaborating closely with engineering teams to strategically prioritize reliability work, incident follow-ups, and targeted technical debt reduction efforts
Lead weekly status and reliability reviews, effectively communicating risks, performance trends, and improvement opportunities to key stakeholders in engineering and product
Champion data-driven decision-making, leveraging observability insights to significantly improve incident response, reduce Mean Time to Resolution (MTTR), and enhance the overall customer experience
Own, evolve, and optimize comprehensive observability solutions, primarily utilizing Dynatrace for full-stack visibility, Real User Monitoring (RUM), synthetic monitoring, and infrastructure monitoring across critical user journeys of the Ford Service Reservation Platform
Design and implement robust Google Cloud Platform (GCP) observability patterns for logs, metrics, alerts, and dashboards specifically tailored for the Ford Service Reservation Platform and its associated applications
Leverage Dynatrace and GCP log analytics insights to proactively drive incident reduction, facilitate efficient root cause analysis, and foster continuous performance improvements across all Ford Service Reservation services
Develop and deploy infrastructure as code using Terraform scripts for the provisioning and management of GCP resources, including networking, load balancing, and monitoring artifacts etc
Configure and maintain essential DevSecOps tools such as SonarQube, FOSSA, Cycode, and 42 Crunch to ensure code quality and security
Build reusable, scalable Terraform modules to automate the provisioning of GCP monitoring artifacts, including log-based metrics, alerting policies, uptime checks, and comprehensive dashboards
Develop and maintain robust CI/CD pipelines utilizing Tekton PAC and/or GitHub Actions for application code deployment, automated operational tasks (e.g., instance management, cache invalidation, and data backups), and infrastructure changes
Manage GitHub repositories for application code, automation scripts, and configuration management
Establish and continually refine Incident Management and Problem Management processes, coordinating effectively with application teams for rapid resolution and thorough root cause analysis of issues
Identify systemic and application-specific issues through detailed analysis of observability data and collaborate proactively with development teams to prioritize feature requests and defect resolutions that enhance reliability

Qualification

Google Cloud PlatformSite Reliability EngineeringInfrastructure as CodeDynatraceTerraformCI/CD PipelinesIncident ManagementPythonData-Driven MindsetCommunication

Required

Bachelor's degree in Computer Science, Computer Engineering, Systems Engineering or equivalent combination of relevant education and experience
7+ years of experience in Software Engineering, DevOps, or Systems Administration
5+ years of dedicated experience in a Site Reliability Engineering (SRE) or Platform Engineering role
2+ years of experience leading technical initiatives or mentoring junior engineers in an SRE context

Preferred

Master's Degree in Computer Science, Computer Engineering, Systems Engineering or related field
Google Professional Cloud Architect or Google Professional Cloud DevOps Engineer certification
Dynatrace Professional Certification
Terraform Associate Certification
Prior experience working on high-traffic reservation systems, e-commerce platforms, or automotive service applications

Company

Ford Pro

twittertwittertwitter
company-logo
Ford Pro is a productivity accelerator designed to drive the business forward, delivering solutions to commercial customers of all sizes.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Andrew Frick
Interim Head
linkedin
Company data provided by crunchbase