General Motors · 2 hours ago
Senior Site Reliability Engineer/Developer – Vehicle Security Platforms
General Motors is a leading automotive company committed to a vision of Zero Crashes, Zero Emissions, and Zero Congestion. They are seeking a Senior Site Reliability Engineer/Developer to ensure the reliability, scalability, and performance of their software systems, focusing on vehicle security platforms through automation, incident response, and collaboration with development teams.
AutomotiveElectric VehicleInformation ServicesManufacturingTransportation
Responsibilities
Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents
Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring
Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems
Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future
Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation
Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems
Implement, and evolve secure, highly available, and globally distributed systems powering GM’s vehicle security platforms
Own reliability roadmaps, establishing frameworks and strategies for system hardening, high availability, disaster recovery, and operational scalability
Develop automation-first solutions to eliminate operational toil, with advanced use of languages such as Python, Go, and Java
Lead incident response, driving systematic elimination of failure modes through blameless postmortems PRRs and cross-team preventative initiatives
Drive observability strategies with best-in-class practices for metrics, logging, and distributed tracing, using Prometheus, Datadog, or similar stacks
Partner with engineering, platform, and security teams to design for reliability from inception, influencing architecture reviews and CI/CD best practices
Lead optimization, capacity planning, and performance-tuning strategies for large-scale, security-critical platforms
Introduce modern SRE practices such as chaos engineering, resilience testing, and progressive delivery to validate support teams and evolve system safety along with SLO, SLI, and SLAs
Mentor engineers across disciplines on SRE, platform resilience, secure operational practices, and architectural trade-offs
Evaluate and adopt technologies (open-source, enterprise, homegrown) for security and reliability at scale
Influence product strategy in partnership with engineering leads, ensuring operational reliability is prioritized alongside customer and business outcomes
Qualification
Required
5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure/platform roles supporting secure, scalable systems
Strong Proven expertise in designing and scaling cloud infrastructure (Azure) and container orchestration systems (Kubernetes, Docker)
Demonstrated mastery of infrastructure-as-code frameworks (Terraform, Helm, CloudFormation, etc)
Proficiency in Python and one JVM language (Java or Kotlin), and working knowledge of Go
Deep architectural understanding of distributed systems, networking, system design, and large-scale security practices
Track record of architecting and running zero-downtime systems in production
Experience with modern monitoring and reliability tooling and frameworks (Prometheus, Datadog, OpenTelemetry, etc.)
Experience leading incident response, uptime SLO/SLA management, and operational excellence initiatives across multiple teams
Capable of influencing architecture and product strategy while maintaining a hands-on approach to systems reliability
Exceptional communication skills, able to present complex trade-offs and foster alignment across executive, product, and engineering stakeholders
Preferred
BS/MS/PhD in Computer Science, Engineering, or equivalent industry experience
Deep understanding of encryption technologies, secure data handling practices, and identity management
Experience designing and operating IoT or automotive-focused architectures with rigorous availability and safety requirements
Direct experience in chaos engineering, game-day testing, disaster recovery orchestration, and production load testing
Ability to grow and mentor engineers into leaders in their domain, building SRE teams that can operate independently at scale
Demonstrated success in defining and executing reliability strategies with measurable business impact
Strong product mindset with the ability to balance engineering excellence with speed and business priorities
Benefits
Relocation benefits
Company
General Motors
General Motors is an automotive company that designs, produces, markets, and distributes vehicles and vehicle parts.
Funding
Current Stage
Public CompanyTotal Funding
$8.51BKey Investors
US Department of Energy
2025-05-05Post Ipo Debt· $2B
2024-10-31Grant· $8M
2024-07-11Grant· $500M
Leadership Team
Recent News
2026-01-22
2026-01-22
Company data provided by crunchbase