xAI · 2 weeks ago
Site Reliability Engineer - xAI Technical Operations
xAI is dedicated to creating AI systems that aid humanity in understanding the universe. They are seeking Site Reliability Engineers to ensure the availability and reliability of their infrastructure and core services, focusing on incident management and performance metrics in distributed environments.
Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
Responsibilities
Setting technical strategy and roadmap for infrastructure availability
Automating monitoring, alerting, and troubleshooting for high-availability services, while working with legacy systems to scale, improve, or deprecate
Owning incident response, problem management, and conducting thorough RCAs to prevent recurrence and drive continuous improvement
Analyzing performance metrics and service health to identify, resolve, and mitigate bottlenecks or failures in distributed environments
Ensuring security, scalability, and resilience of production infrastructure supporting AI workloads
Qualification
Required
A minimum of 5 years of software, systems or reliability engineering experience
Experience managing services in distributed, internet-scale
ix environments, including on-prem and cloud (e.g., AWS, GCP)
Development experience in Python, Scala, Java, C, or C++
Demonstrable knowledge of TCP/IP, HTTP, Networking and systems programming (e.g., bash and shell tools)
Familiarity with containerization and orchestration tools (e.g., Kubernetes, Docker, Mesos) and systems management (e.g., Puppet, Chef, Ansible)
Bachelor's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience)
Preferred
Experience in on-call rotations and incident response in high-stakes environments
Experience with AI/ML infrastructure, large-scale GPU clusters
Strong problem-solving skills and ability to thrive in a fast-paced, ambiguous setting
Comfortable with deployment, support, monitoring, administration, and troubleshooting across on-prem, cloud and hybrid infrastructures
Proven understanding of systems and application design, including operational trade-offs
Benefits
Equity
Comprehensive medical, vision, and dental coverage
Access to a 401(k) retirement plan
Short & long-term disability insurance
Life insurance
Various other discounts and perks
Company
xAI
XAI is an artificial intelligence startup that develops AI solutions and tools to enhance reasoning and search capabilities.
H1B Sponsorship
xAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Late StageTotal Funding
$42.73BKey Investors
Neptune Digital AssetsSpaceXMorgan Stanley
2026-01-06Series E· $20B
2025-12-11Secondary Market· $0.3M
2025-07-13Corporate Round· $5.32B
Recent News
2026-01-09
2026-01-09
2026-01-09
Company data provided by crunchbase