Software Engineer - Reliability jobs in United States
cer-icon
Apply on Employer Site
company-logo

xAI · 11 hours ago

Software Engineer - Reliability

xAI is a company focused on creating AI systems that understand the universe and aid humanity. They are seeking a Software Engineer to join their SuperComputing team, responsible for ensuring the reliability, scalability, and performance of their HPC infrastructure.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design, implement, and maintain robust, scalable infrastructure for supercomputing environments
Monitor and optimize system performance, ensuring high availability and minimal downtime
Develop automation tools and scripts to streamline operations and improve system reliability
Troubleshoot complex issues across distributed systems, networks, and storage solutions
Collaborate with AI researchers and engineers to support compute-intensive workloads
Implement security best practices to protect sensitive data and infrastructure
Contribute to capacity planning and disaster recovery strategies
Participate in an on-call rotation to ensure 24/7 system reliability

Qualification

Site Reliability EngineeringLinux AdministrationContainerizationCloud PlatformsDistributed SystemsHPC EnvironmentsInfrastructure as CodeProblem-Solving SkillsCommunication SkillsCollaborative Mindset

Required

Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience)
3+ years of experience in site reliability engineering, DevOps, or systems engineering
Proficiency in Linux system administration and scripting (e.g., Python, Bash)
Experience with containerization (e.g., Docker, Kubernetes) and cloud platforms (e.g., AWS, GCP, Azure)
Strong understanding of networking, distributed systems, and storage technologies
Familiarity with HPC environments, GPU clusters, or large-scale data processing
Excellent problem-solving skills and ability to work in a fast-paced, dynamic environment
Strong communication skills and a collaborative mindset

Preferred

Experience with Infrastructure as Code (e.g., Terraform, Ansible) or monitoring tools (e.g., Prometheus, Grafana)

Benefits

Equity
Comprehensive medical, vision, and dental coverage
Access to a 401(k) retirement plan
Short & long-term disability insurance
Life insurance
Various other discounts and perks

Company

xAI

twittertwittertwitter
company-logo
XAI is an artificial intelligence startup that develops AI solutions and tools to enhance reasoning and search capabilities.

H1B Sponsorship

xAI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Late Stage
Total Funding
$42.73B
Key Investors
Neptune Digital AssetsSpaceXMorgan Stanley
2026-01-06Series E· $20B
2025-12-11Secondary Market· $0.3M
2025-07-13Corporate Round· $5.32B

Leadership Team

leader-logo
Toby Pohlen
Founding Member
linkedin
Company data provided by crunchbase