Principal Staff Engineer – AI Infrastructure - AI/ML Leader jobs in United States
info-icon
This job has closed.
company-logo

Andiamo · 2 days ago

Principal Staff Engineer – AI Infrastructure - AI/ML Leader

Andiamo is a globally recognized staffing and consulting firm specializing in placing top technology professionals. They are seeking a Principal Staff Engineer to lead the architecture and development of next-generation AI infrastructure, focusing on large-scale distributed systems and machine learning.

ConsultingHuman ResourcesInformation TechnologyStaffing Agency
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Design & Scale AI Infrastructure: Architect and build distributed training, inference, and data pipelines that support large-scale AI workloads across GPUs and heterogeneous environments
Lead Cloud-Native Innovation: Drive adoption of Kubernetes, Docker, and modern orchestration frameworks to optimize model deployment, resource allocation, and cluster utilization
Optimize Performance at Scale: Develop high-throughput, low-latency services and memory-efficient systems to support petabyte-scale data and massive model sizes
Advance Observability & Reliability: Implement monitoring, tracing, and fault-tolerance strategies to ensure resilient AI systems in production
Collaborate with Research & Product: Partner with ML scientists, product engineers, and platform teams to design infrastructure that accelerates experimentation and model iteration
Mentor & Inspire: Support the technical growth of senior engineers, fostering a culture of excellence, innovation, and ownership
Shape Technical Strategy: Define long-term roadmaps for AI infrastructure, balancing near-term delivery with foundational investments in scalability, efficiency, and reliability

Qualification

AI/ML Infrastructure KnowledgeDistributed SystemsProgramming MasteryModern Infrastructure SkillsSystems Design ExpertiseLeadership & InfluenceProduct Mindset

Required

10+ years in distributed systems, large-scale infrastructure, or platform engineering, with experience supporting AI/ML workloads strongly preferred
Deep expertise in Java, Python, or C++, with proven ability to build performant and reliable systems
Familiarity with ML frameworks (TensorFlow, PyTorch, JAX), distributed training strategies, GPU scheduling, and data pipeline optimization
Hands-on experience with Kubernetes, Docker, CI/CD pipelines, cloud platforms (AWS/GCP/Azure), and observability tools (Prometheus, Grafana, Datadog)
Strong foundation in algorithms, concurrency, and systems architecture for high-scale, fault-tolerant environments
Demonstrated success driving cross-functional initiatives, mentoring senior engineers, and setting engineering-wide standards
Ability to balance technical rigor with usability and speed, ensuring infrastructure empowers rapid iteration and impactful outcomes

Company

The Talent Partners for the AI Revolution.

H1B Sponsorship

Andiamo has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2022 (2)
2021 (1)

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Patrick McAdams
CEO & Co-Founder
linkedin
leader-logo
Steven Kottler
CFO
linkedin
Company data provided by crunchbase