Senior Infrastructure Engineer - AI/ML jobs in United States
cer-icon
Apply on Employer Site
company-logo

OpenTeams · 1 day ago

Senior Infrastructure Engineer - AI/ML

OpenTeams is dedicated to unlocking human potential through empowering AI solutions. They are seeking a Senior Infrastructure Engineer to design and implement cloud-native infrastructure for AI/ML workflows, ensuring scalability and observability while collaborating with clients and ML engineers.

Information TechnologyOpen SourceSoftware

Responsibilities

Significantly contribute to the evolution of Nebari (https://nebari.dev) and design reusable, modular infrastructure components that can be composed into bespoke Kubernetes-based platforms for sovereign AI deployments
Develop composable MLOps components and infrastructure patterns supporting model training, serving, monitoring, and CI/CD pipelines that organizations can own and operate
Design and implement observability, monitoring, and cost optimization strategies for large-scale AI/ML workloads on client-owned Kubernetes infrastructure
Collaborate with ML engineers to optimize infrastructure for training ML models, quantizing and packaging open weight LLMs, computer vision workloads, and other AI applications in sovereign environments
Contribute to open-source MLOps tooling and Kubernetes ecosystem projects that enable data sovereignty
Work with clients to deploy, configure, and optimize their sovereign AI infrastructure
Collaborate with a fully remote distributed team using asynchronous communication methods

Qualification

KubernetesInfrastructure-as-CodeCloud platformsPythonMLOps pipelinesMonitoring toolsCI/CD practicesTechnical leadershipCollaborationFeedback skills

Required

4+ years of hands-on infrastructure/platform/DevOps experience with production systems
Strong understanding of infrastructure engineering principles: scalability, reliability, observability, and automation
Solid experience with Kubernetes in production environments, including troubleshooting and optimization
Proficiency with Infrastructure-as-Code tooling (Terraform, Helm, or similar) for managing complex deployments
Experience with at least one major cloud platform (AWS, Azure, GCP) including networking, security, and compute services
Strong programming skills, particularly in Python and/or Go, with ability to write maintainable infrastructure code
Experience contributing to technical initiatives or mentoring junior team members
Understanding of CI/CD practices, GitOps workflows, and infrastructure automation principles
Comfortable working independently and in distributed teams
Ability to provide and constructively receive feedback
Available for collaboration during overlap with US Central Time zone

Preferred

MLOps pipelines and ML infrastructure (model training, serving, monitoring)
Multiple cloud platforms and their AI/ML services
On-premises deployment and hybrid cloud environments
ML/AI ecosystem tools (PyTorch, TensorFlow, scikit-learn, etc.)
Monitoring and observability tools (Prometheus, Grafana, distributed tracing)
Data sovereignty, privacy, and security requirements for enterprise AI
GPU infrastructure and model serving frameworks (KServe, vLLM, LLM-D)
ML workflow orchestration tools (Kubeflow, MLflow, Airflow, Prefect)
Service mesh technologies (Istio, Linkerd) and advanced Kubernetes networking
Open-source contributions to Kubernetes, MLOps, or AI infrastructure projects
Cost optimization and resource management for ML workloads
Air-gapped or highly secure deployment environments

Benefits

Medical, Dental & Vision – 100% paid for employees, 75% for dependents
401(k) Match – Up to 5% with full vesting after 2 years
Unlimited PTO – With a required minimum of 15 days off annually
Fully Remote Setup – Includes up to $3,000 equipment reimbursement
Continuous Education – Includes up to $500 reimbursement
Disability & Life Insurance – 100% employer-paid
HSA & FSA Options – With monthly HSA contributions from OpenTeams

Company

OpenTeams

twittertwittertwitter
company-logo
OpenTeams is the global leader in Open SaaS AI, machine learning, and data science. We’re here to unshackle the world from black-box SaaS.

Funding

Current Stage
Growth Stage
Total Funding
$0.1M
Key Investors
Sputnik ATX
2019-08-12Pre Seed· $0.1M

Leadership Team

leader-logo
Matt Harward
President
linkedin
Company data provided by crunchbase