ML Infrastructure Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Harnham · 8 hours ago

ML Infrastructure Engineer

Harnham is seeking an ML Infrastructure Engineer to build and scale production systems for cutting-edge generative AI models. The role involves architecting scalable inference pipelines, optimizing model deployment, and ensuring reliable performance of 3D and multimodal generation systems at scale.

AnalyticsConsultingInformation TechnologyMarketing
check
H1B Sponsor Likelynote
Hiring Manager
Gabriella Varela
linkedin

Responsibilities

Design and deploy high-performance backend systems for serving generative models in production
Build and optimize GPU-based inference services with focus on latency, throughput, and cost efficiency
Implement model optimization techniques including quantization, pruning, and distillation
Develop robust APIs and microservices for model serving using FastAPI, Flask, or gRPC
Manage cloud infrastructure and CI/CD pipelines for continuous model deployment
Scale distributed inference systems to handle high-concurrency workloads with request batching
Collaborate with ML researchers to productionize diffusion models, transformers, and multimodal pipelines

Qualification

Generative AI ModelsPython programmingCloud platformsModel optimization frameworksBackend infrastructure systemsContainerizationDistributed systemsReal-time inference systemsGraphics renderingOpen-source contributionsCost optimization strategies

Required

Hands-on experience with diffusion models and transformer-based architectures
Background in multimodal pipelines combining image and 3D generation
Familiarity with 3D generation or computer graphics pipelines (meshes, textures, multi-view data)
Strong track record building backend and infrastructure systems in production environments
Expert-level Python programming with production-grade API design
Deep experience deploying and operating ML models at scale, including GPU-based inference services, concurrency handling, request batching, and latency/throughput optimization
Proficiency with cloud platforms: AWS (SageMaker, EC2, EKS), GCP, or equivalent
Experience with containerization (Docker), orchestration, and CI/CD pipelines
Hands-on work with model optimization frameworks: ONNX Runtime, TensorRT, FSDP, DeepSpeed
Knowledge of distributed systems and scalable inference frameworks (Ray, Triton, TorchServe)

Preferred

Experience with real-time inference systems or streaming pipelines
Background in graphics rendering or game engine technologies
Contributions to open-source ML infrastructure projects
Understanding of cost optimization strategies for GPU compute

Company

Harnham has actively chosen to focus on Data and Analytics.

H1B Sponsorship

Harnham has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (1)

Funding

Current Stage
Growth Stage
Total Funding
unknown
Key Investors
BGF Ventures
2022-05-01Seed

Leadership Team

leader-logo
David Farmer
Chief Executive Officer
linkedin
S
Stephen Lawrence
CFO
linkedin
Company data provided by crunchbase