Harnham · 18 hours ago
ML Infrastructure Engineer
Harnham is seeking an ML Infrastructure Engineer to build and scale production systems for cutting-edge generative AI models. The role involves architecting scalable inference pipelines, optimizing model deployment, and ensuring reliable performance of 3D and multimodal generation systems at scale.
Responsibilities
Design and deploy high-performance backend systems for serving generative models in production
Build and optimize GPU-based inference services with focus on latency, throughput, and cost efficiency
Implement model optimization techniques including quantization, pruning, and distillation
Develop robust APIs and microservices for model serving using FastAPI, Flask, or gRPC
Manage cloud infrastructure and CI/CD pipelines for continuous model deployment
Scale distributed inference systems to handle high-concurrency workloads with request batching
Collaborate with ML researchers to productionize diffusion models, transformers, and multimodal pipelines
Qualification
Required
Hands-on experience with diffusion models and transformer-based architectures
Background in multimodal pipelines combining image and 3D generation
Familiarity with 3D generation or computer graphics pipelines (meshes, textures, multi-view data)
Strong track record building backend and infrastructure systems in production environments
Expert-level Python programming with production-grade API design
Deep experience deploying and operating ML models at scale, including GPU-based inference services, concurrency handling, request batching, and latency/throughput optimization
Proficiency with cloud platforms: AWS (SageMaker, EC2, EKS), GCP, or equivalent
Experience with containerization (Docker), orchestration, and CI/CD pipelines
Hands-on work with model optimization frameworks: ONNX Runtime, TensorRT, FSDP, DeepSpeed
Knowledge of distributed systems and scalable inference frameworks (Ray, Triton, TorchServe)
Preferred
Experience with real-time inference systems or streaming pipelines
Background in graphics rendering or game engine technologies
Contributions to open-source ML infrastructure projects
Understanding of cost optimization strategies for GPU compute
Company
Harnham
Harnham has actively chosen to focus on Data and Analytics.
H1B Sponsorship
Harnham has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (1)
Funding
Current Stage
Growth StageTotal Funding
unknownKey Investors
BGF Ventures
2022-05-01Seed
Recent News
Company data provided by crunchbase