Apply on Employer Site

Harnham · 8 hours ago

ML Infrastructure Engineer

United States

Full-time

Remote

Mid, Senior Level

$200K/yr - $275K/yr

Harnham is seeking an ML Infrastructure Engineer to build and scale production systems for cutting-edge generative AI models. The role involves architecting scalable inference pipelines, optimizing model deployment, and ensuring reliable performance of 3D and multimodal generation systems at scale.

AnalyticsConsultingInformation TechnologyMarketing

H1B Sponsor Likely

Hiring Manager

Gabriella Varela

Responsibilities

Design and deploy high-performance backend systems for serving generative models in production

Build and optimize GPU-based inference services with focus on latency, throughput, and cost efficiency

Implement model optimization techniques including quantization, pruning, and distillation

Develop robust APIs and microservices for model serving using FastAPI, Flask, or gRPC

Manage cloud infrastructure and CI/CD pipelines for continuous model deployment

Scale distributed inference systems to handle high-concurrency workloads with request batching

Collaborate with ML researchers to productionize diffusion models, transformers, and multimodal pipelines

Qualification

Generative AI ModelsPython programmingCloud platformsModel optimization frameworksBackend infrastructure systemsContainerizationDistributed systemsReal-time inference systemsGraphics renderingOpen-source contributionsCost optimization strategies

Required

Hands-on experience with diffusion models and transformer-based architectures

Background in multimodal pipelines combining image and 3D generation

Familiarity with 3D generation or computer graphics pipelines (meshes, textures, multi-view data)

Strong track record building backend and infrastructure systems in production environments

Expert-level Python programming with production-grade API design

Deep experience deploying and operating ML models at scale, including GPU-based inference services, concurrency handling, request batching, and latency/throughput optimization

Proficiency with cloud platforms: AWS (SageMaker, EC2, EKS), GCP, or equivalent

Experience with containerization (Docker), orchestration, and CI/CD pipelines

Hands-on work with model optimization frameworks: ONNX Runtime, TensorRT, FSDP, DeepSpeed

Knowledge of distributed systems and scalable inference frameworks (Ray, Triton, TorchServe)

Preferred

Experience with real-time inference systems or streaming pipelines

Background in graphics rendering or game engine technologies

Contributions to open-source ML infrastructure projects

Understanding of cost optimization strategies for GPU compute

Company

Harnham

Glassdoor3.6

Harnham has actively chosen to focus on Data and Analytics.

Founded in 2006

New York, New York, USA

201-500 employees

https://www.harnham.com/us

H1B Sponsorship

Harnham has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2024 (1)

Funding

Current Stage

Growth Stage

Total Funding

unknown

Key Investors

BGF Ventures

2022-05-01Seed

Leadership Team

David Farmer

Chief Executive Officer

Stephen Lawrence

CFO

Recent News

Harnham

Harnham secures investment from BGF | Harnham US Recruitment post

2025-02-05

BeBeez

Notizie da: Nordic Capital, Bridgepoint Development Capital, PAI Partners, BlackRock, IP Group, MMC, Octopus, Pelican Capital, BGF e altri

2022-05-09

Company data provided by crunchbase