Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA jobs in United States
cer-icon
Apply on Employer Site
company-logo

Enigma · 7 hours ago

Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Enigma is a company focused on machine learning solutions, and they are seeking a Machine Learning Engineer to optimize and productize models from research into efficient services. The role involves scaling training across nodes/GPUs, implementing model-efficiency techniques, and maintaining model-serving systems.

Staffing & Recruiting
check
Work & Life Balance
check
H1B Sponsor Likelynote
Hiring Manager
Tom Goldberg
linkedin

Responsibilities

Productize and optimize models from Research into reliable, performant, and cost-efficient services with clear SLOs (latency, availability, cost)
Scale training across nodes/GPUs (DDP/FSDP/ZeRO, pipeline/tensor parallelism) and own throughput/time-to-train using profiling and optimization
Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache, Flash Attention) for training and inference without materially degrading quality
Build and maintain model-serving systems (vLLM/Triton/TGI/ONNX/TensorRT/AITemplate) with batching, streaming, caching, and memory management
Integrate with vector/feature stores and data pipelines (FAISS/Milvus/Pinecone/pgvector; Parquet/Delta) as needed for production
Define and track performance and cost KPIs; run continuous improvement loops and capacity planning
Partner with ML Ops on CI/CD, telemetry/observability, model registries; partner with Scientists on reproducible handoffs and evaluations

Qualification

PythonPyTorchDistributed TrainingOptimizationGPUSQL/NoSQLModel ServingDeep Learning FrameworksPerformance ProfilingCollaboration Skills

Required

Bachelors in computer science, Electrical/Computer Engineering, or a related field required; Master's preferred (or equivalent industry experience)
Strong systems/ML engineering with exposure to distributed training and inference optimization
3–5 years in ML/AI engineering roles owning training and/or serving in production at scale
Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements
Experience collaborating across Research, Platform/Infra, Data, and Product functions
Familiarity with deep learning frameworks: PyTorch (primary), TensorFlow
Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism); distributed training experience a plus
Optimization: experience profiling and optimizing code execution and model inference: (PTQ/QAT/AWQ/GPTQ), pruning, distillation, KV-cache optimization, Flash Attention
Scalable serving: autoscaling, load balancing, streaming, batching, caching; collaboration with platform engineers
Data & storage: SQL/NoSQL, vector stores (FAISS/Milvus/Pinecone/pgvector), Parquet/Delta, object stores
Write performant, maintainable code
Understanding of the full ML lifecycle: data collection, model training, deployment, inference, optimization, and evaluation

Company

Enigma

twitter
company-logo
Here at Enigma, we specialize in Generative AI recruitment, specifically focused on Machine Learning and Software Engineering disciplines.

H1B Sponsorship

Enigma has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (3)
2023 (3)
2022 (3)
2021 (5)
2020 (3)

Funding

Current Stage
Early Stage
Company data provided by crunchbase