Apply on Employer Site

Enigma · 7 hours ago

Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

San Jose, CA

Full-time

Hybrid

Mid, Senior Level

3+ years exp

Enigma is a company focused on machine learning solutions, and they are seeking a Machine Learning Engineer to optimize and productize models from research into efficient services. The role involves scaling training across nodes/GPUs, implementing model-efficiency techniques, and maintaining model-serving systems.

Staffing & Recruiting

Work & Life Balance

H1B Sponsor Likely

Hiring Manager

Tom Goldberg

Responsibilities

Productize and optimize models from Research into reliable, performant, and cost-efficient services with clear SLOs (latency, availability, cost)

Scale training across nodes/GPUs (DDP/FSDP/ZeRO, pipeline/tensor parallelism) and own throughput/time-to-train using profiling and optimization

Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache, Flash Attention) for training and inference without materially degrading quality

Build and maintain model-serving systems (vLLM/Triton/TGI/ONNX/TensorRT/AITemplate) with batching, streaming, caching, and memory management

Integrate with vector/feature stores and data pipelines (FAISS/Milvus/Pinecone/pgvector; Parquet/Delta) as needed for production

Define and track performance and cost KPIs; run continuous improvement loops and capacity planning

Partner with ML Ops on CI/CD, telemetry/observability, model registries; partner with Scientists on reproducible handoffs and evaluations

Qualification

PythonPyTorchDistributed TrainingOptimizationGPUSQL/NoSQLModel ServingDeep Learning FrameworksPerformance ProfilingCollaboration Skills

Required

Bachelors in computer science, Electrical/Computer Engineering, or a related field required; Master's preferred (or equivalent industry experience)

Strong systems/ML engineering with exposure to distributed training and inference optimization

3–5 years in ML/AI engineering roles owning training and/or serving in production at scale

Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements

Experience collaborating across Research, Platform/Infra, Data, and Product functions

Familiarity with deep learning frameworks: PyTorch (primary), TensorFlow

Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism); distributed training experience a plus

Optimization: experience profiling and optimizing code execution and model inference: (PTQ/QAT/AWQ/GPTQ), pruning, distillation, KV-cache optimization, Flash Attention

Scalable serving: autoscaling, load balancing, streaming, batching, caching; collaboration with platform engineers

Data & storage: SQL/NoSQL, vector stores (FAISS/Milvus/Pinecone/pgvector), Parquet/Delta, object stores

Write performant, maintainable code

Understanding of the full ML lifecycle: data collection, model training, deployment, inference, optimization, and evaluation

Company

Enigma

Here at Enigma, we specialize in Generative AI recruitment, specifically focused on Machine Learning and Software Engineering disciplines.

London, England, GB

2-10 employees

https://www.enigma-rec.ai

H1B Sponsorship

Enigma has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

2024 (3)

2023 (3)

2022 (3)

2021 (5)

2020 (3)

Funding

Current Stage

Early Stage

Company data provided by crunchbase