Senior MLOps Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Grid Dynamics · 11 hours ago

Senior MLOps Engineer

Grid Dynamics is a digital-native technology services provider that accelerates growth and bolsters competitive advantage for Fortune 1000 companies. They are seeking a Senior MLOps Engineer responsible for maintaining product and industry knowledge while designing and implementing GPU optimization strategies, developing distributed training pipelines, and managing ML infrastructure across multi-cloud environments.

Big DataCloud ComputingMobile AppsOutsourcingSoftware Engineering
badNo H1Bnote

Responsibilities

Design and implement GPU optimization strategies to maximize utilization and reduce latency for ML workloads
Develop and maintain distributed training pipelines using Ray framework for large-scale model development
Manage and optimize ML infrastructure across multi-cloud environments focusing on cost-efficiency and scalability
Build monitoring and profiling tools for GPU performance analysis and resource allocation optimization
Collaborate with data scientists and ML engineers to streamline model training, inference, and deployment processes
Implement best practices for workload orchestration, fault tolerance, and auto-scaling in cloud environments
Stay current with GPU architectures, ML frameworks, and cloud technologies to drive continuous infrastructure improvements

Qualification

ML infrastructure experienceGPU optimizationRay frameworkAWS/EKSCUDA programmingDeep learning frameworksKubernetesTerraformAnalytical skillsProblem-solving skillsCollaboration skills

Required

5+ years of ML infrastructure experience with 3+ years focused on GPU optimization
Hands-on experience with AWS/EKS for ML workloads in production environments
Proven expertise with Ray framework (Ray Train, Ray Tune, Ray Serve) for distributed ML computing
Bachelor's or Master's degree in Computer Science, Engineering, or a related field
Proficiency with deep learning frameworks (TensorFlow, PyTorch, JAX) and performance tuning
Experience with Kubernetes, Terraform, and infrastructure-as-code practices
Strong analytical and problem-solving skills for complex performance bottlenecks
Ability to collaborate effectively with data science, engineering, and DevOps teams

Preferred

Strong CUDA programming skills for GPU performance optimization (cuDNN, TensorRT experience preferred)

Benefits

Medical insurance
Vision
Dental

Company

Grid Dynamics

company-logo
Grid Dynamics is a provider of in driving enterprise-level digital transformation solutions.

Funding

Current Stage
Public Company
Total Funding
$124.5M
Key Investors
Benhamou Global Ventures
2024-11-12Post Ipo Equity· $108M
2020-03-06IPO
2019-05-31Series Unknown· $15M

Leadership Team

leader-logo
Ilya Katsov
CTO, Americas Region
linkedin
leader-logo
Rahul Bindlish
Vice President Strategic Sales, Emerging Business & Partnerships
linkedin
Company data provided by crunchbase