KLA · 1 day ago
AI Ops Engineer
KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. They are seeking a highly skilled Senior AI Ops Engineer to architect and deliver automation for scalable model development, focusing on end-to-end experiment management and model fine-tuning pipelines. The role involves implementing standards for experiment tracking and building automated pipelines for model training and evaluation.
ElectronicsInformation TechnologyManufacturing
Responsibilities
Implement and operate experiment tracking, lineage, and reproducibility standards (datasets, code, configs, artifacts, metrics) using MLflow/W&B or equivalents
Build CI/CD for ML: tests (unit/integration), packaging, reproducibility checks, policy gates, automated deployment and rollback strategies
Design workflow orchestration for large-scale ML jobs (scheduled runs, triggered retrains, parameter sweeps, gated releases) using tools such as Airflow/Kubeflow/Argo or equivalents
Architect, build, and own automated pipelines for model training, fine-tuning (e.g., PEFT/LoRA), evaluation, and promotion across environments (dev → staging → production)
Establish standardized training “recipes” (configs, templates, golden paths) to reduce time-to-first-experiment and improve consistency across teams
Enable and optimize distributed GPU training (throughput, reliability, and cost), including checkpointing, mixed precision, fault tolerance, and spot/preemptible handling where applicable
Develop evaluation harnesses and automated benchmark suites (quality, safety, latency, and cost) with clear, repeatable reporting to compare runs and releases
Qualification
Required
Strong proficiency in Python and experience building robust automation frameworks and production-grade services for ML workloads
Hands-on experience with experiment tracking and model lifecycle tooling (e.g., MLflow, Weights & Biases) and reproducible ML workflows
Practical experience fine-tuning modern deep learning models (e.g., Transformers) and familiarity with parameter-efficient approaches (LoRA/PEFT)
Working knowledge of RLHF concepts and pipelines (preference data, reward models, policy optimization) and how to operationalize human-in-the-loop workflows
Experience with containerization (Docker), orchestration (Kubernetes), and operating GPU workloads reliably at scale
Experience with CI/CD, version control (Git), and Infrastructure-as-Code (Terraform/Bicep or equivalent)
Excellent problem-solving skills across distributed systems (training jobs, pipelines, compute infrastructure) and strong communication to partner with research and engineering teams
Bachelor's degree in Computer Science, Software Engineering, or related field
5+ years of experience in MLOps/Platform Engineering/DevOps/ML Engineering (or demonstrated equivalent impact), including owning production systems and leading cross-team initiatives
Preferred
Prior experience in a similar industry and/or operating ML platforms with stringent IP/security requirements is a plus
Benefits
Medical
Dental
Vision
Life
Other voluntary benefits
401(K) including company matching
Employee stock purchase program (ESPP)
Student debt assistance
Tuition reimbursement program
Development and career growth opportunities and programs
Financial planning benefits
Wellness benefits including an employee assistance program (EAP)
Paid time off
Paid company holidays
Family care and bonding leave
Company
KLA
Kla creates tools and services that promote innovation in the electronics industry.
H1B Sponsorship
KLA has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (343)
2024 (218)
2023 (191)
2022 (277)
2021 (200)
2020 (226)
Funding
Current Stage
Late StageRecent News
news.com.au — Australia’s leading news site for latest headlines
2025-08-01
2025-08-01
Company data provided by crunchbase