SIGN IN
Senior DevOps Engineer (AI/ML Ops) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Prime Solutions Group, Inc. · 1 month ago

Senior DevOps Engineer (AI/ML Ops)

Prime Solutions Group (PSG), Inc. is seeking a Senior DevOps Engineer (AI/ML Ops) to lead the development of secure, scalable, and automated ML platforms powering mission-critical AI/ML programs. In this high-impact role, you will architect and operate end-to-end ML pipelines across classified and unclassified environments, enabling next-generation AI capabilities for defense and advanced sensing systems.
Information ServicesInformation TechnologySecuritySoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Design, build, and maintain ML-focused CI/CD pipelines with automated testing, security checks, and model validation gates
Architect and implement data ingestion, ETL/ELT, and feature engineering pipelines using modern data engineering frameworks
Lead development of training, evaluation, and retraining workflows with experiment tracking and model registry integration
Containerize and deploy ML models (REST/gRPC microservices, batch jobs, and streaming inference) using Docker and Kubernetes across cloud and on-prem environments
Implement Infrastructure-as-Code (IaC) using Terraform, Ansible, or similar tools for provisioning compute, storage, networking, and GPU resources
Integrate data quality checks, drift detection, and model performance monitoring into production ML systems
Ensure ML workloads comply with NIST, RMF, FedRAMP, and PSG security baselines (image scanning, SBOMs, secrets management, hardening)
Partner with data scientists and software engineers to move models from experimentation to production, including packaging, dependency management, and optimization
Monitor ML infrastructure using Prometheus/Grafana, ELK/EFK, or similar observability stacks; lead incident root-cause analysis
Independently lead projects, influence architecture decisions, and navigate tool selection for enterprise ML platforms
Integrate ML-specific security and quality testing into workflows (SAST/DAST, container security scanning, policy-as-code)
Develop technical documentation, runbooks, diagrams, and risk assessments for ML platforms
Mentor junior staff and provide guidance on architecture, pipelines, code quality, and operational best practices
Participate in architecture reviews, compliance assessments, and configuration management processes

Qualification

MLOpsDevSecOpsCI/CD pipelinesDockerKubernetesPythonTerraformAnsibleAWS/Azure/GCPData engineeringScripting skillsCommunication skillsDocumentation skillsMentoring

Required

U.S. Citizenship (required)
Active Top-Secret Clearance (or higher)
Bachelor's degree in Computer Science, Engineering, Data Science, Mathematics, or related field
4–6+ years of experience in at least one of the following: MLOps / ML platform engineering, DevOps / DevSecOps / SRE for ML workloads, Data engineering with production ML workflows, Applied ML in production environments
Strong experience with secure CI/CD pipelines and IaC (GitLab CI, Jenkins, GitHub Actions, Terraform, Ansible)
Hands-on expertise with Docker, Kubernetes, and at least one major cloud provider (AWS/Azure/GCP), including GPU/HPC support
Strong understanding of the full ML lifecycle (data ? features ? training ? validation ? deployment ? monitoring ? retraining)
Proficiency with Python and standard ML/data libraries (NumPy, pandas, scikit-learn, PyTorch, TensorFlow)
Strong scripting skills (Python, Bash, PowerShell) for automation
Familiarity with RMF, STIGs, DISA, and secure ML deployment practices
Ability to lead projects, make architecture decisions, and mentor technical staff
Excellent communication and documentation skills

Preferred

Master's degree in a related field
Active Security Clearance above minimum requirements (SCI, CI Poly)
Industry certifications: AWS ML Specialty, AWS DevOps, CKA/CKS, etc
Experience with: MLflow, Weights & Biases, SageMaker, or similar registries/experiment tracking
Orchestration frameworks (Airflow, Kubeflow, Prefect, Dagster)
Feature stores and data validation tools (Great Expectations, Feast)
Experience with Zero Trust, SBOMs, and secure software supply chain principles
Familiarity with NIST 800-53, FedRAMP, and ISO 27001 as they relate to ML/AI systems
Kubernetes security expertise (RBAC, network policies, hardened images)
Background supporting defense, intelligence, or other high-assurance environments

Benefits

Competitive compensation & benefits
Professional development & tuition assistance
Collaborative, mission-driven culture
A small-company environment where innovation moves fast
Direct impact on high-visibility government programs leveraging advanced AI/ML

Company

Prime Solutions Group, Inc.

twittertwittertwitter
company-logo
Prime Solutions Group, Inc (PSG) provides engineering services and software data processing products for remote sensing systems.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase