SIGN IN
Artificial Intelligence | Machine Learning Engineer (AI/ML Operations) jobs in United States
info-icon
This job has closed.
company-logo

Integrated Resources, Inc ( IRI ) · 1 day ago

Artificial Intelligence | Machine Learning Engineer (AI/ML Operations)

Integrated Resources, Inc (IRI) is seeking an AI/ML Engineer with a strong focus on operations, analytics, and platform support. This role will manage the operational health of AI/ML models and multi-cloud AI platforms while collaborating with engineering and infrastructure teams.
BiotechnologyHealth CareHuman ResourcesRecruitingStaffing Agency
check
H1B Sponsor Likelynote

Responsibilities

Manage operational workflows for model deployments, updates, and versioning across GCP, Azure, and AWS
Monitor model performance metrics including latency, throughput, error rates, token usage, and inference quality
Track model drift, accuracy degradation, and performance anomalies; escalate issues to engineering teams
Support knowledge base operations including vector embedding pipelines, chunk quality, and refresh cycles
Maintain model inventory and technical documentation across multi-cloud environments
Coordinate model evaluation cycles with Responsible AI and Core Engineering teams
Monitor AI agent health, performance, and reliability (AutoGen-based agents, MCP servers)
Track agent execution metrics such as task completion rates, tool call success/failure, latency, and error patterns
Support agent deployment and configuration management workflows
Document agent behaviors, known issues, and operational runbooks
Coordinate with engineering teams on agent updates, testing, and rollouts
Monitor MCP server availability, connection health, and integration status
Track and analyze AI/ML cloud spend across GCP (Vertex AI), Azure (OpenAI), and AWS (Bedrock)
Build cost dashboards by model, application team, use case, and environment
Monitor token consumption, inference costs, and embedding/storage costs
Identify cost optimization opportunities (model selection, caching, batching, rightsizing)
Provide cost allocation reports for chargeback/showback
Forecast spend trends and flag budget anomalies
Partner with Infrastructure and Finance teams on AI cost governance
Build and maintain dashboards for platform performance, model health, agent metrics, and operational KPIs
Create executive and stakeholder reports on platform adoption, usage trends, and cost allocation
Develop Responsible AI dashboards tracking hallucination rates, accuracy metrics, guardrail triggers, and safety incidents
Monitor API gateway traffic patterns and consumption trends
Provide regular reporting to product and engineering leadership
Support release management processes with pre- and post-deployment validation checks
Track release health metrics for models, agents, and platform components
Maintain release documentation, runbooks, and operational playbooks
Coordinate with QA, Performance Engineering, and Infrastructure teams during releases
Monitor guardrail effectiveness and escalate anomalies to Responsible AI teams
Track hallucination detection, content safety triggers, and accuracy trends
Support LLM red-teaming efforts by collecting and organizing evaluation data
Maintain audit logs and compliance documentation for AI governance
Act as the operational point of contact for application teams using AI APIs
Coordinate with Security teams on audits and compliance reporting
Partner with Infrastructure teams on capacity planning and resource utilization
Support performance engineering with load test analysis and documentation

Qualification

MLOpsCloud cost managementDashboarding toolsGCPPythonSQLMonitoring toolsAI/ML conceptsAnalytical skillsCommunication skills

Required

2–4 years of experience in an operations, analytics, or technical operations role (MLOps, AIOps, DataOps, Platform Ops, or similar)
Understanding of AI/ML concepts: models, inference, embeddings, vector databases, LLMs, tokens, and prompts
Experience with cloud cost management and FinOps practices
Strong proficiency with dashboarding/visualization tools (Looker, Tableau, Grafana, or similar)
Working knowledge of GCP (required); familiarity with Azure and AWS is a plus
Experience with SQL and basic Python for analysis or scripting
Experience with monitoring/observability tools (Datadog, Prometheus, Grafana, Cloud Monitoring, etc.)
Understanding of APIs and API gateways; ability to read logs and analyze traffic
Strong analytical, troubleshooting, and communication skills
Bachelor's degree in computer science, BIS, MIS, Electrical Engineering, Mechanical Engineering, or related field

Preferred

Hands-on experience with LLM platforms such as Vertex AI, Azure OpenAI, or AWS Bedrock
Familiarity with AI agents and agentic frameworks (AutoGen, LangChain, etc.)
Exposure to MCP (Model Context Protocol) or agent-tool integration patterns
Experience with vector databases and RAG operations
Understanding of the MLOps lifecycle: model registry, versioning, deployment, A/B testing
Experience with APIGEE or similar API management platforms
Familiarity with Responsible AI metrics (hallucination, bias, content safety, guardrails)
FinOps certification or formal cloud cost management experience
Experience supporting enterprise AI platforms with multiple application teams
Familiarity with ML pipeline tools (Kubeflow, MLflow, Vertex AI Pipelines)
Exposure to prompt management and evaluation frameworks
ITIL or similar operational process framework experience
Experience creating runbooks and operational documentation

Company

Integrated Resources, Inc ( IRI )

company-logo
Integrated Resources Inc.

H1B Sponsorship

Integrated Resources, Inc ( IRI ) has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (7)
2024 (5)
2023 (13)
2022 (9)
2021 (25)
2020 (39)

Funding

Current Stage
Late Stage
Company data provided by crunchbase