AI Platforms Leader Enterprise AI Platforms jobs in United States
cer-icon
Apply on Employer Site
company-logo

Qualcomm · 3 hours ago

AI Platforms Leader Enterprise AI Platforms

Qualcomm is a leading technology company seeking an experienced AI Platforms Leader to own the strategy, architecture, and operation of their end-to-end AI Platform. This role involves leading a high-caliber engineering team to deliver reliable infrastructure for AI/ML applications and ensuring alignment with business needs and cost efficiency.

Artificial Intelligence (AI)Generative AISoftwareTelecommunicationsWireless
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Own the AI Platform strategy & roadmap
Define the multi‑year vision for a multi‑tenant, hybrid (on‑prem + cloud) AI platform, aligned to business needs, developer productivity, and cost efficiency
Establish clear platform SLAs/SLOs, reliability goals, and security/compliance guardrails
Run GPU-based compute at scale
Operate and optimize on‑prem GPU clusters (e.g., Kubernetes + GPU operator and/or Slurm), including capacity planning, scheduling, partitioning, NCCL, and high‑throughput storage/networking
Drive GPU utilization efficiency, right‑sizing, and cost transparency across training and inference workloads
Deliver MLOps & LLMOps as a product
Provide golden paths for data prep, training/fine‑tuning, model registry, lineage, governance, evaluation, red‑teaming, and safe deployment (batch, online, streaming)
Implement CI/CD for models, prompts, and agents; automate evaluations and rollout/rollback with canaries, A/B, and shadow deployments
Agentic AI, A2A, and MCP ecosystem
Lead the design and operation of agentic orchestration (A2A patterns), tool integration, and MCP (Model Context Protocol) servers to securely expose enterprise tools and data
Standardize agent capability schemas, guardrails, observability, and policy enforcement
Cloud AI/ML platforms
Leverage AWS/Azure AI services for training and inference (e.g., Bedrock/SageMaker/EKS; Azure AI Studio/Azure ML/AKS/Azure OpenAI) with robust networking, identity, secrets, and cost controls
Establish multi‑cloud patterns for portability, resilience, and vendor risk management
Platform engineering & DevOps excellence
Own core platform services: identity/RBAC, secrets, service meshes, observability (logs/metrics/traces), data access controls, vector stores, feature stores, and model gateways (e.g., KServe/Triton/vLLM)
Use GitOps/IaC (Terraform/Bicep/Helm) and secure software supply chain practices (SBOMs, image signing, policy as code)
Operational leadership
Lead a ~10‑engineer global team (platform, SRE, MLOps/LLMOps) with global collaboration, 24×7 readiness, and a healthy on‑call rotation
Drive incident response, post‑mortems, and continuous improvement. Partner with Security, Legal, and Compliance for model/data governance
Stakeholder & vendor management
Partner with product, data, and application teams to enable high‑impact AI use cases
Manage strategic vendors (e.g., cloud, GPU, enterprise AI tooling) and negotiate licenses/SOWs aligned to roadmap and budget

Qualification

AI/ML platform architectureGPU cluster operationsMLOps & LLMOpsCloud services (AWS/GCP/Azure)DevOps/Platform EngineeringAgentic AI & MCPOperational excellenceGlobal collaborationProgramming languagesSecurity & governanceLeadershipCommunication skills

Required

15+ years overall engineering/technology experience, including ~10 years building and operating large‑scale platforms (AI/ML, data, or high‑performance computing)
Leadership: Proven experience leading a team of ~10 engineers for 5+ years, across platform/SRE/MLOps/LLMOps, with coaching, hiring, performance management, and clear execution rhythms
GPU cluster expertise: Hands‑on operations for on‑prem GPU clusters (Kubernetes + GPU operator and/or Slurm), scheduling, capacity planning, performance tuning, and reliability
MLOps & LLMOps: Strong experience with model lifecycle (data → training → registry → deployment), model/agent evaluation, safety/guardrails, and observability
Cloud (AWS/GCP/Azure): Deep experience with AI/ML services and managed Kubernetes (EKS/AKS/GKE), networking, security, identity, and cost management
DevOps/Platform Engineering: CI/CD, GitOps, IaC (Terraform/Bicep/Helm), containerization (Docker), Kubernetes, and secure SDLC practices
Agentic AI & MCP: Solid understanding of agent orchestration, A2A patterns, tool abstractions, and operating MCP servers in production
Operational excellence: Demonstrated success running AI or computing clusters with SLOs, on‑call, incident management, and post‑mortems
Global collaboration: Experience leading a distributed engineering team across time zones
Education: Bachelor's degree in Engineering, Computer Science, or related field
Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 8+ years of Software Engineering or related work experience
Master's degree in Engineering, Information Systems, Computer Science, or related field and 7+ years of Software Engineering or related work experience
PhD in Engineering, Information Systems, Computer Science, or related field and 6+ years of Software Engineering or related work experience
4+ years of work experience with Programming Language such as C, C++, Java, Python, etc

Preferred

Master's or PhD in CS/EE/Math or related field
Experience with: Training & Inference stacks: PyTorch, CUDA/cuDNN, Triton Inference Server, vLLM, KServe, Ray, Slurm
Data & storage: High‑throughput storage (e.g., Lustre, BeeGFS, Ceph), vector databases (e.g., FAISS, Milvus, Pinecone, Azure AI Search), feature stores (e.g., Feast)
MLOps toolchain: MLflow/Vertex/Azure ML/SageMaker registries, Airflow/Argo, Weights & Biases, LangSmith, Prompt/version management
Security & governance: OIDC/RBAC, policy as code (OPA), secrets management (AWS Secrets Manager/Azure Key Vault), model governance/risk controls, privacy/PII safeguards
Agentic frameworks: Semantic Kernel, LangChain, CrewAI, AutoGen (or equivalents) and experience integrating enterprise tools via MCP
Proven track record shipping platform capabilities that enable multiple product teams (self‑service, docs, SDKs, templates, golden paths)
Strong communication with executives and technical leaders; clear metrics, dashboards, and business value storytelling

Benefits

Competitive annual discretionary bonus program
Opportunity for annual RSU grants
Highly competitive benefits package

Company

Qualcomm

company-logo
Qualcomm designs wireless technologies and semiconductors that power connectivity, communication, and smart devices.

H1B Sponsorship

Qualcomm has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2013)
2024 (1910)
2023 (3216)
2022 (2885)
2021 (2104)
2020 (1181)

Funding

Current Stage
Public Company
Total Funding
$3.5M
1991-12-20IPO
1988-01-01Undisclosed· $3.5M

Leadership Team

leader-logo
Cristiano Amon
President & CEO
linkedin
I
Isaac Eteminan
CEO
linkedin
Company data provided by crunchbase