Apply on Employer Site

Cadence · 1 day ago

AI Senior Staff Systems Engineer (R51191)

San Jose, CA

Full-time

Onsite

Senior Level, Lead/Staff

$136K/yr - $254K/yr

10+ years exp

Cadence is seeking a highly skilled and experienced AI Systems Engineer to join their team. This senior individual contributor role will lead the development, operations, and support of their entire AI infrastructure, focusing on architecting and optimizing high-performance GPU clusters and advanced AI models.

AerospaceElectronic Design Automation (EDA)HardwareMobileSemiconductorSoftware

Growth Opportunities

H1B Sponsor Likely

Responsibilities

AI Infrastructure Architecture & Strategy: Lead the design and implementation of our next-generation AI infrastructure to support our Agentic AI initiatives. You will define the technical strategy for our on-premise GPU clusters, storage solutions, and networking to ensure optimal performance, scalability, and reliability for all our AI workloads

Cloud AI Service Integration: Support and secure the use of public cloud AI services, including Azure OpenAI services and Google Cloud Platform (GCP) services like Gemini . This includes managing secure access, monitoring usage, and tracking billing to ensure cost-effectiveness. You will also have hands-on experience supporting compute, GPUs, and AI services on both GCP and Azure

Hands-on GPU Cluster Management: Take a leadership role in the configuration, installation, and optimization of GPU server clusters. This includes advanced troubleshooting of hardware and software, performance tuning, and implementing best practices for cluster utilization and resource management. You will be an expert in administering job schedulers like LSF in a production environment, including integration with Docker for containerized job submission

Full-Stack AI Tech Stack Development & Operations: Architect and deploy a robust and scalable AI tech stack. You will be responsible for the end-to-end operational lifecycle, including setting up and managing deep learning frameworks ( PyTorch , TensorFlow ), containerization with Docker and Kubernetes , and implementing CI/CD pipelines for AI model development

Advanced LLM Deployment & Optimization: Lead the deployment, serving, and optimization of Large Language Models (LLMs). You will be an expert in techniques such as model quantization, distillation, and using high-performance serving frameworks (e.g., vLLM , TGI , TensorRT-LLM ) to maximize inference throughput and minimize latency

Agentic AI Workflow & Service Engineering: Architect and build production-grade Agentic AI workflows and services. You will be responsible for the technical design and implementation of systems that integrate LLMs with external tools, APIs, and databases, and will mentor other engineers on building robust and scalable AI agent applications

Automation & Monitoring: Develop and maintain automation scripts using languages like Python , Bash , or Perl to streamline system maintenance, deployment, and reporting. Implement and manage monitoring solutions for system health, job statuses, GPU utilization, and container performance to proactively identify and resolve issues

AI Systems Support & Mentorship: Act as the final escalation point for the most complex technical issues related to our AI infrastructure. You will also serve as a technical leader and mentor to other engineers, providing guidance on best practices in AI systems engineering, performance tuning, and operational excellence

Security and Compliance: Develop and implement security best practices for our AI systems and data, ensuring compliance with relevant regulations and protecting our intellectual property

Qualification

NVIDIA GPU architectureAI infrastructure managementCloud AI servicesDeep learning frameworksDockerLinux system administrationScripting languagesProblem-solving skillsCommunication skillsMentorship

Required

10+ years of experience in a senior technical role, with at least 5 years focused on building and operating high-performance computing or AI infrastructure. Proven track record as a Principal or Senior Staff Engineer

Expert-level knowledge of NVIDIA GPU architecture and technologies like CUDA and cuDNN. Extensive experience with multi-GPU and multi-node training and inference

Proven experience with public cloud AI services, specifically managing access, usage, and billing for Azure OpenAI and Google Cloud Platform (GCP) services

Extensive hands-on experience with Docker: image management, container orchestration, and troubleshooting

Proficiency in scripting languages such as Python, Bash, or Perl

Deep expertise in Linux system administration (RHEL preferred), including networking, storage, and performance tuning

Familiarity with user authentication and integration using systems like LDAP or Active Directory

Strong problem-solving and communication skills with the ability to work in a multi-platform, cross-functional, and geographically distributed team

Preferred

Understanding of AI job profiling and tuning (memory, GPU, I/O)

Experience administering LSF clusters in a production or research environment. Familiarity with other job schedulers like Slurm is a plus

Experience with LSF Docker integration and job submission using container images

Experience with macOS/AppleSilicon system admin tasks and troubleshooting

Benefits

Paid vacation and paid holidays

401(k) plan with employer match

Employee stock purchase plan

A variety of medical, dental and vision plan options

And more.

Company

Cadence

Glassdoor4.3

Cadence is a market leader in AI and digital twins, pioneering the application of computational software to accelerate innovation in the engineering design of silicon to systems.

Founded in 1988

San Jose, California, USA

10001+ employees

https://www.cadence.com

H1B Sponsorship

Cadence has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (306)

2024 (221)

2023 (282)

2022 (330)

2021 (233)

2020 (209)

Funding

Current Stage

Public Company

Total Funding

unknown

1998-02-20IPO

Leadership Team

Paul Cunningham

Senior Vice President and General Manager

Tom Beckley

Senior Vice President, Custom IC & PCB Group

Recent News

MarketScreener

Cadence Design Systems to integrate CUDA X technology into chip design software - Nvidia CEO at CES

2026-01-06

EE Times

Cadence’s Conformal AI Studio Redefines Logic Verification for the AI-Driven SoC Era

2025-12-30

eeNews Europe

The 2025 deals reshaping the semiconductor industry

2025-12-30

Company data provided by crunchbase