Graphics Processing Unit (GPU) Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Leidos · 3 months ago

Graphics Processing Unit (GPU) Engineer

Leidos is a company that delivers innovative solutions through its diverse and talented workforce. They are seeking a highly skilled Graphics Processing Unit (GPU) Engineer to design, develop, and optimize GPU clusters that power enterprise AI for mission customers, ensuring smooth integration with Linux-based systems and optimizing performance.

ComputerGovernmentInformation ServicesInformation TechnologyNational SecuritySoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Design, configure, and maintain GPU Clusters
Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements
Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems
Optimize GPU drivers for compatibility, reliability, and performance
Provide regular maintenance and updates
Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers
Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments
Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt
Maintain technical documentation, architectural specifications, and Linux best practices
Support ATO (Authority to Operate) and ensure compliance with federal security standards

Qualification

GPU Cluster EngineeringLinux expertiseNVIDIA GPU managementPerformance OptimizationKubernetes managementScripting BashScripting PythonCompliance knowledgeProblem-solvingCollaborationTechnical documentation

Required

Bachelor's or higher degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree
10+ years of relevant systems engineering experience
Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s)
Knowledge of enterprise server components (storage/network controllers, HBA, SSDs)
Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky)
Excellent problem-solving skills and the ability to collaborate within a team
Candidate must, at a minimum, meet DoD 8570.11- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP)
Active TS/SCI clearance with Polygraph required OR active TS/SCI and willingness to obtain and maintain a Poly
US Citizenship is required due to the nature of the government contracts we support

Preferred

Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow)
Familiarity with GPU virtualization and cloud computing
Experience with Prometheus/Grafana for monitoring
Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)

Company

Leidos is a Fortune 500® innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Funding

Current Stage
Public Company
Total Funding
unknown
2025-02-20Post Ipo Debt
2013-09-17IPO

Leadership Team

leader-logo
James Carlini
Chief Technology Officer
linkedin
leader-logo
Theodore Tanner
Chief Technology Officer
linkedin
Company data provided by crunchbase