HPC/AI Platform Engineering jobs in United States
cer-icon
Apply on Employer Site
company-logo

Eli Lilly and Company · 4 months ago

HPC/AI Platform Engineering

Eli Lilly and Company is a global healthcare leader headquartered in Indianapolis, Indiana, dedicated to improving lives through innovative medicines. The company is seeking an expert in HPC and AI platform engineering to drive the engineering and operations of advanced Linux platforms supporting AI and HPC workloads, while optimizing infrastructure for AI/ML applications.

BiotechnologyHealth CareMedicalPharmaceutical
check
H1B Sponsor Likelynote

Responsibilities

Driving the engineering and operations of advanced Linux platforms supporting AI and HPC workloads
Managing Nvidia DGX systems using Mission Control, Base Command and Run:AI
Optimizing Spectrum X networking and WEKA storage for AI/ML applications
Leading the strategy, engineering and development of Advanced Linux computing capabilities for AI/ML
Advising with senior Linux platform engineer on global Linux strategy for on-premises private cloud and public IaaS Linux services

Qualification

Linux system administrationNvidia DGX server managementAI/ML workloadsSpectrum X networkingWeka Storage integrationScripting skillsContainerizationHigh Performance ComputingInfrastructure as CodeDistributed training workloadsSoft skills

Required

Expertise in Linux system administration, HPC environments, and Nvidia DGX server management
Experience with Spectrum X networking and parallel file systems is essential
Strong scripting skills and familiarity with containerization and automation tools are highly valued
6+ years of demonstrated experience in AI/ML and HPC workloads and infrastructure
Hands-on experience in using or operating High Performance Computing (HPC) grade infrastructure
In-depth knowledge of accelerated computing (e.g., GPU), storage (e.g., Weka), scheduling & orchestration (e.g., Slurm, Kubernetes, LSF), high-speed networking (e.g., Ultra-Ethernet, RoCE), and containers technologies (Docker)
Passion for continual learning and keeping abreast of new technologies and effective approaches in the AI/ML infrastructure field
Expertise in running and optimizing large-scale distributed training workloads using PyTorch (DDP, FSDP), NeMo, or JAX
Possess a deep understanding of AI/ML workflows, encompassing data processing, model training, and inference pipelines
Some proficiency in at least one scripting language such as Bash, Python, or equivalent
Bachelor's degree in computer science, Information Technology, or related technical field
7+ years' experience as a Linux OS/ Platform Engineer
Demonstrated experience leading a global large-scale Infrastructure project

Benefits

Company-sponsored 401(k)
Pension
Vacation benefits
Medical, dental, vision and prescription drug benefits
Flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts)
Life insurance and death benefits
Certain time off and leave of absence benefits
Well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities)

Company

Eli Lilly and Company

company-logo
We're a medicine company turning science into healing to make life better for people around the world.

H1B Sponsorship

Eli Lilly and Company has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (514)
2024 (236)
2023 (167)
2022 (133)
2021 (57)
2020 (52)

Funding

Current Stage
Public Company
Total Funding
$6.5M
2024-02-12Post Ipo Debt· $6.5M
1978-01-13IPO

Leadership Team

leader-logo
David Ricks
Chair, CEO
linkedin
leader-logo
Lucas Montarce
Executive Vice President and Chief Financial Officer
linkedin
Company data provided by crunchbase