Tenstorrent · 13 hours ago
Staff Engineer, HPC Infrastructure
Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. They are seeking a Staff HPC Engineer who will design and maintain automated bare-metal provisioning pipelines and ensure the performance and reliability of RHEL/Ubuntu systems as compute demands scale.
AI InfrastructureApplication Specific Integrated Circuit (ASIC)Artificial Intelligence (AI)ElectronicsMachine LearningSemiconductor
Responsibilities
Design and maintain automated bare-metal provisioning pipelines that deploy hundreds of compute nodes globally with consistent configurations
Implement infrastructure-as-code practices using Ansible to manage large-scale OS configuration across diverse hardware platforms
Own the lifecycle management of RHEL and Ubuntu systems—from initial deployment through patching, upgrades, and performance tuning
Build automation and tooling to streamline provisioning, patching, and system updates as the compute environment scales
Troubleshoot OS-level issues, optimize kernel parameters, and resolve system performance bottlenecks that impact EDA workflows
Work directly with hardware design teams to standardize system configurations, toolchains, and development environments
Deploy and lifecycle manage systems across Tenstorrent's global engineering sites, ensuring consistency and reliability
Qualification
Required
Deep experience with IBM Spectrum LSF or similar workload managers
Strong background in commercial HPC storage platforms such as Pure Storage FlashBlade, Weka, NetApp, etc
Hands-on experience with container technologies (Docker, Singularity, Podman)
Solid Linux system administration skills
Understanding of HPC networking, storage architectures, and job scheduling
Ability to diagnose and resolve complex infrastructure issues independently
Comfortable working in a startup environment with rapidly changing requirements
Design and maintain automated bare-metal provisioning pipelines that deploy hundreds of compute nodes globally with consistent configurations
Implement infrastructure-as-code practices using Ansible to manage large-scale OS configuration across diverse hardware platforms
Own the lifecycle management of RHEL and Ubuntu systems—from initial deployment through patching, upgrades, and performance tuning
Build automation and tooling to streamline provisioning, patching, and system updates as the compute environment scales
Troubleshoot OS-level issues, optimize kernel parameters, and resolve system performance bottlenecks that impact EDA workflows
Work directly with hardware design teams to standardize system configurations, toolchains, and development environments
Deploy and lifecycle manage systems across Tenstorrent's global engineering sites, ensuring consistency and reliability
Preferred
Experience supporting EDA tools and hardware design workflows in production HPC environments
Hands-on expertise with commercial HPC storage platforms (Pure Storage, Weka, NetApp) and workload managers (LSF, Slurm)
Container technologies (Docker, Singularity, Podman) for reproducible compute environments at scale
Advanced provisioning techniques (PXE boot, kickstart, cloud-init) and modern infrastructure automation patterns
Cluster monitoring and observability tools (Prometheus, Grafana) for managing thousands of compute nodes
Security hardening and compliance frameworks for multi-tenant semiconductor design environments
Integration of open-source and commercial tools to improve provisioning efficiency and reliability
Work in a deeply technical environment solving infrastructure challenges that directly impact chip design velocity
Benefits
Highly competitive compensation package and benefits
Company
Tenstorrent
Tenstorrent develops AI hardware and software solutions for data processing and machine learning application.
Funding
Current Stage
Late StageTotal Funding
$1.03BKey Investors
FidelityEPIQ Capital GroupEclipse Ventures
2024-12-02Series D· $693M
2023-08-02Series Unknown· $100M
2021-05-20Series C· $200M
Recent News
2026-01-07
TechRadar.com
2026-01-07
2026-01-06
Company data provided by crunchbase