Tenstorrent · 2 weeks ago
Staff Engineer, HPC Systems Software
Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. We are seeking a HPC Systems Engineer to architect and maintain the operating system foundation that powers our global hardware design infrastructure.
AI InfrastructureApplication Specific Integrated Circuit (ASIC)Artificial Intelligence (AI)ElectronicsMachine LearningSemiconductor
Responsibilities
Design and maintain automated OS deployment pipelines for bare-metal HPC clusters globally
Manage large-scale configuration management using Ansible to ensure consistency across compute infrastructure
Deploy and lifecycle manage RHEL and Ubuntu systems across diverse hardware platforms
Implement infrastructure-as-code for repeatable, version-controlled system configurations
Troubleshoot OS-level issues, optimize kernel parameters, and resolve system performance bottlenecks
Collaborate with hardware design teams to standardize system configurations, toolchains, and development environments
Build automation and tooling to streamline provisioning, patching, and system updates at scale
Qualification
Required
Experienced in RHEL and Ubuntu administration at HPC or large-scale compute environments
Highly skilled in Ansible for automation and configuration management across hundreds of nodes
Proficient with bare-metal provisioning systems (MAAS, Foreman, Cobbler, Warewulf, or similar)
Deep understanding of Linux system internals, networking, kernel tuning, and performance troubleshooting
Familiar with HPC cluster architecture, workflows, and infrastructure-as-code practices
Capable of diagnosing and resolving complex infrastructure issues independently in fast-paced environments
Design and maintain automated OS deployment pipelines for bare-metal HPC clusters globally
Manage large-scale configuration management using Ansible to ensure consistency across compute infrastructure
Deploy and lifecycle manage RHEL and Ubuntu systems across diverse hardware platforms
Implement infrastructure-as-code for repeatable, version-controlled system configurations
Troubleshoot OS-level issues, optimize kernel parameters, and resolve system performance bottlenecks
Collaborate with hardware design teams to standardize system configurations, toolchains, and development environments
Build automation and tooling to streamline provisioning, patching, and system updates at scale
Preferred
Hands-on experience with IBM Spectrum LSF or similar HPC workload managers
Integration with commercial HPC storage platforms (Pure Storage, Weka, NetApp, DDN, Vast Data)
Deep exposure to EDA tools and hardware design workflows in semiconductor development
Container technologies (Docker, Singularity, Podman) for reproducible compute environments
Cluster monitoring and observability at scale using Prometheus, Grafana, and custom tooling
Advanced provisioning techniques including PXE boot, kickstart, cloud-init, and BMC/IPMI integration
Security hardening and compliance frameworks for multi-tenant HPC environments
Python and bash scripting for production-level infrastructure automation
Benefits
Highly competitive compensation package and benefits
Company
Tenstorrent
Tenstorrent develops AI hardware and software solutions for data processing and machine learning application.
Funding
Current Stage
Late StageTotal Funding
$1.03BKey Investors
FidelityEPIQ Capital GroupEclipse Ventures
2024-12-02Series D· $693M
2023-08-02Series Unknown· $100M
2021-05-20Series C· $200M
Recent News
2026-01-07
TechRadar.com
2026-01-07
2026-01-06
Company data provided by crunchbase