DataCrunch · 1 month ago
Senior / Principal Site Reliability Engineer
DataCrunch is building a fully featured European AI cloud focused on providing low-cost access to intelligence. They are seeking a Senior or Principal Site Reliability Engineer to strengthen and scale their HPC and cloud infrastructure, ensuring systems remain reliable, observable, and highly performant.
Artificial Intelligence (AI)Information TechnologyMachine LearningSoftware
Responsibilities
Ensure the reliability, scalability, and performance of HPC and cloud systems
Build and maintain automation, observability, and monitoring frameworks for compute clusters
Collaborate with ML, data, and infrastructure teams to deliver high-availability systems
Develop and enhance CI/CD pipelines, deployment workflows, and on-call processes
Participate in architecture design and long-term infrastructure strategy discussions
Participate in a 24/7 on-call rotation, with at least one full on-call week per month
Qualification
Required
7+ years in SRE, DevOps, or Infrastructure Engineering—preferably in HPC or large-scale distributed systems
Linux expertise (Ubuntu or Debian preferred)
Strong experience with scripting and automation (Python, Go, Bash)
Proven ability with cloud platforms (AWS, GCP, Azure, or modern HPC providers such as CoreWeave, Lambda, Nebius)
Deep understanding of networking (DNS/TCP) and infrastructure-as-code tools (Terraform, Ansible)
Experience managing Slurm-based HPC GPU clusters, diagnosing performance issues, and designing efficient HPC jobs
Benefits
Generous cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.)
Company
DataCrunch
DataCrunch.io is a fresh cloud service provider, our main focus is providing our own infrastructure for machine learning.
Funding
Current Stage
Growth StageTotal Funding
$78.56MKey Investors
byFounders
2025-09-08Series A· $64.47M
2025-09-08Debt Financing
2024-10-21Seed· $7.6M
Recent News
2025-11-25
globallegalchronicle.com
2025-09-28
Company data provided by crunchbase