CentML · 3 hours ago
Senior Software Engineer, Infrastructure
Maximize your interview chances
Artificial Intelligence (AI)Enterprise Software
Insider Connection @CentML
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Design and lead the development of the deployment infrastructure of the CentML platform. The deployment infrastructure manages the hardware resources necessary to deploy the ML training and inference applications.
Implementing GPU cluster scheduling solutions for large scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster.
Communicate with our product teams and define new features and goals for improving the CentML platform.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
4+ years of experience working with containerized deployment systems (e.g, kubernetes, openshift, terraform etc.)
Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure
Strong coding skills in languages like Python, Java, Go, and/or C/C++
Preferred
A big plus if you have contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O
Past experience in building GPU clusters for large scale ML training and inference is desirable
Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable
Benefits
An open and inclusive culture and work environment
Twice a week free-lunch
Full health and dental benefits
Parental Leave top-up for 6 months
Gym membership
Continuous education budget
Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask
Employee stock options
Best-in-class medical and dental benefits
Parental Leave top-up for 6 months
Professional development budget
Flexible vacation time to promote a healthy work-life blend
Company
CentML
CentML accelerates Machine Learning workloads by maximizing training and inference efficiency leading to compute cost reduction.
Funding
Current Stage
Early StageTotal Funding
$30.31MKey Investors
GradientRadical Ventures
2023-10-25Seed· $26.81M
2022-06-30Pre Seed· $3.5M
Recent News
2024-02-22
techtaffy.com
2024-02-22
2024-02-22
Company data provided by crunchbase