SIGN IN
Senior Software Development Engineer in Test (SDET) - AI Cluster jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cerebras · 1 week ago

Senior Software Development Engineer in Test (SDET) - AI Cluster

Cerebras Systems builds the world's largest AI chip, providing significant advancements in AI compute power. The Senior Software Development Engineer in Test will innovate and execute tests on cutting-edge AI infrastructure, ensuring reliability and security for large-scale deployments.
AI InfrastructureArtificial Intelligence (AI)ComputerHardwareRISCSemiconductorSoftware
check
Growth Opportunities

Responsibilities

You will be hired to innovate and execute tests on cutting edge AI infrastructure. Be a thinker, define optimized test strategies and methodologies
Cerebras is growing and innovating at a rapid pace and so is the ML community and AI models. Be a quick learner, adapt to new technologies, and bring your expertise. We are looking to hire a team with a diverse skill set
Deep understanding of how large-scale distributed ML training and inference works. Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
Automate first approach - In large scale deployment, automation drives efficiency and scalability. Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect

Qualification

PythonGolangC/C++Debugging toolsCloud technologiesKubernetesMachine LearningOperating systemsDatacenter hardwareSoft skills

Required

Bachelor's or master's degree in engineering in computer science, electrical, AI, data science or related field
5+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
Strong coding skills in one of the programming languages like python, golang and C/C++
Strong debugging skills to debug issues in large distributed systems, hardware, and software
Experience with debugging tools like pdb, gdb, strace and network monitors
Strong understanding of operating systems internals like memory management, file system working, security and performance
Strong understanding of datacenter layout, device performance characteristics like Servers, Memory, BIOS, PCIe, networking and storage

Preferred

Experience with cloud technologies like AWS, kubernetes and dockers
Monitoring tools like grafana, prometheus is huge plus
Understanding and experience of ML model training and inference is a huge plus
Understand of ML hardware accelerators like GPU, custom accelerator ASIC is a huge plus

Company

Cerebras

twittertwittertwitter
company-logo
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Funding

Current Stage
Late Stage
Total Funding
$2.82B
Key Investors
Tiger Global ManagementAtreides Management,FidelityAlpha Wave Ventures
2026-02-04Series H· $1B
2025-12-03Secondary Market
2025-09-30Series G· $1.1B

Leadership Team

leader-logo
Andrew Feldman
Founder and CEO
linkedin
leader-logo
Bob Komin
Chief Financial Officer
linkedin
Company data provided by crunchbase