SIGN IN
Senior Software Development Engineer in Test (SDET) - AI Cluster jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cerebras · 1 week ago

Senior Software Development Engineer in Test (SDET) - AI Cluster

Cerebras Systems builds the world's largest AI chip, offering innovative solutions for AI compute power. The Senior Software Development Engineer in Test (SDET) will be responsible for innovating and executing tests on cutting-edge AI infrastructure, ensuring high reliability and performance across large deployments.
AI InfrastructureArtificial Intelligence (AI)ComputerHardwareRISCSemiconductorSoftware
check
Growth Opportunities

Responsibilities

You will be hired to innovate and execute tests on cutting edge AI infrastructure
Be a thinker, define optimized test strategies and methodologies
Cerebras is growing and innovating at a rapid pace and so is the ML community and AI models
Be a quick learner, adapt to new technologies, and bring your expertise
We are looking to hire a team with a diverse skill set
Deep understanding of how large-scale distributed ML training and inference works
Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested
Automate first approach - In large scale deployment, automation drives efficiency and scalability
Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security
Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability
Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana
Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect

Qualification

Distributed systemsAutomated testingDebugging toolsCloud technologiesProgramming languagesOperating systems internalsDatacenter layoutMachine learningSoft skills

Required

Bachelor's or master's degree in engineering in computer science, electrical, AI, data science or related field
5+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software
Strong coding skills in one of the programming languages like python, golang and C/C++
Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like pdb, gdb, strace and network monitors
Strong understanding of operating systems internals like memory management, file system working, security and performance
Strong understanding of datacenter layout, device performance characteristics like Servers, Memory, BIOS, PCIe, networking and storage
Experience with cloud technologies like AWS, kubernetes and dockers

Preferred

Monitoring tools like grafana, prometheus is huge plus
Understanding and experience of ML model training and inference is a huge plus
Understand of ML hardware accelerators like GPU, custom accelerator ASIC is a huge plus

Company

Cerebras

twittertwittertwitter
company-logo
Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI.

Funding

Current Stage
Late Stage
Total Funding
$2.82B
Key Investors
Tiger Global ManagementAtreides Management,FidelityAlpha Wave Ventures
2026-02-04Series H· $1B
2025-12-03Secondary Market
2025-09-30Series G· $1.1B

Leadership Team

leader-logo
Andrew Feldman
Founder and CEO
linkedin
leader-logo
Bob Komin
Chief Financial Officer
linkedin
Company data provided by crunchbase