Senior Libraries CI Test Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 9 hours ago

Senior Libraries CI Test Engineer

AMD is a company focused on building great products that accelerate next-generation computing experiences. The Senior Libraries CI Test Engineer will validate Communication libraries for high-performance computing and machine learning workloads, designing comprehensive testing strategies to ensure quality and performance.

Artificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design, develop, and execute comprehensive test plans, test cases, and test scripts (functional, performance, stress, and regression) for AMD's RCCL (an open-source, GPU-accelerated communication collective middleware) and related technologies
Validate networking features for multi-GPU and multi-node communication libraries, focusing on reliability, throughput, and latency
Establish and maintain automated test frameworks using languages like Python to ensure continuous integration and quality gates
Benchmark and profile the libraries on single-GPU, multi-GPU, and clustered systems to verify performance optimizations and identify regressions
Isolate, report, and track defects with clear, detailed, and reproducible steps, collaborating closely with development engineers to expedite resolution
Deploy the libraries on large clusters and participate in debugging complex, system-level issues that span across different layers of the software stack: GPU kernel drivers, NIC drivers, etc
Contribute to high-quality test documentation and participate in reviews of design and architectural specifications to ensure testability

Qualification

Software testingTest automationPythonLinux/UNIXNetworking technologiesCollective communication librariesAgile/Scrum methodologiesAttention to detailCommunication skillsCollaboration

Required

Design, develop, and execute comprehensive test plans, test cases, and test scripts (functional, performance, stress, and regression) for AMD's RCCL (an open-source, GPU-accelerated communication collective middleware) and related technologies
Validate networking features for multi-GPU and multi-node communication libraries, focusing on reliability, throughput, and latency
Establish and maintain automated test frameworks using languages like Python to ensure continuous integration and quality gates
Benchmark and profile the libraries on single-GPU, multi-GPU, and clustered systems to verify performance optimizations and identify regressions
Isolate, report, and track defects with clear, detailed, and reproducible steps, collaborating closely with development engineers to expedite resolution
Deploy the libraries on large clusters and participate in debugging complex, system-level issues that span across different layers of the software stack: GPU kernel drivers, NIC drivers, etc
Contribute to high-quality test documentation and participate in reviews of design and architectural specifications to ensure testability
You are accustomed to working in a dynamic, geographically distributed agile team, where partnership and collaboration are paramount
You possess excellent written and verbal communication skills, a meticulous attention to detail, and the ability to express your work in a clear, cohesive fashion
You are results-oriented and accustomed to tight deadlines and changing priorities
You are constantly thinking of ways to break software and ensure optimal performance and defect-free execution across various hardware configurations
B.Sc. or B.Eng. degree in Computer Science, Software Engineering, Electrical Engineering, or equivalent

Preferred

Strong background in software testing and quality assurance methodologies, including test automation, performance testing, and system-level validation
Proficiency in developing test scripts and automation frameworks using Python and Shell scripting
Experience with Linux/UNIX environments and cluster-computing concepts
Familiarity with network technologies relevant to HPC, such as RoCE (RDMA over Converged Ethernet), Libfabric, and InfiniBand
In-depth knowledge of best-practices in software quality assurance, including testing types, regression analysis, defect tracking (e.g., JIRA), and version control (e.g., Git)
Experience with collective communication libraries like MPI, RCCL, or SHMEM
Understanding of the software development lifecycle (SDLC) and experience working within Agile/Scrum methodologies
Advanced degrees, such as M.Sc., M.Eng., Ph.D., are a plus

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase