Senior Cluster Performance Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD ยท 21 hours ago

Senior Cluster Performance Engineer

AMD is committed to building innovative products that enhance computing experiences across various domains. The Senior Cluster Performance Engineer will focus on optimizing GPU clusters, working on performance tuning, benchmarking, and collaborating with cross-functional teams to enhance overall system performance.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

NIC & Performance Optimization: Collaborate with hardware and software teams to enhance the overall performance of GPU clusters, focusing on aspects such as RDMA throughput, latency, and collective communications
Benchmarking and Analysis: Develop and execute comprehensive benchmarking strategies to assess baseline performance, analyze bottlenecks, and identify areas for improvement within GPU cluster environments
Scalability Testing: Evaluate the scalability of GPU clusters by conducting thorough testing under various workloads, ensuring optimal performance across different cluster sizes, configurations, and networking technologies (IB & RoCE)
Performance Profiling: Utilize profiling tools and methodologies to analyze and identify performance bottlenecks, providing actionable insights for improvement
Performance Tuning: Implement optimization strategies, including but not limited to protocol enhancements, load balancing techniques, and parallel processing optimizations
Documentation: Create detailed documentation of performance analysis, tuning efforts, and outcomes, providing clear and concise reports for internal teams and stakeholders
Collaboration: Work closely with cross-functional teams, including hardware engineers, software developers, and system architects, to integrate performance improvements into the GPU cluster architecture
Continuous Learning: Stay current with the latest developments in GPU architectures, parallel processing, and emerging technologies to drive continuous improvement in GPU cluster performance

Qualification

GPU architecturesPerformance tuningRDMA network driversParallel computingScripting languagesPerformance analysis toolsLinux kernel networkingMachine learningHPC system designProblem-solving skillsCollaboration skillsDocumentation skills

Required

Bachelors or Master's degree in computer science or equivalent experience

Preferred

Proven experience in optimizing the performance of GPU clusters
Understanding of RDMA network drivers
Strong understanding of GPU architectures, parallel computing concepts, and network protocols
Proficiency in scripting languages (e.g., Python, Bash) for automation and performance analysis
Experience with system level performance analysis tools and methodologies for GPU clusters
Analytical mindset with excellent problem-solving and debug skills
Familiarity with cluster management tools and systems
Excellent communication and collaboration skills for effective teamwork
RDMA network configuration, troubleshooting and performance tuning
Linux kernel networking expertise
Machine learning and/or HPC system design

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase