AMD ยท 14 hours ago
Senior Cluster Performance Engineer
AMD is committed to building innovative products that enhance computing experiences across various domains. The Senior Cluster Performance Engineer will focus on optimizing GPU clusters, working on performance tuning, benchmarking, and collaborating with cross-functional teams to enhance overall system performance.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
NIC & Performance Optimization: Collaborate with hardware and software teams to enhance the overall performance of GPU clusters, focusing on aspects such as RDMA throughput, latency, and collective communications
Benchmarking and Analysis: Develop and execute comprehensive benchmarking strategies to assess baseline performance, analyze bottlenecks, and identify areas for improvement within GPU cluster environments
Scalability Testing: Evaluate the scalability of GPU clusters by conducting thorough testing under various workloads, ensuring optimal performance across different cluster sizes, configurations, and networking technologies (IB & RoCE)
Performance Profiling: Utilize profiling tools and methodologies to analyze and identify performance bottlenecks, providing actionable insights for improvement
Performance Tuning: Implement optimization strategies, including but not limited to protocol enhancements, load balancing techniques, and parallel processing optimizations
Documentation: Create detailed documentation of performance analysis, tuning efforts, and outcomes, providing clear and concise reports for internal teams and stakeholders
Collaboration: Work closely with cross-functional teams, including hardware engineers, software developers, and system architects, to integrate performance improvements into the GPU cluster architecture
Continuous Learning: Stay current with the latest developments in GPU architectures, parallel processing, and emerging technologies to drive continuous improvement in GPU cluster performance
Qualification
Required
Bachelors or Master's degree in computer science or equivalent experience
Preferred
Proven experience in optimizing the performance of GPU clusters
Understanding of RDMA network drivers
Strong understanding of GPU architectures, parallel computing concepts, and network protocols
Proficiency in scripting languages (e.g., Python, Bash) for automation and performance analysis
Experience with system level performance analysis tools and methodologies for GPU clusters
Analytical mindset with excellent problem-solving and debug skills
Familiarity with cluster management tools and systems
Excellent communication and collaboration skills for effective teamwork
RDMA network configuration, troubleshooting and performance tuning
Linux kernel networking expertise
Machine learning and/or HPC system design
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
H1B Sponsorship
AMD has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
Morningstar.com
2026-01-11
Company data provided by crunchbase