Hewlett Packard Enterprise · 5 hours ago
AI/HPC Networking Software Engineer
Maximize your interview chances
Data CenterEnterprise Software
Actively Hiring
Insider Connection @Hewlett Packard Enterprise
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Engage and work with the GPU/CPU vendors, customers, AI ISV and open-source SW communities to validate, tune, and enable high performance AI applications on the Slingshot Ethernet fabric.
Work on partner engagements for the leading communication libraries, middleware and frameworks used in AI development today (NCCL, RCCL, UCX, OneCCL. Pytorch, etc.).
Design, implement and maintain system software that enables communication between GPUS, CPUs, and storage in scale out AI and HPC systems. Work with all the leading architectures and vendors in the AI and Data Center markets – Nvidia, AMD, Intel.
Work with the OEM, ODM, and VAR channels vendors on bring Slingshot to a broader set of customers. Validate and tune applications driving those engagements.
Develop and own HPE product usage support, upstreaming and community engagements, and internal testing and infrastructure.
Work with cross-disciplinary teams to understand business requirements and align software direction to meet those needs.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Bachelor’s/master's degree in computer science, engineering, or related field
3+ years of relevant experience with software development and/or architecture in the Data Center, university, government lab, or AI-centric environments.
Familiarity with AI/ML networking software development with an emphasis on performance analysis, tuning, and deployment in a scale-out compute cluster environment
Ability to participate and own pieces of the product release pipeline up to and including package integration and support.
Understanding of networking architecture and communications including Ethernet and InfiniBand networking technologies
Understanding of computer architecture, and familiarity with the fundamentals of GPU architecture. Experience with Nvidia and AMD GPU infrastructure and software stacks.
Programming and debugging skills in C, C++ and or Python. Ability to understand how applications and industry middleware/libraries work in Slingshot enabled systems and identify strategies and ideas for allowing these applications to work to customer expectations.
Knowledge of user-based networking and OFI libfabric software interfaces and APIs.
Preferred
Cloud Architectures
Cross Domain Knowledge
Design Thinking
Development Fundamentals
DevOps
Distributed Computing
Microservices Fluency
Full Stack Development
Security-First Mindset
Solutions Design
Testing & Automation
User Experience (UX)
Benefits
Health & Wellbeing
Personal & Professional Development
Diversity, Inclusion & Belonging
Company
Hewlett Packard Enterprise
Hewlett Packard Enterprise is an edge-to-cloud company that uses comprehensive solutions to accelerate business outcomes.
Funding
Current Stage
Public CompanyTotal Funding
$1.35B2024-09-10Post Ipo Equity· $1.35B
2015-11-02IPO· undefined
Recent News
2024-11-24
Company data provided by crunchbase