Rivos Inc. · 6 hours ago
DL Communications Collectives SW Engineer
Maximize your interview chances
HardwareIndustrial Manufacturing
H1B Sponsor Likely
Insider Connection @Rivos Inc.
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Build-up communication components of an AI Software Stack
Port AI Software to run on a new H/W platform
Profiling and tuning of communications within AI applications
Design, develop, and optimize communication collectives (e.g., AllReduce, AllGather, Broadcast, ReduceScatter) for large-scale distributed computing and machine learning frameworks.
Implement and optimize communication algorithms (ring, tree, butterfly, etc.) tailored for our architectures and multi-node clusters.
Ensure low-latency, high-bandwidth communication across multi-GPU setups, supporting interconnects such as PCIe and Infiniband.
Collaborate with hardware engineers and other software teams to optimize performance.
Implement fault tolerance and scalability mechanisms in distributed systems to handle large-scale workloads.
Write unit tests and benchmark tools to validate the performance and correctness of collective operations.
Stay current with advancements in hardware and networking technologies to continuously improve the library's performance.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Strong understanding of GPU architectures (CUDA, AMD ROCm) and experience in GPU programming (CUDA, HIP, or similar)
Proficiency in designing and implementing parallel and distributed algorithms, particularly communication collectives
Experience with network interconnects (NVLink, PCIe, Infiniband, RDMA) and understanding of their performance implications
Hands-on experience with communication collectives libraries like UCC, NCCL, or MPI
Strong knowledge of concurrency, synchronization, and memory consistency models in multi-threaded and distributed environments
Experience with profiling and optimizing low-level performance (memory bandwidth, latency, throughput) on GPU architectures
Familiarity with deep learning frameworks (TensorFlow, PyTorch, etc.) and their use of communication collectives
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment
Network driver experience recommended
Excellent skills in problem solving, written and verbal communication
Strong organization skills, and highly self-motivated
Ability to work well in a team and be productive under aggressive schedules
Bachelor’s, Master’s, or PhD in Computer Engineering, Software Engineering or Computer Science
Preferred
Experience with NumPy, PyTorch, TensorFlow or JAX
Experience with Rust
Experience with CUDA, OpenCL, OpenGL, or SYCL
Coursework or experience with Machine Learning algorithms
Company
Rivos Inc.
Rivos, a high performance RISC-V System Startup targeting integrated system solutions for Enterprise
H1B Sponsorship
Rivos Inc. has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (24)
2022 (45)
2021 (9)
Funding
Current Stage
Growth StageTotal Funding
$250MKey Investors
Matrix Capital Management
2024-04-16Series A· $250M
Recent News
2024-10-27
Google Patent
2024-10-27
2024-10-27
Company data provided by crunchbase