DL Communications Collectives SW Engineer @ Rivos Inc. | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
DL Communications Collectives SW Engineer jobs in Portland, OR
Be an early applicantLess than 25 applicants
company-logo

Rivos Inc. · 6 hours ago

DL Communications Collectives SW Engineer

ftfMaximize your interview chances
HardwareIndustrial Manufacturing
check
H1B Sponsor Likelynote

Insider Connection @Rivos Inc.

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Build-up communication components of an AI Software Stack
Port AI Software to run on a new H/W platform
Profiling and tuning of communications within AI applications
Design, develop, and optimize communication collectives (e.g., AllReduce, AllGather, Broadcast, ReduceScatter) for large-scale distributed computing and machine learning frameworks.
Implement and optimize communication algorithms (ring, tree, butterfly, etc.) tailored for our architectures and multi-node clusters.
Ensure low-latency, high-bandwidth communication across multi-GPU setups, supporting interconnects such as PCIe and Infiniband.
Collaborate with hardware engineers and other software teams to optimize performance.
Implement fault tolerance and scalability mechanisms in distributed systems to handle large-scale workloads.
Write unit tests and benchmark tools to validate the performance and correctness of collective operations.
Stay current with advancements in hardware and networking technologies to continuously improve the library's performance.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

GPU architecturesGPU programmingCommunication collectivesNetwork interconnectsProfilingDeep learning frameworksConcurrencyNetwork driver experienceNumPyRustOpenCLOpenGLSYCLMachine Learning algorithms

Required

Strong understanding of GPU architectures (CUDA, AMD ROCm) and experience in GPU programming (CUDA, HIP, or similar)
Proficiency in designing and implementing parallel and distributed algorithms, particularly communication collectives
Experience with network interconnects (NVLink, PCIe, Infiniband, RDMA) and understanding of their performance implications
Hands-on experience with communication collectives libraries like UCC, NCCL, or MPI
Strong knowledge of concurrency, synchronization, and memory consistency models in multi-threaded and distributed environments
Experience with profiling and optimizing low-level performance (memory bandwidth, latency, throughput) on GPU architectures
Familiarity with deep learning frameworks (TensorFlow, PyTorch, etc.) and their use of communication collectives
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment
Network driver experience recommended
Excellent skills in problem solving, written and verbal communication
Strong organization skills, and highly self-motivated
Ability to work well in a team and be productive under aggressive schedules
Bachelor’s, Master’s, or PhD in Computer Engineering, Software Engineering or Computer Science

Preferred

Experience with NumPy, PyTorch, TensorFlow or JAX
Experience with Rust
Experience with CUDA, OpenCL, OpenGL, or SYCL
Coursework or experience with Machine Learning algorithms

Company

Rivos Inc.

twittertwitter
company-logo
Rivos, a high performance RISC-V System Startup targeting integrated system solutions for Enterprise

H1B Sponsorship

Rivos Inc. has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (24)
2022 (45)
2021 (9)

Funding

Current Stage
Growth Stage
Total Funding
$250M
Key Investors
Matrix Capital Management
2024-04-16Series A· $250M

Leadership Team

M
Mark Hayter
Founder and Chief Strategy Officer
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot