Software Engineer- AI/ML, AWS Neuron Distributed Training jobs in United States
cer-icon
Apply on Employer Site
company-logo

Amazon Web Services (AWS) · 1 week ago

Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon Web Services (AWS) is seeking a Software Development Engineer II to join the Annapurna Labs team, focusing on building and maintaining complex products that enhance customer experiences. The role involves developing and optimizing large scale machine learning model training solutions on AWS Trainium, collaborating with various engineering teams to deliver high-performance systems.

Agentic AIConsultingDevOpsInformation TechnologySoftwareWeb Development
check
H1B Sponsor Likelynote

Responsibilities

You will design, implement and optimize distributed training solutions for large scale ML models running on Trainium instances
A significant part of your work will involve extending and optimizing popular distributed training frameworks including FSDP (Fully-Sharded Data Parallel), torchtitan and Hugging Face libraries for the Neuron ecosystem
You will profile, analyze, and tune end-to-end training models and pipelines to achieve optimal performance on Trainium hardware
You will partner with hardware, compiler, and runtime teams to influence system design and unlock new capabilities
Additionally, you will work directly with AWS solution architects and customers to deploy and optimize training workloads at scale

Qualification

Deep Learning AlgorithmsPytorchDistributed TrainingSoftware Development Life CycleProgramming LanguagesDesign PatternsComputer VisionCode ReviewsSource Control ManagementTestingOperations

Required

3+ years of non-internship professional software development experience
3+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language
Experience developing and implementing deep learning algorithms, particularly with respect to computer vision algorithms

Preferred

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Preferred previous software engineer expertise with Pytorch/Jax/Tensorflow, Distributed libraries and Frameworks, End-to-end Model Training. The group presents lot of opportunity for optimization and scaling large deep learning models on Trainium architecture

Benefits

Health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
Paid time off
Parental leave

Company

Amazon Web Services (AWS)

company-logo
Launched in 2006, Amazon Web Services (AWS) began exposing key infrastructure services to businesses in the form of web services -- now widely known as cloud computing.

H1B Sponsorship

Amazon Web Services (AWS) has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (22803)
2024 (21175)
2023 (19057)
2022 (24088)
2021 (12233)
2020 (14881)

Funding

Current Stage
Late Stage
Total Funding
unknown
Key Investors
BIRD Foundation
2025-01-22Grant

Leadership Team

leader-logo
Matt Garman
Chief Executive Officer
linkedin
leader-logo
Anand Desikan
CTO, CXO Advisor, and Enterprise Technologist
linkedin
Company data provided by crunchbase