Software Engineer II - AI Infrastructure (Scheduler) - CoreAI jobs in United States
cer-icon
Apply on Employer Site
company-logo

Microsoft · 2 weeks ago

Software Engineer II - AI Infrastructure (Scheduler) - CoreAI

Microsoft is a leading technology company that builds the end-to-end Azure AI stack. They are seeking a Software Engineer II to focus on the Scheduler subsystem of their AI Infrastructure, responsible for managing GPU and NPU capacity and ensuring high service reliability and efficiency in AI workloads.

Agentic AIApplication Performance ManagementArtificial Intelligence (AI)Business DevelopmentDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Work on the design and development of the core AI Infrastructure distributed and in-cluster services that support large scale AI training and inferencing
Develop, test, and maintain control plane services written in C#, hosted on Service Fabric or Kubernetes (AKS) clusters
Enhance systems and applications to ensure high stability, efficiency and maintainability, low latency, tight cloud security
Provide operational support and DRI (on-call) responsibilities for the service
Develop and foster a deep understanding of the machine learning concepts, use cases, and relevant services used by our customers
Collaborate closely with service engineers, product managers, and internal applied research and data science teams within Microsoft to build better solutions together
Investigate use of tools and cloud services and prototype solutions for problems in our control plane space
Embody our culture and values

Qualification

C#Distributed systemsCloud servicesMachine learningOOP proficiencyService reliabilityPerformance engineeringKubernetesData analyticsTechnical communication

Required

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java, Scala, Rust, Go, TypeScript + OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Preferred

OOP proficiency and practical familiarity with common code design patterns
2+ years of experience with service development in a distributed environment, in a dev-ops role, including concurrency management and stateful resource management
Master's degree in Computer Science or a related technical field
Hands-on experience with public cloud services at the IaaS level
Advanced knowledge of C# and .Net
Proficiency with use of complex data structures and algorithms, preferably in the setting of a resource allocator/scheduler, workflow/execution orchestration engine, database engine, or similar
Significant experience with unit testing and writing testable code
Technical communication skills: verbal and written
First-hand experience with building large-scale, multi-tenant global services with high availability
Experience with building and operating 'stateful' and critical control plane services; handling challenges with data size and data partitioning; related use of a NoSQL cloud database
Experience with mapping complex object models to relational and non-relational datastores
Dev-ops experience with microservices architecture in a complex infrastructure and operational environment
Service reliability and fundamentals engineering; instrumentation for KPIs or performance analysis; demonstrated service and code quality mindset
Performance engineering: work on scalability, profiling; CPU, memory and I/O use optimization techniques
Applied knowledge of Kubernetes: service model, workload packaging and deployment, programmatic extensibility (CRDs, operators); or equivalent knowledge of Service Fabric
Server-side Windows programming and performance engineering
Data analytics skills, in particular with Kusto
Experience working in a geo-distributed team

Benefits

Certain roles may be eligible for benefits and other compensation.

Company

Microsoft

company-logo
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.

H1B Sponsorship

Microsoft has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9192)
2024 (9343)
2023 (7677)
2022 (11403)
2021 (7210)
2020 (7852)

Funding

Current Stage
Public Company
Total Funding
$1M
Key Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M

Leadership Team

leader-logo
Satya Nadella
Chairman and CEO
linkedin
leader-logo
Vukani Mngxati
Chief Executive Officer - Microsft South Africa
linkedin
Company data provided by crunchbase