Service Engineer II jobs in United States
cer-icon
Apply on Employer Site
company-logo

Microsoft · 20 hours ago

Service Engineer II

Microsoft is seeking a customer-obsessed Service Engineer II to join their Engineering Operations team. This role focuses on enhancing customer experience across Azure services by managing live-site incidents, driving customer reliability, and collaborating with various teams to ensure service excellence.

Agentic AIApplication Performance ManagementArtificial Intelligence (AI)Business DevelopmentDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Lead and manage high-severity incidents across Azure services, serving as the single point of accountability to ensure rapid detection, triage, resolution, and customer communication
Act as the central authority during live site incidents, driving real-time decision-making and coordination across Engineering, Support, PM, Communications, and Field teams
Contribute to the design of V. Next architecture for Cloud infrastructure services, based on Customer/ First party engagements
Engage in major production triage efforts and work with different teams in the identification of root cause of highly impactful or complex issues as required and identify Product gaps and work with Product teams to bridge the gaps
Partner closely with Software developers, Product Managers, architects, and Infrastructure teams to drive delivery of sustainable and reusable design solution patterns to ensure non-functional production support requirements are adopted early in the Migration /Deployment
Promote a customer-first culture by prioritizing availability, reliability, and platform trust in every response
Participate in the on-call rotation
Analyze customer-impacting signals from telemetry, support cases, and feedback to identify root causes, drive incident reviews (RCAs/PIRs), and implement preventative service improvements
Drive continuous improvement of the Azure platform by incorporating learnings from live site events and customer feedback, ensuring improved reliability, observability, and supportability
Collaborate closely with Engineering and Product teams to influence and implement service resiliency enhancements, auto-remediation tools, and customer-centric mitigation strategies
Identify and advocate for customer self-service capabilities, improved documentation, and scalable solutions that empower customers to resolve common issues independently
Design and drive adoption of incident response playbooks, mitigation levers, and operational frameworks aligned to real-world support scenarios and strategic customer needs
Contribute to the design of next-generation architecture for cloud infrastructure services with a focus on reliability and strategic customer support outcomes
Build and maintain cross-functional partnerships, ensuring alignment across engineering, business, and support organizations
Be data-driven and results-focused, using metrics to evaluate incident response effectiveness and platform health
Bring an engineering mindset to operational challenges, balancing agility, scalability, and technical excellence
Exhibit strong cross-team collaboration, engineering mindset, and results-oriented execution under pressure

Qualification

Cloud operationsIncident managementAI-driven solutionsSoftware engineeringAzure Core ServicesAutomation languagesITIL certificationCrisis managementAnalytical skillsCommunication skillsCollaborationProblem-solving

Required

Bachelor's degree in Computer Science, Information Technology, Data Science, Cybersecurity, or a related field AND 2+ years of technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls
OR equivalent hands-on experience
Proven experience in cloud operations, incident & crisis management, or large-scale systems engineering ideally within platforms such as Azure, AWS, or GCP
Demonstrated experience in 24×7×365 enterprise environments, managing mission-critical services
Demonstrated experience implementing AI-driven solutions and automation, with proficiency in one or more programming/automation languages (e.g., C, C++, C#, Java, JavaScript, Python) or equivalent expertise
ITIL, SRE, or other industry-recognized technical and operational certification

Preferred

Master's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls
OR Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 5+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls
OR equivalent experience
1+ year(s) technical experience working with large-scale cloud or distributed systems
3+ Years of demonstrated experience as an Incident Management or Crisis Management for critical, high-severity incidents in high-availability, distributed environments
Experience with Service Engineering principles and practices with exceptional command-and-control communication skills—able to drive clarity and direction with customers - internal Microsoft stakeholders and third-party vendors during ambiguity and chaos
Demonstrated ability to make decisions quickly with strategic thinking under high pressure situations with analytical skills, demonstrating team leadership quality, and collaboration with peer teams and internal engineering partners
Desired strong knowledge of Windows or Linux platforms, developer tools and ability to diagnose cloud computing platform issues, identifying patterns and implementing AI-driven approach for overall platform stability and reliability
Deep understanding of cloud architecture patterns, High Availability, Disaster Recovery, Business Continuity, Performance Tuning for service platform services
Familiarity with monitoring and observability tools (e.g., Azure Monitor, Watch Dog, Grafana, Prometheus, Datadog, Splunk, New Relic)
Exposure to chaos engineering, fault injection, or high availability architecture
AI/ML Experience: [Beginner to Intermediate]
Familiarity with how AI/ML models are integrated into cloud infrastructure and their potential failure modes
Experience using AI-powered tools for incident analysis, log correlation, or predictive alerting
An understanding of the challenges and risks associated with AI/ML systems in a production environment
Certifications: Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure Solutions Architect, GCP Professional Cloud Architect)
Certifications in ITIL, SRE, or other relevant frameworks

Benefits

Certain roles may be eligible for benefits and other compensation.

Company

Microsoft

company-logo
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.

H1B Sponsorship

Microsoft has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9192)
2024 (9343)
2023 (7677)
2022 (11403)
2021 (7210)
2020 (7852)

Funding

Current Stage
Public Company
Total Funding
$1M
Key Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M

Leadership Team

leader-logo
Satya Nadella
Chairman and CEO
linkedin
leader-logo
Vukani Mngxati
Chief Executive Officer - Microsft South Africa
linkedin
Company data provided by crunchbase