Microsoft · 7 hours ago
Site Reliability Engineer II
Microsoft is a leading technology company focused on empowering every person and organization on the planet. As a Site Reliability Engineer II, you will drive automation and incident response to ensure service reliability and performance while collaborating with cross-functional teams to enhance operational excellence.
Agentic AIApplication Performance ManagementArtificial Intelligence (AI)Business DevelopmentDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
Responsibilities
Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within SLA timelines, and driving post-incident learnings
Develop, enhance, and maintain automation for deployment, operations, and incident mitigation to improve service reliability and reduce manual intervention
Instrument services for observability, collect and analyze telemetry and health metrics, and use data-driven insights to guide reliability and performance improvements
Collaborate closely with engineering partners and stakeholders to align goals, share operational insights, and deliver user-centric solutions
Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
Ensure compliance with security, privacy, and accessibility standards throughout service onboarding and operations
Stay current with industry trends and internal tools to continuously improve reliability, performance, and observability at scale
Qualification
Required
Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 1+ years technical experience in software engineering, network engineering, or systems administration
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Preferred
Hands-on experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
Proficiency in at least one programming or scripting language (e.g., C#, Java, Python, PowerShell, etc.)
Solid analytical and problem-solving skills, with experience using telemetry and data to drive operational decisions
Effective communication and collaboration skills, with a track record of working effectively across teams
Experience with observability and monitoring tools, and implementing MELT (Metrics, Events, Logs, Traces) patterns
Experience in automating root cause analysis and mitigation of incidents
Familiarity with compliance processes and standards in cloud environments
Company
Microsoft
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.
H1B Sponsorship
Microsoft has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9192)
2024 (9343)
2023 (7677)
2022 (11403)
2021 (7210)
2020 (7852)
Funding
Current Stage
Public CompanyTotal Funding
$1MKey Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M
Leadership Team
Recent News
MarketScreener
2026-01-06
2026-01-06
Company data provided by crunchbase