Site Reliability Engineer - CTJ - Poly jobs in United States
cer-icon
Apply on Employer Site
company-logo

Microsoft · 5 hours ago

Site Reliability Engineer - CTJ - Poly

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. The Site Reliability Engineer will leverage technical expertise in large scale distributed systems to improve the reliability, performance, and efficiency of services, while developing automation tools and participating in incident response to enhance operational effectiveness.

Application Performance ManagementArtificial Intelligence (AI)Business DevelopmentData ManagementDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
check
Growth Opportunities
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Independently creates, tests, and deploys changes through a safe deployment process (SDP) to enhance code quality and improve the observability, security, reliability and operability of one or more platforms, systems, or products operating at scale
Leverages technical expertise in cloud technologies and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or the automation to improve product components or features supported by their team
Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles. Utilizes technical knowledge of systems/platforms and insights drawn from product engineering teams, security best practices, artificial intelligence (AI)/machine learning (ML), and telemetry analyses to suggest potential improvements in code base and designs across components and features of one or more products
Independently writes code or scripts that automate the performance of scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale
Develops alerts and instrumentation across components and features to monitor product capacity, related security risk, and resource demands and analyze telemetry data using existing capacity planning models. Draws insights from analyses of capacity and resource data to optimize component and feature code to manage resources and capacity across limited range of use conditions and system parameters
Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, security, reliability, performance, and/or efficiency of components and features, leveraging the artificial intelligence (AI) and machine learning (ML) capabilities. Proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams
Utilizes insights from performance and resource monitoring tools to identify whether there is a need to optimize the efficiency of component and feature code, or if changes to compute resources are required. Models the predicted effect of changes to code and/or compute resources across components or features to document the efficacy of proposed solutions. Proposes changes and drives implementation of solutions to identified performance and resource challenges
Embody our culture and values

Qualification

Cloud technologiesDistributed systemsAutomation scriptingSQLPostgreSQLIncident responseTelemetry analysisOn-call responsibilitiesCollaborationProblem-solving

Required

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination
This position requires successful verification of the stated security clearance to meet federal government customer requirements
You will be asked to provide clearance verification information prior to an offer of employment
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
This position requires verification of U.S. citizenship due to citizenship-based legal restrictions
To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance

Preferred

Experience working on large-scale distributed services with on-call responsibilities
Ability to build and influence broadly towards common goals and priorities
Experience with distributed database systems such as SQL and PostgreSQL

Company

Microsoft

company-logo
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.

Funding

Current Stage
Public Company
Total Funding
$1M
Key Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M

Leadership Team

leader-logo
Satya Nadella
Chairman and CEO
linkedin
leader-logo
Vukani Mngxati
Chief Executive Officer - Microsft South Africa
linkedin
Company data provided by crunchbase