Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

McAfee ยท 5 hours ago

Senior Site Reliability Engineer

McAfee is a leader in personal security for consumers, focused on protecting people in an always online world. The Senior Site Reliability Engineer will be responsible for maintaining service levels, engaging with various teams to support business needs, and ensuring the availability and reliability of mission-critical services.

Consumer ElectronicsEnterprise SoftwareInformation TechnologyNetwork Security
check
H1B Sponsor Likelynote

Responsibilities

Responsible for proactive monitoring of mission critical production environment and respond quickly in response to breach in trends or issues
Troubleshoot, debug, and escalate issues with proper analysis to concerned teams to ensure maximum availability
Troubleshoot problems in real-time, interacting with DevOps/Engineering and internal support representatives to deliver maximum customer satisfaction
Detect and triage of all operational incidents and requests
Work extensively to help reduce the Mean Time to Restore (MTTR) & improve Mean Time To Detect (MTTD)
Work across Engineering and Support teams to ensure we meet our goals for service reliability, availability, and efficiency
Ensure security events and alerts are addressed in a timely manner
Own availability and performance of mission critical services. Automation to prevent problem recurrence, and responses to all non-exceptional service conditions
Help maintain and improve service operations by following established processes and procedures and periodic update of SOP and documents in confluence page
Create and manage day to day processes including Change Management, Incident Management, and Problem Management
Support automation initiatives to enhance Mean Time to Restore (MTTR) and Mean Time To Detect (MTTD)
Help track Key Performance Indicators (KPIs) to support operational performance and service reliability
Participate in incident retrospectives and assist in managing the incident lifecycle
Planning and deployment of patches and product enhancements to our environments
Engage in readiness reviews before changes or deployments into production environments
Support product engineering teams on SRE related activities to establish optimal SLAs for all pre-defined activities and provide a high-quality customer experience
Provide detail summary of all high priority issues to stakeholders ensuring quality in data provided
Participate early in the SDLC to ensure reliability is built in from the beginning and creating plans for successful implementations/launches and transition into SRE team smoothly
Create accurate root cause of Production issues and help to provide long term solutions to fix them
Continually evaluate and adopt the latest industry technologies to optimize costs and streamline processes
Communicate effectively and present team progress to leadership
Lead by example technically and establish credibility with quality technical execution
Mentor, coach, other SRE team members

Qualification

SRE experienceCloud operationsMonitoring toolsCI/CD toolsContainer technologiesAWS knowledgeIncident ManagementChange ManagementProblem ManagementSoft skills

Required

4 to 5+ years of software development and/or technical operations experience, and experience running large-scale applications
Prior experience in SRE / DevOps, Infrastructure Engineering, and Systems Engineering required
Experience in defining and monitoring for highly resilient and reliable applications
Experience maintaining and operating production systems (> 99.95% SLA) on Cloud
Able to Monitor, Debug & RCA for any service failures
Exceptional communication skills that cross both team and geographical boundaries
Advanced knowledge and skills within a specific technical or professional discipline with understanding of the impact of work on other areas of the organization
Enjoy working with a large variety of services and technologies
Experience with Monitoring, logging, APM & other tools: APMs. Grafana, CloudWatch, etc
Experience with CI/CD tools: Git, Jenkins, Harness, etc
Experience with container technologies: Kubernetes, Docker
Experience with both Windows and Linux Operating Systems
Strong knowledge of AWS cloud service offerings covering serverless and containerized workloads
Working experience in very well in a fast-paced, high-growth environment
Ability to work some non-standard hours to support a global team and initiatives

Preferred

Good to have ITIL, HDI, AWS, any other Cloud certifications

Benefits

Bonus Program
Pension and Retirement Plans
Medical, Dental and Vision Coverage
Paid Time Off
Paid Parental Leave
Support for Community Involvement

Company

McAfee is an online security company that provides virus alerts and analysis on malware, network security threats, and web vulnerabilities.

H1B Sponsorship

McAfee has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (28)
2024 (24)
2023 (12)
2022 (26)
2021 (46)
2020 (84)

Funding

Current Stage
Public Company
Total Funding
unknown
2022-03-01Private Equity
2022-03-01Debt Financing
2021-11-08Acquired

Leadership Team

leader-logo
Craig Boundy
Chief Executive Officer
linkedin
leader-logo
Steve Grobman
Executive Vice President and Chief Technology Officer
linkedin
Company data provided by crunchbase