Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

TP-Link Systems Inc. ยท 1 day ago

Senior Site Reliability Engineer

TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, committed to delivering innovative products that enhance people's lives through better connectivity. They are seeking a Senior Site Reliability Engineer to ensure the security, reliability, scalability, and operational excellence of their cloud platform.

ElectronicsHardwareHealth CareInternetSoftware
badNo H1Bnote

Responsibilities

Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms
Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform
Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices
Build Observability for Microservices and cloud platforms like AWS, OCI, Azure, and GCP
Write and Execute the Disaster recovery plans in collaboration with the Development and DevOps team
Analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, JVM pre-warming, etc
Write and maintain scripts for automation using languages like Python, Go, or Bash
Define and maintain the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business
Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures
Guarantee adherence to security and compliance standards, including ISO27001, SOC2, and GDPR
Lead incident response efforts to troubleshoot and resolve production issues quickly
Perform post-incident analysis to identify root causes and potential workarounds/solutions
Assist with product/technology selection, including implementation of POCs
Be fluid and open to change and evolving processes and tools
Help to mentor and train less senior members of the team
Ability to be part of On-call rotation and provide support after work hours and on weekends
Other duties as assigned

Qualification

Site Reliability EngineeringCloud OperationsMicroservicesKubernetesProgramming LanguagesCloud SecurityTechnical DocumentationAnalytical SkillsIncident ResponseCompliance ImplementationDisaster RecoveryProblem-SolvingMentoringTeam CollaborationAdaptability

Required

Bachelor's degree in Computer Science, Information Technology, or a related field
5+ years of experience as a Site Reliability Engineer
Proficiency in programming and scripting languages like Java, Python, Bash, or PowerShell
Hands-on experience in SRE, DevOps, cloud operations, and cloud security best practices
Strong knowledge of security technologies, including Identity and access management, Network security, Application security, and Data protection
Strong problem-solving and analytical skills, with the ability to work independently and as part of a team
Experience in developing and maintaining technical documentation and implementing compliance requirements

Preferred

Expert-level cloud certifications include AWS Solutions Architect, Professional, Azure Solutions Architect Expert, and GCP Professional Cloud Architect
Experience with container orchestration technologies (e.g., Kubernetes)

Benefits

Free snacks and drinks, and provided lunch on Fridays
Fully paid medical, dental, and vision insurance (partial coverage for dependents)
Contributions to 401k funds
Bi-annual reviews, and annual pay increases
Health and wellness benefits, including free gym membership
Quarterly team-building events

Company

TP-Link Systems Inc.

company-logo
Headquartered in the United States, TP-Link Systems Inc.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Ben Allcock
Vice President โ€“ B2B UK & Ireland
Company data provided by crunchbase