Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

TP-Link · 1 day ago

Site Reliability Engineer

TP-Link is a global provider of reliable networking devices and smart home products, and they are seeking a passionate and experienced Site Reliability Engineer to join their team. The role involves ensuring the security, reliability, scalability, and operational excellence of their cloud platform while collaborating with various teams to enhance service delivery.

Consumer Electronics
badNo H1Bnote

Responsibilities

Assist in implementing and operating Microservices on Kubernetes cloud-based platforms
Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform
Conduct Load Tests and Chaos Tests to ensure the scalability and reliability of microservices
Build observability for Microservices and cloud platforms like AWS, OCI, Azure, and GCP
Contribute to writing and executing disaster recovery plans in collaboration with the Development and DevOps teams
Help analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, JVM pre-warming, etc
Write and maintain scripts for automation using languages like Python, Go, or Bash
Assist in defining and maintaining the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business
Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures
Ensure adherence to security and compliance standards, including ISO27001, SOC2, and GDPR
Participate in incident response efforts to troubleshoot and resolve production issues quickly
Conduct post-incident analysis to identify root causes and potential workarounds/solutions
Contribute to product/technology selection, including implementation of POCs
Be adaptable to change and evolving processes and tools
Participate in mentoring and training less senior members of the team
Be part of the on-call rotation and provide support after work hours and on weekends
Other duties as assigned

Qualification

Site Reliability EngineeringCloud OperationsKubernetesProgramming LanguagesCloud SecurityTechnical DocumentationIncident ResponseProblem-SolvingTeam CollaborationAdaptabilityMentoring

Required

Bachelor's degree in Computer Science, Information Technology, or a related field
1-3 years of experience as a Site Reliability Engineer or in a related role
Proficiency in programming and scripting languages like Java, Python, Bash, or PowerShell
Hands-on experience in SRE, DevOps, cloud operations, and cloud security best practices
Basic knowledge of security technologies, including Identity and access management, Network security, Application security, and Data protection
Strong problem-solving and analytical skills, with the ability to work independently and as part of a team
Experience in developing and maintaining technical documentation and implementing compliance requirements

Preferred

Relevant cloud certifications include AWS Solutions Architect, Azure Solutions Architect Expert, or GCP Professional Cloud Architect
Experience with container orchestration technologies (e.g., Kubernetes)

Benefits

Free snacks and drinks, and provided lunch on Fridays
Fully paid medical, dental, and vision insurance (partial coverage for dependents)
Contributions to 401k funds
Bi-annual reviews, and annual pay increases
Health and wellness benefits, including free gym membership
Quarterly team-building events

Company

Headquartered in the United States, TP-Link is a global provider of reliable networking devices and smart home products, consistently ranked as the world’s top provider of Wi-Fi devices.

Funding

Current Stage
Late Stage
Company data provided by crunchbase