TP-Link Systems Inc. ยท 1 day ago
Senior Site Reliability Engineer
TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, committed to delivering innovative products that enhance people's lives through better connectivity. They are seeking a Senior Site Reliability Engineer to ensure the security, reliability, scalability, and operational excellence of their cloud platform.
ElectronicsHardwareHealth CareInternetSoftware
Responsibilities
Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms
Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform
Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices
Build Observability for Microservices and cloud platforms like AWS, OCI, Azure, and GCP
Write and Execute the Disaster recovery plans in collaboration with the Development and DevOps team
Analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, JVM pre-warming, etc
Write and maintain scripts for automation using languages like Python, Go, or Bash
Define and maintain the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business
Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures
Guarantee adherence to security and compliance standards, including ISO27001, SOC2, and GDPR
Lead incident response efforts to troubleshoot and resolve production issues quickly
Perform post-incident analysis to identify root causes and potential workarounds/solutions
Assist with product/technology selection, including implementation of POCs
Be fluid and open to change and evolving processes and tools
Help to mentor and train less senior members of the team
Ability to be part of On-call rotation and provide support after work hours and on weekends
Other duties as assigned
Qualification
Required
Bachelor's degree in Computer Science, Information Technology, or a related field
5+ years of experience as a Site Reliability Engineer
Proficiency in programming and scripting languages like Java, Python, Bash, or PowerShell
Hands-on experience in SRE, DevOps, cloud operations, and cloud security best practices
Strong knowledge of security technologies, including Identity and access management, Network security, Application security, and Data protection
Strong problem-solving and analytical skills, with the ability to work independently and as part of a team
Experience in developing and maintaining technical documentation and implementing compliance requirements
Preferred
Expert-level cloud certifications include AWS Solutions Architect, Professional, Azure Solutions Architect Expert, and GCP Professional Cloud Architect
Experience with container orchestration technologies (e.g., Kubernetes)
Benefits
Free snacks and drinks, and provided lunch on Fridays
Fully paid medical, dental, and vision insurance (partial coverage for dependents)
Contributions to 401k funds
Bi-annual reviews, and annual pay increases
Health and wellness benefits, including free gym membership
Quarterly team-building events
Company
TP-Link Systems Inc.
Headquartered in the United States, TP-Link Systems Inc.
Funding
Current Stage
Growth StageLeadership Team
Ben Allcock
Vice President โ B2B UK & Ireland
Recent News
PCMag.com - Technology Product Reviews, News, Prices & Tips
2025-12-12
2025-12-05
Business Standard India
2025-12-03
Company data provided by crunchbase