Alibaba Cloud · 2 weeks ago
Cloud Infrastructure – Site Reliability Engineer (SRE)
Alibaba Cloud is responsible for creating a stable and user-friendly messaging platform. The Site Reliability Engineer (SRE) will oversee the stability maintenance and performance tuning of cloud middleware, manage the lifecycle of containerized middleware, and lead incident response efforts.
Responsibilities
Oversee stability maintenance, performance tuning, and high-availability architecture design for cloud middleware, including messaging middleware (Kafka/RocketMQ)
Manage the containerized middleware lifecycle on Kubernetes clusters: implement deployments, auto-scaling, version upgrades, and resource optimization in K8s environments
Lead the troubleshooting of middleware-related incidents (e.g., message backlog, service registration failures) through log analysis, distributed tracing, and monitoring systems
Develop diagnostic tools using Java/Go to resolve production issues, performance bottlenecks, and compatibility challenges
Build Python/Go/Shell automation tools to standardize middleware deployment, monitoring, and disaster recovery workflows
Implement chaos engineering experiments, capacity planning strategies, and failover mechanisms to enhance system resilience
Qualification
Required
Over 2 years of experience in distributed systems reliability engineering
Familiar with high-availability architecture design
Proficient in at least one of Python, Go, or Java
Cluster management, message reliability assurance, and performance optimization for Kafka/RocketMQ
Hands-on experience deploying middleware on Kubernetes
Ability to convert operations experience into automated solutions
Familiarity with various message middleware, e.g., Kafka and RocketMQ
Strong scripting skills in Shell/Python
Experience with Infrastructure as Code (IaC) tools (Terraform preferred)
Preferred
Familiar with core SRE practices (incident review, error budgeting, chaos engineering)
Experienced in building automated risk control systems
Hands-on experience deploying middleware on Kubernetes (Helm/Operator preferred)
Company
Alibaba Cloud
Alibaba Cloud develops cloud computing and data management services. It is a sub-organization of Alibaba Group.
H1B Sponsorship
Alibaba Cloud has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (18)
2024 (14)
2023 (2)
2022 (1)
Funding
Current Stage
Late StageTotal Funding
$1.2BKey Investors
Alibaba Group
2015-07-29Series B· $1B
2012-09-20Series A· $200M
Recent News
crnasia.com
2026-01-08
2025-12-19
Company data provided by crunchbase