Cloud Infrastructure – Site Reliability Engineer (SRE)-Sunnyvale jobs in United States
cer-icon
Apply on Employer Site
company-logo

Alibaba Cloud · 2 days ago

Cloud Infrastructure – Site Reliability Engineer (SRE)-Sunnyvale

Alibaba Cloud is responsible for innovative messaging products and is seeking a Site Reliability Engineer to oversee the stability and performance of cloud middleware systems. The role involves managing the lifecycle of containerized middleware on Kubernetes, leading incident responses, and developing automation tools to enhance operational efficiency.

Cloud Data ServicesCloud ManagementData CenterData ManagementFoundational AISoftware
check
H1B Sponsor Likelynote

Responsibilities

Oversee stability maintenance, performance tuning, and high-availability architecture design for cloud middleware, including messaging middleware (Kafka/RocketMQ)
Manage the containerized middleware lifecycle on Kubernetes clusters: implement deployments, auto-scaling, version upgrades, and resource optimization in K8s environments
Lead the troubleshooting of middleware-related incidents (e.g., message backlog, service registration failures) through log analysis, distributed tracing, and monitoring systems
Develop diagnostic tools using Java/Go to resolve production issues, performance bottlenecks, and compatibility challenges
Build Python/Go/Shell automation tools to standardize middleware deployment, monitoring, and disaster recovery workflows
Implement chaos engineering experiments, capacity planning strategies, and failover mechanisms to enhance system resilience

Qualification

KubernetesPythonGoJavaKafka/RocketMQTerraformShell scriptingDistributed systemsIncident responseAutomationSRE practices

Required

Over 2 years of experience in distributed systems reliability engineering
Familiar with high-availability architecture design
Proficient in at least one of Python, Go, or Java
Cluster management, message reliability assurance, and performance optimization for Kafka/RocketMQ
Hands-on Experience Deploying Middleware On Kubernetes
Ability to convert operations experience into automated solutions
Familiarity with various message middleware, e.g., Kafka and RocketMQ
Strong scripting skills in Shell/Python
Experience with Infrastructure as Code (IaC) tools (Terraform preferred)

Preferred

Familiar with core SRE practices (incident review, error budgeting, chaos engineering)
Experienced in building automated risk control systems
Hands-on Experience Deploying Middleware On Kubernetes (Helm/Operator Preferred)

Company

Alibaba Cloud

twittertwittertwitter
company-logo
Alibaba Cloud develops cloud computing and data management services. It is a sub-organization of Alibaba Group.

H1B Sponsorship

Alibaba Cloud has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (18)
2024 (14)
2023 (2)
2022 (1)

Funding

Current Stage
Late Stage
Total Funding
$1.2B
Key Investors
Alibaba Group
2015-07-29Series B· $1B
2012-09-20Series A· $200M
Company data provided by crunchbase