Alibaba Cloud · 1 week ago
Senior Network Engineer-Sunnyvale
Alibaba Cloud is a leading cloud service provider, and they are seeking a Senior Network Engineer to enhance their operations and maintenance quality. The role involves developing and implementing stability solutions, establishing monitoring mechanisms, and optimizing resource allocation to ensure business availability and performance.
Cloud Data ServicesCloud ManagementData CenterData ManagementFoundational AISoftware
Responsibilities
Observability Link Construction for Operations and Maintenance
Have a global perspective on stability, capable of developing and implementing stability solutions
Pre-event: Establish and continually optimize monitoring mechanisms for application operations and maintenance; develop and maintain corresponding monitoring platforms/tools
During the event: Establish and continuously optimize warning mechanisms for application operations and maintenance, ensuring that faults can be quickly discovered, located, and addressed
Post-event: Quickly analyze, diagnose, and locate problems, and collaborate with relevant personnel to resolve issues; establish and improve the rapid recovery service mechanism to reduce business impact and ensure stable business operations by identifying and eliminating potential risks through stability governance projects and architectural optimizations
Stability Operations and Maintenance Platform Construction
Design, develop, and maintain reliable operations and maintenance platforms and tools, such as inspection systems, water level systems, delivery systems, cost management systems, etc., to address issues related to delivery, performance, stability, and cost encountered by production systems, ensuring business availability and enhancing performance and efficiency
Responsible for data-driven analysis of operations and maintenance quality; analyze and study daily operations and maintenance metrics, issues, and risks to establish models and provide optimization suggestions for operations and maintenance
Application Operations and Maintenance Standard Construction
Establish operation and maintenance process specifications and standardization (such as change standards, protection plans, cloud product configuration standards, etc.) to ensure the normativity and standardization of operations and maintenance, thereby enhancing stability
Develop and implement emergency response specifications and standards for application operations and maintenance faults
Develop and implement alarm handling specifications and standards for application operations and maintenance, as well as Service Level Agreements (SLA)
Resource Optimization
Based on business requirements, plan budget preparation, capacity planning, and readiness, and coordinate with development teams for predictions and estimates of resource consumption such as storage and computing
Analyze business demands, ensuring stability while integrating water levels, specifications, and billing rules; control the reasonableness of resource estimation in technical solutions and collaborate with development to reduce resource costs
Security Assurance Construction
24/7 emergency response, daily monitoring alerts, and emergency handling, continuously identifying and rectifying existing issues
Responsible for operations and maintenance support during major events (such as National Day, Spring Festival, New Year's Day, and significant activities)
Develop and drill emergency plans, respond to emergencies, and handle faults
Establish a problem/fault record repository, conduct targeted analysis of the repository, and enhance and optimize the emergency plan repository and standard process repository
Architecture Upgrade
Responsible for system architecture upgrades, such as kernel upgrades, architecture upgrades, inter-room service migration, and containerization transformation
Responsible for the design and implementation of disaster recovery architecture, such as local disaster recovery and multi-active geographically distributed setups
Qualification
Required
Fluent in Chinese communication skills, able to clearly articulate technical issues and solutions
Over 3 years of experience in operations and maintenance in related fields such as applications, networks, and containerization
Basic mastery of professional abilities in architecture design, performance optimization, and stability optimization
Capable of applying intelligent and automated operations and maintenance platforms and tools, designing and utilizing complex workflows and daily operational templates, quickly identifying, locating, and resolving relatively complex faults, thereby improving operational efficiency
Able to summarize and consolidate issues discovered in daily operations and maintenance into operational experience, and apply this knowledge to enhance capabilities within the operations and maintenance platform
Proficient in protocols such as TCP/IP, DNS, and HTTP, with the ability to perform preliminary analysis of network traffic and troubleshoot network issues
Familiar with at least one cloud service platform (such as AWS, Alibaba Cloud, Azure, etc.) and its related mainstream products (such as Flink, MaxCompute, Log Service, RDS, Redis, etc.), able to preliminarily troubleshoot and resolve basic issues related to the use of corresponding cloud products
Preferred
Familiarity with DPDK (Data Plane Development Kit) and experience in enhancing network processing performance
Some development capabilities to advance automation in operations and maintenance capabilities
Strong business understanding, capable of independently handling complex issues with real case examples
Possessing personal judgment regarding business issues, able to skillfully utilize processes and tools to identify risks and formulate solutions
Having a certain level of influence within the business line and able to gain recognition from surrounding teams
Company
Alibaba Cloud
Alibaba Cloud develops cloud computing and data management services. It is a sub-organization of Alibaba Group.
H1B Sponsorship
Alibaba Cloud has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (18)
2024 (14)
2023 (2)
2022 (1)
Funding
Current Stage
Late StageTotal Funding
$1.2BKey Investors
Alibaba Group
2015-07-29Series B· $1B
2012-09-20Series A· $200M
Recent News
Gadgets 360
2026-01-23
Business News Americas
2026-01-22
Company data provided by crunchbase