Coupang Fulfillment Services ยท 2 months ago
Sr. Staff Site Reliability Engineer
Coupang is a leading e-commerce company seeking a Senior Staff Site Reliability Engineer to join their Site Reliability Engineering team. The role is crucial for ensuring the health, monitoring, and scalability of customer-facing services while collaborating closely with product development teams to maintain high reliability standards.
DeliveryLogisticsWarehousing
Responsibilities
Serve as a primary point responsible for the platform reliability, health, and performance of all Coupang customer-facing services
Gain deep knowledge of Coupang application workflow and dependencies
Define and track key performance indicators (KPIs) and service-level objectives (SLOs) related to system availability, performance, and reliability
Build world class incident management process and automation, including fast incident remediation, incident operational reviews and retrospectives
Develop and implement best practices for creating , Scaling and maintaining effective monitoring, alerting, and telemetry systems
Build automation to execute regular Disaster Recovery testing, Chaos testing and load testing to stay ahead of expected growth of Coupang services
Work closely with product development teams to ensure the products are designed with scale and operability in mind
Build right guardrails and automation for deploying production changes holding the reliability bar
Participate in a 24x7 rotation for production issue escalations, functions well in a fast-paced environment
Communicate effectively with people at all levels of the organization
Qualification
Required
Bachelor's degree in computer science, Engineering, or a related technical field
8+ years of industry experience building and operating large scale distributed systems
Preferred
Prior experience working with AI/ML, large scale web-based Java architectures and JVM configuration
Professional certifications in cloud platforms, monitoring tools, or related technologies
Previous experience working on a large-scale GPU/Cloud Infrastructure platforms
SLO/SLA management and implementation experience
Deep UNIX/Linux systems knowledge and administration background
Demonstrated programming skills in one or more of: Python, Java, Golang, Ruby
Strong problem-solving and analytical skills spanning systems, network (TCP/IP) and code, with a focus on data-driven decision-making
Experience with cloud-based GPU infrastructure, including AWS, Azure, or Google Cloud Platform
Strong understanding of DevOps and SRE practices, including continuous integration, continuous delivery, and infrastructure as code (IaC)
Experience with containerization and orchestration technologies, such as Docker and Kubernetes
Excellent communication and collaboration skills, with the ability to work with teams across distinct functions and technical domains
Knowledge of open telemetry observability ecosystem including metrics, logging, tracing and tools, such as Prometheus, Grafana, Elastic Stack, Datadog, or New Relic
Benefits
Medical/Dental/Vision/Life, AD&D insurance
Flexible Spending Accounts (FSA) & Health Savings Account (HSA)
Long-term/Short-term Disability
Employee Assistance Program (EAP) program
401K Plan with Company Match
18-21 days of the Paid Time Off (PTO) a year based on the tenure
12 Public Holidays
Paid Parental leave
Pre-tax commuter benefits
MTV - [Free] Electric Car Charging Station
Company
Coupang Fulfillment Services
Coupang Fulfillment Services provides warehousing, delivery, and delivery logistics services.
Funding
Current Stage
Late StageRecent News
2025-10-13
Company data provided by crunchbase