Site Reliability Engineer II @ SHEGLAM | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Site Reliability Engineer II jobs in United States
Be an early applicantLess than 25 applicants
company-logo

SHEGLAM ยท 2 days ago

Site Reliability Engineer II

ftfMaximize your interview chances
Cosmetics

Insider Connection @SHEGLAM

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Participate in an on-call rotation to ensure 24/7/365 availability of SHEIN's production system.
Supervise capacity & utilization and work closely with cross-functional teams to orchestrate scale up/down of the services.
Own and operate critical open-source services like Elasticsearch, Kafka, RabbitMQ, Redis.
Build tools and design processes that help improve observability and system resiliency of the platform.
Triage site availability Incidents and proactively work towards reducing MTTR for customer impacting incidents.
Partner with service owners to implement service level metrics and service level objectives that act as service level health indicators.
Establish design patterns for monitoring, benchmarking and deploying new features for the backend services.
Develop and maintain technical documentation, network diagrams, runbooks, and procedures.
Drive initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices.
Respond to production incidents leverage experience in software development, systems engineering, and networking to proactively prevent recurring issues.
Provide relief and sustainable resolution to issues within our infrastructure.
Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
Join a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Site Reliability EngineeringCloud environmentsFull-stack debuggingObservability toolsElasticsearchKafkaRedisPythonGoLangDockerKubernetesSQL databasesNoSQL databasesLinux systemsBig data technologies

Required

Bachelor's degree in Computer Science or Information Systems or equivalent technical discipline.
3+ years of working experience in an enterprise 24/7 production environment supporting mission-critical, real-time, high-traffic applications, especially in cloud environments.
Systematic problem-solving approach, combined with a sense of ownership and drive.
Full-stack debugging and performance optimization ability, including knowledge of Cloud systems (load balancing, caching, content distribution, etc.), continuous integration/build systems, Java, SQL and NoSQL databases.
Track record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalability.
Strong experience with observability tools such as Grafana, Prometheus, Zabbix, etc.
Experience in any of the scripting/programming languages such as Python, GoLang, etc.
Familiar with container technology, such as Docker, Kubernetes, Mesos, etc.
Strong verbal and written communication skills; able to work effectively with geographically remote teams.
Experience with one or more OSS technologies like Elasticsearch, Kafka and Redis.
Proficient with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutions.

Preferred

Experience with big data related component operation and maintenance experience (Hadoop/Yarn/Hbase/Hive/Spark, etc.)
Solid understanding of Linux system.

Benefits

Healthcare (medical, dental, vision, prescription drugs)
Health Savings Account with Employer Funding
Flexible Spending Accounts (Healthcare and Dependent care)
Company-Paid Basic Life/AD&D insurance
Company-Paid Short-Term and Long-Term Disability
Voluntary Benefit Offerings (Voluntary Life/AD&D, Hospital Indemnity, Critical Illness, and Accident)
Employee Assistance Program
Business Travel Accident Insurance
401(k) Savings Plan with discretionary company match and access to a financial advisor
Vacation, paid holidays, floating holiday and sick days
Employee discounts
Free weekly catered lunch
Dog-friendly office (available at select locations)
Free gym access (available at select locations)
Free swag giveaways
Annual Holiday Party
Invitations to pop-ups and other company events
Complimentary daily office snacks and beverages

Company

SHEGLAM

twitter
company-logo
Founded in 2019, SHEGLAM has become a strong force in the global beauty market.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase
logo

Orion

Your AI Copilot