Site Reliability Engineer, Cloud Infrastructure - USDS jobs in United States
cer-icon
Apply on Employer Site
company-logo

TikTok · 1 day ago

Site Reliability Engineer, Cloud Infrastructure - USDS

TikTok is the leading destination for short-form mobile video, and they are seeking a Site Reliability Engineer to ensure the seamless operation of their US physical infrastructure. The role involves driving infrastructure automation, collaborating on service lifecycle management, and ensuring service reliability and performance while adhering to compliance standards.

Content CreatorsContent DiscoveryMedia and EntertainmentSocial MediaVideo
badNo H1Bnote

Responsibilities

Drive infrastructure automation and tooling: Design, develop, and maintain solutions for efficient operation, optimization, and comprehensive monitoring of global infrastructure, minimizing manual intervention
Collaborate on service lifecycle management: Partner with engineering teams to design, deploy, operate, and continuously improve robust and scalable systems and services, from inception to refinement
Ensure service reliability and performance: Proactively monitor system health, conduct performance testing, and manage incidents to maximize uptime, availability, and adherence to defined SLAs/SLOs
Execute core SRE practices: Perform on-call duties and production operations, including change management, capacity planning, and disaster recovery, while contributing to documentation and process improvements across teams

Qualification

PythonLinuxNetwork architectureCloud servicesMonitoring toolsAutomationKubernetesDatabase modelingDisaster Recovery

Required

Proficient in one or more programming languages (e.g., Python, Go, Java, C++)
Strong understanding of Linux operating systems and open-source technologies
Experience in network architecture and troubleshooting, database modeling, cloud systems, and large-scale distributed systems
Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana), AIOPS, APM, Disaster Recovery
Experience in designing, analyzing, and building automation and tools for large-scale systems
Experience in building solutions with AWS, GCP, Azure, and other cloud services

Preferred

Expertise in any of these tech stacks: Kubernetes, ElasticSearch, ClickHouse, Message Queue, OpenTSDB, Service Mesh, MySQL, Redis, etc
Master's degree in Computer Science, Engineering, or a related field

Benefits

Medical, dental, and vision insurance
401(k) savings plan with company match
Paid parental leave
Short-term and long-term disability coverage
Life insurance
Wellbeing benefits
10 paid holidays per year
10 paid sick days per year
17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure)

Company

TikTok is a short-form video entertainment app and social network platform. It is a sub-organization of ByteDance.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
N Ali Mohamed
CEO
linkedin
leader-logo
Blake Chandlee
VP Global Business Solutions
linkedin
Company data provided by crunchbase