Site Reliability Engineer, Compute - USDS jobs in United States
cer-icon
Apply on Employer Site
company-logo

TikTok · 13 hours ago

Site Reliability Engineer, Compute - USDS

TikTok is the leading destination for short-form mobile video, and they are seeking a Site Reliability Engineer to join their U.S. Data Security division. This role involves managing large-scale systems, ensuring system scalability, and collaborating with software engineering teams to enhance operational efficiency.

Content CreatorsContent DiscoveryMedia and EntertainmentSocial MediaVideo
badNo H1Bnote

Responsibilities

Develop and maintain automation procedures to maximize system efficiency and minimize human intervention
Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust
Ensure system scalability to handle growth in web traffic and data
Implement monitoring tools and set up metrics to keep track of system health and performance
Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues
Conduct performance tests to find and address system bottlenecks
Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs)
Practice sustainable user support, incident response, and blameless postmortems

Qualification

Site Reliability EngineeringAutomation proceduresProgramming languagesLarge-scale systemsNetwork architectureDatabase modelingCloud systemsLinux operating systemsMonitoring toolsContainer orchestrationProblem-solving skillsStrategic thinkingCommunication skillsCollaboration

Required

Bachelor's degree in Computer Science, Information Technology, or a related field with 3+ years of experience
Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role
Passionate about operational excellence through methodical automation and engineering processes using programming languages such as Go, Python and/or any other languages
Experience in network architecture, database modeling, cloud systems and large-scale distributed systems
Strong understanding of Linux operating systems and open-source technologies
Excellent problem-solving skills, strategic thinking, and a strong ability to debug complex systems
Exceptional communication skills and the ability to effectively collaborate with cross-functional teams

Preferred

Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana)
Experience with containers and container orchestration platforms such as Docker, Kubernetes or equivalent

Benefits

Employees have day one access to medical, dental, and vision insurance
A 401(k) savings plan with company match
Paid parental leave
Short-term and long-term disability coverage
Life insurance
Wellbeing benefits
10 paid holidays per year
10 paid sick days per year
17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure)

Company

TikTok is a short-form video entertainment app and social network platform. It is a sub-organization of ByteDance.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
N Ali Mohamed
CEO
linkedin
leader-logo
Blake Chandlee
VP Global Business Solutions
linkedin
Company data provided by crunchbase