ByteDance · 11 hours ago
Production System Engineer - San Jose
ByteDance is a rapidly growing tech company that inspires creativity and enriches life through its innovative products. The Production Systems Engineer will enhance the stability and scalability of data center operations, contributing to the lifecycle of server fleets and ensuring reliable infrastructure services.
ContentData MiningFoundational AIInternetSocial Media
Responsibilities
Operation: As a Production Systems Engineer, your mission is to contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale
Lifecycle Enhancement: Participate in and enhance the entire lifecycle of the server fleet - from system design/introduction consultation to launch reviews, deployment, operation, and retirement
Automation: Develop and deploy tools and solutions to enhance the automation, reliability, scalability, and operability of servers in the datacenter
Monitoring: Develop and deploy tools and solutions for improving the availability, latency, and overall service of the datacenter infrastructure, server, and network health
Disaster Recovery: Troubleshoot and resolve complex technical issues in a fast-paced environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem
Cross-team Collaboration: Collaborate with stakeholders such as infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and our internal customers to comprehend overarching business objectives. Additionally, you will have the chance to design and implement innovative solutions for our Core IDCs and CDN/Edge
On-call: Engage in our on-call support spanning across regions and incident response teams to address critical issues in the production environment
Qualification
Required
Education: Bachelor's degree in Computer Science, Electronic Engineering, relevant technical field, or equivalent practical experience
Experience in at least one of the areas below:
Server Operations: Demonstrated proficiency in Linux system administration tasks. Possessed an in-depth comprehension of Linux kernels, drivers, and modules. Capable of scripting in Bash and Python to automate routine system operations, encompassing skills such as system configuration, performance tuning, and security management within the Linux environment. Had an in-depth understanding of server hardware, and was able to conduct troubleshooting or diagnostics. Experience participating in the planning, delivery, and operation of large-scale data centers in different countries
Tooling Adaptation, Deployment, and Maintenance: Proficient in customizing operation and maintenance tools to satisfy specific demands for new server hardware. Competent in managing the entire software tool lifecycle, ranging from deployment to continuous maintenance. This encompasses tasks associated with facilitating the monitoring of server performance, effectively provisioning resources, timely handling of fault management, and conducting repairs to guarantee the smooth operation of new server hardware. Experience in developing and maintaining hardware, network, or service monitoring software for more than 10,000 servers
Communication: Experience in managing and coordinating teams in the global context
Preferred
3 years of work experience in related field
Data Center: An intermediate level of expertise is preferred. We are looking for individuals who are proficient in areas ranging from OS installations and break-fix operations to significant projects such as planning and operations (encompassing the entire infrastructure lifecycle), as well as new design-build or retrofit activities for existing systems
Proficiency in the operation and maintenance of GPU server is strongly preferred
Full Stack Software Development: Actively, we are in search of individuals proficient in full stack software development. The ideal candidates are expected to possess the following preferred skills:
Be capable of creating and integrating RESTful APIs. This encompasses expertise in using Flask for Python-based back-end development to establish robust API endpoints
Have a profound understanding of JavaScript and be capable of leveraging it, along with Node.js, for both front-end and back-end development tasks
Demonstrate proficiency in SQL for efficient database management, including designing database schemas, composing queries, and ensuring data integrity; be familiar with Redis
Possess experience in Ansible Configuration Management, Application Deployment, and Task Execution
Benefits
Medical, dental, and vision insurance
401(k) savings plan with company match
Paid parental leave
Short-term and long-term disability coverage
Life insurance
Wellbeing benefits
10 paid holidays per year
10 paid sick days per year
17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure)
Company
ByteDance
ByteDance is a technology company that develops content creation platforms and services.
H1B Sponsorship
ByteDance has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1350)
2024 (1123)
2023 (775)
2022 (487)
2021 (417)
2020 (245)
Funding
Current Stage
Late StageTotal Funding
$9.8BKey Investors
Capital TodayG42Tiger Global Management
2025-11-20Secondary Market· $300M
2024-07-25Secondary Market
2023-03-14Secondary Market· $100M
Leadership Team
Recent News
2026-01-12
Company data provided by crunchbase