Meta · 2 weeks ago
SiteOps Data Center Production Operations Engineer
Meta is seeking a forward-thinking experienced engineer to join the Production Operations team within their Data Centers, which are crucial for their rapidly scaling infrastructure. The role involves supporting platform health, resolving technical issues, collaborating on projects, and optimizing data center operations.
Computer Software
Responsibilities
Support platform health by successfully resolving and closing tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls
Participate in root cause analysis of highly technical issues within the data center, ranging from automated tooling to hardware failures and network issues
Collaborate with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation
Point of contact for the introduction of new platforms and hardware to the site, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production
Use tools and data analysis effectively to identify issues. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed
Identify corrective actions of hardware issues, work with internal teams and vendors
Influence future design changes to ensure ease of serviceability
Solve systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution
Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs
Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements
Support and train team members to evaluate and identify better ways to resolve issues, and define updates to tools and processes
Provide engineering support and be a go-to technical resource for the team, leadership, and cross-functional teams in operating and maintaining data center servers
Maintain and update documentation i.e. procedures, runbooks and guides
Build cross functional relationships and influence policies and procedures that improve global data center operations
Participate in 24/7 on-call rotation
Travel up to 15% of the time
Qualification
Required
BS, BA or BEng in technical field or commensurate experience
5+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
Intermediate-level understanding in Linux (or equivalent OS) in a complex IT environment with the capacity to triage, debug, and troubleshoot server issues
Hands-on experience and knowledge of server hardware and components, including storage
Intermediate-level knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
Experience managing technical issues and driving to the root cause
Experience participating in technical projects related to areas such as process improvement, technology, and/or automation
Capacity to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
Intermediate-level knowledge of technologies such as HTTP, DNS, RAID, and DHCP
Experience in providing technical guidance to external vendors
Experience in debugging, modifying and developing commonly used scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
Experience using data and metrics to drive decisions
Preferred
Experience in fostering growth in others, and driving influence across all organizational levels
Experience in a large-scale data center environment
Six Sigma knowledge/certification
Experience with large-scale AI implementations
Benefits
Bonus
Equity
Benefits
Company
Meta
Meta's mission is to build the future of human connection and the technology that makes it possible.
Funding
Current Stage
Late StageRecent News
Crunchbase News
2025-11-17
2025-11-16
Company data provided by crunchbase