Nebius · 20 hours ago
IT Support Manager (Tulsa, OK)
Nebius is leading a new era in cloud computing to serve the global AI economy. As an IT Support Manager, you will lead the front-line IT support and operations function responsible for maintaining the health, availability, and performance of data center IT infrastructure.
AI InfrastructureCloud InfrastructureGPUIaaSPaaS
Responsibilities
Lead daily IT operations across data center environments, ensuring high availability and SLA adherence
Own incident management, including triage, escalation, coordination, and communication
Drive root cause analysis (RCA) and follow-through on corrective and preventive actions
Ensure operational readiness for GPU-dense infrastructure, including power, cooling, and hardware health monitoring
Manage, schedule, and develop IT support engineers operating in shift-based / 24×7 environments
Define and track KPIs, SLAs, and service quality metrics
Provide hands-on guidance during complex troubleshooting scenarios
Maintain consistent operational standards through runbooks, SOPs, and playbooks
Oversee diagnosis and resolution of issues related to servers, GPU systems, networking equipment, and cabling
Manage hardware lifecycle activities, including installations, upgrades, swaps, and decommissioning
Coordinate RMAs, spare parts, inventory accuracy, and asset tracking
Execute approved changes and maintenance activities with minimal risk
Identify recurring issues and drive process improvements to reduce incidents and MTTR
Ensure adherence to ITIL / ITSM operational processes
Act as the operational interface to vendors, OEMs, and colocation providers for day-to-day support issues
Support audits, compliance checks, and operational controls related to asset handling and access
Ensure secure handling, storage, and decommissioning of IT assets
Qualification
Required
3–5+ years of experience in IT support or data center operations, including people management
Strong hands-on experience with server hardware, including exposure to GPU-based systems
Solid understanding of data center operations, networking basics, and structured cabling
Experience leading incident response and operational troubleshooting
Working knowledge of ITIL / ITSM frameworks
Comfortable working with Linux systems and basic command-line tools
Strong organizational skills and ability to prioritize in high-pressure environments
Clear, concise communication skills for technical and non-technical stakeholders
Preferred
Experience in Neocloud, hyperscale, or AI/HPC environments
Prior ownership of 24/7 support operations
ITIL certification
Familiarity with GPU health monitoring, firmware, or platform tooling
Experience working with colocation facilities
Benefits
Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.
Company
Nebius
The Nebius AI Cloud brings powerful full-stack infrastructure for AI developers and practitioners across startups, enterprises and science institutes to build and deploy generative AI applications and rapidly deliver scientific breakthroughs by training and running ML models within a secure, high-performance, and cost-optimized cloud environment.
Funding
Current Stage
Late StageTotal Funding
$1.04B2025-06-04Debt Financing· $1B
2025-05-15Grant· $45M
2024-12-02Seed
Recent News
2025-10-25
Company data provided by crunchbase