Cirrascale Cloud Services · 1 day ago
Network Operations Center Technician
Cirrascale Cloud Services provides high-performance cloud infrastructure for deep learning and AI workloads. As a Network Operations Technician, you will maintain the integrity and performance of GPU-based data centers, focusing on advanced technical operations, troubleshooting, and mentorship.
Responsibilities
Respond to alerts and incidents for systems, jobs, and GPU cluster failures
Troubleshoot and repair servers, GPU clusters, and network equipment at global datacenter locations
Collaborate with NOC I and NOC II to resolve tickets
Lead resolution efforts for complex and critical incidents and upgrades, escalating to the Supervisor as needed
Assist customers with ticket triage and advanced troubleshooting using Jira (Atlassian)
Create, optimize, and maintain procedures, runbooks, and automation scripts to support NOC efficiency
Help NOC supervisor tune and maintain customer dashboards, and/or runbooks
Monitor system performance, support capacity planning, and analyze GPU cluster utilization
Collaborate with Development Engineering to refine alerting and monitoring tools
Document incidents, alerts, system updates, and configurations in alignment with NOC standards
Serve as the sole trainer for all new NOC employees, providing structured onboarding while remaining under Supervisor guidance
Develop and maintain a 5-day training SOP, broken down by day, covering hands-on practice, SOP/script reviews, shadowing, and reverse shadowing
Focus training across all NOC roles (I, II, III) to ensure readiness
Evaluate new hires and sign off at the end of the training week, reporting outcomes to the Supervisor
Standardize training to ensure consistency, freeing other team members from ad hoc onboarding tasks
Support ongoing mentorship and coaching under the direction of the Supervisor
Work closely with the Supervisor and NOC Manager to execute operational priorities and maintain team workflow
Participate in shift handovers and on-call rotations as needed, escalating issues to the Supervisor when appropriate
Support process improvements, SOP updates, and documentation initiatives driven by the Supervisor or NOC Manager
Qualification
Required
4-6 years of experience in NOC, HPC (high-performance computing), AI infrastructure, cloud systems, or related
Strong scripting skills (Python, Bash, or similar), with GPU monitoring experience
Advanced troubleshooting experience in HPC datacenter networking and GPU clusters
Excellent analytical, problem-solving, and organizational skills
Strong written and verbal communication skills; customer-facing experience is critical
Experience with SuperMicro, Lenovo, and Dell servers strongly recommended
Familiarity with Jira ticketing, Microsoft 365 Suite, Slack, and Microsoft Teams
Preferred
Remote & Hands-on experience with Linux Ubuntu 22.04 and 24.04 is preferred
Certifications: Advanced Linux, Kubernetes (CKA/CKAD), Docker, or AI/ML certifications preferred
Understanding of RMAs, logistics, shipping, and receiving is a plus
Mentorship & Training Leadership: Serve as a key mentor for the team by training and coaching NOC Technician I & II staff, providing day-to-day guidance, knowledge transfer, and performance support; requires prior experience mentoring and training junior or lower-level peers in a technical operations environment. (for NOC Technician III only)
Benefits
401(k) with company match.
Health, dental, and vision insurance.
Paid time off (PTO).
Opportunities for professional development and growth.
Company
Cirrascale Cloud Services
Cirrascale Cloud Services is a premier provider of public and private dedicated, GPU & IPU cloud solutions enabling deep learning.
Funding
Current Stage
Growth StageTotal Funding
$59.05MKey Investors
The Carlyle Group
2017-05-08Acquired
2014-12-30Series Unknown· $2M
2012-04-13Debt Financing· $5.38M
Recent News
2025-10-04
2025-07-09
GlobeNewswire News Room
2024-11-04
Company data provided by crunchbase