Nebius · 1 day ago
Support Engineer L2
Maximize your interview chances
Insider Connection @Nebius
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Diagnose and resolve technical issues efficiently, focusing on Linux, networking and Kubernetes environments.
Troubleshoot software, network and storage issues, documenting solutions for future reference.
Apply Linux skills to manage OS-level issues, utilize basic networking knowledge, support Kubernetes environments and use Python/Bash scripting for automation.
Understand data storage concepts for diagnosing storage-related issues.
Provide timely updates to customers, communicate complex issues clearly and escalate unresolved issues as needed.
Create and update technical documentation and mentor L1 support staff on recurring issues.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
5+ years in technical support with Linux and networking experience.
Mid-level Linux, basic networking, Kubernetes, Python/Bash scripting and data storage knowledge.
An understanding of how GPUs accelerate ML workloads.
The ability to assist with resource provisioning, scaling, and integration within ML workflows.
Familiarity with CUDA, Tensor Cores, and distributed training across multiple GPUs.
The ability to troubleshoot memory errors, driver/library mismatches, and GPU utilization bottlenecks.
The ability to debug common errors during model training (e.g., OOM errors, version compatibility issues).
Knowledge of Docker (for packaging ML workflows) and Kubernetes (for scaling and managing GPU workloads in cloud environments).
Preferred
Bachelor’s degree in Computer Science, Information Technology or a related field preferred.
Company
Nebius
Cloud platform specifically designed to train AI models
Funding
Current Stage
Public CompanyTotal Funding
$700M2024-12-02Post Ipo Equity· $700M
2024-10-21IPO
Recent News
High-Performance Computing News Analysis | insideHPC
2024-12-04
2024-10-24
2024-10-22
Company data provided by crunchbase