200+ applicants

Company

Nebius · 2 days ago

Senior Support Engineer L2

United States

Full-time

Remote

Senior Level

$144K/yr - $180K/yr

7+ years exp

Maximize your interview chances

Cloud InfrastructureGPU

Growth Opportunities

Hiring Manager

Daniella Kim

Insider Connection @Nebius

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Diagnose and resolve escalated issues with high proficiency in Linux, networking, Kubernetes and data storage, minimizing downtime.

Lead complex troubleshooting efforts and document solutions for use across teams.

Apply advanced Linux skills for efficient OS management and problem resolution.

Utilize in-depth networking knowledge to troubleshoot and optimize network configurations.

Manage containerized applications within Kubernetes environments, handling complex deployments and ensuring service continuity.

Use advanced Python and Bash scripting to automate tasks, streamline workflows, and improve team efficiency.

Demonstrate deep understanding of data storage concepts to diagnose storage issues and optimize data management practices.

Lead, mentor, and develop a support team of 5+ engineers, sharing technical knowledge and best practices.

Collaborate with internal teams and provide guidance to L1 support to enhance overall service quality.

Foster a supportive team environment, promote continuous learning and drive efficiency.

Ensure clear, professional updates to customers, explaining complex issues in a user-friendly way.

Oversee escalations to higher-level support or engineering teams, ensuring adherence to escalation protocols.

Create, update and oversee technical documentation, troubleshooting guides and knowledge base articles.

Identify recurring issues, recommend improvements, and implement best practices to enhance service reliability and team efficiency.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

LinuxKubernetesPythonBashData StorageTechnical LeadershipDocumentation

Required

7+ years in technical support with advanced skills in Linux and networking; experience managing and mentoring a support team of 5+ engineers.

Advanced expertise in Linux administration and troubleshooting.

Strong networking knowledge, including protocols, IP configurations and diagnostics.

Knowledge of Docker (for packaging ML workflows) and Kubernetes (for scaling and managing GPU workloads in cloud environments).

Proficient in Python and Bash for complex automation and task management.

In-depth understanding of data storage principles, types and management.

An understanding of how GPUs accelerate ML workloads.

The ability to assist with resource provisioning, scaling, and integration within ML workflows.

Familiarity with CUDA, Tensor Cores, and distributed training across multiple GPUs.

The ability to troubleshoot memory errors, driver/library mismatches, and GPU utilization bottlenecks.

The ability to debug common errors during model training (e.g., OOM errors, version compatibility issues).

Preferred

Bachelor’s degree in Computer Science, Information Technology or related field preferred.

Company

Nebius

Cloud platform specifically designed to train AI models

Founded in 2022

Amsterdam, Noord-Holland, NLD

201-500 employees

https://nebius.ai

Funding

Current Stage

Public Company

Total Funding

$700M

2024-12-02Post Ipo Equity· $700M

2024-10-21IPO

Recent News

High-Performance Computing News Analysis | insideHPC

Nebius Announces $700M Equity Financing for AI Infrastructure Rollout

2024-12-04

Intellinews

Nebius aims to become a leading AI infrastructure provider

2024-10-24

TechStartups

Nebius, AI Spin-Off from Russia’s Yandex, Surges 5.6% in IPO Debut

2024-10-22

Company data provided by crunchbase

Orion

Your AI Copilot