Apply on Employer Site

fal · 5 hours ago

Sr. Linux System Administrator

United States

Full-time

Remote

Senior Level, Lead/Staff

8+ years exp

fal is a company focused on maintaining the health, security, and performance of Linux systems at scale. The Sr. Linux System Administrator will be responsible for managing the bare-metal and OS-level foundation for their GPU cloud, ensuring optimal performance and security across a fleet of servers.

Artificial Intelligence (AI)SoftwareInformation TechnologyAI InfrastructureDeveloper PlatformMachine Learning

H1B Sponsored

Responsibilities

Own the full lifecycle of our bare-metal GPU server fleet: provisioning, imaging, configuration management, patching, and decommissioning across multiple data centers and providers

Build and maintain our server automation stack using Ansible, Terraform, and custom tooling to manage OS configuration, kernel parameters, driver versions, and firmware updates at scale

Tune Linux systems for AI workloads: kernel parameters, NUMA topology, CPU pinning, hugepages, I/O schedulers, and GPU driver stack optimization (NVIDIA drivers, CUDA, container runtimes)

Manage and optimize distributed and local storage systems supporting model weights, checkpoints, and ephemeral scratch: NVMe arrays, NFS, parallel file systems, and object storage

Implement and enforce OS-level security: hardening baselines, SELinux/AppArmor policies, SSH key management, vulnerability scanning, and compliance automation

Own system observability: deploy and maintain node-level metrics collection, log aggregation, and alerting using Prometheus, node_exporter, Loki, and Grafana

Collaborate with the Compute platform team to ensure smooth integration between our infrastructure layer (K8s, Nomad, FluxCD) and the underlying Linux hosts

Qualification

Linux administrationKernel tuningConfiguration managementStorage technologiesNVIDIA GPU softwarePython scriptingBash scriptingCommunicationSelf-starter mindset

Required

8+ years of experience administering Linux systems at scale, ideally in GPU cloud, HPC, or large bare-metal environments

Deep expertise in Linux internals: systemd, kernel tuning (sysctl, cgroups, namespaces), boot process, package management, and performance profiling (perf, bpftrace, sar)

Strong experience with configuration management and infrastructure-as-code: Ansible, Terraform, cloud-init, PXE/iPXE, and custom imaging pipelines

Solid understanding of storage technologies: LVM, RAID, NVMe, NFS, Lustre or GPFS, and Linux I/O stack tuning

Familiarity with the NVIDIA GPU software stack: drivers, CUDA toolkit, nvidia-smi, MIG, and container runtimes (nvidia-container-toolkit)

Proficiency in Python and Bash scripting for automation, monitoring, and fleet management tooling

Excellent communication and a self-starter mindset—you take ownership and constantly seek improvement

Preferred

Experience operating Kubernetes on bare metal (kubeadm, Kubespray) and managing GPU scheduling in K8s (device plugins, MIG slicing)

Hands-on experience with BMC/IPMI/Redfish for out-of-band server management and firmware lifecycle automation

Familiarity with fleet-scale observability: Prometheus federation, Thanos, or Victoria Metrics for multi-cluster monitoring

Contributions to open-source infrastructure tooling or Linux distributions

Experience with compliance frameworks relevant to cloud providers (SOC 2, ISO 27001)

Benefits

Competitive salary and equity

Health, dental, and vision insurance (US)

Regular team events and offsite

Company

fal

Fal is a generative media platform that helps developers create applications using AI models.

Founded in 2021

San Francisco, California, USA

51-200 employees

https://www.fal.ai

Funding

Current Stage

Late Stage

Total Funding

$337M

Key Investors

Sequoia CapitalMeritech Capital PartnersAndreessen Horowitz,Notable Capital

2025-12-09Series D· $140M

2025-07-31Series C· $125M

2025-02-12Series B· $49M

Leadership Team

Burkay Gur

Co-Founder

Gorkem Yurtseven

Co-Founder

Recent News

Nordic 9

Fal ai raised a $9M seed round led by Andreessen Horowitz.

2026-02-04

Sourcery

FOMO: All the deals you missed in December

2026-01-07

PYMNTS.com

From One-Person Companies to Generative Media, AI Funding Spans the Stack

2025-12-16

Company data provided by crunchbase