DigitalOcean · 8 hours ago
Staff Engineer (Fleet Performance)
Maximize your interview chances
Cloud ComputingDevOps
Growth OpportunitiesH1B Sponsor Likely
Insider Connection @DigitalOcean
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Develop and implement comprehensive performance metrics, analysis tools, and reporting systems
Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
Collaborate with hardware engineering teams and vendors to continuously validate GPU fabric performance
Engage with the open-source Linux community to advance virtualization technologies and integrate them into our fleet
Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, and network stack to devise optimization strategies
Identify system bottlenecks proactively and drive optimizations across the hypervisor software stack
Work cross-functionally to harness new performance capabilities from evolving hardware architectures
Enhance test frameworks, harnesses, and pipelines to ensure robust performance validation
Investigate and resolve virtual machine downtime and performance issues in our production environment
Participate in on-call rotations as needed to support system reliability
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Bachelor's or Master's degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
Extensive knowledge of Linux kernel, hypervisors, and open-source operating systems
7+ experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, and NCCL
5+ years developing strategies for managing, monitoring, and analyzing infrastructure, applications and services
Strong proficiency in Go, Python, and/or Ruby
Deep understanding of kernel performance aspects, including scheduling, context switching, and hardware acceleration
Expertise in distributed systems performance, including tracing and debugging methodologies
Knowledge of GPU technology, GPU fabrics, and programming for multi-GPU workloads
Demonstrated ability to solve complex problems at scale
Strong security mindset with proactive approach to implementing best practices
Excellent cross-team collaboration and communication skills
Leadership experience in skills development and mentorship
Professional-level written and spoken English with strong presentation abilities
Preferred
Experience with observability platforms such as Splunk, Prometheus, Grafana, Elastic, or Dynatrace
Proficiency in C programming language
Proficiency in compiler-level performance optimization techniques
Experience with Chef, AWX, and/or Kubernetes
Familiarity with x86_64 and/or ARM architectures
Successful history of upstreaming Linux kernel patches
In-depth knowledge of at least one Linux subsystem (CPU scheduling, memory management, file system, I/O, etc.)
Experience in developing and deploying ML-based solutions for anomaly detection and dynamic load balancing
Benefits
Reimbursement for relevant conferences, training, and education
Access to LinkedIn Learning's 10,000+ courses
One-time work from home stipend
Wellness allowance
Flexible time off policy
Equity compensation to eligible employees
Equity grants upon hire
Option to participate in our Employee Stock Purchase Program
Company
DigitalOcean
DigitalOcean provides a cloud platform to deploy, manage, and scale applications of any size.
H1B Sponsorship
DigitalOcean has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (3)
2022 (19)
2021 (19)
2020 (10)
Funding
Current Stage
Public CompanyTotal Funding
$491.28MKey Investors
Global Secure InvestAccess IndustriesKeyBanc Capital Markets
2021-09-13Post Ipo Equity· $34.91M
2021-03-23IPO· undefined
2021-01-01Series Unknown· undefined
Recent News
2024-11-19
2024-11-15
2024-11-05
Company data provided by crunchbase