Infrastructure Operations Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Voltage Park · 3 weeks ago

Infrastructure Operations Engineer

Voltage Park is your enterprise AI factory, offering scalable compute power and bare metal AI infrastructure. We are seeking a highly skilled Infrastructure Operations Engineer to ensure the stability and performance of our compute, storage, and platform infrastructure, supporting AI/ML training and HPC workloads at scale.

AI InfrastructureCloud ComputingMachine Learning
badNo H1Bnote

Responsibilities

At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features
Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases
Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams
Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position

Qualification

LinuxAWSKubernetesTerraformAnsibleNetwork storage managementMonitoring systemsPythonNetworking fundamentalsCommunicationProblem-solvingDocumentation

Required

8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience
5+ years experience with AWS
2+ years experience with Kubernetes and strong container fundamentals
2+ years experience with Terraform and Ansible
2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems
Experience working in a Slack-first, asynchronous remote work environment
Experience with monitoring systems (Prometheus, ELK stack)
Familiarity with the gitops workflow
Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together
Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband
Experience building and delivering complex systems
Effective at navigating tradeoffs between design, risk, cost, and outcomes
Comfortable with navigating ambiguity
Strong written and oral communication

Preferred

Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware
Experience with GPU servers, both in bare metal form or under virtualization
Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors
Experience with VAST storage systems

Company

Voltage Park

twittertwitter
company-logo
Voltage Park provides infrastructure for machine learning.

Funding

Current Stage
Growth Stage
Total Funding
$500M
2023-10-30Undisclosed· $500M

Leadership Team

leader-logo
Eric Park
Chief Executive Officer
linkedin
leader-logo
Mike Xia
Chief Product Officer
linkedin
Company data provided by crunchbase