Kubernetes Platform Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oak Ridge National Laboratory · 1 day ago

Kubernetes Platform Engineer

Oak Ridge National Laboratory is seeking a Kubernetes Platform Engineer to improve the security, performance, and reliability of its computing infrastructure. The role involves operating, implementing, and maintaining on-premises Kubernetes clusters, focusing on scalability and reliability while collaborating with cybersecurity and development teams.

Advanced MaterialsClean EnergyEnergyEnergy ManagementManufacturingNuclearRenewable Energy
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Assist in day-to-day operations for on-premises and cloud-based Kubernetes clusters
Develop and integrate critical components for networking, CI/CD tooling, OS management, service mesh, and Kubernetes operators—excluding observability, which is handled by a dedicated SRE sub-team
Build test environments to evaluate tooling based on performance, feature set, and maintainability—especially for components that must work reliably with on-premise hardware and OS requirements
Work on upgrades, security hardening, monitoring integration, and scalability of all cluster infrastructure
Participant in an on-call rotation
Write and maintain infrastructure and deployment code using tools such as ArgoCD (GitOps), Puppet (OS management), Go, Python, Bash, and GitLab CI
Support the use and understanding of in-house Kubernetes operators and serve as a secondary maintainer for those controllers
Write and maintain automation for infrastructure provisioning and Kubernetes cluster lifecycle management – most of our K8s clusters run on bare metal
Write and maintain automation for configuration of host OS and Kubernetes tooling
Understand the principles of enterprise data centers and hardware management
Partner closely with internal cybersecurity and development teams to ensure the platform meets security, compliance, and usability expectations
Participate in cross-functional projects related to platform enhancements and cluster lifecycle automation
Be able to represent the Platforms team with vendors and both internal collaborators and partners

Qualification

KubernetesLinux SysadminInfrastructure as CodeCI/CD toolingScriptingAutomationData center hardwareTeam collaborationInterpersonal skillsProblem-solving

Required

BS degree in a scientific field and 5+ years of relevant experience or equivalent experience
At least three years of experience with Kubernetes cluster administration
At least three years of experience with Linux system administration
Knowledge of data center hardware and infrastructure management (IPMI, OpenManage, etc)
Experience with Infrastructure-as-Code tooling such as Terraform, Helm, and Puppet
Experience with CI/CD tooling and GitOps
Experience with code review and familiarity with tools like git, GitHub and GitLab
This position requires the ability to obtain and maintain an HSPD-12 PIV badge
For employment at Oak Ridge National Laboratory (ORNL), a Real ID compliant form of identification will be required
Additionally, ORNL is subject to Department of Energy (DOE) access restrictions. All employees must also be able to obtain and maintain a federal Personal Identity Verification (PIV) card as mandated by Homeland Security Presidential Directive 12 (HSPD-12) and Department of Energy (DOE) Order 473.1A, which requires a favorable post-employment background investigation
To obtain this credential, new employees must successfully complete and pass a Federal Tier 1 background check investigation. This investigation includes a declaration of illegal drug activities, including use, supply, possession, or manufacture within the last year. This includes marijuana and cannabis derivatives, which are still considered illegal under federal law, regardless of state laws
If you have not resided in the U.S. for three consecutive years, you are not eligible for the PIV credential and instead will need to obtain a favorable Local Site Specific Only (LSSO) risk determination to maintain employment
Once you meet the three-year residency requirement, you will be required to obtain a PIV credential to maintain employment

Preferred

Excellent interpersonal/communications skills, and the ability to work as part of a team
Experience with managing image registries such as Quay or Harbor
Solid understanding of networked computing environment concepts
Strong working knowledge of Unix systems fundamentals and common network protocols
Ability to develop and maintain programs and scripts that aid in the operation and automation of tasks using various shell and scripting languages (primarily bash, Python, and Go)
Ability to identify requirements and to define, plan, and implement requisite solutions
Experience using tools such as Prometheus, Nagios, and Grafana to monitor systems, metrics and create dashboards
Experience designing and implementing highly available systems/services
Experience with Site Reliability Engineering for Kubernetes infrastructure and application deployments

Benefits

Prescription Drug Plan
Dental Plan
Vision Plan
401(k) Retirement Plan
Contributory Pension Plan
Life Insurance
Disability Benefits
Generous Vacation and Holidays
Parental Leave
Legal Insurance with Identity Theft Protection
Employee Assistance Plan
Flexible Spending Accounts
Health Savings Accounts
Wellness Programs
Educational Assistance
Relocation Assistance
Employee Discounts

Company

Oak Ridge National Laboratory

company-logo
Oak Ridge National Laboratory holds a range of R&D assignments, from fundamental nuclear physics to applied R&D on advanced energy systems.

Funding

Current Stage
Late Stage
Total Funding
$9.8M
Key Investors
US Department of Energy
2023-09-21Grant· $4.8M
2023-07-27Grant
2022-03-14Grant· $5M

Leadership Team

leader-logo
Arjun Shankar
Division Director, National Center for Computational Sciences, Oak Ridge National Laboratory
linkedin
leader-logo
Brett Ellis
Division Director - Research Computing Support
linkedin
Company data provided by crunchbase