Kubernetes System Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

Penguin Solutions · 2 weeks ago

Kubernetes System Architect

Penguin Solutions is a company that specializes in software products for managing large computational systems. The Kubernetes System Architect will focus on Kubernetes and container orchestration technologies, working closely with engineering teams to design and implement robust integrations that enhance capabilities for AI and HPC environments.

Artificial Intelligence (AI)Cloud ComputingEnterprise Software

Responsibilities

Define and architect Kubernetes integration strategies within the ICE ClusterWare platform to enable containerized workloads and hybrid cluster orchestration
Design scalable, secure, and resilient Kubernetes-based infrastructure for HPC and AI compute environments
Develop architectural blueprints for cluster lifecycle management, service discovery, and workload scheduling across on-premise and hybrid infrastructures
Evaluate emerging CNCF ecosystem technologies (e.g., operators, CRDs, service meshes, observability stacks) and guide adoption strategies
Provide technical leadership in Kubernetes administration, troubleshooting, and performance optimization
Define best practices for all aspects of Kubernetes cluster configuration, scaling, and upgrade strategies
Collaborate with software engineering teams to integrate Kubernetes APIs and services into ICE ClusterWare’s management and monitoring subsystems
Enable seamless integration of Kubernetes with existing cluster management workflows, job schedulers, and monitoring frameworks
Administer and maintain Kubernetes clusters, including cluster creation, upgrades, node management, and scaling
Drive consistency in configuration, security, and policy enforcement across multi-cluster deployments
Implement observability and reliability frameworks for monitoring, logging, and alerting using leveraging Kubernetes-native tools such as Prometheus, Grafana, and OpenTelemetry
Manage and optimize cluster networking, including CNI plugin configuration (e.g., Calico, Cilium), ingress controllers, and service meshes
Configure and maintain persistent storage solutions in Kubernetes using dynamic provisioning, CSI drivers, and storage classes
Manage authentication, authorization, and access control through RBAC, service accounts, and integration with external identity providers
Serve as the internal Kubernetes subject matter expert and mentor for engineering peers
Partner with automation teams to ensure system reliability through automation and Infrastructure-as-Code methodologies
Partner with software engineers to guide Kubernetes-aware feature design and API development
Work alongside Product Architects and Product Managers to align architectural decisions with product roadmap and customer use cases

Qualification

KubernetesLinux-based environmentsInfrastructure as CodeScripting languagesKubernetes certificationsMicroservices architecturesCommunication skillsTeam collaboration

Required

Bachelor's degree in Computer Science, Software Engineering, Systems Engineering, or a related technical field—or equivalent experience
Minimum 7–10 years of experience in software or systems engineering, with at least 4 years of hands-on Kubernetes cluster administration and architecture experience
Deep understanding of Kubernetes control plane, networking, security, and storage subsystems
Proven experience designing and operating multi-node, multi-cluster Kubernetes environments in production
Strong familiarity with Linux-based environments and cluster management systems
Understanding of microservices architectures, container runtime interfaces, and cloud-native design principles
Experience with Infrastructure as Code (e.g., Terraform, Ansible, or equivalent) and automation frameworks
Ability to translate system-level requirements into practical, scalable Kubernetes solutions
Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash, etc.)
Excellent communication skills, capable of conveying complex infrastructure concepts to software development teams
Self-motivated and capable of working independently while maintaining strong team collaboration

Preferred

Understanding of microservices architectures, container runtime interfaces, and cloud-native design principles
Experience with HPC and AI cluster workloads in Kubernetes environments
Knowledge of GPU scheduling, device plugins, and high-performance networking within Kubernetes
Familiarity with Helm and other deployment automation tools
Experience with various Kubernetes distributions and vendor platforms (e.g., Red Hat OpenShift, Rancher RKE2, Canonical MicroK8s, VMware Tanzu, or similar enterprise-managed Kubernetes solutions)
Kubernetes certifications (CKA, CKAD, or CKS) highly valued

Benefits

Medical, dental, and vision benefits
401k saving plan
Paid Time Off
Life Insurance
Employee Assistance Plan

Company

Penguin Solutions

twittertwittertwitter
company-logo
At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.

Funding

Current Stage
Late Stage
Total Funding
$19.39M
Key Investors
vSpring Capital
2018-06-11Acquired
2011-04-20Series D· $1M
2009-11-09Series Unknown· $1.5M

Leadership Team

leader-logo
Phillip Pokorny
Chief Technology Officer
linkedin
leader-logo
Alex Lin
Sr. Technical Product Manager
linkedin
Company data provided by crunchbase