Penguin Solutions · 2 weeks ago
Kubernetes System Architect
Penguin Solutions is a company that specializes in software products for managing large computational systems. The Kubernetes System Architect will focus on Kubernetes and container orchestration technologies, working closely with engineering teams to design and implement robust integrations that enhance capabilities for AI and HPC environments.
Artificial Intelligence (AI)Cloud ComputingEnterprise Software
Responsibilities
Define and architect Kubernetes integration strategies within the ICE ClusterWare platform to enable containerized workloads and hybrid cluster orchestration
Design scalable, secure, and resilient Kubernetes-based infrastructure for HPC and AI compute environments
Develop architectural blueprints for cluster lifecycle management, service discovery, and workload scheduling across on-premise and hybrid infrastructures
Evaluate emerging CNCF ecosystem technologies (e.g., operators, CRDs, service meshes, observability stacks) and guide adoption strategies
Provide technical leadership in Kubernetes administration, troubleshooting, and performance optimization
Define best practices for all aspects of Kubernetes cluster configuration, scaling, and upgrade strategies
Collaborate with software engineering teams to integrate Kubernetes APIs and services into ICE ClusterWare’s management and monitoring subsystems
Enable seamless integration of Kubernetes with existing cluster management workflows, job schedulers, and monitoring frameworks
Administer and maintain Kubernetes clusters, including cluster creation, upgrades, node management, and scaling
Drive consistency in configuration, security, and policy enforcement across multi-cluster deployments
Implement observability and reliability frameworks for monitoring, logging, and alerting using leveraging Kubernetes-native tools such as Prometheus, Grafana, and OpenTelemetry
Manage and optimize cluster networking, including CNI plugin configuration (e.g., Calico, Cilium), ingress controllers, and service meshes
Configure and maintain persistent storage solutions in Kubernetes using dynamic provisioning, CSI drivers, and storage classes
Manage authentication, authorization, and access control through RBAC, service accounts, and integration with external identity providers
Serve as the internal Kubernetes subject matter expert and mentor for engineering peers
Partner with automation teams to ensure system reliability through automation and Infrastructure-as-Code methodologies
Partner with software engineers to guide Kubernetes-aware feature design and API development
Work alongside Product Architects and Product Managers to align architectural decisions with product roadmap and customer use cases
Qualification
Required
Bachelor's degree in Computer Science, Software Engineering, Systems Engineering, or a related technical field—or equivalent experience
Minimum 7–10 years of experience in software or systems engineering, with at least 4 years of hands-on Kubernetes cluster administration and architecture experience
Deep understanding of Kubernetes control plane, networking, security, and storage subsystems
Proven experience designing and operating multi-node, multi-cluster Kubernetes environments in production
Strong familiarity with Linux-based environments and cluster management systems
Understanding of microservices architectures, container runtime interfaces, and cloud-native design principles
Experience with Infrastructure as Code (e.g., Terraform, Ansible, or equivalent) and automation frameworks
Ability to translate system-level requirements into practical, scalable Kubernetes solutions
Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash, etc.)
Excellent communication skills, capable of conveying complex infrastructure concepts to software development teams
Self-motivated and capable of working independently while maintaining strong team collaboration
Preferred
Understanding of microservices architectures, container runtime interfaces, and cloud-native design principles
Experience with HPC and AI cluster workloads in Kubernetes environments
Knowledge of GPU scheduling, device plugins, and high-performance networking within Kubernetes
Familiarity with Helm and other deployment automation tools
Experience with various Kubernetes distributions and vendor platforms (e.g., Red Hat OpenShift, Rancher RKE2, Canonical MicroK8s, VMware Tanzu, or similar enterprise-managed Kubernetes solutions)
Kubernetes certifications (CKA, CKAD, or CKS) highly valued
Benefits
Medical, dental, and vision benefits
401k saving plan
Paid Time Off
Life Insurance
Employee Assistance Plan
Company
Penguin Solutions
At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.
Funding
Current Stage
Late StageTotal Funding
$19.39MKey Investors
vSpring Capital
2018-06-11Acquired
2011-04-20Series D· $1M
2009-11-09Series Unknown· $1.5M
Recent News
Inside HPC & AI News | High-Performance Computing & Artificial Intelligence
2025-07-30
Company data provided by crunchbase