Techgene Solutions · 15 hours ago
Senior Solution Architect
Techgene Solutions is seeking a high-performance Senior Solution Architect to lead the convergence of traditional High-Performance Computing (HPC) environments with modern cloud-native architectures. The architect will design, integrate, and optimize large-scale, containerized, hybrid HPC environments while ensuring compliance with ITAR standards.
Responsibilities
Architect end-to-end hybrid cloud solutions integrating Mirantis Container Cloud with dedicated HPC clusters
Balance performance, elasticity, and compliance requirements across on-prem and cloud environments
Produce architecture documentation in adherence with ITAR export-controlled standards and review practices
Design and implement HPC job scheduling strategies using Slurm, Volcano, LAVA, or similar technologies
Support deterministic resource allocation for AI/ML analytics, physics simulations, and scientific workloads
Ensure schedulers meet ITAR-restricted workload isolation and audit requirements
Apply best practices for high-performance containerization: multi-stage builds, minimal base images, and resource tuning (CPU, GPU, Memory)
Implement strategies to minimize overhead, ensure stability, and eliminate noisy-neighbor issues
Architect and operate an enterprise-grade ELK Stack (Elasticsearch, Logstash, Kibana) tuned for HPC-scale environments
Manage Index Lifecycle Management (ILM) for massive log throughput while preserving traceability for compliance audits
Build IaC-driven automation pipelines using Terraform, Ansible, and GitOps workflows
Automate deployment of Mirantis Kubernetes Engine (MKE) and integrated HPC schedulers within an ITAR-secured environment
Implement robust CI/CD workflows using Jenkins, GitLab CI, Argo Workflows, or similar tools
Ensure pipelines comply with ITAR policies, including artifact access control, secure registries, and encrypted transport
Architect integration between Kubernetes and traditional HPC schedulers
Enable advanced workloads requiring high-speed interconnects such as InfiniBand, RDMA, or GPU-accelerated clusters
Qualification
Required
Legally authorized to access and handle U.S. export-controlled technical data
Architect end-to-end hybrid cloud solutions integrating Mirantis Container Cloud with dedicated HPC clusters
Balance performance, elasticity, and compliance requirements across on-prem and cloud environments
Produce architecture documentation in adherence with ITAR export-controlled standards and review practices
Design and implement HPC job scheduling strategies using Slurm, Volcano, LAVA, or similar technologies
Support deterministic resource allocation for AI/ML analytics, physics simulations, and scientific workloads
Ensure schedulers meet ITAR-restricted workload isolation and audit requirements
Apply best practices for high-performance containerization: multi-stage builds, minimal base images, and resource tuning (CPU, GPU, Memory)
Implement strategies to minimize overhead, ensure stability, and eliminate noisy-neighbor issues
Architect and operate an enterprise-grade ELK Stack (Elasticsearch, Logstash, Kibana) tuned for HPC-scale environments
Manage Index Lifecycle Management (ILM) for massive log throughput while preserving traceability for compliance audits
Build IaC-driven automation pipelines using Terraform, Ansible, and GitOps workflows
Automate deployment of Mirantis Kubernetes Engine (MKE) and integrated HPC schedulers within an ITAR-secured environment
Implement robust CI/CD workflows using Jenkins, GitLab CI, Argo Workflows, or similar tools
Ensure pipelines comply with ITAR policies, including artifact access control, secure registries, and encrypted transport
Architect integration between Kubernetes and traditional HPC schedulers
Enable advanced workloads requiring high-speed interconnects such as InfiniBand, RDMA, or GPU-accelerated clusters
Expertise in Docker Runtime, Mirantis Kubernetes Engine (MKE), and Lens Desktop management
Deep experience designing containerized workloads for HPC environments
Hands-on experience with Slurm, PBS, or Kubernetes-native batch schedulers such as Volcano
Knowledge of hierarchical priority queues, gang scheduling, and resource fairness algorithms
Strong understanding of Logstash pipeline performance optimization, Elasticsearch sharding strategies, and Kibana visualization design
Experience with NVIDIA Enroot/Pyxis or equivalent technologies supporting near bare-metal container performance
Implement secure registry solutions, TLS encryption, RBAC, and identity-driven access controls
Demonstrated experience supporting compliance frameworks including ITAR, NIST 800-53, or similar
10+ years in systems architecture or engineering roles
5+ years in HPC, Cloud Infrastructure, or enterprise-scale DevOps environments
Understanding of MPI (Message Passing Interface), GPU compute workloads, low-latency networks, and distributed parallel frameworks
Experience with AWS HPC environments (EKS, AWS Batch, EKS for Lustre, EC2 GPU-accelerated instances)
Preferred
Certified Kubernetes Administrator (CKA)
Mirantis Kubernetes certifications
Relevant security/compliance certifications (a plus)