GenBio AI · 5 hours ago
High Performance Computing (HPC) Engineer
GenBio AI is a newly established start-up headquartered in Silicon Valley, dedicated to transforming biology and medicine through Generative AI. They are seeking a High Performance Computing (HPC) Engineer to design, deploy, and maintain high-performance GPU clusters while implementing distributed computing techniques for large deep learning models.
Responsibilities
GPU Cluster Management: Design, deploy, and maintain high-performance GPU clusters, ensuring their stability, reliability, and scalability. Monitor and manage cluster resources to maximize utilization and efficiency
Distributed/Parallel Training: Implement distributed computing techniques to enable parallel training of large deep learning models across multiple GPUs and nodes. Optimize data distribution and synchronization to achieve faster convergence and reduced training times
Performance Optimization: Fine-tune GPU clusters and deep learning frameworks to achieve optimal performance for specific workloads. Identify and resolve performance bottlenecks through profiling and system analysis
Deep Learning Framework Integration: Collaborate with data scientists and machine learning engineers to integrate distributed training capabilities into GenBio AI’s model development and deployment frameworks
Scalability and Resource Management: Ensure that the GPU clusters can scale effectively to handle increasing computational demands. Develop resource management strategies to prioritize and allocate computing resources based on project requirements
Troubleshooting and Support: Troubleshoot and resolve issues related to GPU clusters, distributed training, and performance anomalies. Provide technical support to users and resolve technical challenges efficiently
Documentation: Create and maintain documentation related to GPU cluster configuration, distributed training workflows, and best practices to ensure knowledge sharing and seamless onboarding of new team members
Qualification
Required
Master's or Ph.D. degree in computer science, or a related field with a focus on High-Performance Computing, Distributed Systems, or Deep Learning
2+ years proven experience in managing GPU clusters, including installation, configuration, and optimization
Strong expertise in distributed deep learning and parallel training techniques
Proficiency in popular deep learning frameworks like PyTorch, Megatron-LM, DeepSpeed, etc
Programming skills in Python and experience with GPU-accelerated libraries (e.g., CUDA, cuDNN)
Knowledge of performance profiling and optimization tools for HPC and deep learning
Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes)
Strong background in distributed systems, cloud computing (AWS, GCP), and containerization (Docker, Kubernetes)
Company
GenBio AI
GenBio AI creates AI-driven models to simulate and predict biological systems at multiple scales.
H1B Sponsorship
GenBio AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2024 (1)
Funding
Current Stage
Early StageRecent News
2025-11-14
Company data provided by crunchbase