Quantum World Technologies Inc. · 18 hours ago
HPC
Maximize your interview chances
Insider Connection @Quantum World Technologies Inc.
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Designs, and maintains HPC clusters.
Build CI / CD Pipeline. Perform DevOps Operations, IAAS (Terraform).
Write LSF esub such that GPU memory calculation circumvents IBM’s standard method for calculation and allows Pipeline management system to rely on this internally built metric
Investigates and analyzes verbal and written requests for infrastructure management.
Excellent teamwork and communication abilities
Maintains high standards documentation, and deliverables..
Self-motivated and self-managing, with strong organizational skills
Ability to work with tight deadlines and multiple competing priorities
Ability to optimize the application for performance
Interact with development teams to develop a strong understanding of the project and testing objectives.
Participate in troubleshooting of issues with different teams to drive towards root cause identification and resolution
Documentation skills to track the development and implementations
Effective communication skills: Regularly achieve consensus with peers, and clear status updates.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
B Tech, MS or PhD degree in Computer Science or similar.
10-14 years of strong hands-on experience in High performance computing.
Should have a knowledge -> Host level parallelism (eg, thread / LWP vs processing / HWP, OpenMP vs MPI, GPU vs CPU parallelism)
Advanced LSF job submission (eg, Jobs vs Arrays, bacct vs bhist, memory requests vs limits)
Shared file system backend types, differences, and advantages (eg, local vs networked storage, shared FS types like NFS / SBM / GPFS / Lustre)
LSF admin experience (eg, mbatchd vs sbatchd, elim vs esub, deploying an LSF cluster from scratch)
Backend Lustre storage (eg, architecture such as MDS / MDT, OSS / OST, MGS / MGT)
Good knowledge in Python programming, Object oriented programming but not mandatory.
Good knowledge in Github Actions, Jenkins, Ansible, CI / CD processes.
Good communication skills and ability to work independently
Expertise in understanding and analyzing requirements
Proficiency with modern development tools, like Git
Suggest any enhancements or changes that are required to stay up with modern security and development best practises
Ability to work with tight deadlines and multiple competing priorities
Ability to optimize the application for performance
Documentation skills to track the development and implementations
Effective communication skills: Regularly achieve consensus with peers, and clear status updates.
Preferred
Good to have Google Cloud knowledge.
Good to have Cloud understanding.