TestingXperts · 4 days ago
HPC Infra Engineer
Maximize your interview chances
DevOpsInformation Technology
H1B Sponsor Likely
Insider Connection @TestingXperts
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Designs, and maintains HPC clusters.
Build CI / CD Pipeline. Perform DevOps Operations, IAAS (Terraform).
Write LSF esub such that GPU memory calculation circumvents IBM’s standard method for calculation and allows Pipeline management system to rely on this internally built metric
Investigates and analyzes verbal and written requests for infrastructure management.
Excellent teamwork and communication abilities
Maintains high standards documentation, and deliverables..
Self-motivated and self-managing, with strong organizational skills
Ability to work with tight deadlines and multiple competing priorities
Ability to optimize the application for performance
Interact with development teams to develop a strong understanding of the project and testing objectives.
Participate in troubleshooting of issues with different teams to drive towards root cause identification and resolution
Documentation skills to track the development and implementations
Effective communication skills: Regularly achieve consensus with peers, and clear status updates.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
B Tech, MS or PhD degree in Computer Science or similar.
10-14 years of strong hands-on experience in High performance computing.
Knowledge of host level parallelism (eg, thread / LWP vs processing / HWP, OpenMP vs MPI, GPU vs CPU parallelism).
Advanced LSF job submission (eg, Jobs vs Arrays, bacct vs bhist, memory requests vs limits).
Knowledge of shared file system backend types, differences, and advantages (eg, local vs networked storage, shared FS types like NFS / SBM / GPFS / Lustre).
LSF admin experience (eg, mbatchd vs sbatchd, elim vs esub, deploying an LSF cluster from scratch).
Backend Lustre storage knowledge (eg, architecture such as MDS / MDT, OSS / OST, MGS / MGT).
Good knowledge in Python programming, Object oriented programming.
Good knowledge in Github Actions, Jenkins, Ansible, CI / CD processes.
Good communication skills and ability to work independently.
Expertise in understanding and analyzing requirements.
Proficiency with modern development tools, like Git.
Ability to suggest any enhancements or changes that are required to stay up with modern security and development best practices.
Excellent teamwork and communication abilities.
Self-motivated and self-managing, with strong organizational skills.
Ability to work with tight deadlines and multiple competing priorities.
Ability to optimize the application for performance.
Interact with development teams to develop a strong understanding of the project and testing objectives.
Participate in troubleshooting of issues with different teams to drive towards root cause identification and resolution.
Documentation skills to track the development and implementations.
Effective communication skills: Regularly achieve consensus with peers, and clear status updates.
Preferred
Good to have Google Cloud knowledge.
Good to have Cloud understanding.
Company
TestingXperts
Next Gen QA & Software Testing Company
H1B Sponsorship
TestingXperts has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2020 (1)
Funding
Current Stage
Late StageRecent News
2023-12-21
Canadian News Wire
2023-07-12
Company data provided by crunchbase