ABSC (Absolute Business Solutions Corp.) · 2 months ago
Systems Engineer - Graphics Processing Unit (GPU)
Absolute Business Solutions Corp (ABSC) is a community of innovators and professionals supporting critical missions in technology and defense. They are seeking a highly skilled Systems Engineer specializing in GPU to design and optimize GPU clusters for enterprise AI applications, collaborating with multidisciplinary teams to meet performance and compliance standards.
ConsultingInformation TechnologyProfessional Services
Responsibilities
GPU Cluster Engineering: Design, configure, and maintain GPU Clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements
Operating System Integration: Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates
Performance Optimization: Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers
Tooling and Automation: Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt
Compliance & Documentation: Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards
Qualification
Required
Bachelor's or higher degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree
10+ years of relevant systems engineering experience
Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s)
Knowledge of enterprise server components (storage/network controllers, HBA, SSDs)
Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky)
Excellent problem-solving skills and the ability to collaborate within a team
Candidate must, at a minimum, meet DoD 8570.11- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP)
Security Clearance Required: TS/SCI (eligibility) with the willingness to obtain and maintain a CI polygraph
Preferred
Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow)
Familiarity with GPU virtualization and cloud computing
Experience with Prometheus/Grafana for monitoring
Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)
Benefits
4 weeks of PTO plus 11 Federal Holidays
Retirement Planning – 401k Fully Vested with Match
Tuition Assistance Program – Annual contributions to help you pay down your loans
Annual Health and Wellness Allowance – buy an Apple Watch, a treadmill, or hit the gym on us
Career Development – Annual Funds to spend on Education and Training
Volunteer Time Off – Annually, all employees can spend 8 hours directly supporting a charity of choice
Charitable Match – ABSC matches an employee’s donation to a qualifying charity
Paid Parental Leave –Employees receive 3 weeks of paid parental leave at 100% pay
Referral Program – We pay for internal and external referrals!
LOV Awards – Earn bonus awards throughout the year from our Living Our Values awards program
Company
ABSC (Absolute Business Solutions Corp.)
ABSC is a technology and services company that combines the agility of a small business with proven processes refined over more than two decades in business supporting public sector clients in the Intelligence, Defense, Health, and Safety areas.