Catalyst Operations & Analytics · 1 month ago
AI Infrastructure & Full-Stack Engineer (PT/Retainer — FT Flex) - TS/SCI Full Scope
Catalyst Operations & Analytics is seeking a highly skilled engineer to support the design, deployment, and optimization of their on-premises AI infrastructure. The role involves designing AI stacks, managing Docker environments, and optimizing model-inference pipelines for AI applications.
AnalyticsConsultingCyber SecurityTraining
Responsibilities
Designing and maintaining on-prem AI stacks — GPU servers, local clusters, NAS storage
Building and managing Docker/Docker Compose environments
Optimizing model-inference pipelines for speed and reliability
Developing backend services and APIs for AI applications
Automating system setup and maintenance with Bash, Python, or PowerShell
Managing GPU drivers, CUDA, and dependency stacks
Implementing logging, metrics, and fault-tolerant distributed systems
Integrating AI systems with local networks (DNS, SSL/TLS, reverse proxy, firewall, auth)
Maintaining clear documentation and deployment procedures
Qualification
Required
Strong proficiency in Python, JavaScript, and full-stack development
Proven experience running GPU-accelerated workloads in Linux environments
Deep knowledge of Docker, GPU runtime management, and multi-container orchestration
Linux server administration, security hardening, and user-permission management
Networking fundamentals (VLANs, NAT, DNS, reverse proxying)
System performance tuning (CPU/GPU/RAM)
Ability to read/debug code without IDE or internet access
Must hold an active TS/SCI Full Scope clearance