SPECTRAFORCE · 9 hours ago
Principal Site Reliability Engineer
SPECTRAFOR is seeking a Principal Site Reliability Engineer to ensure the availability, reliability, and performance of clients Demo Platform AI services and OpenShift Virtualization infrastructure. The role involves managing complex cloud systems, implementing Model as a Service infrastructure, and collaborating with engineering and marketing teams to align requirements with functional capabilities.
Responsibilities
Design, develop, and implement robust, scalable, and secure IT infrastructure solutions, aligned with business objectives and industry best practices
Implement automation and DevOps processes to improve the cloud life cycle, including infrastructure and application uptime, availability, right-sizing and time-to-market
Collaborate with teammates and project stakeholders to meet timelines, goals and SLA
Design and implement Model as a Service platform utilizing Red Hat AI products and GPU enabled Intel hardware systems
Perform architectural planning, deployment, and management of OpenShift Container Platform environments
Architect and optimize virtualization solutions using KVM/QEMU, including advanced capabilities offered by OpenShift Virtualization (Kubevirt)
Design and implement advanced network architectures, particularly Software-Defined Networking (SDN) and Open Virtual Network (OVN), ensuring high performance and reliability
Develop comprehensive storage strategies, including the design and administration of physical storage solutions and distributed storage systems like Ceph / OpenShift Data Foundation (ODF)
Oversee the administration and automation of bare-metal infrastructure, ensuring optimal performance and resource utilization
Drive automation initiatives using Ansible and Red Hat Advanced Cluster Manager for Kubernetes (ACM) for infrastructure provisioning, configuration management, and operational tasks
Establish and optimize CI/CD pipelines for infrastructure and platform deployments, promoting agile and efficient delivery
Provide technical leadership, mentorship, and guidance to engineering teams on architectural patterns and best practices
Evaluate new technologies and trends, recommending solutions that enhance our IT landscape and provide competitive advantages
Collaborate cross-functionally with development, operations, and business teams to gather requirements and translate them into architectural designs
Create and maintain detailed architectural documentation, including design specifications, diagrams, and operational guides
Contribute to performance testing and tuning, quality assurance (QA), ticket and incident management
Qualification
Required
8+ years of progressive experience in IT architecture, with a significant focus on infrastructure design and implementation
5+ years of experience with Public Cloud, Virtualization and Linux technologies, specifically KVM/QEMU, and a strong understanding of OpenShift Virtualization (Kubevirt)
5+ years of experience with Red Hat OpenShift Container Platform or Kubernetes including cluster operations, networking, storage integration, and security
3+ years of experience with automation frameworks and tools like Ansible or Terraform
Hands-on experience with Bare-metal administration, including hardware provisioning, firmware management, and operating system deployment
Solid understanding and practical experience with CI/CD methodologies and tools for automated deployments
Strong problem-solving abilities, analytical skills, and a strategic mindset
Excellent communication, presentation, and interpersonal skills, capable of articulating complex technical concepts to diverse audiences
Preferred
Experience with AI/ML technologies and recent developments including OpenShift AI, inference systems and technologies like vLLM
Proven experience with enterprise-grade storage solutions, Software-Defined Storage technologies especially Ceph and ODF
Advanced knowledge of Software-Defined Networking (SDN) principles and practical experience with Open Virtual Network (OVN)
Extensive experience with Red Hat Enterprise Linux (RHEL) administration and design
Experience with AppDev automation and pipelines including technologies like Jenkins, Tekton, ArgoCD, etc
Experience with networking technologies including VLANs, routing protocols, IPAM solutions, and Load Balancers
Experience in designing and delivering implementations using various public and private cloud infrastructure technologies and providers
Experience in Python development for automation, scripting, and tool development
Experience in Go development for building high-performance applications or infrastructure components
Relevant certifications (e.g., Red Hat Certified Architect, Kubernetes certifications, industry cloud certifications)
Company
SPECTRAFORCE
Welcome to SPECTRAFORCE, your gateway to NEWJOBPHORIA™! Established in 2004, SPECTRAFORCE is now one of the largest and fastest growing U.S.
H1B Sponsorship
SPECTRAFORCE has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2024 (6)
2023 (1)
2022 (6)
2021 (8)
2020 (7)
Funding
Current Stage
Late StageCompany data provided by crunchbase