Site Reliability Engineer - Infrastructure 4872 jobs in United States
cer-icon
Apply on Employer Site
company-logo

Tier4 Group · 13 hours ago

Site Reliability Engineer - Infrastructure 4872

Tier4 Group is seeking a Site Reliability Engineer to ensure their applications and services are reliable and high-performing. The role involves improving monitoring, documentation, and alerting systems, as well as collaborating with various teams to build secure and scalable systems while minimizing downtime.

CRMInformation TechnologySoftwareVirtual Reality

Responsibilities

Design, build, and maintain secure, compliant infrastructure using Terraform and Ansible
Automate provisioning and management of servers, storage, networks, Kubernetes clusters, and other systems in both cloud and on-prem environments
Develop tools and processes for automated deployment, configuration, monitoring, and alerting across applications and services
Collaborate with cross-functional teams to plan and implement scalable, reliable cloud and data center solutions
Lead and participate in incident response, on-call rotations, and post-incident reviews to minimize downtime and improve system resilience
Monitor system performance and availability using SLAs, SLOs, and SLIs; proactively troubleshoot and resolve issues that impact reliability, performance, or security
Create and manage disaster recovery and business continuity plans to ensure high availability of critical systems
Analyze and continuously improve the efficiency, scalability, and performance of our infrastructure and services
Stay current with emerging technologies and industry trends; recommend and evaluate new tools or practices to enhance our platform
Share technical expertise and mentor team members to help grow internal capabilities

Qualification

TerraformAnsibleGoogle Cloud PlatformKubernetesDockerPythonBashVMwareNetworking technologiesCommunication skillsProblem-solving skillsAdaptability

Required

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
Proven experience as a Site Reliability Engineer or Systems Engineer
Strong proficiency in Terraform and Ansible for infrastructure automation; Terraform Associate certification is a plus
In-depth knowledge of Google Cloud Platform services (compute, networking, storage, Kubernetes, and security); GCP certification is a plus
Hands-on experience with Kubernetes, Docker, or other container orchestration tools
Proficiency in scripting with Python, Bash, or similar languages
Solid understanding of VMware virtualization and enterprise storage systems such as Pure Storage
Experience working with networking technologies, including VLANs, VPNs, and routing protocols
Strong grasp of IT infrastructure and operations principles, including systems integration, automation, and best practices
Excellent communication and interpersonal skills—able to effectively collaborate with technical teams, stakeholders, and vendors
Ability to manage multiple tasks and priorities under pressure, with strong adaptability and problem-solving skills
Willingness to continuously learn and adapt to emerging technologies and changing business needs

Preferred

Terraform Associate certification is a plus
GCP certification is a plus
Relevant certifications such as ITIL, PMP, CISSP, or GCP Cloud Architect are a plus

Company

Tier4 Group

twittertwitter
company-logo
Tier4 Group is a women-owned and diversity-certified technology Talent, Professional Services, Advisory, and Information Security firm with a national reach.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Betsy Robinson
Founder + CEO
linkedin
leader-logo
Jake Sherrill
Partner
linkedin
Company data provided by crunchbase