Irvine Technology Corporation · 1 day ago
Site Reliability Engineer
Irvine Technology Corporation is seeking a Site Reliability Engineer to join their team for a full-time, permanent position. The SRE will design, implement, and maintain resilient systems through automation while ensuring the reliability, scalability, performance, and availability of critical systems.
Responsibilities
Monitor and support production systems; serve as first-line response for incidents, outages, and field escalations
Triage operational issues, resolve where possible, and escalate confirmed defects to core engineering with clear context
Lead and participate in incident response, root cause analysis, and post-incident remediation
Design, build, and maintain reliable, scalable, and secure systems and services
Define and track SLIs, SLOs, and error budgets to measure and improve system reliability
Develop and maintain monitoring, alerting, and observability tooling
Automate infrastructure provisioning, configuration management, and operational workflows
Partner with engineering teams to improve system stability, deployment practices, and performance
Identify and mitigate operational risks, including capacity, availability, and security concerns
Maintain and continuously improve runbooks, documentation, and operational standards
Participate in on-call rotations to support production environments
Apply systems thinking to proactively improve reliability, resilience, and overall service quality
Perform structured analysis to identify root causes and implement sustainable solutions
Respond to incidents with sound judgment, maintaining focus and effectiveness under pressure
Identify opportunities for automation and process improvement to reduce manual effort and operational risk
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
7+ years of experience in Site Reliability Engineering, Systems Engineering, DevOps, Platform, or Infrastructure Engineering
Strong hands-on experience with Linux/Unix systems and networking fundamentals
Proficiency in at least one programming or scripting language (e.g., Python, Go, Java, or Bash)
Experience operating production systems in cloud environments (AWS, Azure, and/or GCP), including distributed systems
Hands-on experience with containerization and orchestration technologies (e.g., Docker, Kubernetes)
Experience with Infrastructure as Code tools (e.g., Terraform, CloudFormation, Pulumi)
Solid understanding of CI/CD pipelines, deployment strategies, and release management
Working knowledge of monitoring, logging, observability, and incident management best practices
Strong troubleshooting skills with the ability to assess issues, exercise sound judgment, and resolve problems effectively
Company
Irvine Technology Corporation
Irvine Technology Corporation is a staffing and recruiting company providing IT solutions and staffing services.