Site Reliability Engineer; Full time role. jobs in United States
cer-icon
Apply on Employer Site
company-logo

HARAMAIN SYSTEMS INC. ยท 11 hours ago

Site Reliability Engineer; Full time role.

HARAMAIN SYSTEMS INC. is seeking a Site Reliability Engineer to enhance the reliability and performance of their global video surveillance platform. The role involves collaborating with senior engineers to improve infrastructure, support deployments, and automate processes in a high-scale, cloud-native environment.

ConsultingSoftwareStaffing Agency
badNo H1Bnote

Responsibilities

Build and maintain reliable, automated infrastructure across private cloud environments
Participate in incident response, assisting with communication, troubleshooting, and follow-up actions
Contribute to efforts that reduce recurring issues and improve service availability and recovery
Apply and support best practices for observability, incident management, and production readiness
Collaborate on improvements to Infrastructure as Code and CI/CD tooling
Work with product and application teams to help define meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Advocate for automation and efficiency in day-to-day operations
Contribute to reliability-focused projects and offer insights during architecture discussions when needed
Participate in the on-call rotation and help identify opportunities to improve its effectiveness

Qualification

Site Reliability EngineeringLinux systems managementKubernetesScripting (Python/Bash)AutomationIncident responseObservability toolingNetworking fundamentalsSystem performance tuningTechnical leadership interest

Required

2+ years of experience as a Site Reliability Engineer (or related role)
Strong experience managing Linux systems in production environments
Good working knowledge of Kubernetes or other container orchestration systems
Solid scripting abilities in languages such as Python or Bash; familiarity with Golang is a plus
Experience contributing to automation that reduces operational toil and improves reliability
Hands-on experience participating in incident response and contributing to root-cause analysis
Familiarity with observability tooling such as Prometheus/VictoriaMetrics and Grafana for metrics, alerting, and basic SLO/error-budget usage
Understanding of networking fundamentals and security best practices
Ability to identify reliability issues and assist in implementing scalable improvements
Experience participating in or improving on-call and alerting systems

Preferred

Exposure to system performance tuning or capacity planning
Experience collaborating in cross-functional architecture or design discussions
Interest in growing toward technical leadership or mentorship over time

Company

HARAMAIN SYSTEMS INC.

twittertwitter
company-logo
Haramain Systems Inc.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase