Senior Site Reliability Engineer (SRE) jobs in United States
cer-icon
Apply on Employer Site
company-logo

VARITE INC · 2 months ago

Senior Site Reliability Engineer (SRE)

VARITE is looking for a qualified Senior Site Reliability Engineer (SRE) for one of its clients located in Phoenix, AZ. The role involves providing senior-level SRE support, ensuring system reliability, and developing automation scripts primarily using Java, while managing cloud infrastructure on Azure and deploying workloads on Kubernetes.

Information Technology & Services
check
Growth Opportunities
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Provide senior-level SRE support, ensuring system reliability, availability, and operational excellence across all environments
Develop and maintain services and automation scripts using Java as the primary programming language
Build, deploy, and optimize workloads running on Kubernetes clusters (including multi-cluster and federated deployments)
Manage and enhance cloud infrastructure leveraging Azure services and best practices
Work with Linux/Unix systems and develop automation using BASH shell scripting
Build automation and tooling using Python or Go
Design, implement, and maintain CI/CD pipelines using GitLab CI/CD and Jenkins
Support application streaming, event processing, and analytics using Kafka Stream Generator, KSQLDB, and Spark Streams
Work with service mesh technologies including Istio and understand Anthos Service Mesh
Utilize VMware and other virtualization platforms for environment provisioning
Provide robust incident support, root-cause analysis, and production issue resolution
Implement eBPF-based observability and performance troubleshooting where applicable
Develop and enhance monitoring and alerting systems using Splunk, Prometheus, Datadog, and Kiali
Configure and manage Nginx Controller and Seesaw load-balancing
Use Terraform for infrastructure-as-code and Docker for containerization
Manage Kubernetes storage using Portworx
Automate repetitive operational tasks and contribute to platform stability and efficiency
Provide support across all US time zones, including rotational shifts, weekends, and occasional 24/7 escalations

Qualification

JavaKubernetesAzureIncident responsePythonBASH scriptingDockerTerraformMonitoring toolsGoVMwareKafkaEBPFLoad balancingFunctional languages

Required

14+ Years of experience required
Only USC and GC due to the nature of the project
Provide senior-level SRE support, ensuring system reliability, availability, and operational excellence across all environments
Develop and maintain services and automation scripts using Java as the primary programming language
Build, deploy, and optimize workloads running on Kubernetes clusters (including multi-cluster and federated deployments)
Manage and enhance cloud infrastructure leveraging Azure services and best practices
Work with Linux/Unix systems and develop automation using BASH shell scripting
Build automation and tooling using Python or Go
Design, implement, and maintain CI/CD pipelines using GitLab CI/CD and Jenkins
Support application streaming, event processing, and analytics using Kafka Stream Generator, KSQLDB, and Spark Streams
Work with service mesh technologies including Istio and understand Anthos Service Mesh
Utilize VMware and other virtualization platforms for environment provisioning
Provide robust incident support, root-cause analysis, and production issue resolution
Implement eBPF-based observability and performance troubleshooting where applicable
Develop and enhance monitoring and alerting systems using Splunk, Prometheus, Datadog, and Kiali
Configure and manage Nginx Controller and Seesaw load-balancing
Use Terraform for infrastructure-as-code and Docker for containerization
Manage Kubernetes storage using Portworx
Automate repetitive operational tasks and contribute to platform stability and efficiency
Provide support across all US time zones, including rotational shifts, weekends, and occasional 24/7 escalations
Extensive experience in incident response, troubleshooting, performance engineering, and service reliability
Ability to automate manual operational tasks
Strong understanding of monitoring, alerting, and observability practices
Java (Proficient) – Must be hands-on in building, supporting, and optimizing Java-based systems and microservices
Kubernetes (Hands-on) – Deployment, autoscaling, federation, ingress, storage, service mesh, and cluster operations
Azure (Highly Proficient) – Strong experience across Azure compute, networking, storage, DevOps, and security features
Knowledge of Linux/Unix internals and BASH scripting
Strong experience with Python or Go
VMware and virtualization technologies
Kafka ecosystem tools: Kafka Stream Generator, KSQLDB, Spark Streams
Experience with Istio/Anthos Service Mesh
Familiarity with eBPF for low-level observability
Monitoring tools: Splunk, Prometheus, Datadog, Kiali
Load balancing with Nginx Controller and Seesaw
Docker and Terraform expertise
Experience working with Portworx for Kubernetes storage

Preferred

Functional languages proficiency: Prolog, Haskell, OCaml

Company

VARITE INC

company-logo
VARITE has a definite spirit.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Adarsh Katyal
President & CEO
linkedin
leader-logo
Sue Patel Arora
Vice President Of Strategic Partnerships
linkedin
Company data provided by crunchbase