Apply on Employer Site

VBeyond Corporation · 21 hours ago

Site Reliability Engineer - Fulltime Only

Jersey City, NJ

Full-time

Onsite

Mid, Senior Level

4+ years exp

VBeyond Corporation is seeking a Site Reliability Engineer focused on observability, Kubernetes, and cloud infrastructure. The role involves ownership of the observability stack, building reliable monitoring pipelines, and improving cluster reliability through automation and performance tuning.

ConsultingCRMDeliveryHuman ResourcesInformation Technology

Hiring Manager

Ekta Singh

Responsibilities

SRE role focused on observability, Kubernetes, and cloud infrastructure (AWS/GCP/EKS)

Ownership of observability stack: Prometheus, Grafana, OpenTelemetry, ELK/Loki/Splunk, Jaeger, Alertmanager, SLOs

Build and maintain reliable monitoring pipelines for metrics, logs, tracing, dashboards, and alerts

Develop Terraform modules for observability infrastructure, Kubernetes components, and cluster add-ons

Improve cluster reliability through automation, performance tuning, capacity planning, and remediation

Implement AI-assisted diagnostics for anomaly detection, alert tuning, and noise reduction

Collaborate with Platform Engineering on Istio/service mesh telemetry and platform health

Lead SLO reporting, incident management, and root cause analysis

Qualification

KubernetesTerraformObservability toolsAWSGCPAutomation (Python/Go)CI/CDCloud networkingIncident managementRoot cause analysis

Required