VySystems · 19 hours ago
Site Reliability Engineer
VySystems is a company focused on Site Reliability Engineering. They are seeking a Site Reliability Engineer with deep expertise in distributed systems and production operations to enhance their reliability and automation processes.
AppsConsultingDigital MarketingInformation TechnologyInfrastructureIT InfrastructureIT ManagementWeb Development
Responsibilities
Site Reliability Engineering, Production Engineering, or equivalent roles
Deep expertise in distributed systems, resilience engineering, and large‑scale production operations
Strong proficiency with observability stacks: Metrics, logs, traces, Splunk, ELK, New Relic, synthetic monitoring, APM
Advanced experience with service‑level objectives (SLOs), SLIs, error budgets, and reliability governance
Expertise in Kubernetes, container orchestration, and workload reliability patterns
Strong skills in incident management, on‑call response, war‑room leadership, and RCA methodologies
Proven ability to engineer automation/self‑healing systems (auto‑remediation, failure‑mode detection)
Strong scripting/automation skills in Python, Bash, or similar languages
Solid understanding of traffic distribution, load balancing, session handling, and failure isolation
Expert debugging and performance troubleshooting across the full stack (network, compute, services)
Experience with AWS (EKS/ECS, SQS/SNS, S3, CloudFront, etc.)
Experience implementing AIOps, alert correlation, noise reduction, or automated RCA frameworks
Background in building paved paths, golden templates, or policy‑as‑code reliability gates
Experience with reverse proxy troubleshooting, including rate limits, affinity, and routing logic
Prior experience in high‑throughput government or regulated environments
Performance/load testing experience (designing tests, analyzing throughput, identifying bottlenecks)
Strong understanding of release reliability, risk recording, and continuous deployment safeguards
Familiarity with monitoring‑as‑code or dashboards‑as‑code practices
Hands‑on experience with infrastructure‑as‑code (Terraform preferred)
Qualification
Required
Site Reliability Engineering, Production Engineering, or equivalent roles
Deep expertise in distributed systems, resilience engineering, and large‑scale production operations
Strong proficiency with observability stacks: Metrics, logs, traces, Splunk, ELK, New Relic, synthetic monitoring, APM
Advanced experience with service‑level objectives (SLOs), SLIs, error budgets, and reliability governance
Expertise in Kubernetes, container orchestration, and workload reliability patterns
Strong skills in incident management, on‑call response, war‑room leadership, and RCA methodologies
Proven ability to engineer automation/self‑healing systems (auto‑remediation, failure‑mode detection)
Strong scripting/automation skills in Python, Bash, or similar languages
Solid understanding of traffic distribution, load balancing, session handling, and failure isolation
Expert debugging and performance troubleshooting across the full stack (network, compute, services)
Experience with AWS (EKS/ECS, SQS/SNS, S3, CloudFront, etc.)
Preferred
Experience implementing AIOps, alert correlation, noise reduction, or automated RCA frameworks
Background in building paved paths, golden templates, or policy‑as‑code reliability gates
Experience with reverse proxy troubleshooting, including rate limits, affinity, and routing logic
Prior experience in high‑throughput government or regulated environments
Performance/load testing experience (designing tests, analyzing throughput, identifying bottlenecks)
Strong understanding of release reliability, risk recording, and continuous deployment safeguards
Familiarity with monitoring‑as‑code or dashboards‑as‑code practices
Hands‑on experience with infrastructure‑as‑code (Terraform preferred)
Company
VySystems
Vy Systems is a part of vy.ventures and is in the business of Technology consulting, Solutions, and Managed Services, providing invaluable services across many countries since 2002.
Funding
Current Stage
Late StageLeadership Team
Ramesh Santhanam
Founder and CSO
Company data provided by crunchbase