Cognizant · 3 hours ago
Site Reliability Engineer (SRE)
Cognizant is seeking a Site Reliability Engineer (SRE) to design and implement advanced observability solutions for edge computing environments. The role involves collaborating with engineering and platform teams to ensure high availability, reliability, and performance across distributed systems.
ConsultingIndustrial AutomationInformation TechnologySoftwareSoftware Engineering
Responsibilities
Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing
Collaborate with engineering teams to embed observability best practices into applications and infrastructure
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements to prevent recurrence
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity
Qualification
Required
3–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud)
Strong scripting and automation skills for building dashboards and managing application performance
Proficiency in programming languages such as Go, Python, Java, or Rust
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs)
2+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar)
Experience maintaining containerized applications in GKE/RKE/AKE environments
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios
Candidate must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future
Preferred
Experience managing application availability for 24x7 high-availability platforms
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace
Hands-on experience with CI/CD tools and Rally, Confluence
Knowledge of in-memory caching solutions (Redis preferred)
Strong debugging skills across integrated technical platforms and API gateways
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery
Benefits
Medical/Dental/Vision/Life Insurance
Paid holidays plus Paid Time Off
401(k) plan and contributions
Long-term/Short-term Disability
Paid Parental Leave
Employee Stock Purchase Plan
Company
Cognizant
Cognizant is a professional services company that helps clients alter their business, operating, and technology models for the digital era.
Funding
Current Stage
Public CompanyTotal Funding
$0.24MKey Investors
Summit Financial Wealth Advisors
2025-03-08Post Ipo Equity
2016-11-18Post Ipo Equity· $0.24M
1998-06-19IPO
Recent News
Hindu Business Line
2026-01-24
Hindu Business Line
2026-01-24
2026-01-24
Company data provided by crunchbase