Virtasant · 2 hours ago
Principal Site Reliability Engineer - Operations
Virtasant is a fast-growing global consultancy transforming technology services. They are seeking a Principal-Level Site Reliability Engineer (Operations) to provide operational support for a major global client, ensuring system reliability and improving operational workflows.
Information Technology & Services
Responsibilities
Monitor dashboards, alerts, and system health in real time
Respond to incidents quickly and decisively, driving issues to resolution
Perform root-cause analysis and contribute to post-incident reviews
Troubleshoot complex system and infrastructure issues across distributed environments
Maintain and improve runbooks, playbooks, and operational documentation
Support and enhance the observability tooling used for metrics, logs, and alerting
Work cross-functionally with engineering teams to escalate system-level issues when required
Run routine operational checks to ensure platform stability
Tune alerts, update dashboards, and ensure monitoring accuracy
Identify recurring operational issues and recommend improvements
Implement small automation and scripting solutions to improve operational workflows
Keep services running smoothly through proactive maintenance
Partner with Engineering, SRE, and Product teams to ensure transparent communication during incidents
Provide clear, concise updates and documentation for operational work
Participate in shift patterns or rotational incident coverage depending on client needs
Qualification
Required
Must be based in the San Francisco Bay Area, with the ability to be on site 2 days per week in Santa Clara
5–10+ years in SRE, Production Operations, or Infrastructure Engineering roles
Strong hands-on experience troubleshooting distributed systems in production
Proficiency in Linux fundamentals, including process management, networking, storage, and diagnostics
Solid understanding of cloud-native architectures, containers, and modern infrastructure tooling
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.)
Experience with incident management workflows
Experience with root-cause analysis / postmortems
Experience with CI/CD operational processes
Strong Linux debugging and performance troubleshooting skills
Familiarity with Kubernetes, containers, or cloud-native runtime environments
Ability to write or modify scripts (Python, Bash, or similar) for operational automation
Hands-on experience with logs, metrics, traces, and alert lifecycle management
Calm, structured decision-making under pressure
Excellent communication — clear, concise, and reliable
Strong attention to detail and consistency in documentation
A proactive, ownership-driven mindset for reliability and operations
Company
Virtasant
Virtasant is a global team of cloud experts building the next generation of cloud solutions.
H1B Sponsorship
Virtasant has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2021 (1)
Funding
Current Stage
Growth StageRecent News
2025-07-02
Company data provided by crunchbase