Principal Site Reliability Engineer - Operations jobs in United States
cer-icon
Apply on Employer Site
company-logo

Virtasant · 2 hours ago

Principal Site Reliability Engineer - Operations

Virtasant is a fast-growing global consultancy transforming technology services. They are seeking a Principal-Level Site Reliability Engineer (Operations) to provide operational support for a major global client, ensuring system reliability and improving operational workflows.

Information Technology & Services
check
H1B Sponsor Likelynote

Responsibilities

Monitor dashboards, alerts, and system health in real time
Respond to incidents quickly and decisively, driving issues to resolution
Perform root-cause analysis and contribute to post-incident reviews
Troubleshoot complex system and infrastructure issues across distributed environments
Maintain and improve runbooks, playbooks, and operational documentation
Support and enhance the observability tooling used for metrics, logs, and alerting
Work cross-functionally with engineering teams to escalate system-level issues when required
Run routine operational checks to ensure platform stability
Tune alerts, update dashboards, and ensure monitoring accuracy
Identify recurring operational issues and recommend improvements
Implement small automation and scripting solutions to improve operational workflows
Keep services running smoothly through proactive maintenance
Partner with Engineering, SRE, and Product teams to ensure transparent communication during incidents
Provide clear, concise updates and documentation for operational work
Participate in shift patterns or rotational incident coverage depending on client needs

Qualification

SRE experienceTroubleshooting distributed systemsLinux fundamentalsMonitoring toolsIncident management workflowsRoot-cause analysisCI/CD processesKubernetesScripting PythonScripting BashProactive mindsetCommunication skillsAttention to detail

Required

Must be based in the San Francisco Bay Area, with the ability to be on site 2 days per week in Santa Clara
5–10+ years in SRE, Production Operations, or Infrastructure Engineering roles
Strong hands-on experience troubleshooting distributed systems in production
Proficiency in Linux fundamentals, including process management, networking, storage, and diagnostics
Solid understanding of cloud-native architectures, containers, and modern infrastructure tooling
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.)
Experience with incident management workflows
Experience with root-cause analysis / postmortems
Experience with CI/CD operational processes
Strong Linux debugging and performance troubleshooting skills
Familiarity with Kubernetes, containers, or cloud-native runtime environments
Ability to write or modify scripts (Python, Bash, or similar) for operational automation
Hands-on experience with logs, metrics, traces, and alert lifecycle management
Calm, structured decision-making under pressure
Excellent communication — clear, concise, and reliable
Strong attention to detail and consistency in documentation
A proactive, ownership-driven mindset for reliability and operations

Company

Virtasant

twittertwittertwitter
company-logo
Virtasant is a global team of cloud experts building the next generation of cloud solutions.

H1B Sponsorship

Virtasant has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2021 (1)

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Michael Kearns
Founder and CEO
linkedin
Company data provided by crunchbase