Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Sonar · 2 months ago

Staff Site Reliability Engineer

Sonar is a company dedicated to preventing code quality and security issues while enhancing developer productivity with AI assistants. The Staff Site Reliability Engineer will play a crucial role in automating and enhancing the software development lifecycle, focusing on infrastructure monitoring, automation, and incident response.

Cyber SecurityDeveloper ToolsOpen SourceSoftware
badNo H1Bnote

Responsibilities

System Health Monitoring, Alert Triaging, and Error Budget Management: Dedicate time to monitoring critical security infrastructure (e.g., identity platforms, firewalls, compliance systems) and core infrastructure components. Focus on using and maintaining dashboards tied to Service Level Objectives (SLOs), triaging high-severity alerts, and analyzing the current Error Budget burn rate to guide prioritization for the rest of the day
Infrastructure as Code (IaC) and Policy as Code Development: Spend the largest portion of time writing, reviewing, and testing code (e.g., Python, Go, Terraform, or proprietary tools) to automate the deployment, configuration, and security hardening of infrastructure. This involves treating infrastructure and security policies as software to ensure consistency and prevent configuration drift
Toil Elimination and Automation of Operational Tasks: Identify, scope, and implement automated solutions for manual, repetitive, and time-consuming tasks (toil) related to security patching, compliance checks, certificate rotations, or infrastructure maintenance. The goal is to continuously reduce the operational workload for the team
Security Pipeline and Observability Maintenance: Maintain and enhance the DevSecOps security tools integrated into the CI/CD pipelines (e.g., static analysis, vulnerability scanning, security configuration checks). Ensure the end-to-end logging, metrics, and tracing (observability) systems for both infrastructure and security tools are robust, accurate, and provide immediate diagnostic capability during incidents
Incident Response Engineering and Post-Mortem Action: Participate in the on-call rotation and actively engage in engineering solutions derived from post-mortems. This means turning incident root causes into preventative measures implemented via code, improving runbooks into automated actions, and reducing Mean Time To Resolution (MTTR) for future incidents

Qualification

Infrastructure as CodeCloud Provider ExperienceService Level ObjectivesObservability PlatformsNetworking ConceptsSecurity AutomationIdentityAccess ManagementIncident ManagementPythonGoTerraformAnsiblePuppet

Required

Deep IaC Expertise: Professional experience provisioning and managing complex infrastructure using tools like Terraform or CloudFormation (AWS), or similar tools like Ansible or Puppet for configuration management
Cloud/Platform Experience: Hands-on experience with a major cloud provider (AWS, GCP, Azure) or managing large-scale internal/private cloud infrastructure
SLO/SLI Implementation: Practical experience defining, measuring, and reporting on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services
Logging/Metrics/Tracing Stacks: Proven experience with modern observability platforms (e.g., Prometheus/Grafana, ELK/EFK stack, proprietary systems, or vendor solutions like Datadog/Splunk) for proactive issue identification
Networking: Strong understanding of core networking concepts (TCP/IP, DNS, Load Balancing, Firewalls, Proxies) sufficient to debug complex service connectivity and latency issues
Automation of Security Controls: Experience implementing security best practices via code, such as automated vulnerability scanning, configuration hardening, secret management (e.g., HashiCorp Vault), and key rotation
Identity and Access Management (IAM): Practical experience managing large-scale IAM systems (e.g., implementing least-privilege policies, single sign-on)
Incident Management: Experience running or significantly contributing to post-incident reviews (post-mortems) and prioritizing resulting engineering work (error budget management)

Benefits

Flexible comprehensive employee benefit package.
You will receive 23 days of PTO per calendar year (on a pro-rated basis depending on your employment start date), with additional time provided for sickness, life events and holidays.
We offer an exciting 401(k) plan that has a 4% match, fully vested on day one of participation.
Generous discretionary Company Growth Bonus, paid annually.
Fully paid parking in the heart of downtown Austin, Texas.
Monthly catered events, and team events

Company

Sonar provides open-source and commercial code analyzers to help developers manage code quality.

Funding

Current Stage
Late Stage
Total Funding
$457M
Key Investors
Insight Partners
2022-04-26Series B· $412M
2016-11-29Series Unknown· $45M

Leadership Team

leader-logo
Nathan Jones
VP, Public Sector
linkedin
leader-logo
Lynne Doherty
President Field Operations
linkedin
Company data provided by crunchbase