Entrust · 5 hours ago
Senior Site Reliability Engineer
Entrust is an industry leader in identity-centric security solutions, serving over 150 countries with cutting-edge technologies. The Senior Site Reliability Engineer will ensure the reliability, availability, and performance of the SaaS platform while managing cloud environments, applications, and resolving issues.
Enterprise SoftwareFraud DetectionInformation TechnologyInternetSecuritySoftware
Responsibilities
Own SLOs/SLIs for availability (99.9%), latency, error rate, and quality of service across microservices
Design/operate end‑to‑end observability: metrics, logs, traces, synthetic checks, real‑user monitoring (RUM)
Instrument services (Windows services, APIs, background jobs) with structured logs and trace context
Build health probes and SLA monitors for critical transactions and cross-service dependencies
Monitor system issues using various metrics, such as uptime, latency, error rate, throughput, and availability
Deploy and maintain monitoring and on-call tools i.e.: Splunk on-call, Prometheus, Datadog, etc
Lead incident response (triage, comms, coordination, real-time mitigation) and conduct blameless postmortems with actionable follow-ups
Maintain and continuously improve runbooks, escalation paths, on call rotations, and paging policies
Implement MTTA/MTTR reduction programs
Stand up war room protocols and ensure stakeholder updates during incidents
Forecast compute, storage, network needs, track headroom against growth and peak patterns
Conduct performance profiling and bottleneck analyses (CPU, memory, I/O, thread pools, connection pools)
Optimize resource allocation on VMware (DRS, affinity rules, reservations) and Windows VM tuning (kernel, TCP stack, NICs)
Validate scaling strategies (horizontal vs. vertical) and implement auto-scaling where supported
Standardize gold images, configuration baselines, and desired state for Windows Server (PowerShell DSC or equivalent)
Manage patching (OS, middleware, runtime) with maintenance windows aligned to error budgets
Ensure backup, snapshot, and restore strategies meet RPO/RTO; regularly test restores
Maintain secure baselines (CIS benchmarks for Windows/VMware), vulnerability management, and patch cadence
Support compliance audits (PCI-CP, PCI-DSS, SOC 2/ISO 27001), produce evidence (configs, logs, access reviews), and remediate gaps
Automate provisioning (VM templates, DSC/Ansible for Windows, Terraform for VMware) and configuration drift detection/correction
Build runbooks to reduce toil (deploy, scale, rollback, etc)
Create reliability guardrails (pre‑flight checks, change freeze rules, policy controls) as code
Continuously refactor scripts/runbooks into idempotent automation
Collaborate with development teams and other stakeholders to identify potential risks, such as security vulnerabilities, performance bottlenecks, deployment issues, or configuration errors
Implement various risk mitigation strategies, such as patching, backup, redundancy, encryption, or testing
Collaborate with product teams and other teams to understand the user needs, expectations, and satisfaction
Coach engineers on SRE principles, incident handling, and reliability centric design
Lead knowledge sharing, runbooks quality, and postmortem culture (blameless, action-oriented)
Provide after-hours support for production issues on a rotational basis with other team members to ensure system availability 24/7/365
Qualification
Required
Bachelor's degree in computer science, Software Engineering, or equivalent combination of education and experience
5+ years of related experience as a Software Engineer, DevOps Engineer, Site Reliability Engineer or a role in similar capacity
Extensive experience working with enterprise level micro-services applications, including deployment and maintenance of the applications in distributed environments
Demonstrated hands-on experience and expertise with DevOps tooling (Ansible, Terraform, Jenkins, Octopus deploy, etc.) networks, network security, high-level managerial skills
In-Depth hands-on experience with on-prem and cloud compute, storage and networking solutions (vmWare, NetApp, Azure, AWS, etc)
Benefits
Comprehensive health and well-being programs which include medical, vision, dental
A generous 401(k) matching contribution
Life and disability insurance
Mental health coaching
Virtual fitness programs
Paid personal time off plus 12 paid holidays
Parental leave
Education reimbursement
Comprehensive benefits
Vacation
Paid time off
Paid holidays
Company
Entrust
Entrust offers identity-based security software and services.
H1B Sponsorship
Entrust has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (3)
2023 (3)
2022 (3)
2021 (6)
Funding
Current Stage
Public CompanyTotal Funding
unknown2013-12-17Acquired
1998-08-18IPO
Leadership Team
Recent News
2025-11-25
2025-11-20
PR Newswire UK
2025-11-18
Company data provided by crunchbase