Site Reliability Engineer (Onsite/Hybrid) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Phenom · 1 week ago

Site Reliability Engineer (Onsite/Hybrid)

Phenom is an AI-Powered talent experience platform that is redefining the HR tech space. As a Site Reliability Engineer (SRE), you will ensure the reliability, performance, and operational excellence of the platform while collaborating closely with Engineering, Product, and Platform teams.

Artificial Intelligence (AI)Human ResourcesMachine LearningRecruitingSaaSSoftware
check
Growth Opportunities
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Ensure high availability and performance across Phenom’s cloud-native infrastructure, services, and database with Ownership or knowledge of SLIs/SLOs/SLAs, error budgets, and reporting
Explicit responsibilities for integrating security into CI/CD, scanning for vulnerabilities, and incident handling for security breaches
Experience with auto-scaling, load balancing, and capacity planning methodologies
Experience with GitOps workflows for automated infra/config management
Collaboration with platform engineering for developer enablement and toolchain automation
Implement and support infrastructure, application, and database changes following governance policies and ServiceNow-based Change workflows
Serve as a key technical responder during Major Incidents, collaborating with cross-functional teams to rapidly restore service, communicate status, and drive post-incident actions
Contribute to Root Cause Analysis (RCA) efforts and provide technical input for corrective and preventive actions (CAPAs)
Actively support and execute production deployments, ensuring readiness, rollback planning, and validation during releases and patches, including database schema/version changes

Qualification

CloudOpsDevOpsSREKubernetesAWSPythonCI/CDLinux AdministrationChange ManagementSoft Skills

Required

5+ years of experience in Cloud Ops/DevOps/SRE/Software engineering with hands-on responsibility for production systems
Proficient in one or more programming/scripting languages (e.g., Python, JavaScript/TypeScript, Java)
Hands-on experience with Cloud compute, network and storage expertise
Tooling expertise in Kubernetes, ArgoCD, Helm, LinkerD/Istio/Nginx
Public cloud platforms (AWS, GCP, or Azure)
Kafka, Redis, MongoDB, and relational databases (e.g., PostgreSQL, MySQL, or Aurora)
Strong understanding of production Change Management processes and use of ServiceNow for change execution and tracking
Proven experience supporting and executing production deployments in structured release environments, including database updates
Familiarity with observability tooling and best practices for monitoring and diagnostics of both applications and databases
Experience with CI/CD, container orchestration, and Infrastructure as Code
Solid Linux system administration and troubleshooting skills
US 'Secret' clearance will be required (Must be a US Citizen)

Preferred

Familiarity with SaaS platforms
Prior experience with handling federal requirements such as FIPS, FedRAMP, and FISMA
Experience participating in release readiness reviews, Go/No-Go meetings, and Early Life Support (ELS)
Exposure to structured RCA methodologies and configuration item tracking in a CMDB
Understanding of ITSM practices and service lifecycle principles
Prior experience as a Database Reliability Engineer (DBRE) or supporting mission-critical databases at scale

Company

Phenom is an HR technology company that uses AI to help businesses hire, develop, and retain employees.

Funding

Current Stage
Late Stage
Total Funding
$161.42M
Key Investors
B CapitalWestBridge CapitalAVP
2021-04-07Series D· $100M
2020-01-16Series C· $30M
2018-05-24Series B· $22M

Leadership Team

leader-logo
Mahe Bayireddi
Chief Executive Officer & Co-Founder
linkedin
leader-logo
Brad Goldoor
Chief Employee Experience Officer & co-founder
linkedin
Company data provided by crunchbase