Apply on Employer Site

BlinkRx · 10 hours ago

Staff Site Reliability Engineer

United States

Full-time

Remote

Senior Level, Lead/Staff

7+ years exp

Blink Health is the fastest growing healthcare technology company that builds products to make prescriptions accessible and affordable to everybody. The Staff Site Reliability Engineer will establish best practices, define observability strategies, and drive initiatives to enhance system reliability and performance within the organization.

AppsE-CommerceHealth CareOnline PortalsPharmaceutical

Responsibilities

Establish and evolve SRE best practices across the organization, including reliability principles, error budgets, incident response, postmortems, and operational readiness standards

Define and drive observability strategy for system health, performance, and reliability, including SLIs/SLOs, alerting quality, dashboards, and service health indicators

Design and implement software-driven solutions within the infrastructure domain, automating manual processes and eliminating operational complexity and toil

Act as a technical leader and force multiplier, helping set priorities and influencing decision-making across core cloud infrastructure, reliability tooling, and platform architecture

Take ownership of large, ambiguous initiatives, driving them from concept to delivery while aligning stakeholders across engineering, security, and product

Combine deep knowledge of software development, infrastructure, and security to improve platform resilience, scalability, performance, and compliance

Proactively identify systemic risks and reliability gaps, recommending and leading platform upgrades and architectural improvements before they become incidents

Partner with engineering teams to improve developer workflows, tooling, and operational maturity, increasing productivity while reducing cognitive load

Provide technical mentorship, architecture guidance, and high-quality design and code reviews for engineers across infrastructure and product teams

Lead by example in documentation and knowledge sharing, ensuring systems and processes are well-understood and not dependent on individual ownership

Participate in and help mature incident response, escalation practices, and post-incident learning across the organization

Qualification

Site Reliability EngineeringCloud Platforms (AWS)KubernetesPythonInfrastructure as CodeAgile EnvironmentTechnical MentorshipDocumentation

Required

Bachelor's or Master's degree in Computer Science or equivalent practical experience

7+ years of experience in site reliability engineering, infrastructure engineering, or platform engineering roles, with demonstrated impact at scale

Expert-level, methodical troubleshooting across the entire stack, from application to kernel to network

Strong command-line proficiency and deep expertise in Linux systems and operating system fundamentals

Advanced understanding of networking concepts including load balancing, proxies, DNS, TCP/IP, NAT, and service-to-service communication

Experience working across multiple languages (e.g., Python, Go, Bash, and familiarity troubleshooting application stacks such as React or similar)

Strong track record of automating repetitive and complex operational work to reduce toil and increase reliability

Ability to design and build internal tools (Python or Go) that standardize and scale engineering practices

Comfortable operating in an agile environment, with disciplined testing and quality practices

Deep experience with cloud platforms (AWS preferred, GCP/Azure acceptable), particularly managed services and production-grade architectures

Strong expertise in Kubernetes and container orchestration (EKS, Helm), including lifecycle management and operational best practices

Proven experience designing and implementing observability systems, including metrics, logging, tracing, dashboards, and alerting

Deep understanding of container technologies, security scanning, secrets management, dynamic configuration, and microservices architectures

Familiarity with service meshes and advanced traffic management concepts

Experience designing and maintaining company-wide IaC codebases using tools such as Terraform, Pulumi, CloudFormation, or Ansible

Ability to think holistically about infrastructure design, cost, reliability, security, and long-term maintainability

Company

BlinkRx

Glassdoor3.0

BlinkRx is a prescription access platform that connects patients to branded medications, ensuring transparent pricing and home delivery.

Founded in 2014

New York, New York, USA

501-1000 employees

https://blinkhealth.com

Funding

Current Stage

Late Stage

Total Funding

$315M

Key Investors

1789 CapitalSuRo Capital8VC

2024-11-16Series D· $140M

2020-10-27Series Unknown· $10M

2017-04-12Series B· $90M

Leadership Team

Geoffrey Chaiken

Founder & CEO

Matthew Chaiken

Co-Founder

Recent News

PharmiWeb

NeuroVice Prescription Device for Seizure Oral Injury Protection Now Available at BlinkRx

2026-01-20

MarketScreener

Blink Health : Announces $165 Million in Financing to Combat High Drug Prices

2026-01-06

thefly.com

Scienture, BlinkRx announce strategic collaboration to expand access to Arbli

2025-12-11

Company data provided by crunchbase