Apply on Employer Site

QBench · 13 hours ago

Senior Devops Engineer

United States

Full-time

Remote

Mid, Senior Level

$100K/yr - $118K/yr

5+ years exp

QBench is a fast-growing, fully remote SaaS company powering modern laboratory operations through their cloud-based LIMS platform. The Senior DevOps Engineer will ensure the reliability, availability, and performance of production systems while applying software engineering and systems thinking to infrastructure and operations, focusing on production stability and incident reduction.

Enterprise SoftwareSoftwareManagement Information Systems

Culture & Values

No H1B

Responsibilities

Own the reliability, availability, and performance of production systems

Lead incident response for production issues, including coordination, mitigation, and communication

Conduct blameless postmortems and ensure follow-up actions are completed

Proactively identify reliability risks and eliminate single points of failure

Define reliability standards and operational practices that enable teams to ship safely

Design, provision, and manage AWS infrastructure using Infrastructure as Code (IaC) tools such as AWS CDK and AWS SAM (and/or Terraform)

Maintain clean, version-controlled, and auditable infrastructure across all environments

Manage and improve core AWS services including: ECS Fargate, Elastic Beanstalk, AWS Lambda, RDS Aurora MySQL, S3, VPC, IAM, and networking components

Lead infrastructure modernization initiatives, including re-architecting services for reliability and scalability

Own and continuously improve CI/CD pipelines to ensure fast, reliable, and secure builds, tests, and deployments

Standardize deployment workflows across services and environments (dev/staging/prod), including safe rollout strategies and automated rollback where appropriate

Implement quality and security gates in the delivery pipeline (unit/integration tests, linting, dependency scanning, IaC checks)

Improve deployment observability by integrating release markers, build metadata, and change tracking into monitoring and alerting

Partner with Engineering to reduce friction in the delivery process and improve developer velocity while maintaining reliability standards

Perform capacity planning and load analysis to ensure systems scale predictably

Optimize AWS resource usage for performance and cost efficiency

Tune and optimize MongoDB Atlas and RDS Aurora MySQL for throughput, latency, and reliability

Identify and remediate performance bottlenecks across the stack, including application-level configuration and runtime tuning

Lead and execute infrastructure and platform migrations, including: Migrating workloads from Elastic Beanstalk to ECS Fargate, Migrating AWS resources to more resilient architectures

Perform server, runtime, and dependency upgrades, including: Python version upgrades, Library and dependency updates, Server Platform Upgrades (EC2, RDS, etc.)

Design and execute changes using safe rollout strategies, monitoring, and rollback plans

Build and maintain strong observability using metrics, logs, dashboards, and alerts

Ensure alerting is actionable and aligned with user-impacting symptoms

Reduce alert fatigue by continuously tuning thresholds and signals

Identify and eliminate sources of operational toil through automation

Perform operational tasks supporting SOC 2 Security and Availability controls

Manage secrets and credentials using AWS Secrets Manager and AWS Systems Manager Parameter Store

Rotate credentials and enforce least-privilege access across infrastructure

Partner with Operations to support audits, evidence collection, and remediation

Ensure production changes follow auditable and repeatable processes

Automate manual and repetitive operational tasks

Improve deployment safety and reliability through tooling and process improvements

Document runbooks, operational procedures, and reliability standards

Act as a DevOps and infrastructure advisor to Engineering teams during design and implementation

Qualification

AWSInfrastructure as CodeMySQL / AuroraIncident responseSOC 2Service reliability metricsObservability platformsPythonDocumentation skills

Required

5+ years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles

Must be located in the U.S

Strong hands-on experience with AWS, including ECS, Lambda, RDS, VPC, and IAM

Experience defining and operating service reliability metrics (SLIs/SLOs) and using them to drive operational improvements

Experience using Infrastructure as Code (AWS CDK, AWS SAM, Terraform, or similar)

Experience managing and optimizing MySQL / Aurora in production

Strong incident response and postmortem experience

Solid understanding of cloud security and secrets management

Experience operating systems in SOC 2–aligned environments

Excellent written communication and documentation skills

Comfortable operating independently in a remote, async-first company

Legal resident of the United States residing in the United States

Preferred

Experience migrating workloads from Elastic Beanstalk to ECS Fargate

Experience building or operating observability platforms

Familiarity with SOC 2 tooling (Drata, Vanta, Secureframe, or similar)

Experience supporting high-availability SaaS platforms

Background in Python application infrastructure

Benefits

Fully remote, US-based role

High ownership and autonomy

Direct impact on production reliability and customer trust

Competitive compensation and benefits

Opportunity to shape DevOps practices as the company scales

Company

QBench

QBench is a cloud-based Laboratory Information Management System (LIMS) that enables laboratories to streamline their operations.

Founded in 2015

Newark, Delaware, USA

11-50 employees

https://qbench.com

Funding

Current Stage

Early Stage

Leadership Team

Nicholas Evans

CEO

Trevor Ewen

COO

Recent News

EIN Presswire

The Top Laboratory Information Management Software According to the FeaturedCustomers 2025 Customer Success Report

2025-06-24

Company data provided by crunchbase