Apply on Employer Site

Autheo · 1 day ago

Senior Site Reliability Engineer / Cloud Engineer

United States

Part-time

Remote

Senior Level

7+ years exp

Autheo is at the forefront of bridging Web3 blockchain technology with Web2 integration, offering a unique opportunity for a part-time Equity Cofounder. The Senior Site Reliability Engineer / Cloud Engineer will design, build, and operate reliable cloud infrastructure for blockchain production services and Web3 applications, ensuring exceptional uptime and performance.

Computer Software

Responsibilities

Architect, deploy, and operate highly available AWS infrastructure optimized for blockchain workloads

Implement Infrastructure as Code (IaC) using Terraform for repeatable, auditable provisioning

Manage production container platforms (EKS, ECS, Kubernetes, Docker, ECR)

Operate and optimize EC2, S3, EBS/FSx, Lambda, and related services

Design VPCs, VPNs, subnets, security groups, routing, load balancers, and network isolation

Implement IAM, KMS, Secrets Manager for identity, encryption, and key management

Apply scaling techniques for RPC endpoints (load balancing, caching, throttling) and manage public/private peer connectivity

Support and troubleshoot Amazon Linux, Oracle Linux, and Windows Server environments

Deploy, operate, and maintain blockchain nodes (full/archive/light clients) and RPC endpoints on EVM-compatible chains (Ethereum, Polygon, BNB Chain, etc.)

Optimize node performance, storage, networking, and containerization using Docker/Kubernetes

Monitor and troubleshoot blockchain health metrics (block height, peer count, sync status, logs, memory, throughput)

Support on-chain/off-chain interactions, transactions, gas fees, signing, wallets, smart contract invocations, and state queries

Troubleshoot blockchain errors (transaction failures, RPC timeouts, indexing lag, sync divergence)

Work with API gateways and middleware services (Infura, Alchemy, QuickNode equivalents)

Implement indexing for event logs, state, and transactions using tools like The Graph, ETL pipelines, custom services, or database-backed explorers

Implement Terraform, Helm, and GitOps workflows for infrastructure lifecycle management

Enforce resilient, automated, scalable design patterns and collaborate on faster, higher-quality deployments

Own availability, latency, performance, capacity, SLOs/SLIs/SLAs with observability-driven insights

Lead on-call rotations, incident response for S1/S2 events, post-incident reviews, and preventive initiatives

Reduce operational toil through automation; own and build CI/CD pipelines (Jenkins, GitHub Actions), Terraform validation, Docker builds, Helm deployments

Instrument blockchain workloads for metrics, logs, traces, predictive signals, and anomaly detection using Datadog, Prometheus, Grafana, ELK, CloudWatch, OpenTelemetry, Wazuh

Build automated alerting, anomaly detection, diagnostics, and end-to-end observability strategies

Implement AIOps for event correlation, anomaly detection, predictive diagnostics, automated remediation, and self-healing (using AWS SageMaker, Bedrock, and other AI tools)

Drive security threat detection/prioritization, capacity planning, forecasting, cost control, and reporting

Enforce cloud security best practices, vulnerability remediation pipelines, and compliance guardrails (SOC2, PCI, ISO27000)

Manage cryptographic materials, KMS/HSM, wallet abstractions (HD, custodial/non-custodial, multisig)

Qualification

AWSBlockchain infrastructureInfrastructure as CodeContainer orchestrationAIOpsCI/CDScriptingObservability systemsReliability mindsetCollaborationMentoring

Required

7+ years in Cloud, SRE, Systems, or DevOps Engineering roles

5+ years operating production workloads on AWS

3+ years supporting blockchain infrastructure, nodes, Web3 applications, DeFi, etc

Strong hands-on experience with AWS services (EC2, EKS, ECS, S3, RDS/Aurora, VPC/VPN, Route53, ALB/NLB, KMS, IAM, Secrets Manager, Lambda, EventBridge, CloudWatch, ECR)

Production experience with containers & Kubernetes

Proficiency with IaC (Terraform, Helm, AWS CDK) and automation/scripting (Python, Bash, or Go preferred)

Working experience with CI/CD (GitHub Actions, Jenkins, Argo, etc.)

Demonstrated experience with observability systems (Datadog, Prometheus, OpenTelemetry, ELK, CloudWatch, Wazuh)

Practical exposure to AIOps concepts (event correlation, predictive diagnostics, anomaly detection, automated response)

Experience supporting 24×7 on-call rotation for production services

Strong understanding of distributed systems, reliability patterns, and fault tolerance

Experience participating in major incident response and post-incident reviews

Preferred

AWS Certifications (Solutions Architect, DevOps Engineer, SysOps Administrator)

Deep experience with blockchain, Web3, or decentralized system operations

Proven SRE methodology experience, including automation, CI/CD, and IaC development

Experience in compliance-driven environments (SOC2, PCI, ISO27000)

Benefits

Equity in Launch Legends

Equity in Autheo

Token allocations in the Autheo blockchain

Company

Autheo

Autheo is a full stack Layer-0 Web3 Operating System designed to unify the modern digital stack into a single evolving network.

51-200 employees

https://autheo.com/

Funding

Current Stage

Growth Stage

Company data provided by crunchbase