Autheo · 1 day ago
Senior Site Reliability Engineer / Cloud Engineer
Autheo is at the forefront of bridging Web3 blockchain technology with Web2 integration, offering a unique opportunity for a part-time Equity Cofounder. The Senior Site Reliability Engineer / Cloud Engineer will design, build, and operate reliable cloud infrastructure for blockchain production services and Web3 applications, ensuring exceptional uptime and performance.
Computer Software
Responsibilities
Architect, deploy, and operate highly available AWS infrastructure optimized for blockchain workloads
Implement Infrastructure as Code (IaC) using Terraform for repeatable, auditable provisioning
Manage production container platforms (EKS, ECS, Kubernetes, Docker, ECR)
Operate and optimize EC2, S3, EBS/FSx, Lambda, and related services
Design VPCs, VPNs, subnets, security groups, routing, load balancers, and network isolation
Implement IAM, KMS, Secrets Manager for identity, encryption, and key management
Apply scaling techniques for RPC endpoints (load balancing, caching, throttling) and manage public/private peer connectivity
Support and troubleshoot Amazon Linux, Oracle Linux, and Windows Server environments
Deploy, operate, and maintain blockchain nodes (full/archive/light clients) and RPC endpoints on EVM-compatible chains (Ethereum, Polygon, BNB Chain, etc.)
Optimize node performance, storage, networking, and containerization using Docker/Kubernetes
Monitor and troubleshoot blockchain health metrics (block height, peer count, sync status, logs, memory, throughput)
Support on-chain/off-chain interactions, transactions, gas fees, signing, wallets, smart contract invocations, and state queries
Troubleshoot blockchain errors (transaction failures, RPC timeouts, indexing lag, sync divergence)
Work with API gateways and middleware services (Infura, Alchemy, QuickNode equivalents)
Implement indexing for event logs, state, and transactions using tools like The Graph, ETL pipelines, custom services, or database-backed explorers
Implement Terraform, Helm, and GitOps workflows for infrastructure lifecycle management
Enforce resilient, automated, scalable design patterns and collaborate on faster, higher-quality deployments
Own availability, latency, performance, capacity, SLOs/SLIs/SLAs with observability-driven insights
Lead on-call rotations, incident response for S1/S2 events, post-incident reviews, and preventive initiatives
Reduce operational toil through automation; own and build CI/CD pipelines (Jenkins, GitHub Actions), Terraform validation, Docker builds, Helm deployments
Instrument blockchain workloads for metrics, logs, traces, predictive signals, and anomaly detection using Datadog, Prometheus, Grafana, ELK, CloudWatch, OpenTelemetry, Wazuh
Build automated alerting, anomaly detection, diagnostics, and end-to-end observability strategies
Implement AIOps for event correlation, anomaly detection, predictive diagnostics, automated remediation, and self-healing (using AWS SageMaker, Bedrock, and other AI tools)
Drive security threat detection/prioritization, capacity planning, forecasting, cost control, and reporting
Enforce cloud security best practices, vulnerability remediation pipelines, and compliance guardrails (SOC2, PCI, ISO27000)
Manage cryptographic materials, KMS/HSM, wallet abstractions (HD, custodial/non-custodial, multisig)
Qualification
Required
7+ years in Cloud, SRE, Systems, or DevOps Engineering roles
5+ years operating production workloads on AWS
3+ years supporting blockchain infrastructure, nodes, Web3 applications, DeFi, etc
Strong hands-on experience with AWS services (EC2, EKS, ECS, S3, RDS/Aurora, VPC/VPN, Route53, ALB/NLB, KMS, IAM, Secrets Manager, Lambda, EventBridge, CloudWatch, ECR)
Production experience with containers & Kubernetes
Proficiency with IaC (Terraform, Helm, AWS CDK) and automation/scripting (Python, Bash, or Go preferred)
Working experience with CI/CD (GitHub Actions, Jenkins, Argo, etc.)
Demonstrated experience with observability systems (Datadog, Prometheus, OpenTelemetry, ELK, CloudWatch, Wazuh)
Practical exposure to AIOps concepts (event correlation, predictive diagnostics, anomaly detection, automated response)
Experience supporting 24×7 on-call rotation for production services
Strong understanding of distributed systems, reliability patterns, and fault tolerance
Experience participating in major incident response and post-incident reviews
Preferred
AWS Certifications (Solutions Architect, DevOps Engineer, SysOps Administrator)
Deep experience with blockchain, Web3, or decentralized system operations
Proven SRE methodology experience, including automation, CI/CD, and IaC development
Experience in compliance-driven environments (SOC2, PCI, ISO27000)
Benefits
Equity in Launch Legends
Equity in Autheo
Token allocations in the Autheo blockchain
Company
Autheo
Autheo is a full stack Layer-0 Web3 Operating System designed to unify the modern digital stack into a single evolving network.
Funding
Current Stage
Growth StageCompany data provided by crunchbase