Berkley Hunt · 3 hours ago
Site Reliability Engineer
Berkley Hunt is a fast-growing, VC-backed Series C company building globally distributed cloud infrastructure. They are seeking a Senior Site Reliability Engineer to take ownership of the systems that ensure platform reliability, scalability, and operational excellence.
Responsibilities
Design, build, and maintain highly available infrastructure using IaC (Terraform preferred)
Manage Kubernetes clusters and streamline deployments with Helm
Operate and troubleshoot Linux-based systems, ensuring security and stability
Build automation tools in Python or Go to improve efficiency and reduce manual work
Implement robust observability practices - monitoring, logging, and alerting
Participate in production support and 24/7 on-call rotation
Collaborate across globally distributed engineering teams to shape architectural decisions
Qualification
Required
5+ years of experience with cloud-native infrastructure, IaC, and Linux administration
Deep understanding of Kubernetes, Helm, and containerized environments
Hands-on experience with GCP and AWS, designing scalable cloud solutions
Experience implementing observability solutions and automating operational workflows
Strong troubleshooting skills in networking, systems, and distributed environments
A proactive, ownership-driven mindset - you build it, you run it
Excellent communication skills and ability to collaborate across time zones
Preferred
Terraform preferred