bedrock · 21 hours ago
Cloud - Site Reliability Engineer
Bedrock Robotics is a pioneering company focused on bringing advanced autonomy to the built world. They are seeking an experienced Site Reliability Engineer to own and evolve their cloud infrastructure, ensuring scalability, operational excellence, and system reliability.
ConstructionReal EstateSoftware
Responsibilities
Design, build, and operate highly scalable, reliable systems used by all Bedrock engineering teams
Take full ownership of Bedrock’s cloud infrastructure (AWS, GCP, Azure), ensuring best-in-class security, performance, and cost efficiency
Design, implement, and maintain Bedrock’s end-to-end observability stack (including monitoring, logging, and tracing)
Develop and implement best practices for system reliability, security, on-call rotation, and effective incident response
Continuously identify and implement improvements to enhance system performance and optimize cloud resource consumption
Qualification
Required
A deep passion for building and maintaining reliable, fault-tolerant distributed systems
Strong proficiency in major cloud platforms (such as AWS, GCP, or Azure) and Infrastructure as Code (IaC) tools like Terraform
Proven experience with container technologies and orchestration platforms, particularly Kubernetes
Hands-on experience with observability tools (e.g., Datadog, Prometheus, Splunk) and techniques
Strong understanding of distributed systems, networking concepts, database technologies, and compute infrastructure
Strong understanding and experience implementing security best practices in cloud environments
Ability to work in a fast-paced, high-growth environment, deal effectively with ambiguity, and take decisive ownership of challenging problems