Elastic · 8 hours ago
Platform - Site Reliability Engineer II
Maximize your interview chances
AnalyticsCloud Computing
Comp. & Benefits
Insider Connection @Elastic
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Lead technical initiatives aimed at improving the reliability of the global Elastic infrastructure, taking an engineering approach to the prevention, detection, and timely mitigation of issues.
Contribute to SRE engineering through auto-remediation and system engineering efforts to continue our efforts in reducing human intervention in automation of processes and operational tasks.
Developing and maintaining software, tooling and automations to support the ever growing scaling demands of this global infrastructure.
Champion an environment focused on collaboration, operational excellence, and uplifting others.
Respond to major incidents, correcting and improving systems to prevent incidents and grow at scale. Participate in a weekly on-call rotation, using a follow-the-sun model.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
A well-rounded view of and true appreciation for reliability, borne of real-world experience operating production services.
Examples of using software engineering practices and SRE principles to solve operational problems.
A background in software engineering, and can confidently collaborate with engineers to identify and resolve issues.
Outstanding interpersonal skills, and are able to build strong relationships with your inclusive communication methods.
Examples of working in distributed teams or working remotely.
Preferred
You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform.
You have built or managed a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it.
You have written non-trivial programs in Go.
You have worked with containerized services (such as Docker).
You have experience in system administration with professional skills in Linux on distributed systems at scale.
You have designed, implemented or diagnosed and resolved issues with the Elastic Stack.
You have demonstrable experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to share with others at varying level of the organization.
You are experienced in contributing in a self-organizing and collaborative team environment.
You have mentored, coached, and grown team members to bring out the best in them.
Benefits
Health coverage for you and your family in many locations
Generous number of vacation days each year
We match up to $2000 (or local currency equivalent) for financial donations and service
Up to 40 hours each year to use toward volunteer projects you love
Minimum of 16 weeks of parental leave
Company
Elastic
Elastic builds software to make data usable in real time and at scale for search, logging, security, and analytics use cases.
Funding
Current Stage
Public CompanyTotal Funding
$162MKey Investors
New Enterprise AssociatesIndex VenturesBenchmark
2018-10-04IPO· undefined
2018-05-03Secondary Market· undefined
2016-07-01Series D· $58M
Recent News
2024-05-31
thefly.com
2024-05-31
2024-05-22
Company data provided by crunchbase