Staff Site Reliability Engineer @ Character.AI | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
Staff Site Reliability Engineer jobs in Menlo Park, CAH1B Visa Sponsored Staff Site Reliability Engineer jobs in Menlo Park, CA
91 applicants
company-logo

Character.AI · 2 days ago

Staff Site Reliability Engineer

Wonder how qualified you are to the job?

ftfMaximize your interview chances
AppsArtificial Intelligence (AI)
check
H1B Sponsorship

Insider Connection @Character.AI

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Maintain production services and keep them operational.
Develop tools, instrumentation, and automation to monitor and optimize the performance and reliability of the service.
Develop, implement, and maintain automation tools and processes to prevent and mitigate service disruptions.
Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.
Establish and support SLAs and SLOs for the site.
Provide system monitoring and incident alerts.
Participate in on-call rotations to provide support for critical incidents and outages.
Develop plans for site reliability and disaster recovery.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

PythonGolangSQLLinuxCI/CDKubernetesTerraformGCPIncident ManagementGPU clustersHPC environmentsMonitoring toolsLogging toolsPrometheusGrafanaConsumer productHypergrowthHands-on

Required

5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale
Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang
Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base.
Experience working with multiple cloud computing platforms such as GCP is also a must
Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems
Experience with incident management and event postmortems

Preferred

Familiarity with GPU clusters and/or HPC environments is preferred
Experience with monitoring and logging tools such as Prometheus and Grafana
Hands-on experience scaling a consumer product from early days into hypergrowth

Company

Character.AI

twittertwittertwitter
company-logo
Character.ai provides open-ended conversational applications in which users create characters and converse with them.

H1B Sponsorship

Character.AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Trends of Total Sponsorships
2023 (5)
2022 (2)
2021 (1)

Funding

Current Stage
Early Stage
Total Funding
$150M
Key Investors
Andreessen Horowitz
2023-03-23Series A· $150M
2023-01-01Seed· Undisclosed

Leadership Team

leader-logo
Daniel De Freitas
Founder/President
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot