Crisis Text Line · 5 hours ago
Staff Site Reliability Engineer
Maximize your interview chances
Information TechnologyMessaging
H1B Sponsor LikelyU.S. Citizen Only
Insider Connection @Crisis Text Line
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Assisting to lead and mentor a team of 5 SREs, fostering a collaborative and innovative work environment.
Working closely with the 3 staff in TechOps/Security on enforcement of security best practices across the infrastructure and development processes.
Design, implement, and maintain our highly available and scalable AWS infrastructure that powers our service.
Collaborate with developers to optimize application performance and reliability.
Develop and maintain monitoring, logging, and alerting systems to ensure system health and performance.
Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
Respond to and resolve incidents, minimizing downtime and ensuring quick recovery.
Support and encourage a diversity of backgrounds, voices, and perspectives on the engineering team
Proactively communicate expectations, progress, and issues to engineers, product managers, and other colleagues with clarity and kindness, delivering and receiving feedback respectfully
Spread knowledge, provide mentorship, and promote technical best practices
Learn both independently and from your colleagues, stretch yourself, and grow as an engineer and teammate
Write and review high-quality, easy-to-read, and testable code that follows best practices
Manage time successfully by focusing on priorities, delivering on deadlines, and asking for help when stuck
Providing engineering input and estimating work both during refinement and architecture design.
Participate in retrospectives and post-mortems to improve our processes and operations
Conduct regular security audits and vulnerability assessments, addressing any identified issues.
Stay up-to-date with industry trends and emerging technologies, recommending and implementing improvements as needed.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Bachelor's degree in Computer Science, Engineering, or related field
Proven experience as a Staff SRE or in a similar SRE role, with experience in observability and strong focus on infrastructure and DevOps in a software delivery capacity.
Experience maintaining the reliability of online SaaS/PaaS with a 7/24 schedule.
Proficiency in AWS and infrastructure as code (e.g., Terraform, CloudFormation).
Strong scripting and automation skills (e.g., Python) and in-depth knowledge of containerization and orchestration (e.g., Docker, Kubernetes).
Proven experience in implementing CI/CD pipelines and tools (GitHub Actions) and observability tools (Datadog).
A commitment to ethical practices, data privacy, and security.
Solid understanding of network protocols, security principles, and best practices.
Excellent problem-solving skills and the ability to work under pressure, with strong communication skills to collaborate effectively with cross-functional teams.
Ability to learn quickly and manage your time successfully by focusing on priorities, delivering on deadlines, and asking for help when needed.
Strong communication skills, with the ability to collaborate effectively with cross-functional teams.
Demonstrates an understanding of essential computer science principles and how to apply them to solve problems, including basic data structures, control structures, and functions.
Preferred
Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
Experience implementing Failure Injection / Chaos Engineering practices.
Cloud Solution Architect certifications or completed training (e.g. AWS Cloud Practitioner Essentials and/or AWS Certified Solutions Architect - Associate) GCP or Azure.
Strong experience with AWS Solution Architecture across Next.js, Go, PHP APIs, GraphQL, Databricks, and AI/ML workloads.
Knowledge of compliance and regulatory standards (e.g., GDPR, HIPAA, ISO 27001, SOC2, etc.).
Experience in a non-profit or mission-driven organization.
Benefits
20 paid holidays including: Federal holidays like Juneteenth and Labor Day, Election day, Holiday break from Dec 24 through January 1, 2 renewal days, 2 floating holidays
Flexible paid time off, including: 15 vacation days, 3 personal days, 7 sick days
Medical, dental, and vision benefits for the staff member and family at no cost to the employee
403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness, regardless of personal contribution
12 weeks paid parental leave (after 6 months of employment)
Student loan repayment (after 2 years of continuous full time service)
Family support through a virtual childcare platform
Stipends/Allowances: Mental health (Monthly), Internet Service (Monthly), Professional Development (Annual), Wellness (Annual), Home office setup (One time/First year)
Company
Crisis Text Line
Crisis Text Line is free, 24/7 emotional support for those in crisis.
H1B Sponsorship
Crisis Text Line has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2022 (3)
2021 (1)
Funding
Current Stage
Growth StageTotal Funding
$30.8M2016-06-15Series B· $23.8M
2015-10-08Series Unknown· $7M
Recent News
Maryland Daily Record
2024-11-28
2024-05-24
Company data provided by crunchbase