Charles Schwab · 16 hours ago
Principal Architect, Site Reliability Engineering
Charles Schwab is a financial services company that empowers employees to impact their careers. The Principal Architect, Site Reliability Engineering will be responsible for establishing a foundational SRE practice, partnering with various teams to ensure reliability of mission-critical services.
Financial Services
Responsibilities
Evangelize SRE mindset and practices across the Schwab Technology Solutions organization
Partner with support, development, and business stakeholders to develop, measure, and leverage service level objectives
Design and develop solutions to eliminate toil and manual effort from day-to-day support responsibilities
Identify and implement improvements to logging, metrics, and tracing telemetry and triaging capabilities across a diverse technology stack
Lead complex triage and postmortem activities for critical issues and drive prioritization/resolution of remediation items
Perform chaos engineering experiments to improve application resilience to known and unknown failures
Document reliability guidance and best practices. Advocate for and drive adoption of said practices
Foster a culture of learning through coaching, mentoring, and knowledge sharing around reliability practices, processes, and tools
Develop tools, frameworks, and instrumentation to validate and increase release success for applications
Qualification
Required
Minimum 5+ years in SRE role, with at least 3+ years in an architect or technical leadership position
At least 3 or more years of experience designing and implementing highly scalable and fault tolerant systems
In-depth knowledge of resilience patterns (i.e. circuit breakers, timeouts, retries, etc.) and how to design and implement them
In-depth knowledge of CICD processes and tools to ensure software is delivered safely using known deployment strategies (i.e. blue/green, canary deployments, feature toggles, etc.)
Authored technical postmortems (at least weekly) with root cause analyses and documented action items that resulted in measurable resiliency improvements
Contributed to the SLO strategy for at least 5 teams, ensuring alignment with business and client objectives
Three or more years hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk), with a proven track record of setting up dashboards and alerts
Experienced with latest AI solutions to reduce repetitive operational toil
Led or participated in cross-functional SRE-focused initiatives that included key stakeholders from both technical and business units
Participated in resilience or chaos engineering exercises, with documentation showing a reduction in unplanned downtime
Presented findings or led training sessions to share SRE practices, enhancing team performance or adoption rates for reliability engineering methods
Mentored group of SRE junior engineers or teams in SRE best practices, with improvements in incident resolution speed and reliability metrics
Authored and maintained comprehensive SRE documentation for critical systems or workflows, including incident response guides, runbooks, operational playbooks, SLO implementation, and observability
Benefits
Bonus or incentive opportunities
Company
Charles Schwab
We have plans for every turn you take.
H1B Sponsorship
Charles Schwab has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (579)
2024 (468)
2023 (455)
2022 (705)
2021 (483)
2020 (282)
Funding
Current Stage
Late StageRecent News
2025-10-04
Company data provided by crunchbase