Principal Engineer, Operational Excellence & Resilience (Remote) jobs in United States
cer-icon
Apply on Employer Site
company-logo

CrowdStrike · 5 days ago

Principal Engineer, Operational Excellence & Resilience (Remote)

CrowdStrike is a global leader in cybersecurity, dedicated to stopping breaches with their advanced AI-native platform. The Technology Resilience Principal Engineer will lead the Technology Resilience function, driving strategy and execution of resilience practices across CrowdStrike's technology stack to ensure service reliability and rapid recovery capabilities.

Artificial Intelligence (AI)Cloud Data ServicesCloud SecurityCyber SecurityNetwork Security
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Cross-Organizational Coordination: Facilitate coordination between stakeholders across IT, Product, Engineering, and business units, serving as the central point for technology resilience initiatives and ensuring alignment with business objectives
Enterprise Standards & Governance: Own and maintain enterprise-wide technology resilience standards, ensuring consistent implementation and reducing organizational drift from established frameworks across infrastructure, application, and product domains
Technology Resilience Strategy: Drive comprehensive technical resilience architecture including infrastructure redundancy and fault tolerance, application resilience and graceful degradation strategies, and chaos engineering frameworks for continuous resilience validation
Disaster Recovery Leadership: Lead enterprise technical recovery strategy development and implementation, including backup and redundancy systems, recovery time/point objectives (RTO/RPO) for technical systems, and data recovery/restoration procedures
Product Performance and Scalability: Partner to define and implement resilience standards, including feature flagging, release, testing, multi-tenancy frameworks, and scalability frameworks to manage growth
Risk Oversight & Metrics: Provide technical oversight and aggregation of technology resilience risks across the enterprise, establishing and monitoring key performance indicators including system uptime
Resilience Engineering Leadership: Drive chaos engineering and resilience testing programs, establishing enterprise-wide practices for proactive resilience validation and continuous improvement
Shared Tooling Strategy: Own shared resilience tooling strategy, evaluation, and implementation to support enterprise-wide capabilities including monitoring, testing, and recovery automation
Stakeholder Engagement: Build and maintain formal networks with key constituents across business units, engineering teams, and external partners
Crisis Leadership: Serve as senior technical advisor during major incident response, providing expertise on technical recovery strategies and coordinating cross-functional recovery efforts
Innovation & Continuous Improvement: Drive innovation in resilience practices, identifying emerging technologies and methodologies to advance CrowdStrike's competitive resilience advantage
Mentorship & Knowledge Transfer: Provide strategic guidance and expertise to junior team members and cross-functional partners on resilience engineering best practices

Qualification

Technology resilienceDisaster recoveryChaos engineeringCloud-native environmentsInfrastructure redundancyApplication resilienceMetrics & analyticsAdvanced certificationsCommunicationProblem solvingCollaborationMentorship

Required

10+ years of direct experience in technology resilience, disaster recovery, site reliability engineering, or related technical disciplines, with demonstrated expertise in enterprise-scale cloud-native environments
Deep understanding of infrastructure redundancy patterns, application resilience design, chaos engineering principles, and enterprise disaster recovery strategies across hybrid cloud architectures
Proven experience with feature management systems, progressive deployment strategies, multi-tenant architecture resilience, and scalability engineering practices
Proven ability to drive strategic initiatives across large technology organizations, with experience influencing senior stakeholders and leading complex, cross-functional resilience programs
Experience establishing and monitoring resilience KPIs, including system uptime, MTTR, RTO/RPO objectives, and deployment success metrics
Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications)
Exceptional written and oral communication skills, including experience developing and delivering strategic briefings to executive leadership and technical teams
Advanced analytical and conceptual thinking abilities, with proven track record of solving complex, ambiguous resilience challenges with enterprise-wide impact
Demonstrated ability to build formal networks and influence stakeholders across engineering, product, and business organizations
Bachelor's degree in Computer Science, Information Systems, Engineering, Risk/Resilience, or equivalent practical experience
Ability to provide leadership support during crisis events, including nights and weekends when required

Preferred

Experience leading technology resilience functions in high-growth, cloud-native technology companies
Advanced knowledge of chaos engineering tools and practices (Chaos Monkey, Litmus, Gremlin, etc.)
Experience with modern resilience patterns including circuit breakers, bulkheads, and progressive delivery
Background spanning infrastructure operations, site reliability engineering, and product engineering
Experience with observability and monitoring platforms supporting resilience objectives
Advanced data analytics and visualization experience for resilience metrics and reporting
Deep knowledge of compliance frameworks (ISO27001, ISO22301, SOC2, NIST, FedRAMP) and their intersection with technical resilience
Experience scaling resilience programs and building high-performing resilience engineering teams

Benefits

Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe
Health insurance
401k
Paid time off

Company

CrowdStrike

company-logo
CrowdStrike is a cybersecurity technology firm that provides cloud-delivered protection for cloud workloads, identity, and data.

H1B Sponsorship

CrowdStrike has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (116)
2024 (62)
2023 (91)
2022 (60)
2021 (49)
2020 (22)

Funding

Current Stage
Public Company
Total Funding
$1.24B
Key Investors
ARK Investment ManagementAccelCapitalG
2022-12-01Post Ipo Equity· $4.6M
2021-01-12Post Ipo Debt· $750M
2019-06-12IPO

Leadership Team

leader-logo
George Kurtz
President / CEO & Founder
linkedin
leader-logo
Zeki Turedi
Field CTO Europe
linkedin
Company data provided by crunchbase