MX · 7 hours ago
Senior Director - Cloud Platform Engineering
MX is an award-winning company that is helping create meaningful and lasting change in the financial industry. They are seeking a Senior Director to oversee all site, infrastructure, and cloud platform services, ensuring production reliability and leading large-scale migrations to AWS.
BankingFinanceFinancial ServicesFinTech
Responsibilities
Personally own and execute the end-to-end data center exit and AWS migration, from discovery and planning through cutover, stabilization, and full decommissioning
Define migration waves, readiness gates, and cutover plans with explicit transition into steady-state ownership, avoiding temporary or parallel operating models
Own architectural decisions across AWS networking, compute, storage, security, and observability, ensuring designs are operable, supportable, and resilient post-migration
Establish and own the post-migration operating model for cloud infrastructure and platforms, explicitly tied to outcomes: Clearly defined SLIs, SLOs, and error budgets for all Tier-1 and Tier-2 services Accountable owners for SLO attainment across SRE, platform, and product teams On-call and escalation models that provide durable time-zone coverage Ongoing Cost-efficiency and optimization Incident response, change management, and release practices aligned to reliability targets. Post-migration roadmap for Platforms
Hold teams accountable for post-migration reliability metrics, including: SLO compliance and error budget burn, Sev 0 / Sev 1 incident frequency and customer impact, MTTR and incident recurrence rates
Ensure migration execution does not introduce long-term operational debt, and that workloads transition cleanly into measured, observable, and well-owned cloud operations
Lead physical and logical data center decommissioning only after post-migration SLOs are consistently met and incident KPIs have stabilized
Own the vision, roadmap, and execution for the company’s cloud platform, ensuring it supports both migration needs and long-term, steady-state operations on AWS
Own core platform capabilities and tooling strategy such as Kubernetes (EKS), CI/CD pipelines, infrastructure-as-code, identity and access management, secrets management, observability, and disaster recovery
Deliver self-service, opinionated platform services that improve developer productivity while meeting security and reliability standards
Modernize legacy and architect for Multi-Tenant SaaS: Enable secure and efficient scaling across tenants in AWS, with attention to cost, compliance, and observability
Drive platform standardization to reduce fragmentation, operational toil, and cognitive load for product engineering teams
Partner closely with application and product engineering to ensure the platform accelerates delivery while maintaining reliability and compliance
Own and evolve the end-to-end incident management lifecycle for infrastructure and platform services, grounded in SRE principles of reliability, learning, and automation
Define and enforce SLIs, SLOs, and error budgets for platform and infrastructure services, using them to guide operational decisions, release risk, and incident response
Operate on a clear severity framework (Sev 0/1/2) with explicit ownership, escalation paths, and decision rights
Lead the transition from incident response as heroics to incident prevention by design, embedding reliability, AI, capacity planning, and failure-mode analysis into platform roadmaps and change processes
Serve as the executive escalation owner for Sev 0 and Sev 1 incidents, personally leading response, trade-off decisions, and executive communications when required, while delegating incident command to empowered leaders to ensure sustained coverage
Hold clear decision authority under pressure, including the ability to unilaterally halt or roll back changes, trigger failovers/traffic-shifts and disaster recovery actions, reallocate engineering resources in demanding situations, and make go/no-go cutover decisions to protect customers and data escalating to executive leadership when actions materially impact regulatory posture, contractual commitments, or significant financial exposure
Build and maintain a US-based SRE and incident leadership bench, with multiple leaders capable of acting as Incident Commander, owning executive updates, and coordinating cross-functional response
Lead through error budgets and reliability signals to drive blameless postmortems, root-cause analysis, and prioritization of systemic fixes over short-term feature velocity
Own the systematic reduction of operational toil and capacity tax across infrastructure and platform teams, with clear accountability for ensuring reactive work declines as systems mature
Hold teams accountable to measurable toil and resilience KPIs, such as percentage of engineer time spent on reactive work, on-call interrupt frequency, manual intervention rates, and incident recurrence
Influence readiness through game days, chaos testing, and migration-specific drills, validating both technical resilience and delegation models under pressure
Ensure incident management tooling, observability (metrics, logs, traces), and documentation are standardized, well-owned, and continuously improved
Partner with product, engineering, security, enterprise architecture, and finance to shape cloud migration and platform decisions that directly impact cost-to-serve, unit economics, and operational overhead, ensuring infrastructure choices scale sustainably with business growth
Drive architectural and platform standards that reduce total cost of ownership, including infrastructure spend, support burden, reliability overhead, and on-call load
Embed FinOps and Reliability signals (utilization, reliability cost, incident-driven spend, operational toil) into platform roadmaps and migration sequencing, making trade-offs explicit between performance, resilience, speed, and cost
Translate infrastructure and platform choices into clear business outcomes such as per-customer cost, per-transaction cost, and support effort, enabling executives to make informed investment and prioritization decisions
Act as a trusted advisor on infrastructure and cloud strategy, challenging assumptions and translating complex technical risks into clear business impact, options, and trade-offs to enable informed decision-making under pressure
Build and delegate clear ownership and accountability for cloud migration timelines, risks, and outcomes
Establish clear governance, readiness reviews, and success metrics for migration and platform initiatives
Partner and guide steering committees, technical working groups, and cross-organizational readiness forums
Own the design, scale, and effectiveness of the Cloud Platform Engineering organization, including SRE, cloud infrastructure, and platform engineering teams across geographies
Build and lead a strong leadership bench, developing senior managers, principal engineers, and architects who can operate independently at scale
Clearly define delegation, decision rights, and escalation paths so that critical incidents, migrations, and operational responsibilities are owned at the right level
Drive organizational clarity across charters, roles, responsibilities, and decision rights to reduce friction and increase delivery velocity
Actively recruit, retain, and develop top-tier infrastructure, SRE, and platform talent, including succession planning for critical roles
Establish a culture of engineering excellence, reliability, and continuous improvement, grounded in data, post-incident learning, and blameless accountability
Lead change management during periods of transformation, including data center exit, cloud migration, and operating model shifts
Foster strong partnerships with product, application engineering, security, and business leaders, ensuring platform teams are seen as strategic enablers and not service providers
Champion diversity of thought, inclusive leadership, and high team engagement across a growing, global organization
Qualification
Required
15+ years of experience in infrastructure, Cloud, SRE, or platform engineering
7+ years leading large engineering organizations (managers of managers or equivalent)
Direct, hands-on leadership of at least one full data center exit and AWS migration, including decommissioning of on-premise infrastructure
Deep technical expertise in AWS, including VPC networking, EC2, EKS/Kubernetes, RDS/Aurora, S3, IAM, and observability tooling
Strong experience operating highly available, distributed systems using SRE principles
Proven ability to lead complex, high-risk infrastructure transformations in production environments
Expertise in FinOps and cloud cost optimization practices
Demonstrated ability to drive standards and adoptions across distributed engineering teams without relying on reporting lines
Skillful operating as a front-line executive leader during critical situations, including migrations, upgrades, DR, incidents, and major production events
Benefits
Company-paid meals
Massage therapists
A sports simulator
Gym
Mother’s lounge
Meditation room
Company
MX
MX Technologies, Inc. is a leader in actionable intelligence, enabling financial providers and consumers to do more with financial data.
H1B Sponsorship
MX has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (4)
2023 (4)
2022 (11)
2021 (11)
2020 (6)
Funding
Current Stage
Late StageTotal Funding
$450MKey Investors
TPGBattery VenturesUSAA
2021-01-13Series C· $300M
2019-06-25Series B· $100M
2015-04-30Series A· $30M
Recent News
2025-12-15
bloomberglaw.com
2025-11-05
2025-11-02
Company data provided by crunchbase