Traversal · 1 week ago
Principal Platform Architect
Traversal is an AI Site Reliability Engineer for the enterprise, trusted by major companies to manage complex production incidents. The Principal Platform Architect will set the technical direction for critical systems, working cross-functionally to design resilient and observable systems that enhance production incident management.
Artificial Intelligence (AI)SoftwareSoftware Engineering
Responsibilities
Architecture & System Design: Lead the design of scalable, resilient infrastructure systems to power AI-driven root cause analysis and observability workflows
Architect for scale and complexity: Define and evolve the long-term architecture strategy for infrastructure and observability systems—ensuring they scale with growing AI workloads, customer complexity, and team size
Drive cross-functional technical alignment: Act as a key partner to product and engineering leadership—aligning on priorities, shaping the roadmap, and driving clarity across teams
Lead through ambiguity and depth: Tackle high-leverage, unscoped problems—bringing structure, clarity, and executable plans to ambiguous technical challenges
Set engineering standards: Establish and evangelize best practices across reliability, system design, code quality, and observability—raising the bar for engineering across the org
Influence organizational process: Partner with leadership to improve how technical decisions are made, how teams collaborate, and how we scale culture alongside systems
Mentor deeply and broadly: Uplevel Staff and Senior engineers across the company, not just on your immediate team—through pairing, feedback, and technical guidance
Shape team composition and hiring bar: Work with recruiting and leadership to define role expectations, calibrate interviews, and evaluate candidates for high-impact roles
Be a technical multiplier: Serve as a sounding board for critical architectural decisions, unlock velocity by unblocking teams, and help connect long-term vision with day-to-day execution
Qualification
Required
10+ years of experience in backend, infrastructure, or platform engineering, with a strong emphasis on large-scale data systems
Proven expertise in designing, scaling, and operating high-throughput distributed systems for real-time or near real-time data processing
Demonstrated success owning complex infrastructure architecture end-to-end—from initial design through to deployment and long-term maintenance
Deep understanding of data pipeline design patterns (streaming and batch), storage systems, and consistency/performance tradeoffs at scale
Hands-on experience with technologies like Kafka, Flink, Spark, Postgres, S3, and modern observability stacks
Experience architecting for multi-tenant, hybrid, or on-prem environments
Strong systems thinking and debugging skills across infrastructure, networking, and data layers
Excellent communication and collaboration skills with a track record of driving alignment across technical and non-technical stakeholders
Comfortable working in high-velocity, ambiguous startup environments, with a bias toward action and pragmatism
Preferred
Experience making software systems observable using logs, metrics, and traces
Familiarity with Python-based ecosystems
Background in infrastructure for ML/AI or LLM-powered products
Experience provisioning and managing infrastructure using IaC tools (Terraform, Pulumi)
Contributions to open source or infrastructure tooling
Benefits
Health insurance
Startup equity
Flexible time off
Plenty of in-office snacks
Company
Traversal
Traversal is building the AI SRE for the enterprise.
Funding
Current Stage
Early StageTotal Funding
$48MKey Investors
Sequoia CapitalKleiner Perkins
2025-06-20Seed
2025-06-18Series A· $48M
Company data provided by crunchbase