Blitzy · 2 days ago
Principal Engineer (Backend / Platform)
Blitzy is a rapidly scaling Generative AI start-up based in Cambridge, MA. They are looking for a Principal Engineer to take full ownership of critical production-grade systems, delivering high-leverage features that enhance customer outcomes and engineering velocity.
Artificial Intelligence (AI)Software
Responsibilities
You independently own mission-critical production systems end-to-end
You ship high-leverage features that materially improve reliability, performance, and customer value
You raise the technical bar through working code and outcomes, not reviews alone
You identify and resolve the hardest technical bottlenecks limiting quality or velocity
You make technical decisions with long-term impact and stand behind them in production
You are a self-directed learner requiring minimal supervision
You are dedicated to getting things done right without making technical compromises
You demonstrate strong technical expertise across the full stack: frontend, backend, infrastructure, and AI systems
You materially increase the effectiveness of the engineering organization through example and judgment
End-to-end ownership of production-grade backend systems used by enterprise customers
Correctness, scalability, performance, and operational reliability in real environments
Delivery and stewardship of high-impact features, not just incremental improvements
Building and improving Blitzy using Blitzy, holding the platform to real-world standards
Designing, operating, and evolving LLM validation loops to ensure correctness, consistency, and durability in production
Setting engineering quality and durability expectations across teams without direct reports
Qualification
Required
Strong, hands-on experience across the full technology stack
Deep expertise in Python (primary language)
Experience with backend frameworks and microservices architectures
Experience with RPC-based services (REST, gRPC)
Knowledge of Node.js and JavaScript (supporting languages)
Expertise across at least two of AWS, GCP, and Azure, with GCP mandatory
Expert-level Kubernetes experience for container orchestration in production environments
Experience with Terraform for infrastructure as code
Experience with production systems: monitoring, reliability, scalability, and operational tooling
Strong expertise across both SQL and NoSQL databases
Experience with SQL: PostgreSQL, MySQL, or similar relational databases
Experience with NoSQL: MongoDB, Cassandra, DynamoDB, or similar
Experience with Graph databases (Neo4j) for modeling and searching complex relationships
Experience with vector databases / embeddings infrastructure for semantic search and retrieval
Experience building and operating LLM-powered systems in production
Experience designing and implementing LLM validation loops (evaluation, feedback, regression testing, and failure analysis)
Experience with LangSmith (or equivalent tooling) for tracing, evaluation, and debugging
Familiarity with specific models or vendors (e.g., OpenAI, Anthropic) is a plus
Strong understanding of frontend frameworks and architecture
Ability to evaluate and contribute to frontend technical decisions
Demonstrated technical expertise across the full stack — you can debug, design, and ship across frontend, backend, infrastructure, and AI layers
Core understanding of how large-scale enterprise systems are designed, integrated, deployed, and operated
Experience working within — or modernizing — complex, long-lived production environments
Preferred
Prior experience operating at Principal Engineer / Staff+ scope at a highly technical company
Deep understanding of how enterprise software systems are built and evolved at scale, including tradeoffs around reliability, change management, and long-term maintainability
Clear ownership of complex systems that you designed, built, and supported in production
Hands-on ownership of LLM validation loops used to ensure correctness and durability in real environments
Experience designing systems that combine graph, vector, and traditional data models to support reasoning over complex domains
Proven ability to deliver outsized impact through the right features, not just more code
Track record of being a self-learner who requires minimal supervision and gets things done without compromising on quality
Benefits
Paid medical, dental, and vision insurance for you and your dependents
4% 401(k) match
Flexible vacation days, sick days, and work-from-home days
Technology (hardware, software, reading materials, etc.) equipment and/or allowance
Unlimited snacks, fizzy water, coffee, espresso, and whatever else you need to perform at your best