Apply on Employer Site

Axiomatic_AI · 4 hours ago

Senior Platform Engineer

United States

Full-time

Remote

Senior Level

7+ years exp

Axiomatic AI is building a new class of AI systems designed to reason with the rigor of the scientific method. As a Senior Platform Engineer, you will own the reliability, deployment, and operational excellence of our AI platform, focusing on infrastructure, CI/CD, and operations.

Computer Software

Responsibilities

Lead deployment strategies and CI/CD pipelines across multiple environments

Architect and maintain multi-cloud infrastructure (Azure, AWS, GCP) and on-premise deployments

Own infrastructure as code using Terraform to automate provisioning and configuration

Build comprehensive observability systems: monitoring, metrics, logging, and alerting

Implement security controls, compliance frameworks, and data governance policies

Develop automation tools, APIs, and scripts (Python) to improve operational efficiency

Ensure system reliability, performance, and scalability

Drive incident response, postmortems, and continuous improvement

Troubleshoot infrastructure and application issues across multiple environments

Design and implement deployment pipelines for multi-environment releases (dev, staging, production)

Own the full deployment lifecycle: build, test, release, and rollback strategies

Implement blue-green deployments, canary releases, and progressive rollouts

Build automated deployment tooling and workflows

Ensure zero-downtime deployments and rollback capabilities

Optimize build and deployment performance

Manage artifact repositories and container registries

Design and operate multi-cloud infrastructure across Azure, AWS, and GCP

Architect and deploy on-premise solutions for enterprise customers (Linux-based)

Manage Kubernetes clusters, container orchestration, and networking

Implement disaster recovery, backup strategies, and business continuity

Optimize cloud costs and resource utilization

Define and track SLIs, SLOs, and error budgets for critical services

Write and maintain Terraform modules for infrastructure provisioning

Implement GitOps workflows for infrastructure changes

Automate infrastructure scaling, updates, and operations

Ensure reproducible and version-controlled infrastructure

Design comprehensive monitoring, logging, and alerting (Prometheus, Grafana, Datadog, or similar)

Build dashboards for system health, performance, and business metrics

Implement distributed tracing for microservices

Conduct capacity planning and performance analysis

Drive reliability improvements through data-driven insights

Implement security best practices: identity management, secrets management, network policies

Work towards or maintain security certifications (SOC 2, ISO 27001, or similar)

Conduct security audits and vulnerability remediation

Implement data governance policies for AI pipelines and user data

Ensure compliance with data privacy regulations (GDPR, CCPA)

Write automation scripts and tools in Python for operational tasks

Build internal tooling for deployments, monitoring, and incident response

Develop runbooks, automation, and self-healing systems

Create APIs for infrastructure operations when needed

Maintain high code quality and testing standards for tooling

Participate in on-call rotation and lead incident response

Conduct blameless postmortems and drive action items

Build and maintain incident response playbooks

Improve system resilience and failure modes

Partner with engineering teams on deployment strategies and architecture

Work with security team on compliance and governance

Mentor engineers on operational best practices

Document systems, procedures, and runbooks

Qualification

CI/CD pipelinesMulti-cloud infrastructureTerraformSecurity controlsObservability systemsPythonKubernetesLinux administrationGitOps practicesFluent in EnglishProblem-solving skillsCollaborationDocumentation

Required

7+ years of experience in Platform Engineering, Site Reliability Engineering, DevOps, or Infrastructure Engineering roles

Deployment expert: Deep experience with CI/CD pipelines, release strategies, and production deployments at scale

Multi-cloud expertise: Hands-on experience with Azure and AWS required (GCP is a plus)

On-premise deployment experience: Linux system administration, bare-metal provisioning, networking

Terraform expert: Deep experience writing and maintaining infrastructure as code

Observability systems: Proven track record building monitoring, alerting, and metrics platforms

Security mindset: Experience implementing security controls and best practices. Security certification preferred (CISSP, CEH, AWS/Azure Security Specialty, or similar)

Data governance: Understanding of data privacy, residency requirements, and governance frameworks

Backend/scripting skills: Python (preferred) or Go for automation, tooling, and operational scripts

Kubernetes and container orchestration in production

Strong Linux/Unix administration and scripting (Bash, Python)

CI/CD platforms: GitHub Actions, GitLab CI, Jenkins, or similar

Version control and GitOps practices

Strong problem-solving and debugging skills

Fluent in English (Spanish is a plus)

Preferred

Python proficiency for automation and internal tooling

Experience with cloud AI platforms (Vertex AI, Azure ML, AWS SageMaker)

Service mesh experience (Istio, Linkerd) or API gateways

Experience with GPU workloads and ML infrastructure

FinOps and cloud cost optimization

Compliance frameworks experience (SOC 2, ISO 27001, HIPAA, FedRAMP)

Database operations: PostgreSQL, Redis administration

Experience with FastAPI or similar frameworks for internal tools

Contributions to open-source infrastructure projects

Background in hardware or semiconductor industries

Company

Axiomatic_AI

Axiomatic_AI is readying to launch with the aim to accelerate R&D by "Automated Interpretable Reasoning" (AIR) -- a verifiably truthful AI model built for reasoning in science and engineering.

Founded in 2024

Boston, MA, US

11-50 employees

https://www.axiomatic-ai.com/

Funding

Current Stage

Early Stage

Company data provided by crunchbase