Multiscale AI · 1 month ago
Sr. Platform Reliability Engineer
Multiscale AI provides advanced AI solutions for Semiconductor Manufacturing Process Optimization (SMPO). They are seeking a Senior Platform Reliability Engineer to design, build, and operate their next-gen infrastructure stack, focusing on standardizing deployments and enhancing observability.
Artificial Intelligence (AI)Information TechnologyMachine Learning
Responsibilities
Design and evolve infrastructure across Docker, Kubernetes, and VMs spanning Azure, GCP, AWS, and on-premises environments
Build and maintain Infrastructure as Code with Terraform; enforce repo standards (branch protection, conventional commits, PR reviews)
Own end-to-end CI/CD using Azure DevOps and GitHub Actions. Design secure, efficient container build pipelines (multi‑stage builds, base image hardening, image scanning etc)
Manage container registries with lifecycle policies, vulnerability scanning, and image promotion workflows
Implement GitOps using ArgoCD to keep environments declarative, reproducible, and drift‑free
Establish and manage SLOs/SLIs, alerting, incident response, and post‑incident reviews that drive fixes and automation
Operate and improve observability: Prometheus, Grafana for metrics and dashboards; Loki or ELK for logs; Alertmanager and Teams integrations for alerts
Lead incident response, post-mortems, and drive automation to reduce MTTR and prevent recurrence
Strengthen security and compliance across the stack: secrets management, RBAC and network policies, Policy enforcement
Manage certificate lifecycle with certificate managers and Let's Encrypt; enforce mTLS where applicable
Conduct regular security reviews, vulnerability assessments, and compliance audits
Drive backup and disaster recovery for core services and data (e.g., Databases backups, cluster recovery playbooks) to improve MTTR and resilience
Optimize cost and performance with capacity planning, right‑sizing, and usage visibility, in partnership with engineering and finance
Partner with platform developers and the delivery team to ensure smooth operations across internal environments and customer deployments
Qualification
Required
6+ years in SRE/DevOps or infrastructure engineering with production ownership
Strong Linux ops experience (Ubuntu 22.04/24.04, RHEL 9.x) and solid networking fundamentals (TCP/IP, DNS, ingress, load balancing, TLS/SSL)
Depth with Kubernetes (managed or self-managed), Docker, and VM-based deployments in at least one major cloud (Azure, GCP, or AWS)
Proficiency with Terraform and Git-centric workflows (Bitbucket and/or GitHub)
Practical experience with CI/CD (Azure DevOps, GitHub Actions) and container security best practices
Hands-on with observability stacks (Prometheus, Grafana, Loki and ELK) and actionable alerting
Comfortable with incident management, SLO/SLI thinking, and joining an on-call rotation
Automation skills in one or more languages (Go, Python, Bash)
Preferred
GitOps (ArgoCD), policy enforcement (OPA Gatekeeper), secrets management (Sealed Secrets), and certificate management (Cert-Manager)
Container registries operations and image hygiene (Harbor, Trivy, multi-stage builds)
Long-term metrics storage (Thanos/Cortex)
Distributed Tracing with OpenTelemetry pipelines
Experience with hybrid/on-prem environments and private connectivity patterns
Background supporting data-heavy or AI/ML platforms
Benefits
Competitive salary and equity options
Comprehensive health, dental, and vision coverage
Flexible paid time off
Opportunity to work with a cutting-edge team in AI and semiconductor technology
Company
Multiscale AI
Multiscale AI empowers semiconductor manufacturers to accelerate innovation with advanced AI solutions for Semiconductor Manufacturing Process Optimization (SMPO).
H1B Sponsorship
Multiscale AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (7)
2024 (3)
Funding
Current Stage
Early StageTotal Funding
$22.78MKey Investors
Alumni VenturesMicron Ventures
2024-09-17Series A· $11.09M
2022-10-04Seed· $7.42M
2022-03-22Convertible Note· $1.05M
Recent News
2025-02-08
Company data provided by crunchbase