Shyld AI ยท 1 day ago
Senior Infrastructure Engineer
Shyld AI builds safety-critical robotics and perception systems deployed on real devices, and they are seeking a Senior Infrastructure Engineer to own their cloud and edge infrastructure foundations. The role involves managing CI/CD, deployments, device provisioning, runtime reliability, and ensuring compliance with SOC 2 standards.
Artificial Intelligence (AI)
Responsibilities
Own and operate cloud infrastructure: compute, networking, storage, messaging, CI runners
Standardize environments with infrastructure-as-code, runbooks, and safer deploy practices
Build and maintain CI/CD and release pipelines for containerized services and device components
Manage deployments and runtime reliability (startup, recovery, watchdogs, rollbacks, staged rollouts)
Create and maintain integration test infrastructure (service-to-service and end-to-end CI)
Build device provisioning and automated setup for edge deployments
Own observability across backend and device fleet: logging, metrics, dashboards, alerting
Lead or strongly contribute to SOC 2 (Type I / Type II) readiness and ongoing compliance:
Implement and maintain controls (access, change management, logging, incident response, vendor risk, encryption)
Build auditable workflows and automation for evidence collection
Ensure traceability for changes (approvals, release notes, rollbacks, audit trails)
Build and maintain secure device firmware deployment processes, including:
Firmware/code signing (adding signatures, managing keys/certificates securely)
Release integrity verification, staged rollouts, versioning, rollbacks, and auditability
Collaboration with embedded/robotics teams to ensure safe and reliable update strategies
Implement secrets and authentication management (secure distribution, rotation, service auth)
Maintain strong access control and identity practices across cloud + edge (IAM/RBAC, OAuth/OIDC/JWT, mTLS as applicable)
Write monitoring SQL for operational health checks, anomaly detection, and reporting/dashboards
Develop automation and services in Python for operational workflows, observability, and tooling
Build and maintain internal/external APIs to support deployment orchestration, telemetry pipelines, and integrations
Qualification
Required
4+ years in DevOps / SRE / Platform / Infrastructure Engineering with production ownership
Strong Linux, networking, and debugging skills across distributed systems
Deep Docker/container experience and CI/CD ownership
Cloud infrastructure experience (AWS/GCP/Azure), including IAM, networking, storage, compute
Observability experience (logs/metrics/tracing), dashboards, and alerting
Secrets management experience (Vault / cloud secret managers / KMS) and secure rotation practices
Authentication and identity knowledge: IAM/RBAC, OAuth/OIDC/JWT, mTLS
Experience building and maintaining integration test pipelines (service-to-service and end-to-end CI)
Proven ability to support SOC 2 compliance in engineering practice (controls, evidence, audit readiness, change management)
Experience delivering secure firmware/device updates, including signing and release integrity
Preferred
Edge/IoT/robotics production experience (ROS2 a plus)
Infrastructure-as-code with Terraform/Pulumi
Device identity/attestation and secure update pipelines (supply chain integrity, signed artifacts)
HIL/simulation testing; MQTT/EMQX/Kafka/NATS
SRE practices: SLIs/SLOs, incident response, postmortems, error budgets