Senior Software Platform Engineer – Observability jobs in United States
cer-icon
Apply on Employer Site
company-logo

Arrive AI · 2 weeks ago

Senior Software Platform Engineer – Observability

Arrive AI is transforming the future of the Autonomous Last Mile™ by creating innovative logistics solutions. They are seeking a highly skilled Senior Software Platform Engineer with expertise in cloud-native architectures and observability to build and monitor complex service-based architectures on Microsoft Azure.

Computer Software

Responsibilities

Architect, deploy, and maintain large-scale, service-oriented platforms on Azure
Design and manage containerized microservices using Docker and Kubernetes, emphasizing scalability and resilience
Implement event-driven architectures using message brokers, Pub/Sub, and streaming services (Kafka, RabbitMQ, Azure Service Bus)
Build and evolve observability frameworks leveraging OpenTelemetry, Prometheus, and Grafana to unify logs, metrics, and traces across cloud, IoT, and embedded environments
Develop custom telemetry and monitoring solutions for embedded systems, robotics, and machine-to-machine (M2M) networks
Integrate Feature Flagging and real-time SLA/SLO tracking into observability pipelines to ensure measurable reliability and controlled rollouts
Establish and refine CI/CD pipelines and automated testing to support continuous delivery and deployment safety
Collaborate cross-functionally with software, AI/ML, and robotics teams to embed observability, monitoring, and feedback loops at every layer of the stack
Drive incident response automation and lead post-incident analysis to improve system reliability over time
Mentor engineers on DevOps best practices, telemetry instrumentation, and data-driven platform operations

Qualification

Cloud-native architecturesAutonomous system reliabilityMicrosoft AzureContainer orchestrationObservabilityOpenTelemetryEvent-driven designDockerKubernetesMicroservice architecturesMessage brokersStreaming servicesCI/CD automationPythonBashPowerShellFeature FlaggingIncident responseDevOps best practices

Required

7+ years in software engineering with a strong focus on cloud platform engineering and DevOps
Expert understanding of Microsoft Azure cloud services, including compute, networking, and monitoring
Deep experience with containerization (Docker) and orchestration (Kubernetes) at scale
Proven success implementing microservice architectures, service meshes, and event-driven systems
Strong knowledge of message brokers and streaming pipelines (Kafka, RabbitMQ, Azure Service Bus)
Hands-on experience with OpenTelemetry, Prometheus, Grafana, and SLO/SLA instrumentation
Proficiency with CI/CD automation, scripting (Python, Bash, PowerShell), and infrastructure observability
Solid grounding in scaling strategies, distributed system reliability, and Feature Flag-driven release practices

Preferred

Experience developing custom OpenTelemetry instrumentation for IoT, M2M, or robotic systems
Familiarity with edge observability, telemetry aggregation, and real-time diagnostics for autonomous devices
Background in IoT networking, M2M communication, and protocols such as MQTT
Familiarity with infrastructure as code tools (Terraform, Bicep, Pulumi, or ARM templates)
Experience with observability stacks (Prometheus, Grafana, Azure Monitor, ELK/EFK)
Hands-on experience with NVIDIA NIM microservices and deploying workloads on NVIDIA DGX Cloud
Prior leadership in incident response, Feature Flag governance, and SRE/SLO management

Benefits

High equity incentive

Company

Arrive AI

twitter
company-logo
Arrive AI is a pioneer in mailbox-as-a-service (MaaS), providing secure, seamless delivery and pickup infrastructure for the last inch of the autonomous last mile.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Todd Pepmeier
Chief Financial Officer
linkedin
leader-logo
Mark Hamm
COO
linkedin
Company data provided by crunchbase