Arrive AI · 2 weeks ago
Senior Software Platform Engineer – Observability
Arrive AI is transforming the future of the Autonomous Last Mile™ by creating innovative logistics solutions. They are seeking a highly skilled Senior Software Platform Engineer with expertise in cloud-native architectures and observability to build and monitor complex service-based architectures on Microsoft Azure.
Computer Software
Responsibilities
Architect, deploy, and maintain large-scale, service-oriented platforms on Azure
Design and manage containerized microservices using Docker and Kubernetes, emphasizing scalability and resilience
Implement event-driven architectures using message brokers, Pub/Sub, and streaming services (Kafka, RabbitMQ, Azure Service Bus)
Build and evolve observability frameworks leveraging OpenTelemetry, Prometheus, and Grafana to unify logs, metrics, and traces across cloud, IoT, and embedded environments
Develop custom telemetry and monitoring solutions for embedded systems, robotics, and machine-to-machine (M2M) networks
Integrate Feature Flagging and real-time SLA/SLO tracking into observability pipelines to ensure measurable reliability and controlled rollouts
Establish and refine CI/CD pipelines and automated testing to support continuous delivery and deployment safety
Collaborate cross-functionally with software, AI/ML, and robotics teams to embed observability, monitoring, and feedback loops at every layer of the stack
Drive incident response automation and lead post-incident analysis to improve system reliability over time
Mentor engineers on DevOps best practices, telemetry instrumentation, and data-driven platform operations
Qualification
Required
7+ years in software engineering with a strong focus on cloud platform engineering and DevOps
Expert understanding of Microsoft Azure cloud services, including compute, networking, and monitoring
Deep experience with containerization (Docker) and orchestration (Kubernetes) at scale
Proven success implementing microservice architectures, service meshes, and event-driven systems
Strong knowledge of message brokers and streaming pipelines (Kafka, RabbitMQ, Azure Service Bus)
Hands-on experience with OpenTelemetry, Prometheus, Grafana, and SLO/SLA instrumentation
Proficiency with CI/CD automation, scripting (Python, Bash, PowerShell), and infrastructure observability
Solid grounding in scaling strategies, distributed system reliability, and Feature Flag-driven release practices
Preferred
Experience developing custom OpenTelemetry instrumentation for IoT, M2M, or robotic systems
Familiarity with edge observability, telemetry aggregation, and real-time diagnostics for autonomous devices
Background in IoT networking, M2M communication, and protocols such as MQTT
Familiarity with infrastructure as code tools (Terraform, Bicep, Pulumi, or ARM templates)
Experience with observability stacks (Prometheus, Grafana, Azure Monitor, ELK/EFK)
Hands-on experience with NVIDIA NIM microservices and deploying workloads on NVIDIA DGX Cloud
Prior leadership in incident response, Feature Flag governance, and SRE/SLO management
Benefits
High equity incentive
Company
Arrive AI
Arrive AI is a pioneer in mailbox-as-a-service (MaaS), providing secure, seamless delivery and pickup infrastructure for the last inch of the autonomous last mile.