Infrastructure Engineer (Hybrid Cloud & Platform) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Aldea · 1 month ago

Infrastructure Engineer (Hybrid Cloud & Platform)

Aldea is a multi-modal foundational AI company reimagining the scaling laws of intelligence. They are seeking an Infrastructure Engineer to bridge the gap between complex hybrid infrastructure and developer velocity, focusing on architecting a unified platform across AWS and Bare Metal Kubernetes.

Artificial Intelligence (AI)SoftwareSpeech Recognition
check
H1B Sponsor Likelynote

Responsibilities

Hybrid Infrastructure & Bare Metal (AWS + K8s)
Unified IaC Strategy: Architect and maintain the Terraform codebase for both AWS services (EKS, RDS, VPC) and Bare Metal clusters. You will treat physical infrastructure as mutable software, using tools like Cluster API, Metal3, or Tinkerbell to manage hardware lifecycles
Bare Metal Mastery: Manage multiple production clusters on bare metal with clear separation of environments. You will solve complex challenges including networking (BGP, ECMP), load balancing (MetalLB/Kube-VIP), and storage orchestration (CSI/Rook-Ceph) for stateful workloads
Observability & AI Monitoring
Full-Stack Visibility: Contribute to building our stack (Prometheus, Grafana, ELK/Loki) to monitor both EKS and bare metal
AI/GPU Telemetry: Build specialized dashboards for AI workloads. You will track GPU metrics, CPU saturation, and memory pressure to ensure efficient resource utilization
CI/CD & Release Architecture
CI/CD at Scale: Architect resilient, multi-region pipelines using GitHub Actions. Automated CI/CD for apps using ArgoCD. You will build and manage a fleet of self-hosted runners to control costs and accelerate feedback loops
Secure Release Engineering: Implement end-to-end workflows: Docker image build → Helm chart release → deployment (GH Actions + ArgoCD). Semantic versioning, manage artifacts in centralized registries, and integrate vulnerability scanning
Leadership & Collaboration
Technical Direction: Lead design reviews and drive platform roadmaps that balance reliability, cost, and developer productivity
Cross-Functional Partnership: Partner with product, security, and application teams to translate business needs into robust platform capabilities

Qualification

AWSKubernetesTerraformGitHub ActionsArgoCDLinux/BashPythonNetworkingStorage ManagementObservabilityLeadershipCollaboration

Required

Experience: Infrastructure, DevOps, or SRE roles, with primary ownership of production systems in AWS and Bare Metal Kubernetes
Technical Arsenal: Expert fluency in Terraform, Linux/Bash or Python scripting, and GitHub Actions, and ArgoCD
Bare Metal & K8s: Proven experience operating Kubernetes in production, including hybrid setups (EKS + On-Prem). You understand networking (CNI, BGP), storage (CSI), and cluster lifecycle management
Observability Depth: You have moved beyond 'out-of-the-box' dashboards. You understand high-cardinality metrics, log retention strategies, and how to debug distributed systems
Platform Mindset: You don't just build servers; you build products for developers

Preferred

Experience with OpenTelemetry (OTEL) for unified tracing
Understanding of eBPF
Experience configuring NVIDIA DCGM for GPU monitoring and handling AI training/inference workloads

Company

Aldea

twittertwitter
company-logo
Aldea builds AI voice and language technology with speech-to-text, text-to-speech, and conversational interfaces.

H1B Sponsorship

Aldea has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (2)
2022 (1)
2021 (1)
2020 (1)

Funding

Current Stage
Early Stage
Company data provided by crunchbase