Undisclosed · 3 hours ago
Senior Infrastructure Engineer
Our client is a rapidly scaling technology company operating large-scale, cloud-native platforms that support millions of users and mission-critical workloads. They are seeking a Senior Infrastructure Engineer to design, build, and operate highly reliable, secure, and scalable infrastructure systems. This role involves working closely with various teams to ensure the platform remains fast, resilient, and cost-efficient as it scales.
Responsibilities
Design, deploy, and operate cloud infrastructure on AWS, GCP, or Azure using infrastructure-as-code
Build and maintain scalable compute, storage, and networking architectures to support high-traffic applications
Own core infrastructure components including: Kubernetes clusters (EKS/GKE/AKS), Load balancers, CDNs, and ingress controllers, Virtual networks, IAM, and secrets management
Ensure infrastructure is highly available, fault-tolerant, and designed for growth
Develop and maintain infrastructure using Terraform, CloudFormation, or Pulumi
Build reusable modules and patterns for consistent, repeatable deployments
Automate provisioning, scaling, patching, and configuration management
Reduce manual operational work through scripting (Python, Bash, Go)
Implement observability solutions using tools such as Datadog, Prometheus, Grafana, CloudWatch, or New Relic
Define and track SLIs, SLOs, and error budgets
Participate in on-call rotations and lead incident response for infrastructure-related issues
Conduct post-incident reviews and drive long-term reliability improvements
Partner with application engineers to improve developer velocity through better tooling and workflows
Build and maintain CI/CD pipelines using GitHub Actions, Jenkins, CircleCI, or GitLab CI
Support containerized build and deployment pipelines for microservices
Enable safe, fast deployments with automated testing, rollbacks, and canary releases
Implement cloud security best practices including least-privilege IAM, network segmentation, and encryption
Partner with Security teams to support compliance requirements (SOC 2, ISO 27001, PCI, HIPAA as applicable)
Manage secrets, keys, and certificates using secure systems (Vault, AWS Secrets Manager, KMS)
Proactively identify and remediate infrastructure vulnerabilities
Monitor and optimize cloud spend through cost analysis, capacity planning, and resource rightsizing
Implement autoscaling strategies to balance performance and cost efficiency
Partner with Finance and Engineering to forecast infrastructure needs and budgets
Work closely with Software Engineers, Data Engineers, ML teams, and Product to support new initiatives
Review architecture designs and provide infrastructure guidance early in the development lifecycle
Mentor junior infrastructure engineers and contribute to documentation and best practices
Advocate for operational excellence and shared ownership across teams
Qualification
Required
6–10+ years of experience in infrastructure, platform, or site reliability engineering roles
Strong experience operating production systems in AWS, GCP, or Azure
Deep knowledge of containerization and orchestration (Docker, Kubernetes)
Proficiency with infrastructure-as-code and automation tools
Experience supporting high-availability, distributed systems
Strong troubleshooting skills across networking, compute, and application layers
Excellent communication skills and a collaborative mindset
Preferred
Experience supporting data platforms, streaming systems, or ML workloads
Familiarity with service meshes (Istio, Linkerd)
Experience with multi-region or global infrastructure deployments
Exposure to zero-trust security models
Prior experience in fast-growing startups or scale-ups