The Value Maximizer · 10 hours ago
Senior AWS Cloud/Infrastructure Engineer/Architect
The Value Maximizer is seeking a Senior AWS Cloud/Infrastructure Engineer/Architect to design, implement, and operate large-scale platforms on AWS. This hands-on role involves partnering with various teams to build scalable foundations and enable high-velocity delivery while focusing on event-driven architectures and observability.
Information Technology & Services
Responsibilities
Design and implement cloud-native architectures on AWS using services such as VPC, EC2, EKS, S3, RDS/Aurora, IAM, CloudWatch, and KMS, following Well-Architected and security best practices
Lead the design and operation of event-driven systems using Amazon MSK (Managed Streaming for Apache Kafka) and/or managed streaming frameworks (e.g., Kinesis/Kafka-based MSF), including topic design, partitioning, consumer groups, schema evolution, and back-pressure handling
Architect and manage caching layers and in-memory data stores (e.g., Amazon ElastiCache for Redis/Memcached or similar) to improve performance, reduce latency, and offload downstream databases
Implement and support data lakehouse patterns using Apache Iceberg or similar table formats on object storage (e.g., S3), including table design, partitioning, schema evolution, and performance optimization for analytical and near-real-time workloads
Design, provision, and operate Kubernetes clusters on Amazon EKS, including node groups, autoscaling, networking, ingress, service mesh (where applicable), secrets management, and multi-environment separation
Implement full-stack observability using OpenTelemetry (traces, metrics, logs), integrating with centralized telemetry backends, defining SLOs/SLIs, and enabling deep visibility into distributed, event-driven workloads
Build and maintain Infrastructure-as-Code (IaC) using tools such as Terraform and/or AWS CloudFormation, enforcing reusable modules, environment parity, and Git-based workflows
Establish and enhance CI/CD pipelines for infrastructure and application deployments on AWS/EKS/MSK, including automated testing, security scans, canary/blue-green releases, and rollback strategies
Ensure platform security, compliance, and governance, including IAM roles and policies, network segmentation, encryption in transit/at rest, secrets management, and audit logging
Monitor and optimize cost, performance, and resilience of AWS environments; drive capacity planning, rightsizing, and architectural improvements for high availability and disaster recovery
Troubleshoot complex production incidents across EKS, MSK, event pipelines, caching tiers, and data platforms, driving root cause analysis and long-term remediation
Mentor engineers, champion engineering best practices, and collaborate with architects and product teams to align platform roadmaps with business goals
Qualification
Required
10+ years of hands-on experience in cloud engineering, infrastructure engineering, or platform/SRE roles, with at least 5+ years focused primarily on AWS
Strong expertise with core AWS services: VPC, IAM, EC2, EKS/ECS, S3, RDS/Aurora, CloudWatch/CloudTrail, KMS, and networking (subnets, routing, security groups, NACLs, load balancers)
Proven production experience with Amazon MSK or equivalent Kafka-based managed streaming platforms (MSF), including cluster operations, capacity planning, security, and observability
Practical experience with event-driven and streaming architectures (e.g., Kafka/Kinesis + consumers, stream processing, CQRS, pub/sub patterns) in mission-critical systems
Hands-on experience with caching data stores and distributed caches (e.g., Redis, Memcached, ElastiCache), including eviction strategies, key design, and cache-aside/write-through patterns
Experience implementing or operating data lake or lakehouse solutions on S3 or similar, using Apache Iceberg or comparable table formats (e.g., Delta Lake, Hudi), and integrating with analytics/processing engines
Strong Kubernetes and EKS background, including cluster lifecycle management, Helm or similar packaging, autoscaling, network policies, and container security baselines
Deep understanding of observability, distributed tracing, and telemetry; hands-on with OpenTelemetry SDKs/collectors and integration into logging/metrics/tracing backends
Proficiency with IaC tools such as Terraform and/or CloudFormation, plus strong Git and DevOps practices around code review, branching, and automated testing
Solid scripting or programming skills (e.g., Python, Bash, Go, or similar) for automation, tooling, and glue code around AWS, MSK, EKS, and observability stacks
Strong knowledge of security, networking, and compliance in cloud environments, including least-privilege IAM, network isolation, certificate management, and secrets rotation
Excellent communication and stakeholder management skills, with experience collaborating in cross-functional teams and mentoring engineers at mid-level and below
Preferred
Experience with service meshes (e.g., Istio, Linkerd) on EKS for traffic management, mTLS, and advanced observability
Exposure to big-data/analytics ecosystems around Iceberg or similar (e.g., Spark, Flink, Trino, Athena, Glue, EMR) and streaming ETL pipelines
Hands-on experience with additional managed streaming services (e.g., Amazon Kinesis, Azure Event Hubs, GCP Pub/Sub) in multi-cloud or hybrid environments
AWS certifications such as AWS Certified Solutions Architect - Professional, DevOps Engineer - Professional, or specialty certifications in Security or Advanced Networking
Prior experience in SRE, platform engineering, or reliability-focused roles with strong emphasis on SLOs, error budgets, and incident management
Company
The Value Maximizer
At The Value Maximizer, we empower businesses to unlock their full potential through cutting-edge AI-based platforms.
Funding
Current Stage
Early StageCompany data provided by crunchbase