ZebraEdge · 22 hours ago
Cloud Platform Engineer
ZebraEdge, Inc. is seeking a Cloud Platform Engineer to work on the SNAP Payment Error Rate (CAP) Reduction project. The role involves monitoring database performance, developing AI/ML solutions on AWS, and automating operational tasks to ensure efficient data processing and analytics workflows.
AnalyticsConsultingManagement ConsultingMarketingProduct ManagementProject ManagementQuality Assurance
Responsibilities
Monitor database and system performance using CloudWatch metrics, alarms, and logs; troubleshoot proactively
Develop, deploy, and optimize AI/ML solutions using AWS AI services including SageMaker and Bedrock, supporting model training, inference, and integration into production systems
Automate operational tasks using AWS Lambda, Systems Manager (SSM), and Infrastructure-as-Code tools such as CloudFormation or Terraform
Design, build, and maintain scalable, fault-tolerant data processing and analytics workflows on AWS using services such as API Gateway, S3, EC2, RDS, Lambda, Glue, Athena, DynamoDB, EMR, Kinesis, DataSync
Design and integrate agentic AI systems, including LLM-based agents, multi-agent workflows, and autonomous orchestration pipelines using frameworks such as LangChain and LangGraph
Implement ETL/ELT pipelines and data architectures that support machine learning, analytics, and intelligent agent-based applications
Support CI/CD pipelines for AI models and data workflows using Jenkins and container-based platforms such as ECS, EKS, or Kubernetes
Apply security best practices across AI and data platforms, including IAM least-privilege access, encryption, audit logging, and compliance controls
Maintain technical documentation for AI architectures, data pipelines, infrastructure configurations, and operational runbooks
Qualification
Required
Minimum 7 years of hands-on AWS experience: EC2, RDS, S3, CloudWatch, CloudTrail, IAM, KMS, AWS Backup, and Lambda
Minimum 7 years of experience in Linux/Unix administration and automation scripting (Bash, Shell, Python)
Minimum 7 years of experience with Infrastructure as Code (IaC) and automation tools, including CloudFormation, Terraform, and Ansible, for provisioning and maintaining
Minimum 7 years of knowledge in AWS networking: VPC, subnets, NACLs, security groups, Route 53, and multi-AZ architectures
Minimum 5 years of experience CI/CD pipelines, Jenkins, and IaC for deploying AI agents and ML models into production, monitoring autonomous workflows, and supporting MLOps using Kubernetes, ECS, or EKS
Minimum 4 years of experience architecting, building, and maintaining scalable data processing workflows using AWS managed services and Python (including PySpark); strong understanding of data architecture and ETL/ELT patterns
Minimum 4 years of experience working with AWS AI/ML services such as SageMaker, Bedrock, and vector databases (OpenSearch)
Strong understanding of machine learning algorithms, NLP concepts, and deep learning frameworks such as TensorFlow, PyTorch, or Hugging Face
Preferred
AWS: EC2, RDS, S3, CloudWatch, CloudTrail, IAM, KMS: 7 years
AWS Backup, and Lambda.: 7 years
Linux/Unix administration and automation scripting : 7 years
Infrastructure as Code (IaC) and automation tools: 7 years
AWS networking: VPC, subnets, NACLs, security groups: 7 years
CI/CD pipelines, Jenkins, and IaC for deploying AI agents : 7 years
ML models, and supporting MLOps using Kubernetes, ECS: 7 years
AWS managed services and Python (including PySpark): 4 years
understanding of data architecture and ETL/ELT patterns.: 4 years
AWS AI/ML services such as SageMaker, Bedrock, : 4 years
machine learning algorithms, NLP concepts: 4 years
frameworks such as TensorFlow, PyTorch, or Hugging Face.: 4 years
Company
ZebraEdge
ZebraEdge offers product management, quality assurance, marketing analysis, management consulting, and agile transformation services.