Jobs via Dice ยท 1 day ago
AI-Ops Engineer
Dice is the leading career destination for tech experts at every stage of their careers, and they are seeking an AI-Ops Engineer to evolve traditional DevOps into AI-Ops. This role is responsible for leveraging AI and machine learning to automate IT operations, ensuring high availability and operational efficiency across global online programs.
Computer Software
Responsibilities
Implement AIOps solutions that use ML algorithms to automate performance monitoring, workload scheduling, and infrastructure management
Build anomaly detection systems that identify infrastructure issues before they impact users
Develop automated root cause analysis capabilities using ML to correlate events and filter noise from critical alerts
Create predictive maintenance workflows that analyze historical patterns to proactively mitigate issues
Design and implement automated remediation scripts that respond to incidents without human intervention
Architect comprehensive observability platforms that aggregate data from disparate sources into unified dashboards
Implement intelligent alerting systems using NLP and ML to reduce alert fatigue and surface actionable insights
Build real-time analytics dashboards for coordinated diagnosis across teams
Deploy application performance monitoring (APM) solutions integrated with AI-driven analytics. Ensure end-to-end visibility across cloud infrastructure, applications, and AI/ML workloads
Design, build, and maintain scalable, secure AWS infrastructure using Infrastructure as Code (CloudFormation, Terraform, or CDK)
Implement and manage containerized environments using Docker, AWS ECS, Fargate, and Kubernetes (EKS)
Build CI/CD pipelines for continuous delivery, integrating AI-powered code quality and deployment optimization
Manage cloud automation and optimization to improve cost-efficiency and resource utilization
Ensure compliance with Stanford and regulatory standards (FERPA, GDPR) for secure data handling and governance
Partner with cross-functional teams to implement domain-agnostic AIOps solutions across the organization
Use Git-based version control and code review best practices as part of a collaborative, agile workflow
Document operational procedures, runbooks, and AIOps workflows for team knowledge sharing
Continuously evaluate and adopt emerging AIOps tools, AWS services, and AI-driven automation technologies
Contribute to building an AI-first operational culture that prioritizes automation and predictive capabilities
Qualification
Required
3+ years DevOps/SRE/Cloud Engineering
Python + AWS infrastructure experience
At least one AWS Associate level certification
Bachelor's degree in Computer Science, DevOps, Cloud Engineering, or a related field
3+ years of experience in DevOps, SRE, or Cloud Engineering roles
2+ years of hands-on experience with AWS infrastructure (EC2, ECS, Lambda, S3, IAM, VPC)
Familiarity with ML/AI concepts and their application to operational automation
Languages: Python (required)
AIOps & Monitoring: CloudWatch, X-Ray, Prometheus, Grafana, Datadog, or Splunk with ML capabilities
Infrastructure as Code: AWS CloudFormation, Terraform, or AWS CDK
Containers & Orchestration: Docker, AWS ECS/Fargate, Kubernetes (EKS)
AWS Services: Lambda, EC2, S3, API Gateway, EventBridge, CloudWatch, IAM, VPC, CodePipeline, SageMaker
CI/CD Tools: GitHub Actions, AWS CodePipeline, Jenkins, or GitLab CI
Data & Analytics: Experience with log aggregation, metrics analysis, and event correlation platforms
Excellent problem-solving, debugging, and root cause analysis skills
Demonstrated ability to learn rapidly, adapt to new technologies, and continuously improve
Strong communication skills with ability to collaborate across technical and non-technical teams
Commitment to reliability, security, and operational excellence
Thrives in a fast-paced, evolving environment, proactively seeking opportunities to embed intelligence into systems and processes
Preferred
Master's degree in Computer Science, DevOps, Cloud Engineering, or a related field
AWS certification preferred (Solutions Architect, SysOps Administrator, or DevOps Engineer); Professional-level certification a plus
Bash, Go, or TypeScript
Strong understanding of AIOps principles using AI to enhance, not just support, IT operations
Passion for automation and eliminating manual, repetitive operational tasks
Company
Jobs via Dice
Welcome to Jobs via Dice, the go-to destination for discovering the tech jobs you want.
Funding
Current Stage
Early StageCompany data provided by crunchbase