Crowe · 5 days ago
AI DevOps and Cloud Infrastructure Engineer
Crowe is a leading public accounting, consulting, and technology firm in the United States, known for its commitment to innovation and client service. The AI DevOps and Cloud Infrastructure Engineer will design and maintain scalable cloud environments for AI and machine learning systems, collaborating with various teams to ensure optimal performance and reliability of AI workloads.
AccountingAdviceConsultingFinanceFinancial ServicesInformation TechnologyProfessional ServicesTax Consulting
Responsibilities
Architecting and maintaining cloud infrastructure for AI model training, inference services, and distributed compute workloads
Implementing infrastructure-as-code (IaC) to automate provisioning, configuration, scaling, and lifecycle management of cloud resources
Designing and operating CI/CD pipelines for automated model training, testing, and deployment of AI-enabled applications
Optimizing Kubernetes clusters, GPU utilization, and compute scaling strategies to balance performance, reliability, and cost
Integrating AI models, inference endpoints, and data pipelines into cloud-native platforms
Developing monitoring, logging, alerting, and observability solutions using modern telemetry and tracing tools
Troubleshooting issues across networking, containers, compute, storage, and model-serving layers
Leading performance benchmarking, load testing, and reliability validation for AI systems
Documenting infrastructure architectures, operational runbooks, and engineering standards
Supporting automation for dataset ingestion, model versioning, artifact management, and ML testing
Ensuring compliance with cloud security, identity management, encryption, and responsible AI guidelines
Partnering with security teams to implement secure networking, IAM policies, and secrets management
Providing technical mentorship, design reviews, and cloud best-practice guidance to junior engineers
Evaluating new cloud services, platform capabilities, and AI infrastructure tooling for adoption
Qualification
Required
4+ years of experience in DevOps, cloud engineering, platform engineering, or infrastructure engineering
Strong proficiency with Kubernetes, Docker, and cloud orchestration platforms
Deep experience with CI/CD systems and deployment automation
Demonstrated ability to debug distributed systems and cloud networking issues
Proficiency in Python, Bash, or other automation/scripting languages
Strong communication skills and ability to collaborate across engineering and security teams
Willingness to travel occasionally for cross-functional planning and collaboration
Preferred
Bachelor's degree in Computer Science, Cloud Engineering, Information Systems, or a related technical field, or equivalent experience
Master's degree in a technical discipline
Experience enabling ML or AI workloads at scale in production environments
Cloud and platform certifications, including Azure (AZ-900, AZ-104, AZ-305, AZ-700, AI-102) or equivalent AWS/GCP certifications
Advanced experience with AWS (e.g., EKS, EC2, IAM, Lambda, SageMaker) and/or Azure (e.g., AKS, VMSS, Azure ML)
Experience with GPU orchestration and scaling strategies for AI workloads
Expertise with Terraform or other infrastructure-as-code frameworks
Hands-on experience with observability stacks such as Prometheus, Grafana, CloudWatch, and OpenTelemetry
Experience deploying and operating generative AI workloads, including LLM inference autoscaling and RAG architectures
Familiarity with vector database hosting (e.g., Pinecone, Weaviate, FAISS) and model-serving frameworks (e.g., Hugging Face TGI, vLLM, custom inference containers)
Experience building CI/CD pipelines for LLM fine-tuning workflows (e.g., LoRA, QLoRA, PEFT) and monitoring generative AI performance metrics such as latency, throughput, and hallucination rates
Benefits
Unlimited PTO
Flexible remote work policy
Comprehensive total rewards package
Company
Crowe
Crowe LLP is a public accounting, consulting, and technology firm.
Funding
Current Stage
Late StageTotal Funding
unknown2023-08-29Acquired
Leadership Team
Recent News
Canada NewsWire
2026-01-02
2025-11-11
2025-10-24
Company data provided by crunchbase