Sr DevOps Engineer (AI Platform) jobs in United States
info-icon
This job has closed.
company-logo

Talent Space, Inc. · 1 week ago

Sr DevOps Engineer (AI Platform)

Talent Space, Inc. is seeking a Sr. DevOps Engineer (AI Platform) for a contract to hire opportunity in Thousand Oaks, CA. The role involves designing and operating scalable infrastructure on AWS, integrating AI services, and optimizing CI/CD pipelines for .NET applications while collaborating with development teams and mentoring junior engineers.

ConsultingInformation ServicesInformation TechnologyRecruitingSoftware Engineering
check
H1B Sponsor Likelynote

Responsibilities

Design, build and operate scalable, highly available infrastructure on AWS
Architect and support integrated Windows and Linux environments across cloud platforms
Create and maintain infrastructure-as-code using AWS CloudFormation/CDK and Terraform/OpenTofu
Develop and manage configuration management for Windows and Linux servers using Chef
Design, implement, and optimize GitLab CI/CD pipelines for .NET applications
Integrate and support AI services, including orchestration with AWS Bedrock, Google Agentspace, and other generative AI Frameworks, enabling secure and efficient platform consumption
Enable AI/ML workflows by building and optimizing infrastructure pipelines for large-scale training, inference, and deployment across AWS and GCP
Automate the AI model lifecycle-including training, deployment, and monitoring-through CI-CD pipelines to ensure reproducibility and seamless developer integration
Partner with AI engineering teams to deliver scalable environments, standardized APIs, and platform-level infrastructure that accelerates AI adoption
Implement observability, security, data privacy, and cost-optimization strategies tailored to AI workloads, including monitoring and dynamic scaling for inference services
Define, implement, and enforce security best practices across infrastructure and deployment processes
Collaborate closely with development teams to understand requirements and provide platform engineering expertise
Troubleshoot and resolve infrastructure and deployment issues across environments
Implement and manage monitoring and logging solutions to ensure visibility and proactive issue detection
Contribute clearly and concisely to platform engineering standards, documentation, and best practices
Stay current with emerging trends in cloud computing, platform engineering, AI infrastructure, and security
Mentor and support junior engineer, fostering technical growth and best practices

Qualification

AWSInfrastructure-as-codeCI/CD pipelinesAI ServicesWindows server administrationLinux server administrationCloud security best practicesMonitoring solutionsScripting skillsContainerization technologiesCommunication skillsProblem-solving skillsCollaboration skills

Required

Bachelor's degree in Computer Science, Engineering, or a related discipline, or equivalent practical experience
5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE) roles
At least 1 year of hands-on experience working with AI Services and large language models (LLMs)
Extensive practical experience designing and operating solutions on Amazon Web Services (AWS)
Strong knowledge of Windows and Linux server administration and their integration with cloud environments
Demonstrated expertise with infrastructure-as-code, particularly AWS CDK and Terraform
Proven ability to design and implement CI/CD pipelines using GitLab CI/CD
Experience deploying, managing, and supporting .NET applications in cloud-based environments
In-depth understanding of cloud security best practices and their application within infrastructure and CI/CD pipelines
Solid grasp of cloud networking fundamentals, including TCP/IP, DNS, load balancing, and firewall concepts
Experience implementing and operating monitoring and logging solutions such as New Relic and Amazon CloudWatch
Strong scripting and automation skills using tools such as PowerShell, Python, Ruby or Bash
Excellent analytical, problem-solving, and troubleshooting capabilities
Strong written and verbal communication skills with a collaborative mindset
Familiarity with containerization and orchestration technologies such as Docker and Kubernetes - a plus

Preferred

Strong proficiency in PowerShell and Python scripting for automation and system management
In-depth experience with AWS EC2 features and services, including AutoScaling and WarmPools
Solid understanding of Windows Server build and image creation processes, using tools such as Chocolaty for package management and Packer for AMI or image generation

Company

Talent Space, Inc.

twittertwittertwitter
company-logo
Your trusted partner in IT staffing with a focus on technology, innovation and diversity.

H1B Sponsorship

Talent Space, Inc. has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2021 (2)
2020 (4)

Funding

Current Stage
Growth Stage
Company data provided by crunchbase