GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD jobs in United States
cer-icon
Apply on Employer Site
company-logo

UPS · 3 months ago

GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD

UPS is a Fortune Global 500 organization seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage cloud infrastructure for Generative AI applications. The role involves leveraging Google Cloud Platform and containerization technologies to deliver secure, scalable, and high-performance AI solutions while ensuring compliance and optimizing costs.

LogisticsMessagingTransportation
check
H1B Sponsor Likelynote

Responsibilities

Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP
Deploy and manage containerized workloads using Docker and Kubernetes (GKE)
Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models
Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference
Ensure business continuity through backup, disaster recovery, and multi-region deployments
Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager
Adopt GitOps practices (Flux) for infrastructure lifecycle management
Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications
Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime
Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management
Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP)
Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms
Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies)
Define KPIs to monitor system health, performance, and adoption across AI workloads
Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring
Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment
Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure
Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration

Qualification

Google Cloud Platform (GCP)TerraformDockerKubernetes (GKE)PythonBashCI/CD toolsIBM WatsonxGenAI experienceProblem-solving skillsCommunication skills

Required

Bachelor's or master's degree in computer science, Software Engineering, or a related field
5+ years of experience in cloud infrastructure engineering, DevOps, or platform engineering
Experience with GenAI use cases (chatbots, content generation, code assistants, etc.)
Strong hands-on expertise with Google Cloud Platform (GCP), especially Vertex AI
Experience with IBM Watsonx for AI application deployment and management
Proven skills in Docker, Kubernetes (GKE), and container orchestration at scale
Proficiency in Python, Bash, or other relevant scripting languages
Strong understanding of cloud networking, IAM, and security best practices
Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager)
Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka)
Excellent problem-solving, debugging, and communication skills

Preferred

Experience in MLOps practices for model deployment, monitoring, and retraining
Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem)
Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler)
Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed)
Contributions to open-source projects in infrastructure, MLOps, or GenAI
Experience managing infrastructure in regulated industries
Google Cloud Certified - Professional Cloud Architect
Google Cloud Certified - Machine Learning Engineer
Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
IBM Certified Watsonx Generative AI Engineer – Associate
IBM Certified Solution Architect - Cloud Pak for Data
Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies

Company

Operating in more than 200 countries and territories, we’re committed to moving our world forward by delivering what matters.

H1B Sponsorship

UPS has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (1)
2021 (5)
2020 (9)

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
Innovate UK
2025-07-28Grant
1999-11-10IPO

Leadership Team

leader-logo
Brian Dykes
EVP & Chief Financial Officer at UPS
linkedin
leader-logo
Joel Stenson
Senior Vice President Global Operations Technology / Enterprise Data & Analytics and Gen AI
linkedin
Company data provided by crunchbase