Bayside Solutions · 2 days ago
Infrastructure Data Engineer (Kubernetes-Golang)
Bayside Solutions is looking for a highly skilled Senior Kubernetes & Data Infrastructure Engineer with deep expertise in distributed systems and cloud platforms. The role involves operational support for Kubernetes clusters, development of large-scale data systems, and collaboration with engineering teams to enhance platform resilience and efficiency.
Information TechnologyStaffing AgencyTelecommunicationsVirtual Reality
Responsibilities
Own daily operational support for on-premise Kubernetes clusters, ensuring reliability, availability, scalability, and performance
Develop and maintain a large-scale batch orchestration platform running 100,000+ jobs per day
Build migration tooling to facilitate the transfer of job configurations and workloads to new data platforms
Design and implement distributed data systems for high availability, resilience, and performance
Write efficient, high-performance Golang code for automation, tooling, and platform services
Create operational scripts and tooling (Bash, Python) for automation, observability, and infrastructure workflows
Implement and improve CI/CD pipelines and DevOps standards across Kubernetes environments
Set up, manage, and optimize Prometheus, Grafana, and monitoring/alerting pipelines for full-stack observability
Troubleshoot distributed systems and Kubernetes workloads in production environments
Participate in system design interviews, Kubernetes design reviews, and cross-team technical discussions
Collaborate with platform, SRE, and data engineering teams to enhance platform resilience and operational efficiency
Qualification
Required
Experience with hybrid cloud + on-prem Kubernetes architectures
Familiarity with service mesh (Istio, Linkerd) and advanced Kubernetes networking
Exposure to data engineering workflows or batch processing frameworks
Experience with GitOps tooling (ArgoCD, Helm, Kustomize)
Knowledge of infrastructure security, RBAC, certificates, and cluster hardening
Previous experience supporting critical production systems at a very large scale
Strong analytical, debugging, and distributed system troubleshooting skills
Excellent communication skills and ability to work across engineering teams
High ownership mindset with a focus on reliability, sustainability, and operational excellence
Ability to work in a fast-paced, high-impact environment
Preferred
Kubernetes (on-prem and hybrid cloud)
Distributed Systems Engineering
Data Infrastructure Engineering
Large-Scale Batch Processing
Kubernetes Operations and Reliability
Golang Software Development
API and Platform Services Development
Infrastructure Automation
Migration Tooling Development
CI/CD Pipelines
DevOps Practices
GitOps (ArgoCD, Helm, Kustomize)
Kubernetes Networking
Service Mesh (Istio, Linkerd)
Observability and Monitoring
Prometheus
Grafana
Alerting Pipelines
Production Troubleshooting
High Availability and Resilience Design
Performance Optimization
Bash Scripting
Python Scripting
Infrastructure Security
RBAC and Certificates
Cluster Hardening
Kubernetes Networking and Security
SRE Collaboration
Cross-Team Technical Collaboration
System Design and Architecture Reviews
Cloud Platforms
Hybrid Infrastructure
Operational Excellence