Principal Site Reliability Engineer @ Lumen Technologies | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Principal Site Reliability Engineer jobs in United States
Be an early applicantLess than 25 applicants
company-logo

Lumen Technologies · 10 hours ago

Principal Site Reliability Engineer

ftfMaximize your interview chances
Big DataInformation Services
check
Actively Hiring

Insider Connection @Lumen Technologies

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Design and manage Kubernetes clusters (AWS EKS) with a focus on networking, scalability, security, and reliability.
Troubleshoot complex, cross-system issues involving Kubernetes, databases, networking, and cloud infrastructure.
Implement and maintain guardrails to ensure consistent and secure operation of Kubernetes workloads.
Architect, build, and maintain highly available, fault-tolerant systems using AWS services.
Use Terraform to define infrastructure as code, enabling scalable, repeatable, and secure deployments.
Automate provisioning, configuration, and updates for cloud infrastructure with a focus on GitOps principles using ArgoCD and GitHub Actions.
Set up and enforce guardrails for databases, infrastructure, and applications, ensuring consistency and adherence to best practices.
Implement robust application and infrastructure monitoring using tools like Prometheus, Grafana, and potentially Datadog.
Ensure proactive alerting and predictive monitoring to detect issues before they impact users.
Design and implement deployment strategies like blue-green deployments, canary releases, and feature-flag-based rollouts.
Develop and maintain CI/CD pipelines to streamline application delivery, testing, and deployment.
Partner with development teams to embed reliability and security best practices into the application lifecycle.
Drive a culture of operational excellence, ensuring teams build for reliability, scalability, and security from the ground up.
Conduct post-incident reviews to identify root causes and prevent future incidents.
Implement practices like chaos engineering to test and enhance system resilience.
Design and manage secure networking solutions, including AWS VPCs, Kubernetes networking, and firewalls.
Ensure compliance with security best practices and industry standards.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

KubernetesAWS EKSTerraformArgoCDGitHub ActionsPrometheusGrafanaPythonGoBashAWS servicesDatabase systemsCloud cost optimizationCI/CD pipelinesChaos engineeringHashiCorp VaultGCPAzureELKLokiGraylog

Required

Deep hands-on experience managing Kubernetes clusters (AWS EKS or similar) with a focus on networking, scaling, and security.
Strong troubleshooting skills across Kubernetes workloads, infrastructure, and networking.
Expertise in Terraform for infrastructure as code.
Proven experience with ArgoCD and GitHub Actions for GitOps workflows and CI/CD pipelines.
Proficiency in Prometheus, Grafana, and incident management workflows.
Experience implementing application-level monitoring and tracing to identify performance bottlenecks.
Demonstrated ability to set up guardrails for databases, Kubernetes clusters, and applications to ensure reliable and secure operations.
Advanced knowledge of AWS services, including EKS, EC2, CloudWatch, Route53, Aurora, and S3.
Familiarity with auto-scaling, load balancing, and cloud cost optimization.
Strong proficiency in Python, Go, or Bash for scripting and automation tasks.
Proven ability to troubleshoot complex, distributed systems across cloud infrastructure, databases, and networking.

Preferred

Experience with other cloud platforms such as GCP or Azure.
Familiarity with logging and observability tools like ELK, Loki, or Graylog.
Exposure to chaos engineering and resilience testing.
Knowledge of HashiCorp Vault, SOPS, and secrets management best practices.
Expertise in database systems, including setup, scaling, and optimization.

Benefits

Health
Life
Voluntary Lifestyle benefits

Company

Lumen Technologies

company-logo
Lumen delivers the most secure platform for applications and data to help businesses, government and communities deliver amazing experiences

Funding

Current Stage
Public Company
Total Funding
$10.4M
2023-05-22Post Ipo Equity
2020-01-31Post Ipo Debt
2018-06-21Post Ipo Equity· $2.4M

Leadership Team

leader-logo
Jeff Storey
President & CEO
linkedin
leader-logo
Kate Johnson
CEO & President
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot