Apply on Employer Site

McKesson · 14 hours ago

Site Reliability Engineer

Columbus, OH

Full-time

Hybrid

Entry, Mid Level

$84K/yr - $140K/yr

2+ years exp

McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. They are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of critical healthcare technology systems through automation, monitoring, and proactive problem-solving.

BiopharmaBiotechnologyHealth CareInformation TechnologyPharmaceutical

No H1B

Responsibilities

Design, implement, and maintain robust and scalable infrastructure and applications to ensure high availability, performance, and disaster recovery capabilities

Develop and implement automation scripts, tools, and processes to streamline operational tasks, reduce manual effort, and improve efficiency across the software development lifecycle

Establish and maintain comprehensive monitoring, alerting, and logging systems to proactively identify and diagnose issues, understand system behavior, and track key performance indicators

Participate in on-call rotations, respond to and resolve critical incidents, and conduct thorough post-mortems to identify root causes and implement preventative measures

Collaborate with development teams to analyze system capacity, forecast future needs, and optimize resource utilization to support business growth

Work closely with software engineers, product managers, and other SREs to promote a culture of reliability, share best practices, and contribute to continuous improvement

Create and maintain clear and concise documentation for systems, processes, and incident runbooks

Contribute to the implementation and enforcement of security best practices within our infrastructure and applications

Qualification

Cloud PlatformsContainerization & OrchestrationMonitoring & Alerting ToolsProgramming SkillsCI/CDOperating SystemsNetworkingProblem-SolvingCommunication

Required

Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)

2+ years of experience in a Site Reliability Engineering, DevOps, or highly related software engineering role

Strong proficiency in at least one scripting language (e.g., Python, Go, Ruby, Bash) for automation and tool development

Hands-on experience with cloud computing platforms (e.g., AWS, Azure, GCP). AWS experience is highly preferred

Experience with container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes)

Familiarity with Continuous Integration and Continuous Delivery (CI/CD) pipelines and tools

Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana, Splunk)

Strong understanding of Linux/Unix operating systems

Fundamental understanding of networking concepts (TCP/IP, DNS, HTTP, Load Balancing)

Excellent analytical and problem-solving skills with a proactive approach to identifying and resolving complex technical issues

Strong verbal and written communication skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences