BNY · 3 months ago
Vice President, Site Reliability Engineer
BNY is a leading global financial services company at the heart of the global financial system. The role of Vice President, Site Reliability Engineer involves driving reliability and performance, automating infrastructure and operations, and collaborating with cross-functional teams to build resilient services.
Financial Services
Responsibilities
Drive reliability and performance by defining SLOs/SLIs, improving observability, and proactively identifying and addressing system bottlenecks across cloud environments
Automate infrastructure and operations using Terraform, Kubernetes, and CI/CD tools to eliminate toil and enable scalable, fault-tolerant deployments
Collaborate cross-functionally with product, infrastructure, and DevOps teams to reduce incidents, build resilient services, and ensure architectural clarity
Lead incident management by participating in on-call rotations, conducting postmortems, and implementing automated recovery to minimize downtime
Build and maintain monitoring systems with tools like Prometheus, Grafana, AppDynamics, and Splunk to support real-time alerting and root cause analysis
Develop platform tooling and pipelines for container orchestration, third-party integrations, and cloud-native operations to improve system efficiency and reliability
Maintain and improve live services by measuring and monitoring latency and overall system health, working closely with tech support and operations teams
Leverage and define KPIs to understand service performance and identify corrective actions
Create, manage, and use dashboards for continuous monitoring and health checks of applications and underlying infrastructure
Design and implement solutions to customer friction points and improve the entire lifecycle of services from inception through sustainment
Assist in creating and maintaining automation to improve reliability and velocity in addressing issues during regular maintenance tasks
Mentor engineers and champion SRE best practices, embedding a reliability-first culture and ensuring technical excellence across engineering teams
Qualification
Required
Bachelor's degree in computer science or a related discipline, or equivalent work experience required
5-8 years of related experience; experience in the securities or financial services industry is a plus
Strong expertise in cloud infrastructure (Azure, AWS, or GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Helm)
Proficiency in observability and monitoring tools such as Prometheus, Grafana, AppDynamics, Datadog, Splunk, and experience with incident response and on-call support
Solid programming and scripting skills in languages like Python, Go, or Java, with a focus on automation, tooling, and system integration
Deep understanding of SRE principles, including SLAs, SLOs, error budgets, postmortems, and reliability-focused system design
Familiarity with automated testing, DevSecOps practices, CI/CD methods, performance engineering, and security controls
Strong collaboration and communication skills, with experience working in Agile environments and partnering with cross-functional engineering, product, and operations teams
Previous success in technical engineering and coding experience beyond simple scripts
Preferred
Advanced degree preferred
Benefits
Generous paid leaves
Paid volunteer time
Company
BNY
We help make money work for the world — managing it, moving it and keeping it safe.
Funding
Current Stage
Late StageLeadership Team
Recent News
PR Newswire
2024-11-01
Company data provided by crunchbase