Lead, IT Service Operations jobs in United States
cer-icon
Apply on Employer Site
company-logo

S&P Global · 10 hours ago

Lead, IT Service Operations

S&P Global is a leading provider of essential intelligence and data solutions. They are seeking a Lead, IT Service Operations professional to oversee technical application support and cloud infrastructure management, ensuring high-quality service delivery and continuous improvement in operational processes.

AnalyticsBusiness IntelligenceCreditEnterprise SoftwareFinanceFinancial ServicesInformation ServicesMarket Research
check
Culture & Values
badNo H1Bnote

Responsibilities

Act as a strategic technology partner to Architecture, Engineering, Business Systems, and Global Service Delivery (L1/L2/L3), ensuring enterprise-grade, resilient, and scalable IT services aligned to business outcomes
Establish and lead a collaborative service excellence culture, driving standardized, repeatable, and cost-efficient operational processes with a strong focus on quality, reliability, and continuous improvement
Own and govern the Major Incident Management lifecycle, from fault detection and triage through resolution, executive communication, post-incident reviews, and sustainable Root Cause remediation
Lead service performance reviews with business and technology stakeholders, identifying systemic improvement opportunities, operational risks, and reliability enhancements
Provide overall accountability for people leadership, including talent strategy, recruitment, onboarding, performance management, career development, and succession planning for Service Management and SRE teams
Define and evolve enterprise-level observability and reliability frameworks, covering metrics, logs, traces, SLIs/SLOs, and error budgets across hybrid and cloud platforms
Own Disaster Recovery, resiliency strategy, and operational readiness, ensuring regular testing, executive assurance, and continuous enhancement of recovery capabilities
Serve as a senior technical leader and mentor, guiding SREs, DevOps, and engineering teams while driving adoption of best practices across reliability engineering and operations
Provide end-to-end ownership of Incident, Problem, Change, and Business Continuity processes, ensuring predictable, high-quality service delivery to internal and external customers
Operate as the primary escalation authority for complex, high-impact production issues, coordinating across engineering, cloud, security, and vendor teams
Partner closely with Product, Architecture, and Delivery teams to ensure operational readiness for releases, embedding reliability, supportability, and resilience early in the design lifecycle
Drive continuous improvement initiatives across monitoring, alerting, reporting, automation, and operational maturity
Embed AI/ML-driven operations (AIOps) to enhance anomaly detection, predictive alerting, intelligent noise reduction, and proactive incident prevention
Influence and support technology governance, risk management, compliance, and audit activities related to service reliability
Ensure 24x7 proactive monitoring and management of business-critical platforms, restoring service rapidly and minimizing customer impact
Define and enforce incident severity models, ensuring accurate impact assessment, prioritization, and stakeholder communication
Maintain end-to-end ownership of incidents, including those requiring third-line engineering or formal change execution
Provide clear, consistent, and executive-level communication during incidents, outages, and service degradation
Oversee application support spanning infrastructure, data remediation, user queries, education, and deep-dive incident investigations
Drive observability across events, alerts, batch jobs, capacity planning, and performance KPIs, translating insights into actionable change
Collaborate with functional and technical teams to ensure future deliverables (functional and non-functional) are operationally viable
Champion knowledge management, ensuring high-quality runbooks, SOPs, and operational documentation in Confluence
Deliver against SLA, OLA, and SLO commitments, with transparent reporting and corrective actions
Leverage AIOps and reliability analytics to identify trends, systemic risks, and optimization opportunities at scale

Qualification

Incident ManagementCloud Infrastructure ManagementService ManagementAIOpsDisaster RecoveryObservability PlatformsDatabase ExpertiseLinuxWindowsContinuous ImprovementTechnical LeadershipCollaborationDecision MakingCommunication

Required

Provide end-to-end ownership of Incident, Problem, Change, and Business Continuity processes, ensuring predictable, high-quality service delivery to internal and external customers
Operate as the primary escalation authority for complex, high-impact production issues, coordinating across engineering, cloud, security, and vendor teams
Partner closely with Product, Architecture, and Delivery teams to ensure operational readiness for releases, embedding reliability, supportability, and resilience early in the design lifecycle
Drive continuous improvement initiatives across monitoring, alerting, reporting, automation, and operational maturity
Embed AI/ML-driven operations (AIOps) to enhance anomaly detection, predictive alerting, intelligent noise reduction, and proactive incident prevention
Influence and support technology governance, risk management, compliance, and audit activities related to service reliability
Ensure 24x7 proactive monitoring and management of business-critical platforms, restoring service rapidly and minimizing customer impact
Define and enforce incident severity models, ensuring accurate impact assessment, prioritization, and stakeholder communication
Maintain end-to-end ownership of incidents, including those requiring third-line engineering or formal change execution
Provide clear, consistent, and executive-level communication during incidents, outages, and service degradation
Oversee application support spanning infrastructure, data remediation, user queries, education, and deep-dive incident investigations
Drive observability across events, alerts, batch jobs, capacity planning, and performance KPIs, translating insights into actionable change
Collaborate with functional and technical teams to ensure future deliverables (functional and non-functional) are operationally viable
Champion knowledge management, ensuring high-quality runbooks, SOPs, and operational documentation in Confluence
Deliver against SLA, OLA, and SLO commitments, with transparent reporting and corrective actions
Leverage AIOps and reliability analytics to identify trends, systemic risks, and optimization opportunities at scale
Bachelor's or Master's degree in Computer Science, Engineering, or related discipline
Ideally 10-12+ years of progressive experience in SRE, DevOps, Platform Engineering, or Technology Operations, including leadership responsibility
Proven experience designing and operating high-availability, disaster-recovery, and incident response capabilities across AWS, Azure, or GCP
Strong understanding of ITIL-aligned Service Management processes and enterprise operational governance
Deep expertise with observability platforms such as Splunk, CloudWatch, Prometheus, Grafana, Datadog, or equivalent
Strong database expertise (Oracle / PostgreSQL), including advanced SQL tuning, performance optimization, and operational troubleshooting
Demonstrated experience leading post-incident reviews and driving preventative engineering outcomes
Excellent decision-making and leadership capabilities under high-pressure, executive-visible incidents
Strong knowledge of Linux and Windows operating systems, automation, and scripting (Python preferred)
Solid understanding of SDLC, Agile methodologies, defect triage, and engineering collaboration models
Prior experience in Financial Services and/or S&P Global technology platforms is highly desirable

Benefits

Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.

Company

S&P Global

company-logo
S&P Global is a market intelligence company that provides financial information and data analytics services.

Funding

Current Stage
Public Company
Total Funding
$1.75B
2025-12-01Post Ipo Debt· $1B
2023-09-07Post Ipo Debt· $750M
2016-04-28IPO

Leadership Team

leader-logo
Martina Cheung
President and CEO
linkedin
leader-logo
Rick Goldberg
Division Chief Financial Officer
linkedin
Company data provided by crunchbase