Senior Principal Site Reliability Engineer | Oracle Health Federal Operations Team jobs in United States
cer-icon
Apply on Employer Site
company-logo

NetSuite · 1 month ago

Senior Principal Site Reliability Engineer | Oracle Health Federal Operations Team

NetSuite is a technology leader that’s changing how the world does business. They are seeking a Senior Principal Site Reliability Engineer to define and deploy key services, focusing on architecture, production operations, and automation to enhance the reliability and performance of Oracle Health's platform.

Cloud ComputingComputerCRMiOSSaaSSoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Own the full service lifecycle: design, implementation, deployment, on-call, and continuous improvement—maintaining high code and reliability standards
Define and meet service-level objectives (availability, latency, durability) while reducing toil through automation, observability, and self-healing mechanisms
Lead architecture, analysis, design, implementation, and production operations for Core System Framework solutions, with strong documentation and runbooks
Create and maintain clear, version-controlled documentation—architectural diagrams, SOPs, runbooks, and incident playbooks—to ensure repeatable operations, auditability, and fast onboarding
Design, write, and deploy software that improves the availability, scalability, and efficiency of platform services
Develop designs, architectures, standards, and methods for large-scale distributed systems
Build automation to prevent problem recurrence; drive real-time monitoring, alerting, and self-healing into production systems
Conduct capacity planning and demand forecasting; perform software performance analysis, system tuning, and optimization
Contribute to and support platform services across architecture, provisioning, configuration, deployment, and ongoing operations
Partner with distributed teams to prototype and launch new platform services
Stay current on emerging technologies and introduce innovations that improve reliability, security, and developer productivity
Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence
Set and raise engineering standards across multiple teams; model best practices in reliability, security, and automation
Collaborate closely with storage, networking, observability, and security teams to deliver platform features and secure-by-default designs
Participate in an on-call rotation; lead incident response, postmortems, and follow-through on corrective actions to drive continuous improvement

Qualification

Site Reliability EngineeringDevOps PracticesDistributed Systems DesignAutomationPerformance ManagementIncident ResponseCapacity PlanningMentoringCollaboration

Required

Applicants are required to read, write, and speak the following languages: English
Does this position require a security clearance?: Yes
Own the full service lifecycle: design, implementation, deployment, on-call, and continuous improvement—maintaining high code and reliability standards
Define and meet service-level objectives (availability, latency, durability) while reducing toil through automation, observability, and self-healing mechanisms
Lead architecture, analysis, design, implementation, and production operations for Core System Framework solutions, with strong documentation and runbooks
Create and maintain clear, version-controlled documentation—architectural diagrams, SOPs, runbooks, and incident playbooks—to ensure repeatable operations, auditability, and fast onboarding
Design, write, and deploy software that improves the availability, scalability, and efficiency of platform services
Develop designs, architectures, standards, and methods for large-scale distributed systems
Build automation to prevent problem recurrence; drive real-time monitoring, alerting, and self-healing into production systems
Conduct capacity planning and demand forecasting; perform software performance analysis, system tuning, and optimization
Contribute to and support platform services across architecture, provisioning, configuration, deployment, and ongoing operations
Partner with distributed teams to prototype and launch new platform services
Stay current on emerging technologies and introduce innovations that improve reliability, security, and developer productivity
Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence
Set and raise engineering standards across multiple teams; model best practices in reliability, security, and automation
Collaborate closely with storage, networking, observability, and security teams to deliver platform features and secure-by-default designs
Participate in an on-call rotation; lead incident response, postmortems, and follow-through on corrective actions to drive continuous improvement

Company

NetSuite

company-logo
NetSuite is cloud computing company dedicated to delivering business applications over the internet.

Funding

Current Stage
Public Company
Total Funding
$157.79M
Key Investors
Meritech Capital PartnersTako VenturesStarVest Partners
2016-07-28Acquired
2007-12-20IPO
2007-02-05Secondary Market· $17.87M

Leadership Team

leader-logo
Brian Chess
SVP Technology and AI
linkedin
E
Eli Johnson
Vice President, Global Sales Productivity
linkedin
Company data provided by crunchbase