Oracle · 1 hour ago
Principal Site Reliability Engineer
Oracle Cloud is a comprehensive suite of cloud services designed to help organizations build, deploy, and manage workloads securely at scale. The Principal Site Reliability Engineer will focus on the availability, performance, and operational excellence of Fusion SRE Middleware, emphasizing automation and optimization of operations across multiple production environments.
Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware
Responsibilities
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas
Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services
Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance
Authority for end-to-end performance and operability
Partner with development teams in defining and implementing improvements in service architecture
Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio
Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack
Demonstrate clear understanding of automation and orchestration principles
Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs)
Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations
Understand and explain the affect of product architecture decisions on distributed systems
Professional curiosity and a desire to a develop deep understanding of services and technologies
Automation: Develop and optimize operations through AI-powered automation
Apply machine learning and orchestration principles to every possible opportunity, reducing manual intervention and technical debt
Enhance operational outcomes with scalable, AI-driven automation solutions that anticipate issues and optimize system performance proactively
Middleware Technology Expert: Lead L3 WebLogic Administration, managing server lifecycle, configuring and deploying applications, and monitoring server and application resources
Leverage AI-driven monitoring tools to proactively detect and resolve issues across application and infrastructure layers, ensuring efficient and automated troubleshooting
Service Ownership: Act as a Service Owner for Fusion Apps customers, sharing full-stack ownership of critical services in partnership with Service Development and Operations
Utilize AI-based analytics to predict potential service disruptions and optimize service delivery to improve customer satisfaction and minimize downtime
Technical Expertise: Provide deep technical guidance and serve as the ultimate escalation point for complex issues not documented in SOPs
Participate in major incident management as a subject matter expert, leveraging your understanding of service topologies, AI-driven insights, and dependencies to troubleshoot and resolve issues quickly and effectively
Ownership Scope: Understand end-to-end configuration, dependencies, and behavioral characteristics of production services
Use AI-powered telemetry and monitoring systems to ensure mission-critical delivery with a focus on system health, security, resiliency, scale, and performance
Service Requirements: Provide strategic direction and prioritization to Product Management and Service Development teams, guiding the addition of AI-enhanced capabilities to Oracle SaaS/ERP services
Act as an escalation point for undocumented or critical issues, leveraging AI tools to aid in faster resolution and proactive service improvements
Qualification
Required
Bachelor's degree in Computer Science or a related field, or equivalent experience
Overall 8+ years of experience in IT industry
6+ years of experience in Site Reliability Engineering (SRE) or DevOps, or Systems Engineering
6+ years of hands-on automation experience using Python or Unix Shell Scripting
Excellent proficiency in Oracle Database, SQL, and PL/SQL & performance tuning
Hands-on expertise with Oracle WebLogic Server
Strong background in WebLogic performance tuning, monitoring
Proven expertise in designing and implementing solutions for telemetry, monitoring, scalability, performance, and reliability at both platform and application layers
Correlate WebLogic/JVM metrics (heap, GC, threads, connection pools) with oracle database performance indicators
Perform JVM Heap sizing, Garbage Collection tuning, and thread analysis
Analyze database, middleware, and application metrics to resolve performance bottlenecks
Administration experience with web servers such as OHS (Oracle HTTP Server) or Apache
Deep understanding of performance concepts (response time, throughput, resource utilization)
Perform capacity planning and scalability analysis based on workload growth and usage patterns
Preferred
Experience with Fusion Apps functional flows
Java programming experience and understanding structured SQL statements
Knowledge of Oracle Business Intelligence Enterprise Edition (OBIEE) and Oracle Service-Oriented Architecture (SOA)
Benefits
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
Company
Oracle
Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.
H1B Sponsorship
Oracle has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1271)
2024 (846)
2023 (995)
2022 (1192)
2021 (985)
2020 (755)
Funding
Current Stage
Public CompanyTotal Funding
$25.75BKey Investors
Sequoia Capital
2025-09-24Post Ipo Debt· $18B
2025-02-03Post Ipo Debt· $7.75B
1986-03-12IPO
Leadership Team
Recent News
Analytics Insight: Latest AI, Crypto, Tech News & Analysis
2026-01-23
Social Media Today
2026-01-23
2026-01-23
Company data provided by crunchbase