Oracle · 4 hours ago
Site Reliability Developer 3
Oracle is a world leader in cloud solutions, committed to innovation and inclusivity. They are seeking a skilled Site Reliability Engineer to design, build, operate, and automate services for traditional IT infrastructure, ensuring the reliability, scalability, and efficiency of their systems.
Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware
Responsibilities
Design and manage distributed Unix-based systems, particularly Oracle Linux
Implement auto-scaling and self-healing infrastructure to ensure uptime and durability
Tune system internals, including kernel parameters, networking, and filesystems, for high performance
Maintain timely OS patching and compliance posture across environments
Integrate systems with enterprise identity services such as Active Directory, LDAP, and Kerberos
Design, implement, and manage distributed storage solutions using technologies like GlusterFS
Ensure data reliability and availability through replication strategies and geo-replication
Monitor and optimize storage performance, addressing bottlenecks and ensuring scalability
Collaborate with development teams to understand storage requirements and provide appropriate solutions
Develop and maintain infrastructure automation using Ansible and Terraform
Automate deployment pipelines, service configurations, and patch management
Develop scripts and services in Python and Bash to enhance infrastructure delivery workflows
Extend APIs and platform automation to drive efficiency and repeatability
Develop observability stacks using tools like Prometheus, Grafana, and other open-source telemetry tools
Create dashboards and SLO/SLI-based alerts for real-time monitoring of production systems
Participate in a global 24/7 on-call rotation, leading responses for high-severity incidents
Conduct post-incident analysis (RCA) and drive remediations that improve long-term reliability
Partner with development teams to embed reliability in deployment pipelines
Help define system architecture standards and maintain robust platform documentation
Mentor engineers in Unix performance, observability, and debugging practices
Champion a culture of automation, resilience, and continuous improvement
Qualification
Required
US Government TS/SCI with Polygraph
U.S. Citizenship required for Federal Government customer
Bachelor's or Master's degree in Computer Science or related engineering field
5+ years of experience in software development/IT operations
5+ years in SRE, Infrastructure, or Systems Engineering roles managing production services
Deep expertise with Unix/Linux systems, particularly Oracle Linux
Experience in kernel tuning, performance profiling, and debugging complex system issues
Proficiency in Python and Bash scripting
Strong grasp of Infrastructure as Code tools like Ansible and Terraform
Experience running hybrid infrastructure (on-premises) with VMware, containers, and Kubernetes
Hands-on experience with monitoring, telemetry, and observability stacks
Expertise in distributed storage systems, particularly GlusterFS
Familiarity with storage protocols like NFS, SMB, iSCSI, or NVMe-oF
Excellent problem-solving skills; ability to multi-task and prioritize
Ability to work independently; works well under pressure
Strong communication and collaboration skills with the ability to engage and influence
Self-motivated, able, and willing to help where help is needed
Able to build and establish relationships, be culturally sensitive, have goal alignment, and learning agility
High-reaching to work with geographically distributed teams
Preferred
Experience with virtualization and container technologies (e.g., Docker, Kubernetes)
Experience with continuous integration platforms such as Jenkins
Experience with monitoring and alerting technologies (e.g., Prometheus, Grafana)
Experience with PostgreSQL; understanding of replication, failover, backups
Familiarity with other distributed storage systems like Ceph or MinIO
Benefits
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
Company
Oracle
Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.
Funding
Current Stage
Public CompanyTotal Funding
$25.75BKey Investors
Sequoia Capital
2025-09-24Post Ipo Debt· $18B
2025-02-03Post Ipo Debt· $7.75B
1986-03-12IPO
Leadership Team
Recent News
Analytics Insight: Latest AI, Crypto, Tech News & Analysis
2026-01-23
Social Media Today
2026-01-23
2026-01-23
Company data provided by crunchbase