Oracle ยท 4 days ago
Principal Site Reliability Developer
Maximize your interview chances
Data GovernanceData Management
No H1BU.S. Citizen Only
Insider Connection @Oracle
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas.
Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance.
Authority for end-to-end performance and operability.
Partner with development teams in defining and implementing improvements in service architecture.
Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.
Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack.
Demonstrate clear understanding of automation and orchestration principles.
Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
Understand and explain the affect of product architecture decisions on distributed systems.
Professional curiosity and a desire to a develop deep understanding of services and technologies.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
US Citizen
In-depth expertise in Linux
Ability to resolve issues with applications running on the linux operating system (SELinux even better)
Ability to resolve complex application data flow scenarios and issues as they arise
Ability to diagnose and resolve complex system and cloud infrastructure issues.
Familiarity with advanced monitoring tools (Nagios, Prometheus, CloudWatch).
Comprehensive understanding of networking protocols (TCP/IP, DNS, DHCP).
Experience with advanced network configurations, load balancers, proxies, and firewall rules.
Extensive knowledge of web application infrastructure and cloud automation tools
Experience with high availability configurations and data replication.
Comprehensive knowledge of containerization tools such as Docker and Kubernetes
Willing to excel in a fast-paced, challenging environment
Be able to implement solutions when given rather generic requirements from the US Government
Strong expertise in security principles, best practices, and compliance requirements
Proficiency in cloud architecture, design patterns, and Infrastructure as Code (IaC) tools like Terraform
Mastery of scripting languages (e.g., Python, Bash, PowerShell).
Extensive experience with automation tools like: Puppet or equivalent, Ansible / Chef / Shepherd, Powershell (for Windows hosts), sh / bash scripting for linux hosts, Jenkins pipelines in Groovy or equivalent
Ability to design, implement, and manage robust CI/CD pipelines
Source code management tools and concepts
Strong ability to work collaboratively with cross-functional teams and lead technical initiatives
Ability to align technical solutions with business goals and objectives
A BS or MS in Computer Science, or equivalent.
A minimum of 12+ years experience of running large scale customer facing web services.
Preferred
US FedRAMP security technical implementation guide (STIGs) as a requirement
Nist 800-53 Control Families
SELinux operating system
System monitoring and alerting (Nagios, Grafana)
Database as a Service (ATP)
Infrastructure as a Service (IaaS)
Infrastructure as code (Terraform, Chef, Ansible)
Benefits
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
Company
Oracle
Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
Sequoia Capital
1986-03-12IPO
1983-01-01Series Unknown
Recent News
2024-12-19
2024-12-11
Nashville Business Journal
2024-12-11
Company data provided by crunchbase