Oracle · 3 days ago
Principal Software Developer - AI/ML
Oracle is a world leader in cloud solutions, and they are seeking a Principal Software Developer with deep expertise in AI/ML system design to drive reliability and automation across Oracle’s global cloud network. The role involves designing and delivering AI-powered systems for predictive incident detection and automated remediation, while mentoring engineers and driving best practices in applied AI and MLOps.
Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware
Responsibilities
Design and build distributed AI/ML services that enable anomaly detection, event correlation, RCA prediction, and operational insights across OCI infrastructure
Develop, train, and deploy models for classification, clustering, forecasting, and LLM-based reasoning. Own the full lifecycle — from feature engineering and training to evaluation, deployment, and continuous improvement
Build data ingestion and processing pipelines leveraging Kafka, Spark, Flink, or OCI Data Flow to handle petabyte-scale telemetry and operational data
Embed model outputs into observability, automation, and workflow systems to enable closed-loop, self-healing operations
Apply LLMs, retrieval-augmented generation (RAG), and knowledge graph techniques to enhance incident triage, RCA automation, and intelligent alert summarization
Ensure AI services are fault-tolerant, performant, and secure. Implement model-monitoring and feedback loops for drift detection and accuracy improvement
Partner with Data Science, NRE, GNOC, and Platform teams to translate operational challenges into scalable AI solutions. Mentor engineers and drive best practices in applied AI, MLOps, and distributed system design
Qualification
Required
8+ years of software development experience, with 3+ years focused on applied AI/ML systems
Proficiency in Python, Go, or Java, and deep understanding of ML frameworks such as PyTorch, TensorFlow, or Scikit-Learn
Strong knowledge of data engineering tools and architectures (Kafka, Spark, Flink, Airflow, or similar)
Proven experience deploying and operating ML models in production using MLflow, Kubeflow, or OCI Data Science
Understanding of MLOps principles — versioning, retraining pipelines, drift detection, and model observability
Background in distributed systems or cloud infrastructure (compute, networking, storage, or observability)
Hands-on experience with containerized and microservice architectures (Kubernetes, Docker)
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field
Preferred
Experience applying LLMs or generative AI for operational intelligence (incident summarization, recommendation systems, or RCA reasoning)
Familiarity with AI-Ops / Observability tools and telemetry pipelines in large-scale environments
Knowledge of hyperscale networking, HPC, or GPU infrastructure
Expertise in designing data feedback systems that improve AI model performance through continuous learning
Demonstrated ability to influence technical direction across teams and lead complex cross-functional projects
Benefits
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
Company
Oracle
Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.
H1B Sponsorship
Oracle has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1271)
2024 (846)
2023 (995)
2022 (1192)
2021 (985)
2020 (755)
Funding
Current Stage
Public CompanyTotal Funding
$25.75BKey Investors
Sequoia Capital
2025-09-24Post Ipo Debt· $18B
2025-02-03Post Ipo Debt· $7.75B
1986-03-12IPO
Leadership Team
Recent News
The Motley Fool
2026-01-11
Hindu Business Line
2026-01-11
Company data provided by crunchbase