Principal Software Developer - AI/ML jobs in United States
cer-icon
Apply on Employer Site
company-logo

NetSuite · 3 months ago

Principal Software Developer - AI/ML

NetSuite is a part of Oracle, a world leader in cloud solutions, seeking a Principal Software Developer with expertise in AI/ML system design. The role involves designing and delivering AI-powered systems for predictive incident detection, automated remediation, and root-cause analysis across Oracle’s global cloud network.

Cloud ComputingComputerCRMiOSSaaSSoftware

Responsibilities

Design and build distributed AI/ML services that enable anomaly detection, event correlation, RCA prediction, and operational insights across OCI infrastructure
Develop, train, and deploy models for classification, clustering, forecasting, and LLM-based reasoning. Own the full lifecycle — from feature engineering and training to evaluation, deployment, and continuous improvement
Build data ingestion and processing pipelines leveraging Kafka, Spark, Flink, or OCI Data Flow to handle petabyte-scale telemetry and operational data
Embed model outputs into observability, automation, and workflow systems to enable closed-loop, self-healing operations
Apply LLMs, retrieval-augmented generation (RAG), and knowledge graph techniques to enhance incident triage, RCA automation, and intelligent alert summarization
Ensure AI services are fault-tolerant, performant, and secure. Implement model-monitoring and feedback loops for drift detection and accuracy improvement
Partner with Data Science, NRE, GNOC, and Platform teams to translate operational challenges into scalable AI solutions
Mentor engineers and drive best practices in applied AI, MLOps, and distributed system design

Qualification

AI/ML system designPythonML frameworksData engineering toolsMLOps principlesDistributed systemsContainerized architecturesGoJavaKubernetesDockerEnglish

Required

8+ years of software development experience, with 3+ years focused on applied AI/ML systems
Proficiency in Python, Go, or Java, and deep understanding of ML frameworks such as PyTorch, TensorFlow, or Scikit-Learn
Strong knowledge of data engineering tools and architectures (Kafka, Spark, Flink, Airflow, or similar)
Proven experience deploying and operating ML models in production using MLflow, Kubeflow, or OCI Data Science
Understanding of MLOps principles — versioning, retraining pipelines, drift detection, and model observability
Background in distributed systems or cloud infrastructure (compute, networking, storage, or observability)
Hands-on experience with containerized and microservice architectures (Kubernetes, Docker)
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field

Preferred

Experience applying LLMs or generative AI for operational intelligence (incident summarization, recommendation systems, or RCA reasoning)
Familiarity with AI-Ops / Observability tools and telemetry pipelines in large-scale environments
Knowledge of hyperscale networking, HPC, or GPU infrastructure
Expertise in designing data feedback systems that improve AI model performance through continuous learning
Demonstrated ability to influence technical direction across teams and lead complex cross-functional projects

Benefits

Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance

Company

NetSuite

company-logo
NetSuite is cloud computing company dedicated to delivering business applications over the internet.

Funding

Current Stage
Public Company
Total Funding
$157.79M
Key Investors
Meritech Capital PartnersTako VenturesStarVest Partners
2016-07-28Acquired
2007-12-20IPO
2007-02-05Secondary Market· $17.87M

Leadership Team

leader-logo
Brian Chess
SVP Technology and AI
linkedin
E
Eli Johnson
Vice President, Global Sales Productivity
linkedin
Company data provided by crunchbase