Apply on Employer Site

Oracle · 2 days ago

Data Engineer - Dallas, TX

4835 Lyndon B Johnson fwy, Suite 540 Dallas, TX 75244, Dallas, Texas, US

Full-time

Onsite

Mid Level

$38K/yr - $133K/yr

4+ years exp

Oracle is seeking a Data Engineer to build and scale the data infrastructure powering their Agentic AI products. The role involves designing data pipelines and architectures to enable autonomous agents to access and reason over large datasets.

Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware

H1B Sponsor Likely

Responsibilities

Design and implement scalable ETL/ELT pipelines that process both structured (SQL, logs) and unstructured (PDFs, emails, docs) data specifically for LLM consumption

Architect and optimize Vector Databases (e.g., Pinecone, Weaviate, Milvus, or Qdrant) to ensure high-speed, relevant similarity searches for agentic retrieval

Collaborate with AI Engineers to optimize data chunking strategies and embedding models to improve the 'recall' and 'precision' of the agent's knowledge retrieval

Develop automated 'Data Cleaning' workflows to remove noise, PII (Personally Identifiable Information), and toxicity from training/context datasets

Enrich raw data with advanced metadata tagging to help agents filter and prioritize information during multi-step reasoning tasks

Build low-latency data streams (using Kafka or Flink) to provide agents with 'fresh' data, enabling them to act on real-time market or operational changes

Construct 'Gold Datasets' and versioned data snapshots to help the team benchmark agent performance over time

Qualification

Data EngineeringPythonVector DatabaseData ToolingCloud InfrastructureSearch KnowledgeData-Centric AI

Required

4+ years in Data Engineering, with at least 1 year focusing on data for LLMs or AI/ML applications

Deep expertise in Python (Pandas, Pydantic, FastAPI) for data manipulation and API integration

Strong experience with modern data stack tools (e.g., dbt, Airflow, Dagster, Snowflake, or Databricks)

Hands-on experience with at least one major Vector Database and knowledge of similarity search algorithms (HNSW, Cosine Similarity)

Familiarity with hybrid search techniques (combining semantic search with traditional keyword search like Elasticsearch/BM25)

Proficiency in managing data workloads on AWS, Azure, or GCP

Preferred

Experience with LlamaIndex or LangChain for data ingestion

Knowledge of Graph Databases (e.g., Neo4j) to help agents understand complex relationships between data points

Familiarity with 'Data-Centric AI' principles—prioritizing data quality over model size

Benefits

Medical, vision, and dental benefits

401k retirement plan

Variable pay/incentives

Paid time off

Paid holidays

Company

Oracle

Glassdoor3.8

Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.

Founded in 1977

Austin, Texas, USA

10001+ employees

https://www.oracle.com/

H1B Sponsorship

Oracle has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1271)

2024 (846)

2023 (995)

2022 (1192)

2021 (985)

2020 (755)

Funding

Current Stage

Public Company

Total Funding

$25.75B

Key Investors

Sequoia Capital

2025-09-24Post Ipo Debt· $18B

2025-02-03Post Ipo Debt· $7.75B

1986-03-12IPO

Leadership Team

Esteban Rubens

Healthcare Field CTO

Gerard Warrens

Field CTO, Business Strategy and Transformative Technologies

Recent News

Oracle highlights IntellIA as a leading Uruguayan Health AI startup

2026-01-10

Investing.com

Less than zero: Paramount reaffirms Warner Bros offer, dumps on cable spinoff

2026-01-09

Business Insider

Why Paramount is now saying the TV networks it wants to buy from WBD are worth $0.00 per share

2026-01-09

Company data provided by crunchbase