Oracle · 2 days ago
Data Engineer - Dallas, TX
Oracle is seeking a Data Engineer to build and scale the data infrastructure powering their Agentic AI products. The role involves designing data pipelines and architectures to enable autonomous agents to access and reason over large datasets.
Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware
Responsibilities
Design and implement scalable ETL/ELT pipelines that process both structured (SQL, logs) and unstructured (PDFs, emails, docs) data specifically for LLM consumption
Architect and optimize Vector Databases (e.g., Pinecone, Weaviate, Milvus, or Qdrant) to ensure high-speed, relevant similarity searches for agentic retrieval
Collaborate with AI Engineers to optimize data chunking strategies and embedding models to improve the 'recall' and 'precision' of the agent's knowledge retrieval
Develop automated 'Data Cleaning' workflows to remove noise, PII (Personally Identifiable Information), and toxicity from training/context datasets
Enrich raw data with advanced metadata tagging to help agents filter and prioritize information during multi-step reasoning tasks
Build low-latency data streams (using Kafka or Flink) to provide agents with 'fresh' data, enabling them to act on real-time market or operational changes
Construct 'Gold Datasets' and versioned data snapshots to help the team benchmark agent performance over time
Qualification
Required
4+ years in Data Engineering, with at least 1 year focusing on data for LLMs or AI/ML applications
Deep expertise in Python (Pandas, Pydantic, FastAPI) for data manipulation and API integration
Strong experience with modern data stack tools (e.g., dbt, Airflow, Dagster, Snowflake, or Databricks)
Hands-on experience with at least one major Vector Database and knowledge of similarity search algorithms (HNSW, Cosine Similarity)
Familiarity with hybrid search techniques (combining semantic search with traditional keyword search like Elasticsearch/BM25)
Proficiency in managing data workloads on AWS, Azure, or GCP
Preferred
Experience with LlamaIndex or LangChain for data ingestion
Knowledge of Graph Databases (e.g., Neo4j) to help agents understand complex relationships between data points
Familiarity with 'Data-Centric AI' principles—prioritizing data quality over model size
Benefits
Medical, vision, and dental benefits
401k retirement plan
Variable pay/incentives
Paid time off
Paid holidays
Company
Oracle
Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.
H1B Sponsorship
Oracle has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1271)
2024 (846)
2023 (995)
2022 (1192)
2021 (985)
2020 (755)
Funding
Current Stage
Public CompanyTotal Funding
$25.75BKey Investors
Sequoia Capital
2025-09-24Post Ipo Debt· $18B
2025-02-03Post Ipo Debt· $7.75B
1986-03-12IPO
Leadership Team
Recent News
2026-01-10
2026-01-09
Business Insider
2026-01-09
Company data provided by crunchbase