Data Engineer - Dallas, TX jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oracle · 2 days ago

Data Engineer - Dallas, TX

Oracle is seeking a Data Engineer to build and scale the data infrastructure powering their Agentic AI products. The role involves designing data pipelines and architectures to enable autonomous agents to access and reason over large datasets.

Data GovernanceData ManagementEnterprise SoftwareInformation TechnologySaaSSoftware
check
H1B Sponsor Likelynote

Responsibilities

Design and implement scalable ETL/ELT pipelines that process both structured (SQL, logs) and unstructured (PDFs, emails, docs) data specifically for LLM consumption
Architect and optimize Vector Databases (e.g., Pinecone, Weaviate, Milvus, or Qdrant) to ensure high-speed, relevant similarity searches for agentic retrieval
Collaborate with AI Engineers to optimize data chunking strategies and embedding models to improve the 'recall' and 'precision' of the agent's knowledge retrieval
Develop automated 'Data Cleaning' workflows to remove noise, PII (Personally Identifiable Information), and toxicity from training/context datasets
Enrich raw data with advanced metadata tagging to help agents filter and prioritize information during multi-step reasoning tasks
Build low-latency data streams (using Kafka or Flink) to provide agents with 'fresh' data, enabling them to act on real-time market or operational changes
Construct 'Gold Datasets' and versioned data snapshots to help the team benchmark agent performance over time

Qualification

Data EngineeringPythonVector DatabaseData ToolingCloud InfrastructureSearch KnowledgeData-Centric AI

Required

4+ years in Data Engineering, with at least 1 year focusing on data for LLMs or AI/ML applications
Deep expertise in Python (Pandas, Pydantic, FastAPI) for data manipulation and API integration
Strong experience with modern data stack tools (e.g., dbt, Airflow, Dagster, Snowflake, or Databricks)
Hands-on experience with at least one major Vector Database and knowledge of similarity search algorithms (HNSW, Cosine Similarity)
Familiarity with hybrid search techniques (combining semantic search with traditional keyword search like Elasticsearch/BM25)
Proficiency in managing data workloads on AWS, Azure, or GCP

Preferred

Experience with LlamaIndex or LangChain for data ingestion
Knowledge of Graph Databases (e.g., Neo4j) to help agents understand complex relationships between data points
Familiarity with 'Data-Centric AI' principles—prioritizing data quality over model size

Benefits

Medical, vision, and dental benefits
401k retirement plan
Variable pay/incentives
Paid time off
Paid holidays

Company

Oracle is an integrated cloud application and platform services that sells a range of enterprise information technology solutions.

H1B Sponsorship

Oracle has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1271)
2024 (846)
2023 (995)
2022 (1192)
2021 (985)
2020 (755)

Funding

Current Stage
Public Company
Total Funding
$25.75B
Key Investors
Sequoia Capital
2025-09-24Post Ipo Debt· $18B
2025-02-03Post Ipo Debt· $7.75B
1986-03-12IPO

Leadership Team

leader-logo
Esteban Rubens
Healthcare Field CTO
linkedin
G
Gerard Warrens
Field CTO, Business Strategy and Transformative Technologies
linkedin
Company data provided by crunchbase