Photon · 1 day ago
Data Engineer - Dallas, TX
Photon is seeking a Data Engineer to build and scale the data infrastructure for their Agentic AI products. The role involves designing data pipelines, optimizing vector databases, and ensuring data quality for AI applications.
E-CommerceInformation TechnologyMobile AppsWeb DesignWeb Development
Responsibilities
Design and implement scalable ETL/ELT pipelines that process both structured (SQL, logs) and unstructured (PDFs, emails, docs) data specifically for LLM consumption
Architect and optimize Vector Databases (e.g., Pinecone, Weaviate, Milvus, or Qdrant) to ensure high-speed, relevant similarity searches for agentic retrieval
Collaborate with AI Engineers to optimize data chunking strategies and embedding models to improve the "recall" and "precision" of the agent's knowledge retrieval
Develop automated "Data Cleaning" workflows to remove noise, PII (Personally Identifiable Information), and toxicity from training/context datasets
Enrich raw data with advanced metadata tagging to help agents filter and prioritize information during multi-step reasoning tasks
Build low-latency data streams (using Kafka or Flink) to provide agents with "fresh" data, enabling them to act on real-time market or operational changes
Construct "Gold Datasets" and versioned data snapshots to help the team benchmark agent performance over time
Qualification
Required
4+ years in Data Engineering, with at least 1 year focusing on data for LLMs or AI/ML applications
Deep expertise in Python (Pandas, Pydantic, FastAPI) for data manipulation and API integration
Strong experience with modern data stack tools (e.g., dbt, Airflow, Dagster, Snowflake, or Databricks)
Hands-on experience with at least one major Vector Database and knowledge of similarity search algorithms (HNSW, Cosine Similarity)
Familiarity with hybrid search techniques (combining semantic search with traditional keyword search like Elasticsearch/BM25)
Proficiency in managing data workloads on AWS, Azure, or GCP
Preferred
Experience with LlamaIndex or LangChain for data ingestion
Knowledge of Graph Databases (e.g., Neo4j) to help agents understand complex relationships between data points
Familiarity with 'Data-Centric AI' principles—prioritizing data quality over model size
Benefits
Medical, vision, and dental benefits
401k retirement plan
Variable pay/incentives
Paid time off
Paid holidays
Company
Photon
Photon is a technology corporation that provides Strategy Consulting, Creative Design, and Technology Services to global enterprise.
H1B Sponsorship
Photon has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (233)
2024 (168)
2023 (236)
2022 (184)
2021 (157)
2020 (249)
Funding
Current Stage
Late StageRecent News
Bangkok Post
2024-05-20
2024-04-02
Company data provided by crunchbase