Gen AI Data Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Tiger Analytics · 4 months ago

Gen AI Data Engineer

Tiger Analytics is a fast-growing advanced analytics consulting firm seeking experienced Machine Learning Engineers with Gen AI experience. The role involves building and maintaining data platforms and pipelines, utilizing cloud technologies, and collaborating with teams to deliver analytics solutions.

AdvertisingAnalyticsBig DataConsultingMachine LearningNews
check
H1B Sponsor Likelynote

Responsibilities

You will be responsible for:
Proficiency in Python, SQL, and PySpark
Experience with Snowflake, NOSQL and Neo4j
Proficiency with Apache Airflow
Familiarity with AWS (S3, RDS, Lambda, AWS batch, SageMaker processing Job, CloudFormation, etc.) or GCP (Vertex AI RAG, Data pipeline, Bigquery, GKE)
Experience with Linux
Experience in building and deploying various pipelines
Experience with GitHub
Proficiency with VS Code
Skills in testing, deployment automation, DevOps/SysOps
Strong presentation and communication skills
Experience working with onshore/offshore teams
Experience with Hadoop and Spark
Proficiency with Streamlit and dashboards
Experience in building and maintaining internal APIs
Basic understanding of ML concepts
Familiarity with generative AI tools and techniques
Experience with creation and retrieval
Proficiency in managing vector databases
Ability to develop and maintain multiple forms of data persistence and retrieval methods (RDMBS, Vector Databases, buckets, graph databases, knowledge graphs, etc.)
Experience with AWS, especially SageMaker, Lambda, OpenSearch
Experience with Airflow DAGs, AutoSys, and CronJobs
Experience in managing data in unstructured forms (audio, video, image, text, etc.)
Expertise in continuous integration and deployment using Jenkins and GitHub Actions
Advanced skills in Terraform and CloudFormation
Knowledge of Docker and Kubernetes
Proven ability to monitor system performance, reliability, and security, and optimize them as needed
In-depth understanding of security best practices in cloud environments
Experience in designing and managing scalable infrastructure
Knowledge of disaster recovery and business continuity planning
Excellent analytical and problem-solving abilities
Ability to stay up-to-date with the latest industry trends and adapt to new technologies and methodologies
Proven ability to work well in a team environment and contribute to a positive, collaborative culture
8+ years of experience in data engineering, platform engineering, or related fields, with deep expertise in designing and building distributed data systems and large-scale data warehouses
Proven track record of architecting data platforms capable of processing petabytes of data and supporting real-time and batch ingestion processes
Strong experience in building robust data pipelines for document ingestion, indexing, and retrieval to support scalable RAG solutions. Proficiency in information retrieval systems and vector search technologies (e.g., FAISS, Pinecone, Elasticsearch, Milvus)
Experience with graphs/graph algorithms, LLMs, optimization algorithms, relational databases, and diverse data formats
Proficient in infrastructure and architecture for optimal extraction, transformation, and loading of data from various data sources
Hands-on experience in curating and collecting data from a variety of traditional and non-traditional sources
Experience in building ontologies in the knowledge retrieval space, schema-level constructs (including higher-level classes, punning, property inheritance), and Open Cypher
Experience in integrating external databases, APIs, and knowledge graphs into RAG systems to improve contextualization and response generation
Conduct experiments to evaluate the effectiveness of RAG workflows, analyze results, and iterate to achieve optimal performance

Qualification

PythonData PipelinesCloud PlatformsData WarehousingMachine LearningApache AirflowGitHubBig Data TechnologiesData VisualizationGenerative AICI/CDCommunicationCollaborationProblem-SolvingAdaptabilityTeam Collaboration

Required

Programming Languages: Proficiency in Python, SQL, and PySpark
Data Warehousing: Experience with Snowflake, NOSQL and Neo4j
Data Pipelines: Proficiency with Apache Airflow
Cloud Platforms: Familiarity with AWS (S3, RDS, Lambda, AWS batch, SageMaker processing Job, CloudFormation, etc.) or GCP (Vertex AI RAG, Data pipeline, Bigquery, GKE)
Operating Systems: Experience with Linux
Batch/Realtime Pipelines: Experience in building and deploying various pipelines
Version Control: Experience with GitHub
Development Tools: Proficiency with VS Code
Engineering Practices: Skills in testing, deployment automation, DevOps/SysOps
Communication: Strong presentation and communication skills
Collaboration: Experience working with onshore/offshore teams
Industry Experience: 8+ years of experience in data engineering, platform engineering, or related fields, with deep expertise in designing and building distributed data systems and large-scale data warehouses
Data Platforms: Proven track record of architecting data platforms capable of processing petabytes of data and supporting real-time and batch ingestion processes
Data Pipelines: Strong experience in building robust data pipelines for document ingestion, indexing, and retrieval to support scalable RAG solutions. Proficiency in information retrieval systems and vector search technologies (e.g., FAISS, Pinecone, Elasticsearch, Milvus)
Graph Algorithms: Experience with graphs/graph algorithms, LLMs, optimization algorithms, relational databases, and diverse data formats
Data Infrastructure: Proficient in infrastructure and architecture for optimal extraction, transformation, and loading of data from various data sources
Data Curation: Hands-on experience in curating and collecting data from a variety of traditional and non-traditional sources
Ontologies: Experience in building ontologies in the knowledge retrieval space, schema-level constructs (including higher-level classes, punning, property inheritance), and Open Cypher
Integration: Experience in integrating external databases, APIs, and knowledge graphs into RAG systems to improve contextualization and response generation
Experimentation: Conduct experiments to evaluate the effectiveness of RAG workflows, analyze results, and iterate to achieve optimal performance

Preferred

Big Data Technologies: Experience with Hadoop and Spark
Data Visualization: Proficiency with Streamlit and dashboards
APIs: Experience in building and maintaining internal APIs
Machine Learning: Basic understanding of ML concepts
Generative AI: Familiarity with generative AI tools and techniques
Knowledge Graphs: Experience with creation and retrieval
Vector Databases: Proficiency in managing vector databases
Data Persistence: Ability to develop and maintain multiple forms of data persistence and retrieval methods (RDMBS, Vector Databases, buckets, graph databases, knowledge graphs, etc.)
Cloud Technologies: Experience with AWS, especially SageMaker, Lambda, OpenSearch
Automation Tools: Experience with Airflow DAGs, AutoSys, and CronJobs
Unstructured Data Management: Experience in managing data in unstructured forms (audio, video, image, text, etc.)
CI/CD: Expertise in continuous integration and deployment using Jenkins and GitHub Actions
Infrastructure as Code: Advanced skills in Terraform and CloudFormation
Containerization: Knowledge of Docker and Kubernetes
Monitoring and Optimization: Proven ability to monitor system performance, reliability, and security, and optimize them as needed
Security Best Practices: In-depth understanding of security best practices in cloud environments
Scalability: Experience in designing and managing scalable infrastructure
Disaster Recovery: Knowledge of disaster recovery and business continuity planning
Problem-Solving: Excellent analytical and problem-solving abilities
Adaptability: Ability to stay up-to-date with the latest industry trends and adapt to new technologies and methodologies
Team Collaboration: Proven ability to work well in a team environment and contribute to a positive, collaborative culture

Benefits

This position offers an excellent opportunity for significant career development in a fast-growing and challenging entrepreneurial environment with a high degree of individual responsibility.

Company

Tiger Analytics

company-logo
Tiger Analytics offers data analytics and predictive modeling solutions for retail, social media, and online advertising sectors.

H1B Sponsorship

Tiger Analytics has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (259)
2024 (158)
2023 (93)
2022 (176)
2021 (123)
2020 (64)

Funding

Current Stage
Late Stage

Leadership Team

M
Mahesh Kumar
Founder and CEO
linkedin
Company data provided by crunchbase