Apply on Employer Site

Tiger Analytics · 4 months ago

Gen AI Data Engineer

United States

Full-time

Remote

Senior Level, Lead/Staff

8+ years exp

Tiger Analytics is a fast-growing advanced analytics consulting firm seeking experienced Machine Learning Engineers with Gen AI experience. The role involves building and maintaining data platforms and pipelines, utilizing cloud technologies, and collaborating with teams to deliver analytics solutions.

AdvertisingAnalyticsBig DataConsultingMachine LearningNews

H1B Sponsor Likely

Responsibilities

You will be responsible for:

Proficiency in Python, SQL, and PySpark

Experience with Snowflake, NOSQL and Neo4j

Proficiency with Apache Airflow

Familiarity with AWS (S3, RDS, Lambda, AWS batch, SageMaker processing Job, CloudFormation, etc.) or GCP (Vertex AI RAG, Data pipeline, Bigquery, GKE)

Experience with Linux

Experience in building and deploying various pipelines

Experience with GitHub

Proficiency with VS Code

Skills in testing, deployment automation, DevOps/SysOps

Strong presentation and communication skills

Experience working with onshore/offshore teams

Experience with Hadoop and Spark

Proficiency with Streamlit and dashboards

Experience in building and maintaining internal APIs

Basic understanding of ML concepts

Familiarity with generative AI tools and techniques

Experience with creation and retrieval

Proficiency in managing vector databases

Ability to develop and maintain multiple forms of data persistence and retrieval methods (RDMBS, Vector Databases, buckets, graph databases, knowledge graphs, etc.)

Experience with AWS, especially SageMaker, Lambda, OpenSearch

Experience with Airflow DAGs, AutoSys, and CronJobs

Experience in managing data in unstructured forms (audio, video, image, text, etc.)

Expertise in continuous integration and deployment using Jenkins and GitHub Actions

Advanced skills in Terraform and CloudFormation

Knowledge of Docker and Kubernetes

Proven ability to monitor system performance, reliability, and security, and optimize them as needed

In-depth understanding of security best practices in cloud environments

Experience in designing and managing scalable infrastructure

Knowledge of disaster recovery and business continuity planning

Excellent analytical and problem-solving abilities

Ability to stay up-to-date with the latest industry trends and adapt to new technologies and methodologies

Proven ability to work well in a team environment and contribute to a positive, collaborative culture

8+ years of experience in data engineering, platform engineering, or related fields, with deep expertise in designing and building distributed data systems and large-scale data warehouses

Proven track record of architecting data platforms capable of processing petabytes of data and supporting real-time and batch ingestion processes

Strong experience in building robust data pipelines for document ingestion, indexing, and retrieval to support scalable RAG solutions. Proficiency in information retrieval systems and vector search technologies (e.g., FAISS, Pinecone, Elasticsearch, Milvus)

Experience with graphs/graph algorithms, LLMs, optimization algorithms, relational databases, and diverse data formats

Proficient in infrastructure and architecture for optimal extraction, transformation, and loading of data from various data sources

Hands-on experience in curating and collecting data from a variety of traditional and non-traditional sources

Experience in building ontologies in the knowledge retrieval space, schema-level constructs (including higher-level classes, punning, property inheritance), and Open Cypher

Experience in integrating external databases, APIs, and knowledge graphs into RAG systems to improve contextualization and response generation

Conduct experiments to evaluate the effectiveness of RAG workflows, analyze results, and iterate to achieve optimal performance

Qualification

PythonData PipelinesCloud PlatformsData WarehousingMachine LearningApache AirflowGitHubBig Data TechnologiesData VisualizationGenerative AICI/CDCommunicationCollaborationProblem-SolvingAdaptabilityTeam Collaboration

Required

Programming Languages: Proficiency in Python, SQL, and PySpark

Data Warehousing: Experience with Snowflake, NOSQL and Neo4j

Data Pipelines: Proficiency with Apache Airflow

Cloud Platforms: Familiarity with AWS (S3, RDS, Lambda, AWS batch, SageMaker processing Job, CloudFormation, etc.) or GCP (Vertex AI RAG, Data pipeline, Bigquery, GKE)

Operating Systems: Experience with Linux

Batch/Realtime Pipelines: Experience in building and deploying various pipelines

Version Control: Experience with GitHub

Development Tools: Proficiency with VS Code

Engineering Practices: Skills in testing, deployment automation, DevOps/SysOps

Communication: Strong presentation and communication skills

Collaboration: Experience working with onshore/offshore teams

Industry Experience: 8+ years of experience in data engineering, platform engineering, or related fields, with deep expertise in designing and building distributed data systems and large-scale data warehouses

Data Platforms: Proven track record of architecting data platforms capable of processing petabytes of data and supporting real-time and batch ingestion processes

Data Pipelines: Strong experience in building robust data pipelines for document ingestion, indexing, and retrieval to support scalable RAG solutions. Proficiency in information retrieval systems and vector search technologies (e.g., FAISS, Pinecone, Elasticsearch, Milvus)

Graph Algorithms: Experience with graphs/graph algorithms, LLMs, optimization algorithms, relational databases, and diverse data formats

Data Infrastructure: Proficient in infrastructure and architecture for optimal extraction, transformation, and loading of data from various data sources

Data Curation: Hands-on experience in curating and collecting data from a variety of traditional and non-traditional sources

Ontologies: Experience in building ontologies in the knowledge retrieval space, schema-level constructs (including higher-level classes, punning, property inheritance), and Open Cypher

Integration: Experience in integrating external databases, APIs, and knowledge graphs into RAG systems to improve contextualization and response generation

Experimentation: Conduct experiments to evaluate the effectiveness of RAG workflows, analyze results, and iterate to achieve optimal performance

Preferred

Big Data Technologies: Experience with Hadoop and Spark

Data Visualization: Proficiency with Streamlit and dashboards

APIs: Experience in building and maintaining internal APIs

Machine Learning: Basic understanding of ML concepts

Generative AI: Familiarity with generative AI tools and techniques

Knowledge Graphs: Experience with creation and retrieval

Vector Databases: Proficiency in managing vector databases

Data Persistence: Ability to develop and maintain multiple forms of data persistence and retrieval methods (RDMBS, Vector Databases, buckets, graph databases, knowledge graphs, etc.)

Cloud Technologies: Experience with AWS, especially SageMaker, Lambda, OpenSearch

Automation Tools: Experience with Airflow DAGs, AutoSys, and CronJobs

Unstructured Data Management: Experience in managing data in unstructured forms (audio, video, image, text, etc.)

CI/CD: Expertise in continuous integration and deployment using Jenkins and GitHub Actions

Infrastructure as Code: Advanced skills in Terraform and CloudFormation

Containerization: Knowledge of Docker and Kubernetes

Monitoring and Optimization: Proven ability to monitor system performance, reliability, and security, and optimize them as needed

Security Best Practices: In-depth understanding of security best practices in cloud environments

Scalability: Experience in designing and managing scalable infrastructure

Disaster Recovery: Knowledge of disaster recovery and business continuity planning

Problem-Solving: Excellent analytical and problem-solving abilities

Adaptability: Ability to stay up-to-date with the latest industry trends and adapt to new technologies and methodologies

Team Collaboration: Proven ability to work well in a team environment and contribute to a positive, collaborative culture

Benefits

This position offers an excellent opportunity for significant career development in a fast-growing and challenging entrepreneurial environment with a high degree of individual responsibility.

Company

Tiger Analytics

Glassdoor3.9

Tiger Analytics offers data analytics and predictive modeling solutions for retail, social media, and online advertising sectors.

Founded in 2010

Santa Clara, California, USA

5001-10000 employees

http://www.tigeranalytics.com

H1B Sponsorship

Tiger Analytics has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (259)

2024 (158)

2023 (93)

2022 (176)

2021 (123)

2020 (64)

Funding

Current Stage

Late Stage

Leadership Team

Mahesh Kumar

Founder and CEO

Recent News

Dataiku

Dataiku Announces the 2025 Frontrunner Award Winners, Honoring Trailblazers Driving Agentic AI and Enterprise Transformation

2025-11-19

SiliconANGLE

Three insights you might have missed during the Google Cloud AI Partner Series

2025-10-22

SiliconANGLE

ROI demands disciplined data foundations for the agentic AI stack

2025-10-11

Company data provided by crunchbase