Abaka AI · 2 months ago
Data Engineer
Abakaai is a company focused on data engineering and artificial intelligence solutions. They are seeking a Data Engineer to collaborate with clients on data requirements, develop scalable data pipelines, and address technical challenges in multimodal data processing.
Data Collection and LabelingMachine LearningNatural Language Processing
Responsibilities
Collaborate closely with foundation model clients to understand their data requirements; coordinate internal teams to develop tailored delivery plans and ensure on-time, high-quality data delivery (e.g., meeting format, precision, and volume expectations)
Lead the development of mid- to long-term plans for the data engineering function. Build scalable, end-to-end pipelines for multimodal data (text, image, audio, video, 3D point cloud, etc.) including data sourcing, cleaning, annotation, QA, storage, and iterative optimization for training, fine-tuning, and evaluation
Drive solutions to core technical challenges in multimodal data processing, such as cross-modal alignment (e.g., image-text semantic matching), large-scale data cleaning (e.g., deduplication, denoising, format normalization), annotation efficiency, and data encryption/security
Work cross-functionally with algorithm, product, and business teams: for example, providing feedback to model teams on data bottlenecks, helping refine internal tooling and services, or supporting client-facing teams with technical documentation and pre-sales support
Assess and optimize the cost structure of data processing operations, including headcount, infrastructure, and tooling—striking a balance between quality, efficiency, and scalability
Qualification
Required
Strong background in computer science, data engineering, artificial intelligence, or related fields, with hands-on experience in large-scale data systems
1+ years of experience in data engineering or data operations; leadership experience is highly valued. Prior involvement in LLM or multimodal dataset preparation is a strong plus
Deep understanding of end-to-end multimodal data workflows, with hands-on experience in at least two modalities (e.g., text, images, audio, video)
Proficient in designing technical architectures for large-scale data pipelines (e.g., distributed processing, automation frameworks). Familiarity with data privacy and security best practices (e.g., access control, data anonymization)
Strong execution and team management skills—able to translate high-level objectives into actionable plans and drive team outcomes
Excellent communication and cross-functional collaboration skills—able to clearly convey technical and operational requirements, resolve conflicts, and manage stakeholder expectations
High sense of ownership and resilience—comfortable working in a fast-paced, evolving AI landscape and capable of navigating urgent delivery timelines
Company
Abaka AI
Abaka AI is a leading AI company and we are committed to becoming the data partner in artificial intelligence industry.
H1B Sponsorship
Abaka AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
Funding
Current Stage
Growth StageCompany data provided by crunchbase