Acceler8 Talent ยท 2 days ago
Software Engineer
Wonder how qualified you are to the job?
Maximize your interview chances
Insider Connection @Acceler8 Talent
Responsibilities
Designing and implementing multimodal web crawlers for scraping and indexing petabytes of data.
Creating large-scale data processing pipelines using tools such as Ray, Apache Spark, Apache Flink, and Google BigQuery.
Scaling deduplication techniques across different modalities and applying heuristic and model-based methods for parsing and filtering crawled data.
Identifying new data sources for integration into pre/post-training datasets.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Distributed computing and parallel processing techniques proficiency.
Attention to detail, reliability, and rigorous testing for ensuring data quality and integrity.
Experience in designing and maintaining high-performance, scalable data architectures.
Ability to develop and operate an LLM data pipeline from web scraping to data loading.
Benefits
Significant equity
401(k) plan with 6% salary matching
Comprehensive health, dental, and vision insurance coverage
Unlimited paid time off policy
Option to work remotely or in-person
Visa sponsorship and relocation stipend available