29 applicants

Company

Sand Technologies · 8 hours ago

Data Engineer

United States

Full-time

Remote

Mid Level

Maximize your interview chances

IT Services and IT Consulting

Insider Connection @Sand Technologies

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Design, implement, and maintain scalable data pipelines for ingesting, processing, and transforming large volumes of data from various sources using tools such as databricks, python and pyspark.

Design and optimize data models and schemas for efficient storage, retrieval, and analysis of structured and unstructured data.

Develop and automate ETL workflows to extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses.

Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics.

Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging cloud-native services for data storage, processing, and analytics.

Implement data quality checks, validation processes, and data governance policies to ensure accuracy, consistency, and compliance with regulations.

Monitor data pipelines and infrastructure performance, identify bottlenecks and optimize for scalability, reliability, and cost-efficiency. Troubleshoot and fix data-related issues.

Build and maintain basic CI/CD pipelines, commit code to version control and deploy data solutions.

Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand requirements, define data architectures, and deliver data-driven solutions.

Create and maintain technical documentation, including data architecture diagrams, ETL workflows, and system documentation, to facilitate understanding and maintainability of data solutions.

Continuously learn and apply best practices in data engineering and cloud computing.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Data EngineeringPythonBig Data TechnologiesETL ProcessesSQLDatabricksApache SparkKafkaFlinkCloud PlatformsData ModelingCI/CD PipelinesGitData GovernanceApache AirflowInformaticaTalendAWSAzureGCPJavaScala

Required

Proven experience as a Data Engineer, or in a similar role, with hands-on experience building and optimizing data pipelines and infrastructure.

Proven experience working with Big Data and tools used to process Big Data.

Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues.

Solid understanding of data engineering principles and practices.

Excellent communication and collaboration skills to work effectively in cross-functional teams and communicate technical concepts to non-technical stakeholders.

Ability to adapt to new technologies, tools, and methodologies in a dynamic and fast-paced environment.

Ability to write clean, scalable, robust code using python or similar programming languages.

Preferred

Proficiency in programming languages such as Python, Java, Scala, or SQL for data manipulation and scripting.

Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling.

Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink.

Experience in using modern data architectures, such as lakehouse.

Experience with CI/CD pipelines and version control systems like Git.

Knowledge of ETL tools and technologies such as Apache Airflow, Informatica, or Talend.

Knowledge of data governance and best practices in data management.

Familiarity with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions.

SQL (for database management and querying)

Apache Spark (for distributed data processing)

Apache Spark Streaming, Kafka or similar (for real-time data streaming)

Experience using data tools in at least one cloud service - AWS, Azure or GCP (e.g. S3, EMR, Redshift, Glue, Azure Data Factory, Databricks, BigQuery, Dataflow, Dataproc)

Company

Sand Technologies

Sand Technologies implements AI and digital transformation projects for leading organizations and governments around the world.

501-1,000 employees

https://www.sandtech.com/

Funding

Current Stage

Late Stage

Company data provided by crunchbase

Orion

Your AI Copilot