myGwork - LGBTQ+ Business Community · 2 days ago
Data Engineer
Wonder how qualified you are to the job?
Internet
Insider Connection @myGwork - LGBTQ+ Business Community
Responsibilities
Design, develop, and maintain scalable data pipelines using pyspark on Databricks, adhering to best practices and emphasizing software engineering principles.
Implement and optimize stream processing workflows using Kafka for real-time data ingestion and processing.
Utilize Parquet and Avro-formatted data files for efficient storage and retrieval, ensuring data schema compatibility and evolution.
Leverage Databricks platform on AWS to build and manage data processing workflows and analytics, while adhering to development lifecycle standards.
Harness the power of Databricks Delta Lake and Parquet files for data warehousing, query optimization, and data versioning.
Collaborate closely with data analysts and scientists to understand their requirements and provide reliable and timely data solutions.
Implement robust testing methodologies, including unit testing, integration testing, and end-to-end testing, utilizing Python packages such as pytest.
Contribute to the pyspark/Python ecosystem by creating reusable components, maintaining internal PyPI packages, and exploring other common Python packages.
Monitor data pipelines, identify and resolve issues, and ensure data integrity and quality.
Stay up-to-date with the latest trends and technologies in data engineering, software development, and testing practices, and actively share knowledge with the team.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Bachelor's or master's degree in computer science or a related field.
Minimum 5 years of real-world Data Engineering experience working on large-scale data projects.
Strong proficiency in pySpark, Python, and shell scripting, with a focus on software engineering best practices and a deep understanding of development lifecycle.
Experience working with workflow management tools such as Airflow.
Experience with stream processing technologies, preferably Kafka.
Familiarity with Avro data serialization format and its usage in data engineering workflows.
Expertise in using Databricks platform on AWS for data processing and analytics.
Solid understanding of data warehousing concepts and experience with Delta Lake and Parquet files.
Proficiency in SQL and experience with relational databases.
Strong testing skills, with experience in implementing and executing unit tests, integration tests, and end-to-end tests using Python packages such as pytest.
Familiarity with the Python ecosystem, including PyPI packages and their integration into data engineering workflows.
Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment.
Strong communication skills and ability to effectively communicate complex technical concepts to non-technical stakeholders.
Working experience with Databricks and pyspark.
Proficiency in writing complex SQLs.
Working experience with cloud platforms like AWS or Azure (preferably AWS).
Working Experience with Airflow.
Experience working with very large datasets.
Preferred
Experience working with reporting tools such as Tableau.
Past experience working on Machine Learning projects.
Past experience working in finance.
Benefits
Medical care
Insurance
Savings plans
Flexible Work Programs
Development programs
Educational support
Paid volunteer days
Matching gift programs
Employee networks
Company
myGwork - LGBTQ+ Business Community
myGwork is the largest global platform for the LGBTQ+ business community.
Funding
Current Stage
Early StageTotal Funding
$4.77MKey Investors
24 HaymarketInnovate UK
2023-08-17Series Unknown· $1.66M
2023-08-17Grant· Undisclosed
2021-12-07Series A· $2.12M
Recent News
2024-04-10
Company data provided by crunchbase