Definitive Healthcare · 5 hours ago
Senior Big Data Engineer
Maximize your interview chances
AnalyticsArtificial Intelligence (AI)
H1B Sponsor Likely
Insider Connection @Definitive Healthcare
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Build and maintain scalable data pipelines using Python, Spark, and Databricks.
Implement data workflows and ETL processes using Apache Airflow.
Integrate data from various sources (AWS, GCP, on-premises) into a unified data warehouse.
Handle variety of data formats such as csv, text, xml, parquet, delta etc.,
Ensure data quality and integrity through effective data cleansing and curation practices.
Manage and optimize data storage solutions, ensuring high availability and performance.
Automate observability of data and workloads
Implement and manage Unity Catalog for metadata management.
Ensure data governance policies are followed, including data security, privacy, and compliance.
Develop and maintain data documentation and data dictionaries.
Automate data observability across pipelines
Optimize Spark jobs for performance and efficiency.
Investigate and resolve performance bottlenecks in Spark applications.
Utilize JVM tuning techniques to improve application performance.
Implement and manage the Medallion architecture for data maturity lifecycle.
Ensure data is appropriately processed and categorized at different stages (bronze, silver, gold) to maximize its usability and value.
Work closely with data scientists, analysts, and other stakeholders to understand data needs and deliver solutions.
Implement CI/CD pipelines to automate deployment and testing of data infrastructure.
Stay up to date with the latest industry trends and technologies to continuously improve data engineering practices.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Hands-on Python or Scala programming.
Strong experience with Apache Spark and Databricks.
Hands-on experience with Apache Airflow or similar workflow orchestration tools.
Data modeling and processing fundamentals with large-scale volume of data.
Knowledge of data cleansing and curation techniques.
Familiarity with Unity Catalog or other metadata management tools.
Understanding of data governance principles and best practices.
Experience with cloud platforms (AWS and GCP).
Strong understanding of normalization and denormalization.
Proficiency in CI/CD tools and practices (e.g., Jenkins, GitLab CI, etc.).
Experience with JVM tuning and Spark job performance investigation.
Experience with Medallion architecture for data maturity lifecycle.
Familiarity with containerization.
Excellent problem-solving and analytical skills.
Strong communication and collaboration skills.
Ability to work independently and as part of a team.
Detail-oriented with a focus on delivering high-quality work.
Preferred
Certification in cloud platforms (AWS Certified Data Analytics, Google Cloud Professional Data Engineer, etc.).
Familiarity with SQL and NoSQL databases.
Experience in a similar role within a fast-paced, data-driven environment.
Benefits
Competitive benefits package including great healthcare benefits and a 401(k) match
Company
Definitive Healthcare
Definitive Healthcare aims to transform data, analytics and expertise into healthcare commercial intelligence.
H1B Sponsorship
Definitive Healthcare has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (7)
2022 (13)
2021 (25)
2020 (9)
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
22C Capital
2021-09-15IPO· undefined
2019-10-02Private Equity· undefined
2015-03-02Private Equity· undefined
Recent News
2024-11-07
2024-11-06
2024-05-29
Company data provided by crunchbase