Be an early applicantLess than 25 applicants

This job has closed.

Company

Original Job Post

BioSpace · 1 week ago

Principal Data Engineer

Thousand Oaks, CA

Full-time

Hybrid

Senior Level

5+ years exp

Wonder how qualified you are to the job?

BiotechnologyCommunities

Comp. & Benefits

Insider Connection @BioSpace

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Lead the architecture, design, prototype, build, testing, implementation and DevOps of Coverage & Pricing's Deal Modeling, Forecasting and Analytics Products on GCO's AWS Data Lake, Databricks, and Anaplan and Tableau platforms.

Provide strategic leadership in designing and implementing Anaplan system architecture to support business objectives.

Architect, design, and implement scalable and efficient data pipelines and systems.

Develop and maintain data warehousing solutions to support analytics and reporting.

Ensure data infrastructure is robust, secure, and optimized for performance.

Provide technical guidance and mentorship to junior data engineers.

Lead architectural reviews and ensure adherence to best practices in data engineering.

Collaborate with product owner, data scientists, analysts, and business stakeholders to understand data needs and deliver solutions.

Develop and maintain scalable Extract, Transform, and Load (ETL) pipelines utilizing various technologies.

Ensure data quality, consistency, and reliability through effective data governance practices.

Monitor and optimize data systems for performance, scalability, and security.

Stay updated with the latest industry trends and advancements in software architecture and development.

Identify opportunities for process improvements and drive initiatives to enhance the efficiency of the development lifecycle.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Computer ScienceData EngineeringLarge-scale data systemsData pipelinesETL processesData warehousingData modelingDatabase designDatabricksAWSAzureSQLPythonRProgramming languagesBig data technologiesHadoopSparkKafkaCloud data platformsGoogle CloudData visualization toolsTableauPower BIDevOps frameworksJenkinsJIRAGitHubCI/CD processesAgile certification

Required

Doctorate degree and 2 years of Computer Science, Data Engineering, or a related field

Master's degree and 4 years of Computer Science, Data Engineering, or a related field

Bachelor's degree and 6 years of Computer Science, Data Engineering, or a related field

Associate degree and 10 years of Computer Science, Data Engineering, or a related field

High school diploma / GED and 12 years of Computer Science, Data Engineering, or a related field

Preferred

Five plus years of experience in data engineering, with a focus on large-scale data systems

Three plus years of proven experience in designing and implementing complex data pipelines and ETL processes

Four plus years of experience data warehousing, data modeling, and database design using Databricks and AWS/Azure stack

Proficiency in SQL, Python, R and other relevant programming languages

Strong experience with big data technologies (e.g., Hadoop, Spark, Kafka)

Expertise in cloud data platforms (e.g., AWS, Azure, Google Cloud)

Solid experience in data visualization tools (e.g., Tableau, Power BI)

Excellent problem-solving skills and attention to detail

Strong communication and collaboration skills

Ability to work effectively in a fast-paced, Agile development environment

Experience working in Agile teams & DevOps frameworks (Jenkins, JIRA, GitHub) and designing CI/CD processes

Certification in Agile or Scaled Agile Framework (SAFe)

Understanding of containerization and orchestration tools (e.g., Docker, Kubernetes)

Experience with machine learning and data analytics frameworks

Previous experience designing and building large-scale data platforms and systems including AI solutions

Familiar with Machine Learning life cycle, with knowledge of feature stores, MLflow, model registries, model deployment, model serving, and model monitoring

Certification in Anaplan

Familiar with PySpark and data processing libraries, machine learning frameworks (like Tensorflow, Keras or PyTorch), and other machine learning libraries

Ability to quickly assimilate knowledge and learn platforms, languages, tools, and technologies

Benefits

Comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts.

A discretionary annual bonus program, or for field sales representatives, a sales-based incentive plan

Stock-based long-term incentives

Award-winning time-off plans and bi-annual company-wide shutdowns

Flexible work models, including remote work arrangements, where possible