Senior Staff Engineer – Data Lakehouse Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

GEICO · 3 months ago

Senior Staff Engineer – Data Lakehouse Platform

GEICO is a leading insurance company that values innovation and aims to exceed customer expectations. They are seeking a Senior Staff Engineer to build high-performance data infrastructure and lead the implementation of a core Data Lakehouse for various business verticals. The role involves designing scalable components, leading architecture sessions, and mentoring other engineers.

Auto InsuranceFinancial ServicesGovernmentInsuranceInternetMobile
badNo H1Bnote

Responsibilities

Scope, design, and build scalable, resilient Data Lakehouse components
Lead architecture sessions and reviews with peers and leadership
Spearhead new software evaluations and innovate with new tooling
Design and lead the development & implementation of Compute Efficiency projects like Smart Spark Auto-Tuning Feature
Drive performance regression testing, benchmarking, and continuous performance profiling
Accountable for the quality, usability, and performance of the solutions
Determine and support resource requirements, evaluate operational processes, measure outcomes to ensure desired results, and demonstrate adaptability and sponsoring continuous learning
Collaborate with customers, team members, and other engineering teams to solve our toughest problems
Be a role model and mentor, helping to coach and strengthen the technical expertise and know-how of our engineering community
Consistently share best practices and improve processes within and across teams
Share your passion for staying on top of the latest open-source projects, experimenting with, and learning recent technologies, participating in internal and external OSS technology communities, and mentoring other members of the engineering community

Qualification

Spark internalsTuning Spark jobsAutomated optimization systemsScalaApache SparkCloud computingOpen-source table formatsDistributed systemsContainer technologyDevOps conceptsJavaPythonPerformance regression testingMentoringCollaboration

Required

Deep knowledge of Spark internals, including Catalyst, Tungsten, AQE, CBO, scheduling, shuffle management, and memory tuning
Proven experience in tuning and optimizing Spark jobs on Hyper-Scale Spark Compute Platforms
Mastery of Spark configuration parameters, resource tuning, partitioning strategies, and job execution behaviors
Experience building automated optimization systems – from config auto-tuners to feedback loops and adaptive pipelines
Strong software engineering skills in Scala, Java, and python are required
Ability to build tooling to surface meaningful performance insights at scale
Deep understanding of auto-scaling and cost-efficiency strategies in cloud-based Spark environments
Exemplary ability to design and develop, perform experiments, and influence engineering direction and product roadmap
Advanced experience developing new and enhancing existing open-source based Data Lakehouse platform components
Experience cultivating relationships with and contributing to open-source software projects
Experience with open-source table formats (Apache Iceberg, Delta, Hudi or equivalent)
Advanced experience with open-source compute engines (Apache Spark, Apache Flink, Trino/Presto, or equivalent)
Experience with cloud computing (AWS, Microsoft Azure, Google Cloud, Hybrid Cloud, or equivalent)
Expertise in developing distributed systems that are scalable, resilient, and highly available
Experience in container technology like Docker and Kubernetes platform development
Experience with continuous delivery and infrastructure as code
In-depth knowledge of DevOps concepts and cloud architecture
Experience in Azure Network (Subscription, Security zoning, etc.) or equivalent
10+ years of professional experience in data software development, programming languages and developing with big data technologies
8+ years of experience with architecture and design
6+ years of experience with distributed systems, with at least 3 years focused on Apache Spark
6+ years of experience in open-source frameworks
4+ years of experience with AWS, GCP, Azure, or another cloud service
Bachelor's or Master's degree in Computer Science, Software Engineering, or related field like physics or mathematics

Preferred

Active or past Apache Spark Committer (or significant code contributions to OSS Apache Spark)
Experience with ML-based optimization techniques (e.g., reinforcement learning, Bayesian tuning, predictive models)
Contributions to other big data/open-source projects (e.g., Delta Lake, Iceberg, Flink, Presto, Trino)
Background in designing performance regression frameworks and benchmarking suites
Deep understanding of Spark accelerators (Spark RAPIDS, Apache Gluten, Apache Comet, Apache Auron, etc.) committer status in one or more project is a plus
Skilled in documenting methodologies and producing publication-style papers, whitepapers, and internal research briefs

Benefits

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being.
Financial benefits including market-competitive compensation; a 401K savings plan vested from day one that offers a 6% match; performance and recognition-based incentives; and tuition assistance.
Access to additional benefits like mental healthcare as well as fertility and adoption assistance.
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year.

Company

GEICO, Government Employees Insurance Company, has been providing affordable auto insurance since 1936. It is a sub-organization of Berkshire Hathaway.

Funding

Current Stage
Late Stage
Total Funding
unknown
1996-01-01Acquired

Leadership Team

leader-logo
Todd Combs
Chairman, President, and Chief Executive Officer
leader-logo
Clayton Johnson
Sr. Director of Product Management
linkedin
Company data provided by crunchbase