Apply on Employer Site

GEICO · 2 weeks ago

Senior Staff Engineer – Data Lakehouse Platform

New York, NY

Full-time

Hybrid

Senior Level, Lead/Staff

$110K/yr - $260K/yr

10+ years exp

GEICO is a leading insurance company that offers quality coverage and innovative solutions to its customers. They are seeking a Senior Staff Engineer to build high-performance data infrastructure and lead the implementation of a core Data Lakehouse, driving the company's transformation into a tech organization focused on engineering excellence.

Auto InsuranceFinancial ServicesGovernmentInsuranceInternetMobile

No H1B

Responsibilities

Scope, design, and build scalable, resilient Data Lakehouse components

Lead architecture sessions and reviews with peers and leadership

Spearhead new software evaluations and innovate with new tooling

Design and lead the development & implementation of Compute Efficiency projects like Smart Spark Auto-Tuning Feature

Drive performance regression testing, benchmarking, and continuous performance profiling

Accountable for the quality, usability, and performance of the solutions

Determine and support resource requirements, evaluate operational processes, measure outcomes to ensure desired results, and demonstrate adaptability and sponsoring continuous learning

Collaborate with customers, team members, and other engineering teams to solve our toughest problems

Be a role model and mentor, helping to coach and strengthen the technical expertise and know-how of our engineering community

Consistently share best practices and improve processes within and across teams

Share your passion for staying on top of the latest open-source projects, experimenting with, and learning recent technologies, participating in internal and external OSS technology communities, and mentoring other members of the engineering community

Qualification

Spark internalsTuning Spark jobsAutomated optimization systemsOpen-source Data LakehouseDistributed systemsScalaJavaPythonCloud computingContainer technologyDevOps conceptsSoft skills

Required

Deep knowledge of Spark internals, including Catalyst, Tungsten, AQE, CBO, scheduling, shuffle management, and memory tuning

Proven experience in tuning and optimizing Spark jobs on Hyper-Scale Spark Compute Platforms

Mastery of Spark configuration parameters, resource tuning, partitioning strategies, and job execution behaviors

Experience building automated optimization systems – from config auto-tuners to feedback loops and adaptive pipelines

Strong software engineering skills in Scala, Java, and python are required

Ability to build tooling to surface meaningful performance insights at scale

Deep understanding of auto-scaling and cost-efficiency strategies in cloud-based Spark environments

Exemplary ability to design and develop, perform experiments, and influence engineering direction and product roadmap

Advanced experience developing new and enhancing existing open-source based Data Lakehouse platform components

Experience cultivating relationships with and contributing to open-source software projects

Experience with open-source table formats (Apache Iceberg, Delta, Hudi or equivalent)

Advanced experience with open-source compute engines (Apache Spark, Apache Flink, Trino/Presto, or equivalent)

Experience with cloud computing (AWS, Microsoft Azure, Google Cloud, Hybrid Cloud, or equivalent)

Expertise in developing distributed systems that are scalable, resilient, and highly available

Experience in container technology like Docker and Kubernetes platform development

Experience with continuous delivery and infrastructure as code

In-depth knowledge of DevOps concepts and cloud architecture

Experience in Azure Network (Subscription, Security zoning, etc.) or equivalent

10+ years of professional experience in data software development, programming languages and developing with big data technologies

8+ years of experience with architecture and design

6+ years of experience with distributed systems, with at least 3 years focused on Apache Spark

6+ years of experience in open-source frameworks

4+ years of experience with AWS, GCP, Azure, or another cloud service

Bachelor's or Master's degree in Computer Science, Software Engineering, or related field like physics or mathematics

Preferred

Active or past Apache Spark Committer (or significant code contributions to OSS Apache Spark)

Experience with ML-based optimization techniques (e.g., reinforcement learning, Bayesian tuning, predictive models)

Contributions to other big data/open-source projects (e.g., Delta Lake, Iceberg, Flink, Presto, Trino)

Background in designing performance regression frameworks and benchmarking suites

Deep understanding of Spark accelerators (Spark RAPIDS, Apache Gluten, Apache Comet, Apache Auron, etc.) committer status in one or more project is a plus

Skilled in documenting methodologies and producing publication-style papers, whitepapers, and internal research briefs

Benefits

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being.

Financial benefits including market-competitive compensation; a 401K savings plan vested from day one that offers a 6% match; performance and recognition-based incentives; and tuition assistance.

Access to additional benefits like mental healthcare as well as fertility and adoption assistance.

Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year.

Company

GEICO

Glassdoor2.7

GEICO, Government Employees Insurance Company, has been providing affordable auto insurance since 1936. It is a sub-organization of Berkshire Hathaway.

Founded in 1936

Chase, Maryland, USA

10001+ employees