Apply on Employer Site

Archetype AI · 1 day ago

Staff Backend Software Engineer: Distributed Data

United States

Full-time

Remote

Senior Level, Lead/Staff

7+ years exp

Archetype AI is developing an innovative AI platform aimed at transforming real-world data into valuable insights. The Staff Backend Software Engineer will be responsible for owning data processing and analysis across edge devices and the platform, building high-performance data pipelines, and ensuring reliable processing in constrained environments.

Artificial Intelligence (AI)Information TechnologySoftware

Responsibilities

Analyze raw data using Python for statistical analysis, visualization, and exploratory techniques to understand quality, patterns, and anomalies

Prepare datasets for AI workflows: cleaning, normalization, imputation, filtering, resampling, and validation

Execute iterative preprocessing cycles: refine transformations, evaluate results, compare against baselines, retain improvements

Build tooling for data validation, quality monitoring, and automated preprocessing

Generate clear reports and visualizations that communicate findings to technical and non-technical stakeholders

Build and optimize data processing software in C++ that runs on small, resource-constrained Linux devices

Ensure pipelines meet real-time performance requirements: low latency, bounded memory, reliable throughput

Integrate sensor inputs and manage data flow on-device: ingestion, buffering, local processing, and transmission

Work within device constraints: limited CPU, memory, storage, and intermittent connectivity

Contribute to device deployment, configuration, and operational tooling

Partner with Solutions Engineers to assess customer data assets and deployment requirements

Translate customer data challenges into reusable pipeline components and analysis workflows

Design and develop scalable, efficient, and reliable data processing systems that handle large volumes of data

Collaborate with data engineers, data scientists, and product managers to design and implement data processing systems that meet the needs of our business and our users

Write high-quality, maintainable code that is efficient, scalable, and reliable, using programming languages such as Java, Python, and Scala

Work with distributed computing frameworks such as Apache Spark, Hadoop, and Flink to design and implement data processing systems that handle large volumes of data

Design and implement data storage systems such as NoSQL databases, columnar storage, and data warehousing to meet the needs of our business and our users

Collaborate with cross-functional teams to design and implement data processing systems that meet the needs of our business and our users

Contribute to the development of our data infrastructure, including data pipelines, data warehouses, and data lakes

Collaborate with our data scientists to design and implement data processing systems that enable them to focus on high-level tasks, while our data infrastructure handles the heavy lifting

Participate in code reviews, contribute to the development of our codebase, and ensure that our code is maintainable, efficient, and scalable

Stay up-to-date with the latest technologies and trends in data processing and infrastructure, and contribute to the development of our data infrastructure and data processing systems

Qualification

C++PythonData processingLinuxStatistical analysisReal-time processingData preparation for MLCommunicationCollaborationProblem-solving

Required

7+ years in data engineering, data analysis, or related technical roles with hands-on data processing focus

Deep experience with time-series data (video a plus): ingestion, preprocessing, feature extraction, quality assessment

Proven ability to apply diverse analytical techniques: statistical analysis, signal processing, visualization, anomaly detection

Experience with iterative data workflows: hypothesis, transformation, evaluation, refinement

Comfortable building and running software on Linux devices, familiarity with system-level concerns (resource usage, process management, I/O)

Experience with real-time or streaming data processing under latency and throughput constraints

Familiarity with data preparation for ML: dataset formatting, labeling workflows, train/eval splits, data validation

C++ (production development): Strong proficiency building production data pipelines and device software. Experience with modern C++, memory management, multithreading, and performance optimization

Python (analysis & prototyping): Strong proficiency for data exploration, statistical analysis, visualization, and rapid prototyping. Experience with NumPy, Pandas, Matplotlib, and Jupyter notebooks

Proven expertise in Linux system architecture and performance, including process design, I/O strategies, and diagnosing complex production issues

Debugging & profiling: Strong skills diagnosing performance issues, memory problems, and data pipeline failures in both C++ and Python

Clear, structured written communication, including customer-facing documentation of findings, processes, and technical decisions

Proven ability to present complex analytical and technical results directly to customers, translating them into concrete, actionable insights for technical teams and business stakeholders

Preferred

Background in signal processing, control systems, or physics-based data analysis

Experience with embedding-space analysis or other AI/ML diagnostic techniques

Prior work optimizing data pipelines for resource-constrained environments

Background in solutions engineering or customer-facing technical work

Company

Archetype AI

Archetype AI develops Physical AI agents that harness real-world sensor data to enhance decision-making and automate processes.

Founded in 2023

Palo Alto, California, USA

11-50 employees

https://www.archetypeai.io

Funding

Current Stage

Early Stage

Total Funding

$48M

Key Investors

Hitachi Ventures,IAG Capital PartnersComcast NBCUniversal LIFT LabsVenrock

2025-11-20Series A· $35M

2024-10-20Non Equity Assistance

2024-04-05Seed· $13M

Leadership Team

Jaime Lien

Co-Founder & Chief Scientist

Nick Gillian

Founder, CTO

Recent News

CB Insights

Early-Stage Trends Report: Beyond LLMs, physical AI’s ChatGPT moment, and more in November

2025-12-20

SiliconANGLE

Archetype raises $35M to automate sensor data analysis with AI

2025-11-23

Pulse 2.0

Archetype AI: $35 Million Series A Raised To Advance Physical AI Platform And Deploy Next-Generation Physical Agents

2025-11-23

Company data provided by crunchbase