Archetype AI · 1 day ago
Staff Backend Software Engineer: Distributed Data
Archetype AI is developing an innovative AI platform aimed at transforming real-world data into valuable insights. The Staff Backend Software Engineer will be responsible for owning data processing and analysis across edge devices and the platform, building high-performance data pipelines, and ensuring reliable processing in constrained environments.
Artificial Intelligence (AI)Information TechnologySoftware
Responsibilities
Analyze raw data using Python for statistical analysis, visualization, and exploratory techniques to understand quality, patterns, and anomalies
Prepare datasets for AI workflows: cleaning, normalization, imputation, filtering, resampling, and validation
Execute iterative preprocessing cycles: refine transformations, evaluate results, compare against baselines, retain improvements
Build tooling for data validation, quality monitoring, and automated preprocessing
Generate clear reports and visualizations that communicate findings to technical and non-technical stakeholders
Build and optimize data processing software in C++ that runs on small, resource-constrained Linux devices
Ensure pipelines meet real-time performance requirements: low latency, bounded memory, reliable throughput
Integrate sensor inputs and manage data flow on-device: ingestion, buffering, local processing, and transmission
Work within device constraints: limited CPU, memory, storage, and intermittent connectivity
Contribute to device deployment, configuration, and operational tooling
Partner with Solutions Engineers to assess customer data assets and deployment requirements
Translate customer data challenges into reusable pipeline components and analysis workflows
Design and develop scalable, efficient, and reliable data processing systems that handle large volumes of data
Collaborate with data engineers, data scientists, and product managers to design and implement data processing systems that meet the needs of our business and our users
Write high-quality, maintainable code that is efficient, scalable, and reliable, using programming languages such as Java, Python, and Scala
Work with distributed computing frameworks such as Apache Spark, Hadoop, and Flink to design and implement data processing systems that handle large volumes of data
Design and implement data storage systems such as NoSQL databases, columnar storage, and data warehousing to meet the needs of our business and our users
Collaborate with cross-functional teams to design and implement data processing systems that meet the needs of our business and our users
Contribute to the development of our data infrastructure, including data pipelines, data warehouses, and data lakes
Collaborate with our data scientists to design and implement data processing systems that enable them to focus on high-level tasks, while our data infrastructure handles the heavy lifting
Participate in code reviews, contribute to the development of our codebase, and ensure that our code is maintainable, efficient, and scalable
Stay up-to-date with the latest technologies and trends in data processing and infrastructure, and contribute to the development of our data infrastructure and data processing systems
Qualification
Required
7+ years in data engineering, data analysis, or related technical roles with hands-on data processing focus
Deep experience with time-series data (video a plus): ingestion, preprocessing, feature extraction, quality assessment
Proven ability to apply diverse analytical techniques: statistical analysis, signal processing, visualization, anomaly detection
Experience with iterative data workflows: hypothesis, transformation, evaluation, refinement
Comfortable building and running software on Linux devices, familiarity with system-level concerns (resource usage, process management, I/O)
Experience with real-time or streaming data processing under latency and throughput constraints
Familiarity with data preparation for ML: dataset formatting, labeling workflows, train/eval splits, data validation
C++ (production development): Strong proficiency building production data pipelines and device software. Experience with modern C++, memory management, multithreading, and performance optimization
Python (analysis & prototyping): Strong proficiency for data exploration, statistical analysis, visualization, and rapid prototyping. Experience with NumPy, Pandas, Matplotlib, and Jupyter notebooks
Proven expertise in Linux system architecture and performance, including process design, I/O strategies, and diagnosing complex production issues
Debugging & profiling: Strong skills diagnosing performance issues, memory problems, and data pipeline failures in both C++ and Python
Clear, structured written communication, including customer-facing documentation of findings, processes, and technical decisions
Proven ability to present complex analytical and technical results directly to customers, translating them into concrete, actionable insights for technical teams and business stakeholders
Preferred
Background in signal processing, control systems, or physics-based data analysis
Experience with embedding-space analysis or other AI/ML diagnostic techniques
Prior work optimizing data pipelines for resource-constrained environments
Background in solutions engineering or customer-facing technical work
Company
Archetype AI
Archetype AI develops Physical AI agents that harness real-world sensor data to enhance decision-making and automate processes.
Funding
Current Stage
Early StageTotal Funding
$48MKey Investors
Hitachi Ventures,IAG Capital PartnersComcast NBCUniversal LIFT LabsVenrock
2025-11-20Series A· $35M
2024-10-20Non Equity Assistance
2024-04-05Seed· $13M
Recent News
2025-11-23
Company data provided by crunchbase