Distributed Spectrum · 5 months ago
Machine Learning Infra Engineer
Distributed Spectrum is a company focused on creating systems for radio spectrum intelligence. The Machine Learning Infra Engineer will design and build core infrastructure to enable fast, scalable model training and facilitate data access for researchers.
AnalyticsSaaSSensor
Responsibilities
Design and build core infrastructure from scratch using technologies that you actually want to use
Scale our distributed data storage and write Python APIs that make loading 30GB datasets feel instantaneous
Set up the orchestration for model training on GPU clusters, versioning, and artifact deployment
Explore creative ways to combine relational and vector-based search queries, enabling researchers to discover the most relevant data for any modeling task
Qualification
Required
Experience designing and implementing data infrastructure from scratch, including databases, cloud storage, and cloud compute
Experience managing a production-grade Python codebase that was used by other people
Experience with AWS, including AWS networking, S3, Sagemaker, RDS, ECS, Lambda, and related infra-as-code tools
Experience designing database schemas, metadata states, and software abstractions that promote clarity and generalize well to new situations
Experience working directly with researchers and using infrastructure that supports experiment tracking, model versioning, and artifact deployment, such as MLflow or similar
You know how to deal with larger-than-memory data inexpensively, without setting up a cluster
You can write clearly
Extremely collaborative attitude and interest in helping define large areas of our engineering roadmap
Preferred
Understanding of database internals (indexes, query optimizers) and data storage formats, and the ability to use it to make practical design decisions
Experience writing production Rust or C++
Experience with modern DataFrame libraries and database systems, including Polars, Ibis, Duckdb, or similar
Experience with maintaining a versioned Python package, related CI/CD best practices, and the Python packaging ecosystem
Experience with event streaming data systems like ZeroMQ, Kafka, Flink, or similar
Experience with orchestration frameworks like Airflow, Prefect, Dagster, or similar
Experience dealing with role-based access to AWS and permissioning
Experience running distributed jobs using Spark, Ray, Dask, or similar
Benefits
Above-market salary, equity, and benefits package.
Early Series A Equity
Excellent health, dental, and vision coverage
401(k) match - up to 4% of your salary
Unlimited PTO
Daily office lunches in NYC
Company
Distributed Spectrum
Radio signals are everywhere - we find the ones that matter.
Funding
Current Stage
Early StageTotal Funding
$25.23MKey Investors
National Science Foundation
2025-03-19Series A· $25M
2024-03-01Seed
2022-06-01Seed
Recent News
2025-04-02
Company data provided by crunchbase