Apply on Employer Site

Main Sequence · 1 month ago

Machine Learning Engineer - Audio Specialist | Breaker

Austin, TX

Full-time

Onsite

Mid, Senior Level

Main Sequence is an innovative startup focused on redefining how humans interact with robots through advanced AI technology. The Machine Learning Engineer - Audio Specialist will be responsible for building audio understanding models from scratch and owning the entire audio ML pipeline, including data collection, training, and deployment.

Venture Capital & Private Equity

Responsibilities

Evaluate and implement state-of-the-art architectures, making informed decisions about model selection based on audio quality constraints and deployment requirements

Own metrics such as word error rate (WER), establishing baselines and demonstrating measurable improvements over time

Build and maintain infrastructure for model training, including experiment tracking, performance monitoring, and version control

Design data collection campaigns and field testing protocols to capture representative training data across varying environmental conditions

Establish audio quality requirements and provide input on hardware selection for optimal model performanceDeploy and optimize models for NVIDIA Jetson platforms, ensuring real-time performance within compute and latency constraints

Conduct hands-on field testing in varied environments (outdoor, windy conditions, different communication systems) to validate model performance

Stay current with rapidly evolving speech recognition and multimodal model research, evaluating new approaches for potential integration

Qualification

Audio ML modelsDeep audio processingPython for MLModel deploymentData collection strategiesEdge deploymentPassion for audioTeam collaborationProblem-solving

Required

Bachelor's or Master's degree in Computer Science, Artificial Intelligence, Machine Learning, Audio Engineering, or a related field

Proven track record designing, training, and shipping audio ML models end-to-end (e.g. speech-to-text, speech-to-speech), including dataset creation, training pipelines, evaluation, and deployment in real-world applications

Deep understanding of how audio is represented and modeled for ML, including audio DSP and frequency-domain processing (e.g. STFT, mel/spectrogram transforms) and how these choices affect model performance

Expert-level Python for ML development, including building training loops, data/input pipelines, and experiment tracking

Hands-on experience deploying, quantizing, and optimizing models for production environments

Open to field work and travel for data capture campaigns and system validation testing

Preferred

Background in audio product companies or audio-focused ML applications (microphone manufacturers, audio processing products, speech recognition systems)

Personal passion for audio (e.g., sound engineering background, audio enthusiast with technical depth)

Experience with data annotation workflows and managing labeling processes

Experience with edge deployment or resource-constrained environments

Familiarity with ARM deployment or NVIDIA Jetson platforms

Exposure to multimodal models or bridging speech and language model systems

Data pipeline engineering experience for managing large-scale training datasets

Proficiency with ML infrastructure tools (e.g., Weights & Biases, ClearML, or similar)

Experience with ROS/ROS2 development and integrating AI with robotic systems

Benefits

Generous equity packages mean when Breaker wins, you win.

Company

Main Sequence

We are Australia’s deep tech investment fund tackling the world’s biggest challenges by turning today’s scientific discoveries into tomorrow’s industries.

Founded in 2016