webAI · 9 hours ago
On-Device Machine Learning Engineer
webAI is pioneering the future of artificial intelligence by establishing a distributed AI infrastructure dedicated to personalized AI. They are seeking an On-Device Machine Learning Engineer to design, optimize, and manage models running locally on consumer devices, focusing on performance and user experience.
Computer Software
Responsibilities
Convert, optimize, and deploy models to run efficiently on-device using Core ML and/or MLX
Implement quantization strategies (e.g., 8-bit / 4-bit where applicable), compression, pruning, distillation, and other techniques to meet performance targets
Profile and improve model execution across compute backends (CPU/GPU/Neural Engine where relevant), and reduce memory footprint
Build and optimize local retrieval pipelines (embeddings, indexing, caching, ranking) that work offline and under tight resource constraints
Implement local memory systems (short/long-term) with careful attention to privacy, durability, and performance
Collaborate with product/design to translate 'memory' behavior into concrete technical architectures and measurable quality targets
Own the on-device model lifecycle: packaging, versioning, updates, rollback strategies, on-device A/B testing approaches, telemetry, and quality monitoring
Build robust evaluation and regression suites that reflect real device constraints and user workflows
Ensure models degrade gracefully (low-power mode, thermals, backgrounding, OS interruptions)
Treat battery, thermal, and latency as first-class product requirements: instrument, benchmark, and optimize continuously
Design inference pipelines and scheduling strategies that respect app responsiveness, animations, and UI smoothness
Partner with platform engineers to integrate ML into production apps with clean APIs and stable runtime behavior
Qualification
Required
Strong experience shipping ML features into production, ideally including mobile / edge / consumer devices
Hands-on proficiency with Core ML and/or MLX, and the practical realities of running models locally
Solid understanding of quantization and optimization techniques for inference (accuracy/perf tradeoffs, calibration, benchmarking)
Experience building or operating retrieval systems (embedding generation, vector search/indexing, caching strategies)—especially under resource constraints
Fluency in performance engineering: profiling, latency breakdowns, memory analysis, and tuning on real devices
Strong software engineering fundamentals: maintainable code, testing, CI, and debugging across complex systems
Preferred
Experience with on-device LLMs, multimodal models, or real-time interactive ML features
Familiarity with Metal / GPU compute, or performance tuning of ML workloads on Apple platforms
Experience designing privacy-preserving personalization and memory (local-first data handling, encryption, retention policies)
Experience building developer tooling for model packaging, benchmarking, and release management
Prior work on offline-first architectures, edge inference, or battery/thermal-aware scheduling
Benefits
Competitive salary and performance-based incentives.
Comprehensive health, dental, and vision benefits package.
401k Match (US-based only)
$200/mos Health and Wellness Stipend
$400/year Continuing Education Credit
$500/year Function Health subscription (US-based only)
Free parking, for in-office employees
Unlimited Approved PTO
Parental Leave for Eligible Employees
Supplemental Life Insurance
Company
webAI
webAI is designed to streamline the training, deployment, and execution of AI models by offering a unified execution layer for AI that seamlessly integrates cloud-based services and local devices.
H1B Sponsorship
webAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9)
2024 (2)
Funding
Current Stage
Growth StageRecent News
Google Patent
2024-04-16
Company data provided by crunchbase