Quince · 5 days ago
Senior ML Infrastructure Engineer - MLOps
Maximize your interview chances
Retail
Actively Hiring
Insider Connection @Quince
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Design, Build, and Maintain ML Pipelines: Develop and optimize end-to-end machine learning pipelines, including data ingestion, model training, validation, deployment, and monitoring.
Implement Continuous Integration/Continuous Deployment (CI/CD) for ML Models: Establish robust CI/CD processes to automate the testing, deployment, and monitoring of machine learning models in production environments.
Build and Own Production Infrastructure for Serving ML Models: Design, deploy, and maintain the production infrastructure necessary for real-time and batch serving of machine learning models, ensuring high availability, scalability, and reliability.
Build and Own the Feature Store: Design, implement, and manage the feature store to ensure efficient and scalable storage, retrieval, and versioning of features used in machine learning models, enabling consistent and reusable feature engineering across teams.
Collaborate with Data Scientists and Engineers: Work closely with data scientists, data engineers, and software engineers to ensure seamless integration of ML models into production systems, aligning models with business goals.
Monitor and Optimize Model Performance: Implement monitoring solutions to track the performance of ML models in production, identifying and addressing any issues such as data drift, model degradation, or system bottlenecks.
Ensure Scalability and Reliability: Design and implement scalable and reliable ML infrastructure, leveraging cloud platforms, containerization, and orchestration tools like Kubernetes and Docker.
Automate Data and Model Management: Develop automated solutions for version control, model registry, and experiment tracking to manage the lifecycle of ML models efficiently.
Optimize Resource Utilization: Manage and optimize the use of computational resources, such as GPUs and cloud instances, to balance performance with cost-effectiveness
Conduct Root Cause Analysis and Troubleshooting: Diagnose and resolve issues in ML pipelines, including debugging data, code, and model performance problems.
Document Processes and Systems: Create and maintain comprehensive documentation of ML pipelines, deployment processes, and operational workflows to ensure knowledge sharing and continuity.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Bachelor degree in computer science, engineering or related field
5+ years of experience in ML Infrastructure or ML engineering
Hands-on and expertise experience in building and maintaining ML pipelines
Hands-on and expertise experience in building and managing scalable ML production infrastructure
Hands-on and expertise experience in AWS or other major cloud services
Strong knowledge of CI/CD practices for ML models
Familiarity with DevOps principles and tools
Familiarity with TensorFlow, PyTorch, or similar frameworks
Proficient in Python and Java (or Scala)
Excellent communication skills
Move fast, be a team player, and kind
Company
Quince
Quince is an affordable luxury brand that sells high-quality fashion and home goods at radically low prices— direct from the factory floor.
Funding
Current Stage
Growth StageRecent News
Business Insider
2024-03-16
2024-02-22
Silicon Valley Journals
2023-12-04
Company data provided by crunchbase