CereCore · 1 week ago
Senior ML Reliability Engineer
Wonder how qualified you are to the job?
ConsultingInformation Services
Insider Connection @CereCore
Responsibilities
Tool Development and Management: Build, manage, and maintain tools for system reliability, including dashboards, logging systems, and pager systems.
Infrastructure Maintenance: Help maintain and enhance CI/CD pipelines, logging infrastructure, and other operational systems crucial for MLOps.
Monorepo Management: Keep the monorepo up-to-date with the latest dependency and security updates, ensuring a secure and efficient development environment.
Vendor Collaboration: Assist in implementing and maintaining infrastructure and systems managed by external vendor teams.
Incident Management: Lead and participate in incident management processes, including troubleshooting, root cause analysis, and implementing corrective measures to prevent future occurrences.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Hands on technical engineering, architecture and development experience with Google Cloud Platform Vertex AI MLOps live and in production at scale.
AI/ML Knowledge: Solid understanding of AI/ML principles and technologies.
System Monitoring and Tools: Experience with system monitoring tools and observability. Knowledge of GCP, Vertex AI, or other cloud platforms is highly beneficial.
Programming and Scripting: Proficiency in programming languages such as Python and scripting for automation.
Problem-Solving Skills: Strong analytical and problem-solving skills, with the ability to work under pressure.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
5+ years of experience in the technology field
Proven experience in a reliability engineering role, preferably with a focus on AI/ML systems.
Experience in incident management and performance optimization.
Excellent communication and teamwork skills.
Company
CereCore
CereCore has implemented EHR systems in more than 300 facilities and offers staffing and remote support services for major EHR acute.
Funding
Current Stage
Late StageRecent News
The Business Journals
2024-04-07
2024-04-05
2024-02-17
Company data provided by crunchbase