Galent · 8 hours ago
Site Reliability Engineer SRE – ML platform
Responsibilities
6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS
Good understanding of Apache SOLR
Proficient with Linux administration
Knowledge of ML models and LLM
Ability to understand tools used by data scientists and experience with software development and test automation
Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
Experience working with cloud computing and database systems
Experience building custom integrations between cloud-based systems using APIs
Experience developing and maintaining ML systems built with open-source tools
Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Experience developing containers and Kubernetes in cloud computing environments
Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
Ability to translate business needs to technical requirements
Strong understanding of software testing, benchmarking, and continuous integration
Exposure to machine learning methodology and best practices
Good communication skills and ability to work in a team
Qualification
Required
6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS
Good understanding of Apache SOLR
Proficient with Linux administration
Knowledge of ML models and LLM
Ability to understand tools used by data scientists and experience with software development and test automation
Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
Experience working with cloud computing and database systems
Experience building custom integrations between cloud-based systems using APIs
Experience developing and maintaining ML systems built with open-source tools
Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Experience developing containers and Kubernetes in cloud computing environments
Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
Ability to translate business needs to technical requirements
Strong understanding of software testing, benchmarking, and continuous integration
Exposure to machine learning methodology and best practices
Good communication skills and ability to work in a team
Company
Galent
Galent is an AI-native digital engineering firm at the forefront of the AI revolution, dedicated to delivering unified, enterprise-ready AI solutions that transform businesses and industries.
Funding
Current Stage
Late StageCompany data provided by crunchbase