griddable.io · 13 hours ago
ML Platform Engineer
Griddable.io is seeking a highly skilled AI Platform Engineer to play a pivotal role in the development of their ML/AI platform. The role involves building, maintaining, and scaling core infrastructure, platform services, and CI/CD pipelines to support machine learning initiatives and product launches.
AnalyticsBig DataCloud Data ServicesData IntegrationInformation TechnologySaaSSoftware
Responsibilities
Design, implement, and manage secure and scalable cloud infrastructure (primarily AWS) including networking, permissions management, data management, and kubernetes
Develop and maintain core ML platform components such as Model Registry, permissions services for project access, and tools for SageMaker default setup and deployments
Build and optimize CI/CD pipelines using GitHub Actions for efficient and secure code deployment, Docker and package building, and security scanning
Ensure robust and secure connectivity for the platform, including ingress (public and VPN), egress, and domain management (Route53). Manage service mesh (Istio) for traffic routing and security trust between micro services
Implement and manage essential tooling to enhance developer productivity and platform security, including secrets management, package/dependency management, testing frameworks, developer self-service tools, automation scripts/bots, and observability integrations
Contribute to establishing monitoring solutions (e.g., Grafana, PagerDuty) and integrate security scanning to ensure platform health and security
Participate in security reviews and ensure all platform components adhere to security best practices and compliance requirements
Work closely with cross-functional teams, including ML engineers, data scientists, and product managers, to deliver robust and high-performance solutions
Create and maintain comprehensive documentation for infrastructure, services, workflows, and user guides
Qualification
Required
Proven experience as a Platform Engineer, Software Engineer, or ML Infrastructure Engineer
Strong software engineering skills, particularly with Python, for building scalable tools, automation scripts, and platform components
Strong expertise in cloud platforms, particularly AWS (IAM, EKS, S3, SageMaker, etc.)
Extensive experience with CI/CD tools, especially GitHub Actions and ArgoCD
Proficiency in infrastructure-as-code principles and tools (e.g., Terraform)
Experience with containerization technologies (Docker) and orchestration (Kubernetes)
Understanding of networking concepts within cloud environments and service mesh technologies (eg., Istio)
Experience with MLOps concepts and tools
Knowledge of Airflow or other workflow orchestration tools
Experience with monitoring and alerting systems (Grafana, PagerDuty)
Familiarity with Okta or similar identity and access management systems
Experience with tenant and project onboarding processes in a multi-tenant environment
Familiarity with security best practices and conducting security reviews
Ability to manage multiple priorities and dependencies effectively
Excellent problem-solving and communication skills
Preferred
Exposure to A/B testing experimentation platforms
Experience with Salesforce Ecosystem
Have Built Agents and Evaluated them
Experience with Agent Memory, MCP servers etc
Experience with unstructured databases(vector or graph databases) and RAG pipelines
Experience working with modern data platforms and real-time processing frameworks, including cloud data warehouses (e.g., snowflake), streaming technologies (e.g. kafka, flink)
Experience with Feature Stores like Feast
Company
griddable.io
Griddable.io is a San Jose, CA based SaaS startup that closed Series A funding in 2017 from August Capital, Artiman Ventures, and Carsten Thoma, founding CEO of Hybris (acquired by SAP).