Tabby | تابي · 2 hours ago
Senior ML/Data Ops Engineer II
Tabby is a financial technology company that reshapes how people shop, earn, and save. They are seeking a Senior ML/Data Ops Engineer II to manage model serving, optimize data pipelines, and ensure infrastructure reliability for their innovative payment solutions.
Artificial Intelligence (AI)BillingFinanceFinancial ServicesFinTechPayments
Responsibilities
Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency
Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants
Advanced optimization and security hardening of Docker specifically for GPU environments
Managing model weights and orchestration within Kubernetes (GKE) environments
Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL
Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging
Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability
Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems
Experience with Istio service mesh to manage microservices communication and traffic
Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible
Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress
Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning
Infrastructure as Code with Terraform and Terragrunt
Proficiency in Python/Bash for building custom automation and AI Agent tooling
Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS
Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking
Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines
Strong ownership mindset: balancing speed, reliability, and cost
Comfortable working cross-functionally with developers, security, and compliance
Excellent sense of responsibility and accountability
Qualification
Required
Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency
Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants
Advanced optimization and security hardening of Docker specifically for GPU environments
Managing model weights and orchestration within Kubernetes (GKE) environments
Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL
Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging
Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability
Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems
Experience with Istio service mesh to manage microservices communication and traffic
Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible
Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress
Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning
Infrastructure as Code with Terraform and Terragrunt
Proficiency in Python/Bash for building custom automation and AI Agent tooling
Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS
Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking
Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines
Strong ownership mindset: balancing speed, reliability, and cost
Comfortable working cross-functionally with developers, security, and compliance
Excellent sense of responsibility and accountability
English B2 or higher
Preferred
Experience with PCI-DSS, SOC2, or regulations compliance environments
Benefits
Full-time B2B contract
Fully remote setup, work from anywhere in Europe
Up to 20% tax allowance
22 paid leave days annually
Stock options (ESOP) in a fast-scaling, pre-IPO company
Flexi benefits you can use for wellness, travel, or learning
Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.
Company
Tabby | تابي
Tabby is a financial technology company that helps millions of people in the Middle East to stay in control of their spending and make the most out of their money.
Funding
Current Stage
Late StageTotal Funding
$1.85BKey Investors
Hassana Investment Company (HIC)JP MorganWellington Management
2025-10-27Secondary Market
2025-02-12Series E· $160M
2023-12-21Series D· $50M
Recent News
2026-01-16
2025-12-30
2025-12-28
Company data provided by crunchbase