Balyasny Asset Management L.P. · 1 day ago
Senior Workflow Orchestration Engineer (Airflow & Scheduling Platforms)
Balyasny Asset Management L.P. is seeking a seasoned engineer to design, operate, and scale their workflow orchestration platform with a focus on Apache Airflow. The role involves owning the Airflow control plane and developer experience while partnering with data, analytics, and ML teams to deliver fast, reliable pipelines.
Financial Services
Responsibilities
Architect, deploy, and operate production-grade Airflow (self-managed or managed like MWAA/Cloud Composer/Astronomer), including upgrades, capacity planning, HA, and performance tuning
Run Airflow on Kubernetes using Helm and GitOps; configure Executors (KubernetesExecutor or Celery on K8s with CeleryKubernetesExecutor), autoscaling (e.g., KEDA), resource quotas, PDBs, and rolling strategies
Build and maintain automation infrastructure: Terraform/Helm modules, GitOps (Argo CD/Flux), CI/CD pipelines (GitHub Actions/GitLab/Jenkins) for environment creation, upgrades, and zero/low-downtime rollouts
Standardize the developer experience: DAG repo templates, shared operator/hook libraries, connection/secrets management, packaging/constraints, code owners, linting (ruff/flake8), unit tests/pytest, and pre-commit checks
Implement observability: metrics (StatsD/Prometheus), dashboards (Grafana), structured logs (ELK/OpenSearch), tracing (OpenTelemetry), SLA/latency tracking, alerting (PagerDuty/Opsgenie/Slack), and automated remediation
Drive reliability: pools/queues/concurrency policies, retries/backoff, idempotency patterns, deferrable operators/sensors, backfills, datasets and cross-DAG dependencies, runbooks, and incident response/postmortems
Secure the platform: SSO/OIDC, RBAC, least-privilege connections, network policies, TLS, secrets management (Vault/Secrets Manager/Kubernetes Secrets), audit logging, and compliance automation/policy-as-code
Manage platform components: metadata DB (Postgres/MySQL), Celery brokers/backends (Redis/RabbitMQ), provider packages, and controlled plugin lifecycle; plan and execute Airflow 2.x upgrades/migrations
Integrate data quality and lineage: Great Expectations/dbt tests, OpenLineage/Marquez; enforce quality gates in CI/CD and at runtime
Orchestrate across the data/ML ecosystem: Snowflake/BigQuery/Redshift, Databricks/Spark/EMR/Dataproc, dbt Core/Cloud, object storage (S3/GCS/ADLS), event and batch workloads
Evaluate and, where appropriate, operate complementary schedulers (Prefect, Dagster, Argo Workflows, Kubernetes CronJobs, AWS Step Functions) and lead migrations from legacy orchestrators
Partner closely with platform, data, and ML teams; provide enablement, documentation, and self-service tooling. Mentor engineers and contribute to roadmap and standards
Qualification
Required
5–8+ years building/operating data or platform systems; 3+ years running Airflow in production at scale (hundreds–thousands of DAGs and high task throughput)
Deep Airflow expertise: DAG design and testing, idempotency, deferrable operators/sensors, dynamic task mapping, task groups, datasets, pools/queues, SLAs, retries/backfills, cross-DAG dependencies
Strong Kubernetes experience running Airflow and supporting services: Helm, autoscaling, node/pod tuning, topology spread, network policies, PDBs, and blue/green or canary strategies
Automation-first mindset: Terraform, Helm, GitOps (Argo CD/Flux), and CI/CD for platform lifecycle; policy-as-code (OPA/Gatekeeper/Conftest) for DAG, connection, and secrets changes
Proficiency in Python for authoring operators/hooks/utilities; solid Bash; familiarity with Go or Java is a plus
Observability and SRE practices: Prometheus/Grafana/StatsD, centralized logging, alert design, capacity/throughput modeling, performance tuning
Data platform experience with at least one major cloud (AWS/Azure/GCP) and systems like Snowflake/BigQuery/Redshift, Databricks/Spark, EMR/Dataproc; strong grasp of IAM, VPC networking, and storage (S3/GCS/ADLS)
Security/compliance: SSO/OIDC, RBAC, secrets management (Vault/Secrets Manager), auditing, least-privilege connection management, and change control
Proven incident leadership, runbook creation, and platform roadmap execution; excellent cross-functional communication
Preferred
Experience operating alternative orchestrators (Prefect 2.x, Dagster, Argo Workflows, AWS Step Functions) and leading migrations to/from Airflow
OpenLineage/Marquez adoption; Great Expectations or other data quality frameworks; data contracts
dbt Core/Cloud orchestration patterns (state management, artifacts, slim CI)
Cost optimization and capacity planning for schedulers and workers; spot instance strategies
Multi-region HA/DR for Airflow metadata DB; backup/restore and disaster drills
Building internal developer platforms/portals (e.g., Backstage) for self-service pipelines
Contributions to Apache Airflow or provider packages; familiarity with recent AIPs/Airflow 2.7+ features
Company
Balyasny Asset Management L.P.
Balyasny Asset Management (BAM) is a diversified global investment firm founded in 2001 by Dmitry Balyasny, Scott Schroeder, and Taylor O'Malley.
H1B Sponsorship
Balyasny Asset Management L.P. has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (107)
2024 (85)
2023 (39)
2022 (44)
2021 (28)
2020 (18)
Funding
Current Stage
Late StageLeadership Team
Recent News
2026-01-09
2026-01-06
Company data provided by crunchbase