We explore how a third-party logistics platform built its entire data orchestration layer on Airflow, and what that makes possible for developer teams and merchant-facing products alike.
Filip Kunčar, Platform Director at ShipMonk Product Development, discusses migrating from a closed source tool to Airflow, orchestrating dbt with both Cosmos and the BashOperator and using Airflow to power customer-facing data delivery.
Key Takeaways:
00:00 Introduction.
01:07 ShipMonk is a third-party logistics company guaranteeing two-day delivery across the US. The data platform team's mission is to lower cognitive load for developers working with data.
05:13 ShipMonk migrated to Airflow in 2022, moving away from a closed-source UI-based tool, driven by the need for a code-first approach, open source extensibility and broad cloud provider support.
10:02 The team uses Cosmos for developer-facing visibility and lineage and BashOperator for internal pipelines where runtime performance matters.
12:20 Switching from Cosmos to the BashOperator for a frequently running pipeline reduced runtime from over 15 minutes to three minutes.
13:14 Because the full dbt chain runs inside Airflow, a configurable downstream DAG can deliver processed data directly to each merchant's preferred destination, with secrets management and SLA tracking already handled.
15:03 Per-team alerting is hooked to each DAG by owner and severity, so teams can react to SLA breaches immediately.
18:09 ShipMonk uses Airflow in three ways for AI: authoring DAGs faster with skills, orchestrating AI workloads in Lambda and containers and using Astronomer's skills repo to simplify Airflow version upgrades.
Resources Mentioned:
Filip Kunčar
https://www.linkedin.com/in/filipkuncar/
ShipMonk Product Development
https://www.linkedin.com/company/shipmonk-product-development/
ShipMonk | Website
http://www.shipmonk.com
Astronomer Cosmos
http://www.astronomer.io/cosmos
Astronomer AI Skills Repo
http://www.github.com/astronomer/airflow-llm-providers-demo
Datadog
http://www.datadoghq.com
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning