调研了几款开源分布式任务调度系统(workflow manager的叫法更合适?),最终选择airflow。
记录一些关于airflow的link方便快捷查询:
官方tutorial比较完善:https://airflow.apache.org/
airflow 简明指南:http://morefreeze.github.io/2016/12/airflow.html
airflow 进阶:http://morefreeze.github.io/2017/02/airflow-advance.html
HA:http://site.clairvoyantsoft.com/making-apache-airflow-highly-available/
airflow cluster:http://site.clairvoyantsoft.com/setting-apache-airflow-cluster/
https://www.slideshare.net/RobertSanders49/airflow-clustering-and-high-availability/
与其它任务调度系统比较:https://www.bizety.com/2017/06/05/open-source-data-pipeline-luigi-vs-azkaban-vs-oozie-vs-airflow/
https://github.com/meirwah/awesome-workflow-engines
除了schduler,其它服务均可通过配置实现高可用。画了一个HA部署图,其中scheduler failover controller需要开发。