zoukankan      html  css  js  c++  java
  • 【原创】CentOS 7 安装airflow

    该文是基于python虚拟化环境来安装,非虚拟化也是一样,虚拟化我只是不想破环系统环境。

    安装python虚拟环境

    pip install virtualenv

    设置环境变量

    sudo vi /etc/profile

    将如下内容添加到末尾

    export PYTHON_HOME=/usr/local/python3

    export PATH=$PATH:$PYTHON_HOME/bin

    source /etc/profile

    创建虚拟环境存储文件夹

    mkdir /softwares/pyenv_for_airflow

    cd pyenv_for_airflow/

    创建python虚拟环境

    virtualenv --no-site-packages airflow_env

    赋权

    chmod +x -R *

    激活虚拟环境

    cd bin

    source ./activate

    安装依赖组件

    yum -y install gcc zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

    yum -y install python-devel mysql-devel

    yum -y install python3-devel

    yum -y install cyrus-sasl cyrus-sasl-devel cyrus-sasl-lib

    pip install paramiko

    pip install pymysql

    pip install sqlalchemy

    vi /etc/profile

    export AIRFLOW_HOME=/softwares/airflow

    export SLUGIFY_USES_TEXT_UNIDECODE=yes

    #即时生效

    source /etc/profile

    安装airflow,all全安装

    pip install apache-airflow[all]

    初始化数据库

    cd /softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin

    ./airflow initdb

    查看其生成文件

    cd /softwares/airflow/

    创建mysql后台数据库

    create database airflow_db default charset utf8 collate utf8_general_ci;

    create user 'airflow'@'%' identified by 'airflow_db';

    create user 'airflow'@'localhost' identified by 'airflow_db';

    grant all on airflow_db.* to 'airflow'@'%';

    flush privileges;

    -----------------------------------------utf8mb4字符的---------------------------------------------------------------

    create database airflow_db default charset utf8mb4 collate utf8mb4_unicode_ci;

    create user 'airflow'@'%' identified by 'airflow_db';

    create user 'airflow'@'localhost' identified by 'airflow_db';

    grant all on airflow_db.* to 'airflow'@'%';

    flush privileges;

    配置airflow使用LocalExecutor执行器,及使用MySQL数据库

    vi airflow/airflow.cfg

    executor = LocalExecutor

    sql_alchemy_conn = mysql://root:123456@airflow.mn01:3306/airflow_db

    [webserver]

    base_url = http://airflow.mn01:8085

    web_server_port = 8085

    时区

    default_timezone = Asia/Shanghai

    还需要修改3个文件

    #1、修改webserver页面上右上角展示的时间:

    vi ${PYTHON_HOME}/lib/python3.7/site-packages/airflow/www/templates/admin/master.html

    var UTCseconds = (x.getTime() + x.getTimezoneOffset()*60*1000);

            $("#clock").clock({

    "dateFormat":"Y-m-d ",

    "timeFormat":"H:i:s %UTC%",

    "timestamp":UTCseconds

            }).click(function(){

    alert('{{ hostname }}');

            });

    改为:

    var UTCseconds = x.getTime();

            $("#clock").clock({

    "dateFormat":"Y-m-d ",

    "timeFormat":"H:i:s",

    "timestamp":UTCseconds

            }).click(function(){

    alert(

    #2、修改airflow/utils/timezone.py

    #在 utc = pendulum.timezone('UTC') 这行(第27行)代码下添加

    from airflow import configuration as conf

    try:

        tz = conf.get("core", "default_timezone")

    if tz == "system":

            utc = pendulum.local_timezone()

    else:

            utc = pendulum.timezone(tz)

    except Exception:

            pass

    #修改utcnow()函数 (在第69行)

    #d = dt.datetime.utcnow()

    d = dt.datetime.now()

    #3、修改airflow/utils/sqlalchemy.py

    #在utc = pendulum.timezone('UTC') 这行(第37行)代码下添加

    from airflow import configuration as conf

    try:

        tz = conf.get("core", "default_timezone")

    if tz == "system":

            utc = pendulum.local_timezone()

    else:

            utc = pendulum.timezone(tz)

    except Exception:

            pass

    重新初始化数据库

    ./airflow initdb

    启动服务

    cd /softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin

    ./airflow webserver -D

    可能错误

    错误1:启动可能报错:FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn' ,找不到gunicorn。

    airflow webserver启动时,会调用subprocess.Popen创建子进程,webserver使用gunicorn,启动参数:

    1: ['gunicorn', '-w', '4', '-k', 'sync', '-t', '120', '-b', '0.0.0.0:8080', '-n', 'airflow-webserver', '-p', '/home/admin/airflow/airflow-webserver.pid', '-c', 'airflow.www.gunicorn_config', '--access-logfile', '-', '--error-logfile', '-', 'airflow.www.app:cached_app()']

    执行gunicorn启动时,因为在PATH中找不到该命令报错。

    创建gunicorn软连接

    ln –fs /home/admin/python3.6/bin/gunicorn/bin/gunicorn /bin/gunicorn

    或者将/usr/local/python3/bin添加到PATH,export PATH=$PATH:/usr/local/python3/bin

    #即使生效

    source /etc/profile

    错误2:有可能会启动不了,可以查看err日志,

    一般报错什么pid已经存在,这时候需要删除airflow目录下的airflow-webserver-monitor.pid文件

    启动其它服务

    ./airflow scheduler -D

    ./airflow worker -D

    #启动flower

    ./airflow flower-D

    默认的端口为 5555,您可以在浏览器地址栏中输入 "http://hostip:5555" 来访问 flower ,对 celery 消息队列进行监控。

    设置开机启动服务

    #1、创建启动shell脚本

    cd /softwares/

    mkdir shellscripts

    cd shellscripts/

    touch startairflow.sh

    vi startairflow.sh

    #!/bin/bash

    # chkconfig: 2345 10 90

    # description:airflow开机自启脚本

    #因为pid文件存在启动会报错,所以启动服务前先判定是否存在pid文件,存在删除先

    airflow_path="/softwares/airflow/"

    airflow_webserver_monitor_name="airflow-webserver-monitor.pid"

    airflow_webserver_pid_name="airflow-webserver.pid"

    airflow_scheduler_pid_name="airflow-scheduler.pid"

    airflow_worker_pid_name="airflow-worker.pid"

    if [ -x "$airflow_path" ]; then

    echo "$airflow_path existed"

    cd "$airflow_path"

    if [ -f "$airflow_webserver_monitor_name" ]; then

    echo "$airflow_webserver_monitor_name existed, i can delete it"

    rm -rf "$airflow_webserver_monitor_name"

    fi

    if [ -f "$airflow_webserver_pid_name" ]; then

    echo "$airflow_webserver_pid_name existed, i can delete it"

    rm -rf "$airflow_webserver_pid_name"

    fi

    if [ -f "$airflow_scheduler_pid_name" ]; then

    echo "$airflow_scheduler_pid_name existed, i can delete it"

    rm -rf "$airflow_scheduler_pid_name"

    fi

    if [ -f "$airflow_worker_pid_name" ]; then

    echo "$airflow_worker_pid_name existed, i can delete it"

    rm -rf "$airflow_worker_pid_name"

    fi

    fi

    #进入python虚拟环境

    cd /softwares/pyenv_for_airflow/airflow_env/bin

    #激活虚拟环境

    source ./activate

    #启动相应的airflow 服务

    /softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin/airflow webserver -D

    /softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin/airflow scheduler -D

    #LocalExecutor模式不需要启动worker

    #/softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin/airflow worker -D

    #2、将bash脚本cp到inti.d

    sudo cp startairflow.sh /etc/init.d/startairflow

    #3、加入到自启动中

    #增加执行权限

    cd /etc/init.d/

    sudo chmod +x startairflow

    #加入自动启动

    sudo chkconfig startairflow on

    #查看是否增加到自启动,2345为on即设置OK

    chkconfig --list

    将airflow命令加入PATH系统变量中,不需要每次指定到airflow bin目录下执行

    sudo vi /etc/profile

    #增加如下内容到末尾

    export AIRFLOW_CLI_HOME=/usr/local/python3/lib/python3.7/site-packages/airflow/

    export PATH=$PATH:$AIRFLOW_CLI_HOME/bin

    #立即生效

    source /etc/profile


    如果您觉得此文章对您有帮助,请点击右下方【推荐】让更多人看到,thanks!

  • 相关阅读:
    JavaSE学习(二):进制转换—数据类型转换—Java运算符
    JavaSE学习(五):数组及其基础操作
    iOS工作中的经验总结—马甲包审核以及常见审核问题!!!(干货)
    月薪过万不是梦!2018年最全/最新Python面试题(整理汇总)
    Python:爬虫技巧总结!
    【转】maven学习(下) 利用Profile构建不同环境的部署包
    【转】maven学习(上) 基本入门用法
    Java从控制台获取数据的方法
    【转】LinkedHashMap实现由插入集合的顺序输出
    浅谈String/StringBuffer/StringBuilder字符串的拼接
  • 原文地址:https://www.cnblogs.com/xiongnanbin/p/11836366.html
Copyright © 2011-2022 走看看