zoukankan      html  css  js  c++  java
  • AirFlow常见问题汇总

    airflow常见问题的排查记录如下:


    1,airflow怎么批量unpause大量的dag任务

    ​ 普通少量任务可以通过命令airflow unpause dag_id命令来启动,或者在web界面点击启动按钮实现,但是当任务过多的时候,一个个任务去启动就比较麻烦。其实dag信息是存储在数据库中的,可以通过批量修改数据库信息来达到批量启动dag任务的效果。假如是用mysql作为sql_alchemy_conn,那么只需要登录airflow数据库,然后更新表dag的is_paused字段为0即可启动dag任务。

    示例: update dag set is_paused = 0 where dag_id like "benchmark%";


    2,airflow的scheduler进程在执行一个任务后就挂起进入假死状态

    出现这个情况的一般原因是scheduler调度器生成了任务,但是无法发布出去。而日志中又没有什么错误信息。

    可能原因是Borker连接依赖库没安装:
    如果是redis作为broker则执行pip install apache‐airflow[redis]
    如果是rabbitmq作为broker则执行pip install apache-airflow[rabbitmq]
    还有要排查scheduler节点是否能正常访问rabbitmq。


    3,当定义的dag文件过多的时候,airflow的scheduler节点运行效率缓慢

    airflow的scheduler默认是起两个线程,可以通过修改配置文件airflow.cfg改进:

    [scheduler]
    # The scheduler can run multiple threads in parallel to schedule dags.
    # This defines how many threads will run.
    #默认是2这里改为100
    max_threads = 100
    

    4,airflow日志级别更改

    $ vi airflow.cfg
    
    [core]
    #logging_level = INFO
    logging_level = WARNING
    

    NOTSET < DEBUG < INFO < WARNING < ERROR < CRITICAL

    如果把log的级别设置为INFO, 那么小于INFO级别的日志都不输出, 大于等于INFO级别的日志都输出。也就是说,日志级别越高,打印的日志越不详细。默认日志级别为WARNING。

    注意: 如果将logging_level改为WARNING或以上级别,则不仅仅是日志,命令行输出明细也会同样受到影响,也只会输出大于等于指定级别的信息,所以如果命令行输出信息不全且系统无错误日志输出,那么说明是日志级别过高导致的。


    5,AirFlow: jinja2.exceptions.TemplateNotFound

    ​ 这是由于airflow使用了jinja2作为模板引擎导致的一个陷阱,当使用bash命令的时候,尾部必须加一个空格:

    • Described here : see below. You need to add a space after the script name in cases where you are directly calling a bash scripts in the bash_command attribute of BashOperator - this is because the Airflow tries to apply a Jinja template to it, which will fail.
    t2 = BashOperator(
    task_id='sleep',
    bash_command="/home/batcher/test.sh", // This fails with `Jinja template not found` error
    #bash_command="/home/batcher/test.sh ", // This works (has a space after)
    dag=dag)
    

    参考链接:

    https://stackoverflow.com/questions/42147514/templatenotfound-error-when-running-simple-airflow-bashoperator

    https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls


    6,AirFlow: Task is not able to be run

    任务执行一段时间后突然无法执行,后台worker日志显示如下提示:

    [2018-05-25 17:22:05,068] {jobs.py:2508} INFO - Task is not able to be run
    

    查看任务对应的执行日志:

    cat /home/py/airflow-home/logs/testBashOperator/print_date/2018-05-25T00:00:00/6.log
    ...
    [2018-05-25 17:22:05,067] {models.py:1190} INFO - Dependencies not met for <TaskInstance: testBashOperator.print_date 2018-05-25 00:00:00 [success]>, 
    dependency 'Task Instance State' FAILED: Task is in the 'success' state which is not a valid state for execution. The task must be cleared in order to be run.
    

    根据错误提示,说明依赖任务状态失败,针对这种情况有两种解决办法:

    • 使用airflow run运行task的时候指定忽略依赖task:

      $ airflow run -A dag_id task_id execution_date
      
    • 使用命令airflow clear dag_id进行任务清理:

      $ airflow clear -u testBashOperator
      

    7,CELERY: PRECONDITION_FAILED - inequivalent arg 'x-expires' for queue 'celery@xxxx.celery.pidbox' in vhost ''

    在升级celery 4.x以后使用rabbitmq为broker运行任务抛出如下异常:

    [2018-06-29 09:32:14,622: CRITICAL/MainProcess] Unrecoverable error: PreconditionFailed(406, "PRECONDITION_FAILED - inequivalent arg 'x-expires' for queue 'celery@PQ
    SZ-L01395.celery.pidbox' in vhost '/': received the value '10000' of type 'signedint' but current is none", (50, 10), 'Queue.declare')
    Traceback (most recent call last):
      File "c:programdataanaconda3libsite-packagesceleryworkerworker.py", line 205, in start
        self.blueprint.start(self)
    .......
      File "c:programdataanaconda3libsite-packagesamqpchannel.py", line 277, in _on_close
        reply_code, reply_text, (class_id, method_id), ChannelError,
    amqp.exceptions.PreconditionFailed: Queue.declare: (406) PRECONDITION_FAILED - inequivalent arg 'x-expires' for queue 'celery@PQSZ-L01395.celery.pidbox' in vhost '/'
    : received the value '10000' of type 'signedint' but current is none
    

    出现该错误的原因一般是因为rabbitmq的客户端和服务端参数不一致导致的,将其参数保持一致即可。

    比如这里提示是x-expires 对应的celery中的配置是control_queue_expires。因此只需要在配置文件中加上control_queue_expires = None即可

    ​ 在celery 3.x中是没有这两项配置的,在4.x中必须保证这两项配置的一致性,不然就会抛出如上的异常。

    我这里遇到的了两个rabbitmq的配置与celery配置的映射关系如下表:

    rabbitmq celery4.x
    x-expires control_queue_expires
    x-message-ttl control_queue_ttl

    8,CELERY: The AMQP result backend is scheduled for deprecation in version 4.0 and removal in version v5.0.Please use RPC backend or a persistent backend

    celery升级到4.x之后运行抛出如下异常:

    /anaconda/anaconda3/lib/python3.6/site-packages/celery/backends/amqp.py:67: CPendingDeprecationWarning: 
        The AMQP result backend is scheduled for deprecation in     version 4.0 and removal in version v5.0.     Please use RPC backend or a persistent backend.
      alternative='Please use RPC backend or a persistent backend.')
    

    原因解析:
    在celery 4.0中 rabbitmq 配置result_backbend方式变了:
    以前是跟broker一样:
    result_backend = 'amqp://guest:guest@localhost:5672//'
    现在对应的是rpc配置:
    result_backend = 'rpc://'

    参考链接:
    http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-event_queue_prefix


    9,CELERY: ValueError('not enough values to unpack (expected 3, got 0)',)

    windows上运行celery 4.x抛出以下错误:

    [2018-07-02 10:54:17,516: ERROR/MainProcess] Task handler raised error: ValueError('not enough values to unpack (expected 3, got 0)',)
    Traceback (most recent call last):
    	......
        tasks, accept, hostname = _loc
    ValueError: not enough values to unpack (expected 3, got 0)
    
    

    celery 4.x暂时不支持windows平台,如果为了调试目的的话,可以通过替换celery的线程池实现以达到在windows平台上运行的目的:

    pip install eventlet
    
    celery -A <module> worker -l info -P eventlet
    

    参考链接:

    https://stackoverflow.com/questions/45744992/celery-raises-valueerror-not-enough-values-to-unpack

    https://blog.csdn.net/qq_30242609/article/details/79047660


    10,Airflow: ERROR - 'DisabledBackend' object has no attribute '_get_task_meta_for'

    airflow运行中抛出以下异常:

    Traceback (most recent call last):
      File "/anaconda/anaconda3/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 83, in sync
    ......
        return self._maybe_set_cache(self.backend.get_task_meta(self.id))
      File "/anaconda/anaconda3/lib/python3.6/site-packages/celery/backends/base.py", line 307, in get_task_meta
        meta = self._get_task_meta_for(task_id)
    AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
    [2018-07-04 10:52:14,746] {celery_executor.py:101} ERROR - Error syncing the celery executor, ignoring it:
    [2018-07-04 10:52:14,746] {celery_executor.py:102} ERROR - 'DisabledBackend' object has no attribute '_get_task_meta_for'
    

    这种错误有两种可能原因:

    1. CELERY_RESULT_BACKEND属性没有配置或者配置错误;
    2. celery版本太低,比如airflow 1.9.0要使用celery4.x,所以检查celery版本,保持版本兼容;

    11,airflow.exceptions.AirflowException dag_id could not be found xxxx. Either the dag did not exist or it failed to parse

    查看worker日志 airflow-worker.err

    airflow.exceptions.AirflowException: dag_id could not be found: bmhttp. Either the dag did not exist or it failed to parse.
    [2018-07-31 17:37:34,191: ERROR/ForkPoolWorker-6] Task airflow.executors.celery_executor.execute_command[181c78d0-242c-4265-aabe-11d04887f44a] raised unexpected: AirflowException('Celery command failed',)
    Traceback (most recent call last):
      File "/anaconda/anaconda3/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
        subprocess.check_call(command, shell=True)
      File "/anaconda/anaconda3/lib/python3.6/subprocess.py", line 291, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command 'airflow run bmhttp get_op1 2018-07-26T06:28:00 --local -sd /home/ignite/airflow/dags/BenchMark01.py' returned non-zero exit status 1.
    

    ​ 通过异常日志中的Command信息得知, 调度节点在生成任务消息的时候同时也指定了要执行的脚本的路径(通过ds参数指定),也就是说调度节点(scheduler)和工作节点(worker)相应的dag脚本文件必须置于相同的路径下面,不然就会出现以上错误。

    参考链接:

    https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found


    12,airlfow 的 REST API调用返回 Airflow 404 = lots of circles

    ​ 出现这个错误的原因是因为URL中未提供origin参数,这个参数用于重定向,例如调用airflow的/run接口,可用示例如下所示:

    http://localhost:8080/admin/airflow/run?dag_id=example_hello_world_dag&task_id=sleep_task&execution_date=20180807&ignore_all_deps=true&origin=/admin

  • 相关阅读:
    如今我这样编程,你呢?
    专注UI——是alert()打败了你!
    创业公司十分钟简单搭建GIT私有库
    小胖说事22-----iOS开发技巧之取消键盘响应和截屏功能
    C++运算符重载的妙用
    汽水瓶
    Android 应用按返回键异常退出的问题
    053第256题
    Activity、FragmentActivity和AppCompatActivity的区别
    Android中Activity和AppcompatActivity的区别(详细解析)
  • 原文地址:https://www.cnblogs.com/cord/p/9397584.html
Copyright © 2011-2022 走看看