zoukankan      html  css  js  c++  java
  • Apache Dolphin Scheduler

    Apache DolphinScheduler 是一个分布式去中心化,易扩展的可视化 DAG 工作流任务调度系统。简称 DS,包括 Web 及若干服务,它依赖 PostgreSQL 和 Zookeeper,自身的服务模块包括:api, alert, master, worker(有一个 logger 服务,运行在 worker 中)等。详细部署可以参考:Docker 部署 Dolphin Scheduler

    官方提供了 docker-compose.yml,位于项目的 docker/docker-swarm/ 目录下,本文以 v1.3.8 版本为例,讲解 docker-compose.yml 内的具体内容,该版本的 Compose 基于 apache/dolphinscheduler:1.3.8 的 Docker 镜像,DS Docker 构建可以参考之前写的这篇博客:Apache Dolphin Scheduler - Dockerfile 详解,主要的配置修改、流程启动都封装在 Dockerfile 中

    Docker Compose

    version: "3.1"
    
    services:
      
      # PostgreSQL
      dolphinscheduler-postgresql:
        image: postgres:11.12
        environment:
          # 设置时区
          TZ: Asia/Shanghai
          # PostgreSQL 相关的配置
          POSTGRES_USER: root
          POSTGRES_PASSWORD: root
          POSTGRES_DB: dolphinscheduler
        # 数据卷
        volumes:
        - dolphinscheduler-postgresql:/var/lib/postgresql/data
        # 重启策:在容器退出时总是重启容器
        restart: unless-stopped
        # 配置网络
        networks:
        - dolphinscheduler
    
      # Zookeeper
      dolphinscheduler-zookeeper:
        image: zookeeper:3.6.3
        environment:
          TZ: Asia/Shanghai
          # Zookeeper 相关配置
          ZOO_DATA_LOG_DIR: /data
          ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons
        volumes:
        - dolphinscheduler-zookeeper:/data
        restart: unless-stopped
        networks:
        - dolphinscheduler
    
      # DS 服务模块
      dolphinscheduler-api:
        image: apache/dolphinscheduler:1.3.8
        command: api-server
        ports:
        - 12345:12345
        environment:
          TZ: Asia/Shanghai
        # 引入外部环境变量
        env_file: config.env.sh
        # 健康检查
        healthcheck:
          test: ["CMD", "/root/checkpoint.sh", "ApiApplicationServer"]
          interval: 30s
          timeout: 5s
          retries: 3
        # 依赖 PostgreSQL 和 Zookeeper
        depends_on:
        - dolphinscheduler-postgresql
        - dolphinscheduler-zookeeper
        volumes:
        - dolphinscheduler-logs:/opt/dolphinscheduler/logs
        - dolphinscheduler-shared-local:/opt/soft
        - dolphinscheduler-resource-local:/dolphinscheduler
        restart: unless-stopped
        networks:
        - dolphinscheduler
    
      dolphinscheduler-alert:
        image: apache/dolphinscheduler:1.3.8
        command: alert-server
        environment:
          TZ: Asia/Shanghai
        env_file: config.env.sh
        healthcheck:
          test: ["CMD", "/root/checkpoint.sh", "AlertServer"]
          interval: 30s
          timeout: 5s
          retries: 3
        depends_on:
        - dolphinscheduler-postgresql
        volumes:
        - dolphinscheduler-logs:/opt/dolphinscheduler/logs
        restart: unless-stopped
        networks:
        - dolphinscheduler
    
      dolphinscheduler-master:
        image: apache/dolphinscheduler:1.3.8
        command: master-server
        environment:
          TZ: Asia/Shanghai
        env_file: config.env.sh
        healthcheck:
          test: ["CMD", "/root/checkpoint.sh", "MasterServer"]
          interval: 30s
          timeout: 5s
          retries: 3
        depends_on:
        - dolphinscheduler-postgresql
        - dolphinscheduler-zookeeper
        volumes:
        - dolphinscheduler-logs:/opt/dolphinscheduler/logs
        - dolphinscheduler-shared-local:/opt/soft
        restart: unless-stopped
        networks:
        - dolphinscheduler
    
      dolphinscheduler-worker:
        image: apache/dolphinscheduler:1.3.8
        command: worker-server
        environment:
          TZ: Asia/Shanghai
        env_file: config.env.sh
        healthcheck:
          test: ["CMD", "/root/checkpoint.sh", "WorkerServer"]
          interval: 30s
          timeout: 5s
          retries: 3
        depends_on:
        - dolphinscheduler-postgresql
        - dolphinscheduler-zookeeper
        volumes:
        - dolphinscheduler-worker-data:/tmp/dolphinscheduler
        - dolphinscheduler-logs:/opt/dolphinscheduler/logs
        - dolphinscheduler-shared-local:/opt/soft
        - dolphinscheduler-resource-local:/dolphinscheduler
        restart: unless-stopped
        networks:
        - dolphinscheduler
    # 声明使用到的网络
    networks:
      dolphinscheduler:
        driver: bridge
    
    # 声明使用到的数据卷
    volumes:
      dolphinscheduler-postgresql:
      dolphinscheduler-zookeeper:
      dolphinscheduler-worker-data:
      dolphinscheduler-logs:
      dolphinscheduler-shared-local:
      dolphinscheduler-resource-local:
    

    每一个 service 都定义了 TZ 的环境变量,设置容器的时区为亚洲上海,restart 重启策略都设置为:unless-stopped,即:在容器退出时总是重启容器

    在 yml 的最后定义了 Compose 使用到的 networks 和 volumes

    所有 service 使用同一个网络:dolphinscheduler,driver 定义为:bridge,默认就是 bridge,bridge 用于应用部署在不同容器,它们之间需要通信的情况

    DS 的每个服务模块都通过 env_file 导入独立的环境变量文件 config.env.sh

    healthcheck 是健康检查,调用容器内的 checkpoint.sh,并传入服务名称,检查该 Java 进程是否存在。两次健康检查的间隔 30s,超时时间为 5s,如果超过这个时间,本次健康检查就被视为失败,retries 重试次数设置为 3,当连续失败指定次数后,则将容器状态视为 unhealthy

    PostgreSQL:POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB 分别定义了 PostgreSQL 的用户名、密码、一个名为:dolphinscheduler 的数据库

    Zookeeper:添加环境变量 ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons,把这四个命令加入白名单,避免在使用这四个四字命令时提示:stat is not executed because it is not in the whitelist

    conf 配置信息

    #============================================================================
    # Database
    #============================================================================
    # postgresql
    DATABASE_TYPE=postgresql
    DATABASE_DRIVER=org.postgresql.Driver
    DATABASE_HOST=dolphinscheduler-postgresql
    DATABASE_PORT=5432
    DATABASE_USERNAME=root
    DATABASE_PASSWORD=root
    DATABASE_DATABASE=dolphinscheduler
    DATABASE_PARAMS=characterEncoding=utf8
    # mysql
    # DATABASE_TYPE=mysql
    # DATABASE_DRIVER=com.mysql.jdbc.Driver
    # DATABASE_HOST=dolphinscheduler-mysql
    # DATABASE_PORT=3306
    # DATABASE_USERNAME=root
    # DATABASE_PASSWORD=root
    # DATABASE_DATABASE=dolphinscheduler
    # DATABASE_PARAMS=useUnicode=true&characterEncoding=UTF-8
    
    #============================================================================
    # ZooKeeper
    #============================================================================
    ZOOKEEPER_QUORUM=dolphinscheduler-zookeeper:2181
    ZOOKEEPER_ROOT=/dolphinscheduler
    
    #============================================================================
    # Common
    #============================================================================
    # common opts
    DOLPHINSCHEDULER_OPTS=
    # common env
    DATA_BASEDIR_PATH=/tmp/dolphinscheduler
    RESOURCE_STORAGE_TYPE=HDFS
    RESOURCE_UPLOAD_PATH=/dolphinscheduler
    FS_DEFAULT_FS=file:///
    FS_S3A_ENDPOINT=s3.xxx.amazonaws.com
    FS_S3A_ACCESS_KEY=xxxxxxx
    FS_S3A_SECRET_KEY=xxxxxxx
    HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE=false
    JAVA_SECURITY_KRB5_CONF_PATH=/opt/krb5.conf
    LOGIN_USER_KEYTAB_USERNAME=hdfs@HADOOP.COM
    LOGIN_USER_KEYTAB_PATH=/opt/hdfs.keytab
    KERBEROS_EXPIRE_TIME=2
    HDFS_ROOT_USER=hdfs
    RESOURCE_MANAGER_HTTPADDRESS_PORT=8088
    YARN_RESOURCEMANAGER_HA_RM_IDS=
    YARN_APPLICATION_STATUS_ADDRESS=http://ds1:8088/ws/v1/cluster/apps/%s
    # skywalking
    SKYWALKING_ENABLE=false
    SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800
    SW_GRPC_LOG_SERVER_HOST=127.0.0.1
    SW_GRPC_LOG_SERVER_PORT=11800
    # dolphinscheduler env
    HADOOP_HOME=/opt/soft/hadoop
    HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
    SPARK_HOME1=/opt/soft/spark1
    SPARK_HOME2=/opt/soft/spark2
    PYTHON_HOME=/usr/bin/python
    JAVA_HOME=/usr/local/openjdk-8
    HIVE_HOME=/opt/soft/hive
    FLINK_HOME=/opt/soft/flink
    DATAX_HOME=/opt/soft/datax
    
    #============================================================================
    # Master Server
    #============================================================================
    MASTER_SERVER_OPTS=-Xms1g -Xmx1g -Xmn512m
    MASTER_EXEC_THREADS=100
    MASTER_EXEC_TASK_NUM=20
    MASTER_DISPATCH_TASK_NUM=3
    MASTER_HOST_SELECTOR=LowerWeight
    MASTER_HEARTBEAT_INTERVAL=10
    MASTER_TASK_COMMIT_RETRYTIMES=5
    MASTER_TASK_COMMIT_INTERVAL=1000
    MASTER_MAX_CPULOAD_AVG=-1
    MASTER_RESERVED_MEMORY=0.3
    
    #============================================================================
    # Worker Server
    #============================================================================
    WORKER_SERVER_OPTS=-Xms1g -Xmx1g -Xmn512m
    WORKER_EXEC_THREADS=100
    WORKER_HEARTBEAT_INTERVAL=10
    WORKER_MAX_CPULOAD_AVG=-1
    WORKER_RESERVED_MEMORY=0.3
    WORKER_GROUPS=default
    
    #============================================================================
    # Alert Server
    #============================================================================
    ALERT_SERVER_OPTS=-Xms512m -Xmx512m -Xmn256m
    # xls file
    XLS_FILE_PATH=/tmp/xls
    # mail
    MAIL_SERVER_HOST=
    MAIL_SERVER_PORT=
    MAIL_SENDER=
    MAIL_USER=
    MAIL_PASSWD=
    MAIL_SMTP_STARTTLS_ENABLE=true
    MAIL_SMTP_SSL_ENABLE=false
    MAIL_SMTP_SSL_TRUST=
    # wechat
    ENTERPRISE_WECHAT_ENABLE=false
    ENTERPRISE_WECHAT_CORP_ID=
    ENTERPRISE_WECHAT_SECRET=
    ENTERPRISE_WECHAT_AGENT_ID=
    ENTERPRISE_WECHAT_USERS=
    
    #============================================================================
    # Api Server
    #============================================================================
    API_SERVER_OPTS=-Xms512m -Xmx512m -Xmn256m
    
    #============================================================================
    # Logger Server
    #============================================================================
    LOGGER_SERVER_OPTS=-Xms512m -Xmx512m -Xmn256m
    

    config.env.sh 定义了用到的配置,通过 env_file 的方式传入容器,它会覆盖容器内的默认配置

    参考资料

    Networking overview
    Zookeeper 四字命令
    zookeeper四字命令提示命令不在白名单中
    The “env_file” configuration option

  • 相关阅读:
    Eclipse中使用GIT提交文件至本地
    Eclipse中使用GIT更新项目
    Eclipse使用Git检出项目
    JQuery选择器排除某元素实现js代码
    如何在使用layer.prompt在输入值为空的情况下点击确定继续执行逻辑?
    怎样验证layer.prompt输入的值为数值型???
    使用ECharts制作图形时,如何设置指定图形颜色?
    JS中通过LayUI的layer.prompt弹出文本输入层,多个按钮回调获取输入值
    MAVEN环境配置
    【Linux】Tomcat安装及端口配置
  • 原文地址:https://www.cnblogs.com/aaronlinv/p/15309275.html
Copyright © 2011-2022 走看看