zoukankan      html  css  js  c++  java
  • linux 之 DolphinScheduler 安装步骤

    下载安装包

    直接进官网下载 https://dolphinscheduler.apache.org/zh-cn/download/download.html

    参考官方文档 https://dolphinscheduler.apache.org/zh-cn/docs/1.3.2/user_doc/cluster-deployment.html

    我下载的是1.3.2版本
    apache-dolphinscheduler-incubating-1.3.2-dolphinscheduler-bin.tar.gz

    基础环境

    系统版本:   centos6.5
    普通用户:   hadoop
    家目录:      /hadoop
    JDK1.8:     /hadoop/app/jdk1.8.0_281
    mysql5.7.27:         /hadoop/app/mysql
    zookeeper-3.5.6:  /hadoop/app/zookeeper-3.5.6
    hadoop-2.7.7:       /hadoop/app/hadoop-2.7.7

    安装机器IP及hostname
    192.168.100.10 bigdata01
    192.168.100.11 bigdata02
    192.168.100.12 bigdata03

    配置sudo免密

    使用root用户给每台机器配置sudo免密

    vi /etc/sudoers

    添加

    hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL

    注释
    # Defaults requirett

    或者

    echo 'hadoop  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
    sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

    注意:
    因为是以 sudo -u {linux-user} 切换不同linux用户的方式来实现多用户运行作业,所以部署用户需要有 sudo 权限,而且是免密的。
    如果/etc/sudoers文件中有"Default requiretty"这行,须要注释掉
    如果用到资源上传的话,还需要在`HDFS或者MinIO`上给该部署用户分配读写的权限

     配置hostname

    在所有机器上使用root用户配置hostname

    vi /etc/hosts

    192.168.100.10 bigdata01

    192.168.100.11 bigdata02
    192.168.100.12 bigdata03

    配置ssh免密

    在三台机器上都使用hadoop用户配置ssh免密

    ssh-keygen -t rsa -m PEM

    一直按回车,都设置为默认值,然后再当前用户的Home目录下的.ssh目录中会生成公钥文件(id_rsa.pub)和私钥文件(id_rsa)

    分发公钥

    ssh-copy-id 192.168.100.10
    ssh-copy-id 192.168.100.11
    ssh-copy-id 192.168.100.12

    注意:正常设置后,ssh bigdata01 是不需要再输入密码的

    配置JAVA环境

    hadoop用户已经安装/hadoop/app/jdk1.8.0_281
    将jdk软链/bin/java下

    因为已经存在open-jdk软链接,需root用户修改

    sudo ln -snf /hadoop/app/jdk1.8.0_281/bin/java /bin/java

    数据库初始化

    CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
    GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'root'@'%' IDENTIFIED BY '123';
    GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'root'@'localhost' IDENTIFIED BY '123';
    flush privileges;

    添加mysql-connector-java 驱动jar包

    手动添加 [ mysql-connector-java 驱动 jar ] 包mysql-connector-java-5.1.49.jar到lib目录

    下载mysql-connector-java-5.1.49.jar包

    修改配置文件conf/datasource.properties 

    vi conf/datasource.properties

    spring.datasource.driver-class-name=com.mysql.jdbc.Driver
    spring.datasource.url=jdbc:mysql://ywjcapp4:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
    spring.datasource.username=root
    spring.datasource.password=123


    执行建表及导入基础数据脚本

    sh script/create-dolphinscheduler.sh

    配置运行参数

    vi conf/env/dolphinscheduler_env.sh

    export HADOOP_HOME=/hadoop/app/hadoop-2.7.7
    export HADOOP_CONF_DIR=/hadoop/app/hadoop-2.7.7/etc/hadoop
    #export SPARK_HOME1=/opt/soft/spark1
    #export SPARK_HOME2=/opt/soft/spark2
    #export PYTHON_HOME=/opt/soft/python

    export JAVA_HOME=/opt/soft/java

    #export HIVE_HOME=/opt/soft/hive
    #export FLINK_HOME=/opt/soft/flink
    #export DATAX_HOME=/opt/soft/datax/bin/datax.py

    export PATH=$JAVA_HOME/bin:$PATH

    修改一键部署配置文件 conf/config/install_config.conf中的各参数

    vi conf/config/install_config.conf
    #
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #
    
    
    # NOTICE :  If the following config has special characters in the variable `.*[]^${}+?|()@#&`, Please escape, for example, `[` escape to `[`
    # 这里填 mysql or postgresql
    dbtype="mysql"
    
    # db config
    # 数据库连接地址
    dbhost="bigdata01:3306"
    
    # 数据库用户名,此处需要修改为上面设置的{user}具体值
    username="root"
    
    # 数据库名
    dbname="dolphinscheduler"
    
    # 数据库密码, 如果有特殊字符,请使用转义,需要修改为上面设置的{password}具体值
    password="123"
    
    # Zookeeper地址
    zkQuorum="bigdata01:2181,bigdata02:2181,bigdata03:2181"
    
    # #将DS安装到哪个目录,如: /opt/soft/dolphinscheduler,不同于现在的目录
    installPath="/hadoop/app/ds"
    
    #使用哪个用户部署,使用第3节创建的用户
    # Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself
    deployUser="hadoop"
    
    # 邮件配置,以qq邮箱为例
    # 邮件协议
    mailProtocol="SMTP"
    
    # 邮件服务地址
    mailServerHost="smtp.qq.com"
    
    # 邮件服务端口
    # note: Different protocols and encryption methods correspond to different ports, when SSL/TLS is enabled, make sure the port is correct.
    mailServerPort="25"
    
    # mailSender和mailUser配置成一样即可
    # 发送者
    mailSender="xxx@qq.com"
    
    # 发送用户
    mailUser="xxx@qq.com"
    
    # 邮箱密码
    # note: The mail.passwd is email service authorization code, not the email login password.
    mailPassword="xxx"
    
    # TLS协议的邮箱设置为true,否则设置为false
    starttlsEnable="true"
    
    # 开启SSL协议的邮箱配置为true,否则为false。注意: starttlsEnable和sslEnable不能同时为true
    # only one of TLS and SSL can be in the true state.
    sslEnable="false"
    
    #note: 邮件服务地址值,参考上面 mailServerHost
    sslTrust="smtp.qq.com"
    
    # 业务用到的比如sql等资源文件上传到哪里,可以设置:HDFS,S3,NONE,
    # 单机如果想使用本地文件系统,请配置为HDFS,因为HDFS支持本地文件系统;
    # 如果不需要资源上传功能请选择NONE。强调一点:使用本地文件系统不需要部署hadoop
    # resource storage type:HDFS,S3,NONE
    resourceStorageType="HDFS"
    
    # 如果上传资源保存想保存在hadoop上,hadoop集群的NameNode启用了HA的话,
    # 需要将hadoop的配置文件core-site.xml和hdfs-site.xml放到安装路径(/hadoop/app/ds/conf)的conf目录下,并配置namenode cluster名称;
    # 如果NameNode不是HA,则只需要将mycluster修改为具体的ip或者主机名即可
    # if resourceStorageType is HDFS,defaultFS write namenode address,HA you need to put core-site.xml and hdfs-site.xml in the conf directory.
    # if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
    # Note,s3 be sure to create the root directory /dolphinscheduler
    defaultFS="hdfs://nn1:8020"
    
    
    # if resourceStorageType is S3, the following three configuration is required, otherwise please ignore
    s3Endpoint="http://192.168.xx.xx:9010"
    s3AccessKey="xxxxxxxxxx"
    s3SecretKey="xxxxxxxxxx"
    
    # 如果没有使用到Yarn,保持以下默认值即可;
    # 如果ResourceManager是HA,则配置为ResourceManager节点的主备ip或者hostname,比如"192.168.xx.xx,192.168.xx.xx";
    # 如果是单ResourceManager请配置yarnHaIps=""即可
    # if resourcemanager HA enable, please type the HA ips ; if resourcemanager is single, make this value empty
    yarnHaIps="bigdata01,bigdata02"
    
    # 如果ResourceManager是HA或者没有使用到Yarn保持默认值即可;
    # 如果是单ResourceManager,请配置真实的ResourceManager主机名或者ip
    # if resourcemanager HA enable or not use resourcemanager, please skip this value setting; If resourcemanager is single, you only need to replace yarnIp1 to actual resourcemanager hostname.
    # singleYarnIp="yarnIp1"
    
    # 资源上传根路径,主持HDFS和S3,由于hdfs支持本地文件系统,需要确保本地文件夹存在且有读写权限
    # resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。/dolphinscheduler is recommended
    resourceUploadPath="/hadoop/data/dolphinscheduler"
    
    # 具备权限创建resourceUploadPath的用户
    # who have permissions to create directory under HDFS/S3 root path
    # Note: if kerberos is enabled, please config hdfsRootUser=
    hdfsRootUser="hadoop"
    
    # kerberos config
    # whether kerberos starts, if kerberos starts, following four items need to config, otherwise please ignore
    kerberosStartUp="false"
    # kdc krb5 config file path
    krb5ConfPath="$installPath/conf/krb5.conf"
    # keytab username
    keytabUserName="hdfs-mycluster@ESZ.COM"
    # username keytab path
    keytabPath="$installPath/conf/hdfs.headless.keytab"
    
    
    # api server port
    apiServerPort="12345"
    
    # 在哪些机器上部署DS服务,本机选localhost
    # install hosts
    # Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
    ips="bigdata01,bigdata02,bigdata03"
    
    #ssh端口,默认22
    # ssh port, default 22
    # Note: if ssh port is not default, modify here
    sshPort="22"
    
    #master服务部署在哪台机器上
    # run master machine
    # Note: list of hosts hostname for deploying master
    masters="bigdata01,bigdata02"
    
    # worker服务部署在哪台机器上,并指定此worker属于哪一个worker组,下面示例的default即为组名
    # run worker machine
    # note: need to write the worker group name of each worker, the default value is "default"
    workers="bigdata01:default,bigdata02:default,bigdata03:default"
    
    # 报警服务部署在哪台机器上
    # run alert machine
    # note: list of machine hostnames for deploying alert server
    alertServer="bigdata01"
    
    # 后端api服务部署在在哪台机器上
    # run api machine
    # note: list of machine hostnames for deploying api server
    apiServers="bigdata01"

    执行一键安装

    sh install.sh

    注意:
    将hadoop的配置文件 core-site.xml 和 hdfs-site.xml 放到/hadoop/app/ds/conf下面:

    cp /hadoop/app/hadoop-2.7.7/etc/hadoop/core-site.xml /hadoop/app/ds/conf
    cp /hadoop/app/hadoop-2.7.7/etc/hadoop/hdfs-site.xml /hadoop/app/ds/conf

    重启服务

    /hadoop/app/ds/bin/stop_all.sh
    /hadoop/app/ds/bin/start_all.sh

      进程说明

        MasterServer 主要负责 DAG 的切分和任务状态的监控
        WorkerServer/LoggerServer 主要负责任务的提交、执行和任务状态的更新。LoggerServer 用于 Rest Api 通过 RPC 查看日志
        ApiServer 提供 Rest Api 服务,供 UI 进行调用
        AlertServer 提供告警服务

    查看前台web页面

    初始账号、密码: admin/dolphinscheduler123

    http://192.168.100.10:12345/dolphinscheduler

    OK,安装完成!

    世风之狡诈多端,到底忠厚人颠扑不破; 末俗以繁华相尚,终觉冷淡处趣味弥长。
  • 相关阅读:
    Linux的CPU负载
    C++ 内接连与外接连
    boost 串口通信
    创建型模式--单例模式
    Python urllib与urllib2
    CodeBlocks使用boost+MinGW
    Python 线程(七):local(线程局部存储)
    Python 线程(六):Timer(定时器)
    Python 线程(五):Event
    Python 线程(四):Semphore同步
  • 原文地址:https://www.cnblogs.com/simple-li/p/14705460.html
Copyright © 2011-2022 走看看