zoukankan      html  css  js  c++  java
  • Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)

    Tags: Hadoop

    Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)

     

    主机环境

    基本配置:

    节点数5
    操作系统 CentOS Linux release 7.5.1804 (Core)
    内存 8GB

    流程配置:

    节点数5
    操作系统 CentOS Linux release 7.5.1804 (Core)
    内存 16GB

    注: 实际生产中按照需求分配内存,如果只是在vmvare中搭建虚拟机,内存可以调整为每台主机1-2GB即可

    软件环境

    软件版本下载地址
    jdk jdk-8u172-linux-x64 点击下载
    hadoop hadoop-2.6.0-cdh5.14.2 点击下载
    zookeeeper zookeeper-3.4.5-cdh5.14.2 点击下载
    hbase hbase-1.2.0-cdh5.14.2 点击下载
    hive hive-1.1.0-cdh5.14.2 点击下载

    注: CDH5的所有软件可以在此下载:http://archive.cloudera.com/cdh5/cdh/5/

    主机规划

    5个节点角色规划如下:

    主机名CDHNode1CDHNode2CDHNode3CDHNode4CDHNode5
    IP 192.168.223.201 192.168.223.202 192.168.223.203 192.168.223.204 192.168.223.205
    namenode yes yes no no no
    dataNode no no yes yes yes
    resourcemanager yes yes no no no
    journalnode yes yes yes yes yes
    zookeeper yes yes yes no no
    hmaster(hbase) yes yes no no no
    regionserver(hbase) no no yes yes yes
    hive(hiveserver2) no no yes yes yes

    注: Journalnode和ZooKeeper保持奇数个,如果需要高可用则不少于 3 个节点。具体原因,以后详叙。

    主机安装前准备

    1. 关闭所有节点的 SELinux
    sed -i 's/^SELINUX=.*$/SELINUX=disabled/g' /etc/selinux/config 
    setenforce 0
    
    1. 关闭所有节点防火墙 firewalld or iptables
    systemctl disable firewalld;  
    systemctl stop firewalld;
    systemctl disable iptables;  
    systemctl stop iptables;
    
    1. 开启所有节点时间同步 ntpdate
    echo "*/5 * * * * /usr/sbin/ntpdate asia.pool.ntp.org | logger -t NTP" >> /var/spool/cron/root
    
    1. 设置所有节点语言编码以及时区
    echo 'export TZ=Asia/Shanghai' >> /etc/profile
    echo 'export LANG=en_US.UTF-8' >> /etc/profile
    . /etc/profile
    
    1. 所有节点添加hadoop用户
    useradd -m hadoop
    echo '123456' | passwd --stdin hadoop
    # 设置PS1
    su - hadoop
    echo 'export PS1="u@h:$PWD>"' >> ~/.bash_profile
    echo "alias mv='mv -i'
    alias rm='rm -i'" >> ~/.bash_profile
    . ~/.bash_profile
    
    1. 设置hadoop用户之间免密登录 首先在CDHNode1主机生成秘钥
    su - hadoop
    ssh-keygen -t rsa	# 一直回车即可生成hadoop用户的公钥和私钥
    cd .ssh
    vi id_rsa.pub  # 去掉私钥末尾的主机名 hadoop@CDHNode1
    cat id_rsa.pub > authorized_keys
    chmod 600 authorized_keys
    

    压缩.ssh文件夹

    su - hadoop
    zip -r ssh.zip .ssh
    

    随后分发ssh.zip到CDHNode2-5主机hadoop用户家目录解压即完成免密登录

    1. 主机内核参数优化以及最大文件打开数、最大进程数等参数优化 不同主机优化参数有可能不一样,故这里不作出具体优化方法,但如果Hadoop环境用于正式生产,必须优化,linux默认参数可能会导致hadoop集群性能低下。 
    2. datanode节点(CDHNode3-5)挂载数据盘/chunk1,大小15G,请挂载后目录需要授权给hadoop用户

    注: 以上操作需要使用 root 用户,到目前为止操作系统环境已经准备完成,以下开始正式安装,后面的操作如果不做特殊说明均使用 hadoop 用户

    安装jdk1.8

    所有节点都需要安装,安装方式都一样 解压 jdk-8u172-linux-x64.tar.gz

    tar zxvf jdk-8u172-linux-x64.tar.gz
    mkdir -p /home/hadoop/app
    mv jdk-8u172-linux-x64 /home/hadoop/app/jdk
    rm -f jdk-8u172-linux-x64.tar.gz
    

    配置环境变量 vi ~/.bash_profile 添加以下内容:

    #java
    export JAVA_HOME=/home/hadoop/app/jdk
    export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
    export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
    

    加载环境变量

    . ~/.bash_profile
    

    查看是否安装成功 java -version

    java version "1.8.0_172"
    Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
    

    如果出现以上结果证明安装成功。

    安装zookeeper

    首先在CDHNode1上安装

    解压 zookeeper-3.4.5-cdh5.14.2.tar.gz

    tar zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz
    mv zookeeper-3.4.5-cdh5.14.2 /home/hadoop/app/zookeeper
    rm -f zookeeper-3.4.5-cdh5.14.2.tar.gz
    

    设置环境变量 vi ~/.bash_profile 添加以下内容:

    #zk
    export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
    export PATH=$PATH:$ZOOKEEPER_HOME/bin
    

    加载环境变量

    . ~/.bash_profile
    

    添加配置文件 vi /home/hadoop/app/zookeeper/conf/zoo.cfg 添加以下内容:

    # The number of milliseconds of each tick  
    tickTime=2000
    # The number of ticks that the initial  
    # synchronization phase can take  
    initLimit=10
    # The number of ticks that can pass between  
    # sending a request and getting an acknowledgement  
    syncLimit=5
    # the directory where the snapshot is stored.  
    # do not use /tmp for storage, /tmp here is just  
    # example sakes.  
    #数据文件目录与日志目录  
    dataDir=/home/hadoop/data/zookeeper/zkdata
    dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
    # the port at which the clients will connect  
    clientPort=2181
    #server.服务编号=主机名称:Zookeeper不同节点之间同步和通信的端口:选举端口(选举leader)  
    server.1=CDHNode1:2888:3888
    server.2=CDHNode2:2888:3888
    server.3=CDHNode3:2888:3888
    # 节点变更时只需在此添加或者删除相应的节点(所有节点配置都需要修改),然后在启动新增或者停止删除的节点即可
    # administrator guide before turning on autopurge.  
    #  
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance  
    #  
    # The number of snapshots to retain in dataDir  
    #autopurge.snapRetainCount=3  
    # Purge task interval in hours  
    # Set to "0" to disable auto purge feature  
    #autopurge.purgeInterval=1
    

    创建所需目录

    mkdir -p /home/hadoop/data/zookeeper/zkdata
    mkdir -p /home/hadoop/data/zookeeper/zkdatalog
    mkdir -p /home/hadoop/app/zookeeper/logs
    

    添加myid vim /home/hadoop/data/zookeeper/zkdata/myid,添加:

    1
    

    注: 此数字来源于zoo.cfg中配置 server.1=CDHNode1:2888:3888行server后面的1,故CDHNode2填写2,CDHNode3填写3

    配置日志目录 vim /home/hadoop/app/zookeeper/libexec/zkEnv.sh ,修改以下参数为:

    ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
    

    注: /home/hadoop/app/zookeeper/libexec/zkEnv.sh 与 /home/hadoop/app/zookeeper/bin/zkEnv.sh 文件内容相同。启动脚本 /home/hadoop/app/zookeeper/bin/zkServer.sh 会优先读取/home/hadoop/app/zookeeper/libexec/zkEnv.sh,当其不存在时才会读取 /home/hadoop/app/zookeeper/bin/zkEnv.sh

    vim /home/hadoop/app/zookeeper/conf/log4j.properties ,修改以下参数为:

    zookeeper.root.logger=INFO, ROLLINGFILE
    zookeeper.log.dir=/home/hadoop/app/zookeeper/logs
    log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
    

    复制zookeeper到CDHNode2-3

    scp ~/.bash_profile CDHNode2:/home/hadoop
    scp ~/.bash_profile CDHNode3:/home/hadoop
    scp -pr /home/hadoop/app/zookeeper CDHNode2:/home/hadoop/app
    scp -pr /home/hadoop/app/zookeeper CDHNode3:/home/hadoop/app
    ssh CDHNode2 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
    ssh CDHNode2 "echo 2 > /home/hadoop/data/zookeeper/zkdata/myid"
    ssh CDHNode3 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
    ssh CDHNode3 "echo 3 > /home/hadoop/data/zookeeper/zkdata/myid"
    

    启动zookeeper 3个节点均启动

    /home/hadoop/app/zookeeper/bin/zkServer.sh start
    

    查看节点状态

    /home/hadoop/app/zookeeper/bin/zkServer.sh status
    

    如果一个节点为leader,另2个节点为follower,则说明Zookeeper安装成功

    查看进程

    jps
    

    其中 QuorumPeerMain 进程为zookeeper

    停止zookeeper

    /home/hadoop/app/zookeeper/bin/zkServer.sh stop
    

    安装hadoop

    首先在CDHNode1节点安装,然后复制到其他节点 解压 hadoop-2.6.0-cdh5.14.2.tar.gz

    tar zxvf hadoop-2.6.0-cdh5.14.2.tar.gz
    mv hadoop-2.6.0-cdh5.14.2 /home/hadoop/app/hadoop
    rm -f hadoop-2.6.0-cdh5.14.2.tar.gz
    

    设置环境变量 vi ~/.bash_profile 添加以下内容:

    #hadoop
    HADOOP_HOME=/home/hadoop/app/hadoop
    PATH=$HADOOP_HOME/bin:$PATH
    export HADOOP_HOME PATH
    

    加载环境变量

    . ~/.bash_profile
    

    配置HDFS

    配置 /home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh, 修改以下内容

    export JAVA_HOME=/home/hadoop/app/jdk
    

    配置 /home/hadoop/app/hadoop/etc/hadoop/core-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    	<property>
    		<name>fs.defaultFS</name>
    		<value>hdfs://cluster1</value>
    	</property>
    	<!-- 这里的值指的是默认的HDFS路径 ,取名为cluster1 -->
    	<property>
    		<name>hadoop.tmp.dir</name>
    		<value>/home/hadoop/data/tmp</value>
    	</property>
    	<!-- hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建 -->
    	<property>
    		<name>ha.zookeeper.quorum</name>
    		<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
    	</property>
    	<!-- 配置Zookeeper 管理HDFS -->
    </configuration>
    

    配置 /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    	<property>
    		<name>dfs.replication</name>
    		<value>3</value>
    	</property>
    	<!-- 数据块副本数为3 -->
    	<property>
    		<name>dfs.name.dir</name>
    		<value>/home/hadoop/data/hdfs/name</value>
    	</property>
    	<!-- 元数据保存目录,多个以','隔开 -->
    	<property>
    		<name>dfs.data.dir</name>
    		<value>/chunk1</value>
    	</property>
    	<!-- 数据保存目录,多个以','隔开 -->
    	<property>
    		<name>dfs.permissions</name>
    		<value>false</value>
    	</property>
    	<property>
    		<name>dfs.permissions.enabled</name>
    		<value>false</value>
    	</property>
    	<!-- 权限默认配置为false -->
    	<property>
    		<name>dfs.nameservices</name>
    		<value>cluster1</value>
    	</property>
    	<!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口 -->
    	<property>
    		<name>dfs.ha.namenodes.cluster1</name>
    		<value>CDHNode1,CDHNode2</value>
    	</property>
    	<!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可 -->
    	<property>
    		<name>dfs.namenode.rpc-address.cluster1.CDHNode1</name>
    		<value>CDHNode1:9000</value>
    	</property>
    	<!-- CDHNode1 rpc地址 -->
    	<property>
    		<name>dfs.namenode.http-address.cluster1.CDHNode1</name>
    		<value>CDHNode1:50070</value>
    	</property>
    	<!-- CDHNode1 http地址 -->
    	<property>
    		<name>dfs.namenode.rpc-address.cluster1.CDHNode2</name>
    		<value>CDHNode2:9000</value>
    	</property>
    	<!-- CDHNode2 rpc地址 -->
    	<property>
    		<name>dfs.namenode.http-address.cluster1.CDHNode2</name>
    		<value>CDHNode2:50070</value>
    	</property>
    	<!-- CDHNode2 http地址 -->
    	<property>
    		<name>dfs.ha.automatic-failover.enabled</name>
    		<value>true</value>
    	</property>
    	<!-- 启动故障自动恢复 -->
    	<property>
    		<name>dfs.namenode.shared.edits.dir</name>
    		<value>qjournal://CDHNode1:8485;CDHNode2:8485;CDHNode3:8485;CDHNode4:8485;CDHNode5:8485/cluster1</value>
    	</property>
    	<!-- 指定journal -->
    	<property>
    		<name>dfs.client.failover.proxy.provider.cluster1</name>
    		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    	</property>
    	<!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换 -->
    	<property>
    		<name>dfs.journalnode.edits.dir</name>
    		<value>/home/hadoop/data/journaldata/jn</value>
    	</property>
    	<!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径 -->
    	<property>
    		<name>dfs.ha.fencing.methods</name>
    		<value>shell(/bin/true)</value>
    	</property>
    	<property>
    	<name>dfs.ha.fencing.ssh.private-key-files</name>
    	<value>/home/hadoop/.ssh/id_rsa</value>
    	</property>
    	<property>
    	<name>dfs.ha.fencing.ssh.connect-timeout</name>
    	<value>10000</value>
    	</property>
    	<!-- 脑裂默认配置 -->
    	<property>
    		<name>dfs.namenode.handler.count</name>
    		<value>100</value>
    	</property>
    </configuration>
    
    

    配置 /home/hadoop/app/hadoop/etc/hadoop/slaves

    CDHNode3
    CDHNode4
    CDHNode5
    

    配置YARN

    配置 /home/hadoop/app/hadoop/etc/hadoop/mapred-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    	<property>
    		<name>mapreduce.framework.name</name>
    		<value>yarn</value>
    	</property>
    	<!-- 指定运行mapreduce的环境是Yarn,与hadoop1不同的地方 -->
    </configuration>
    

    配置 /home/hadoop/app/hadoop/etc/hadoop/yarn-site.xml

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Site specific YARN configuration properties -->
    <configuration>
         <property>
    		<name>yarn.resourcemanager.connect.retry-interval.ms</name>
    		<value>2000</value>
         </property>
         <!-- 超时的周期 -->
         <property>
    		<name>yarn.resourcemanager.ha.enabled</name>
    		<value>true</value>
         </property>
         <!-- 打开高可用 -->
         <property>
    		<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    		<value>true</value>
         </property>
         <!-- 启动故障自动恢复 -->
         <property>
    		<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    		<value>true</value>
         </property>
         <property>
    		<name>yarn.resourcemanager.cluster-id</name>
    		<value>yarn-rm-cluster</value>
         </property>
         <!-- 给yarn cluster 取个名字yarn-rm-cluster -->
         <property>
    		<name>yarn.resourcemanager.ha.rm-ids</name>
    		<value>rm1,rm2</value>
         </property>
         <!-- 给ResourceManager 取个名字 rm1,rm2 -->
         <property>
    		<name>yarn.resourcemanager.hostname.rm1</name>
    		<value>CDHNode1</value>
         </property>
         <!-- 配置ResourceManager rm1 hostname -->
         <property>
    		<name>yarn.resourcemanager.hostname.rm2</name>
    		<value>CDHNode2</value>
         </property>
         <!-- 配置ResourceManager rm2 hostname -->
         <property>
    		<name>yarn.resourcemanager.recovery.enabled</name>
    		<value>true</value>
         </property>
         <!-- 启用resourcemanager 自动恢复 -->
         <property>
    		<name>yarn.resourcemanager.zk.state-store.address</name>
    		<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
         </property>
         <!-- 配置Zookeeper地址 -->
         <property>
    		<name>yarn.resourcemanager.zk-address</name>
    		<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
         </property>
         <!-- 配置Zookeeper地址 -->
         <property>
    		<name>yarn.resourcemanager.address.rm1</name>
    		<value>CDHNode1:8032</value>
         </property>
         <!--  rm1端口号 -->
         <property>
    		<name>yarn.resourcemanager.scheduler.address.rm1</name>
    		<value>CDHNode1:8034</value>
         </property>
         <!-- rm1调度器的端口号 -->
         <property>
    		<name>yarn.resourcemanager.webapp.address.rm1</name>
    		<value>CDHNode1:8088</value>
         </property>
         <!-- rm1 webapp端口号 -->
         <property>
    		<name>yarn.resourcemanager.address.rm2</name>
         <value>CDHNode2:8032</value>
         </property>
         <!-- rm2端口号 -->
         <property>
    		<name>yarn.resourcemanager.scheduler.address.rm2</name>
    		<value>CDHNode2:8034</value>
         </property>
         <!-- rm2调度器的端口号 -->
         <property>
    		<name>yarn.resourcemanager.webapp.address.rm2</name>
    		<value>CDHNode2:8088</value>
         </property>
         <!-- rm2 webapp端口号 -->
         <property>
    		<name>yarn.nodemanager.aux-services</name>
    		<value>mapreduce_shuffle</value>
         </property>
         <property>
    		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
         </property>
         <!-- 执行MapReduce需要配置的shuffle过程 -->
    </configuration>
    

    创建相应目录

    mkdir -p /home/hadoop/data/tmp
    mkdir -p /home/hadoop/data/hdfs/name
    mkdir -p /home/hadoop/data/journaldata/jn
    mkdir -p /home/hadoop/data/pid
    touch /home/hadoop/app/hadoop/etc/hadoop/excludes
    

    复制hadoop到CDHNode2-5

    scp ~/.bash_profile CDHNode2:/home/hadoop
    scp ~/.bash_profile CDHNode3:/home/hadoop
    scp ~/.bash_profile CDHNode4:/home/hadoop
    scp ~/.bash_profile CDHNode5:/home/hadoop
    
    scp -pr /home/hadoop/app/hadoop CDHNode2:/home/hadoop/app
    scp -pr /home/hadoop/app/hadoop CDHNode3:/home/hadoop/app
    scp -pr /home/hadoop/app/hadoop CDHNode4:/home/hadoop/app
    scp -pr /home/hadoop/app/hadoop CDHNode5:/home/hadoop/app
    
    ssh CDHNode2 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes"
    
    ssh CDHNode3 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes"
    
    ssh CDHNode4 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes"
    
    ssh CDHNode5 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes"
    

    集群初始化

    启动 CDHNode1-3 节点上面的 zookeeper

    /home/hadoop/app/zookeeper/bin/zkServer.sh start
    

    启动 CDHNode1-5 节点上面的 journalnode

    /home/hadoop/app/hadoop/sbin/hadoop-daemon.sh start journalnode
    

    jps 如有 JournalNode 则启动正常

    首先在主节点上CDHNode1执行格式化

    /home/hadoop/app/hadoop/bin/hdfs namenode -format	# namenode 格式化  
    /home/hadoop/app/hadoop/bin/hdfs zkfc -formatZK		# 格式化高可用
    /home/hadoop/app/hadoop/bin/hdfs namenode	# 启动namenode
    

    注: 执行完上述命令后,程序就会在等待状态,只有在CDHNode2上执行完下一步后,按下ctrl+c来结束此namenode进程。

    在CDHNode2上面执行namenode数据同步

    /home/hadoop/app/hadoop/bin/hdfs namenode -bootstrapStandby	# 同步主节点和备节点之间的元数据
    

    同步完成后,在CDHNode1节点上,按下ctrl+c来结束namenode进程。

    然后关闭所有节点journalnode

    /home/hadoop/app/hadoop/sbin/hadoop-daemon.sh stop journalnode
    

    启动HDFS

    如果上面操作没有问题,则可以集群中任何一台主机使用一键脚本启动hdfs所有相关进程,一般建议在namenode主节点上操作

    /home/hadoop/app/hadoop/sbin/start-dfs.sh
    

    注: start-dfs.sh 脚本原理是通过免密ssh登录到各节点启动相关进程,所以也会遇到ssh第一次连接需要确认的问题,请注意。

    启动HDFS时如果遇到警告: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable , 这个只是WARN,不会影响正常执行,如果需要根治,方法如下: 下载:hadoop-2.6.0+cdh5.14.2+2748-1.cdh5.14.2.p0.11.el7.x86_64.rpm,windows下使用7zip解压hadoop-2.6.0+cdh5.14.2+2748-1.cdh5.14.2.p0.11.el7.x86_64.rpm并取出usrlibhadooplib ative下所有文件,上传到所有节点/home/hadoop/app/hadoop/lib/native下,然后在所有节点执行:

    cd /home/hadoop/app/hadoop/lib/native
    rm -f libhadoop.so
    rm -f libnativetask.so
    rm -f libsnappy.so
    rm -f libsnappy.so.1
    cp libhadoop.so.1.0.0 libhadoop.so
    cp libnativetask.so.1.0.0 libnativetask.so
    cp libsnappy.so.1.1.4 libsnappy.so
    cp libsnappy.so.1.1.4 libsnappy.so.1
    

    再次启动HDFS就不会再有此WARN了。

    通过web界面查看hdfs namenode启动情况

    http://CDHNode1:50070 http://CDHNode2:50070

    通过web界面查看hdfs datanode启动情况

    http://cdhnode3:50075 http://cdhnode4:50075 http://cdhnode5:50075

    上传文件至HDFS测试 vi a.txt //本地创建一个test.txt文件

    hadoop CDH
    hello world
    CDH hadoop
    

    hdfs dfs -mkdir /test #在hdfs上创建一个文件目录 hdfs dfs -put test.txt /test #向hdfs上传一个文件 hdfs dfs -ls /test #查看a.txt是否上传成功 如果上面操作没有问题说明hdfs配置成功。

    启动YARN

    首先在CDHNode1节点执行

    /home/hadoop/app/hadoop/sbin/start-yarn.sh
    

    然后在CDHNode2节点执行

    /home/hadoop/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
    

    通过web界面查看yarn resourcemanager启动情况

    http://CDHNode1:8088 http://CDHNode2:8088

    通过web界面查看yarn nodemanager启动情况

    http://cdhnode3:8042/node http://cdhnode4:8042/node http://cdhnode5:8042/node

    检查一下ResourceManager状态

    /home/hadoop/app/hadoop/bin/yarn rmadmin -getServiceState rm1
    /home/hadoop/app/hadoop/bin/yarn rmadmin -getServiceState rm2
    

    active 为主节点,standby为备节点

    Wordcount示例测试

    hadoop jar /home/hadoop/app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/test.txt /test/out
    

    如果上面执行没有异常,说明YARN安装成功。

    整个集群启动顺序

    启动

    启动CDHNode1-3节点zookeeper

    /home/hadoop/app/zookeeper/bin/zkServer.sh start
    

    启动HDFS

    /home/hadoop/app/hadoop/sbin/start-dfs.sh
    

    启动YARN

    # 首先CDHNode1执行:
    /home/hadoop/app/hadoop/sbin/start-yarn.sh
    
    # 然后CDHNode2执行:
    /home/hadoop/app/hadoop/sbin/yarn-daemon.sh start resourcemanager  
    
    停止

    停止YARN

    # 首先CDHNode2执行:
    /home/hadoop/app/hadoop/sbin/yarn-daemon.sh stop resourcemanager
    
    # 然后CDHNode1执行:
    /home/hadoop/app/hadoop/sbin/stop-yarn.sh
    

    停止HDFS

    /home/hadoop/app/hadoop/sbin/stop-dfs.sh
    

    停止CDHNode1-3节点zookeeper

    /home/hadoop/app/zookeeper/bin/zkServer.sh stop
    

    Hbase安装

    首先在CDHNode1上安装

    解压 hbase-1.2.0-cdh5.14.2.tar.gz

    tar zxvf hbase-1.2.0-cdh5.14.2.tar.gz
    mv hbase-1.2.0-cdh5.14.2 /home/hadoop/app/hbase
    rm -f hbase-1.2.0-cdh5.14.2.tar.gz
    

    设置环境变量 vi ~/.bash_profile 添加以下内容:

    #hbase
    export HBASE_HOME=/home/hadoop/app/hbase
    export PATH=$PATH:$HBASE_HOME/bin
    

    加载环境变量

    . ~/.bash_profile
    

    修改配置文件 vi /home/hadoop/app/hbase/conf/hbase-env.sh 修改以下内容:

    export JAVA_HOME=/home/hadoop/app/jdk
    export HBASE_MANAGES_ZK=false   #  不使用hbase自带zookeeper
    

    添加配置文件 vi /home/hadoop/app/hbase/conf/hbase-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
    /**
     *
     * Licensed to the Apache Software Foundation (ASF) under one
     * or more contributor license agreements.  See the NOTICE file
     * distributed with this work for additional information
     * regarding copyright ownership.  The ASF licenses this file
     * to you under the Apache License, Version 2.0 (the
     * "License"); you may not use this file except in compliance
     * with the License.  You may obtain a copy of the License at
     *
     *     http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing, software
     * distributed under the License is distributed on an "AS IS" BASIS,
     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and
     * limitations under the License.
     */
    -->
    <configuration>
    	<property>
    		<name>hbase.rootdir</name>
    		<value>hdfs://cluster1/hbase</value>
    	</property>
    	<!-- 此处的hdfs配置需要与hadoop配置文件core-site.xml中fs.defaultFS的值一致 -->
    	<property>
    		<name>hbase.cluster.distributed</name>
    		<value>true</value>
    	</property>
    	<property>
    		<name>hbase.zookeeper.quorum</name>
    		<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
    	</property>
    	<!-- zookeeper配置 -->
    	<property>
    		<name>hbase.zookeeper.property.dataDir</name>
    		<value>/home/hadoop/data/hbase/zookeeper</value>
    	</property>
    	<property>
    		<name>hbase.tmp.dir</name>
    		<value>/home/hadoop/data/hbase/tmp</value>
    	</property>
    	<property>
    		<name>dfs.replication</name>
    		<value>3</value>
    	</property>
    </configuration>
    

    添加regionservers从机 vi /home/hadoop/app/hbase/conf/regionservers,修改为

    CDHNode3
    CDHNode4
    CDHNode5
    

    拷贝hadoop的hdfs-site.xml和core-site.xml 放到$HBASE_HOME/conf下

    cp /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml /home/hadoop/app/hbase/conf
    cp /home/hadoop/app/hadoop/etc/hadoop/core-site.xml /home/hadoop/app/hbase/conf
    

    创建相关目录

    mkdir -p /home/hadoop/data/hbase/zookeeper
    mkdir -p /home/hadoop/data/hbase/tmp
    

    复制hbase到CDHNode2-5

    scp ~/.bash_profile CDHNode2:/home/hadoop
    scp ~/.bash_profile CDHNode3:/home/hadoop
    scp ~/.bash_profile CDHNode4:/home/hadoop
    scp ~/.bash_profile CDHNode5:/home/hadoop
    
    scp -pr /home/hadoop/app/hbase CDHNode2:/home/hadoop/app
    scp -pr /home/hadoop/app/hbase CDHNode3:/home/hadoop/app
    scp -pr /home/hadoop/app/hbase CDHNode4:/home/hadoop/app
    scp -pr /home/hadoop/app/hbase CDHNode5:/home/hadoop/app
    
    ssh CDHNode2 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
    ssh CDHNode3 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
    ssh CDHNode4 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
    ssh CDHNode5 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
    

    启动hbase

    /home/hadoop/app/hbase/bin/start-hbase.sh
    

    如果使用jdk8以上,会有以下警告

    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
    

    解决方法,所有节点 vi /home/hadoop/app/hbase/conf/hbase-env.sh 注释掉以下行

    # Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
    export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
    export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
    

    启动完成后,CDHNode1节点会多出HMaster进程,CDHNode3-5三个节点会多出HRegionServer进程(regionservers文件中配置的CDHNode3-5)

    CDHNode2上启动从HMaster

    /home/hadoop/app/hbase/bin/hbase-daemon.sh start master
    

    备注:如果需要单独启动一个regionserver,使用类似命令

    /home/hadoop/app/hbase/bin/hbase-daemon.sh start regionserver
    

    查看HMaster

    http://cdhnode1:60010/master-status http://cdhnode2:60010/master-status

    可以看出CDHNode2节点是HMaster的从机。

    查看HRegionServer

    http://cdhnode3:60030/rs-status http://cdhnode4:60030/rs-status http://cdhnode5:60030/rs-status

    验证

    hbase shell

    停止hbase 首先停止CDHNode2上的HMaster

    /home/hadoop/app/hbase/bin/hbase-daemon.sh stop master
    

    然后停止其他所有相关进程

    /home/hadoop/app/hbase/bin/stop-hbase.sh
    

    Hive安装

    首先在CDHNode5上面安装mysql,教程参考我的博客: Centos7.5安装mysql 8.0.11

    mysql中创建hive数据库与用户

    mysql > create database hivedb character set latin1 collate latin1_bin;    # 这里必须知道hivedb字符集为latin1
    
    # mysql > grant all privileges on hivedb.*  to 'hive'@'%' identified identified by 'hive'; # mysql 8.0以前可以使用,8.0报错,可以使用以下的方法
    
    mysql > create user 'hive'@'%' identified by 'hive';
    mysql > grant all privileges on hivedb.* to 'hive'@'%';
    mysql > flush privileges;
    

    CDHNode3上面解压 hive-1.1.0-cdh5.14.2.tar.gz

    tar zxvf hive-1.1.0-cdh5.14.2.tar.gz
    mv hive-1.1.0-cdh5.14.2 /home/hadoop/app/hive
    rm -f hive-1.1.0-cdh5.14.2.tar.gz
    

    下载mysql连接驱动并拷贝到/home/hadoop/app/hive/lib下: 下载地址: mysql-connector-java-8.0.11

    tar zxvf mysql-connector-java-8.0.11.tar.gz
    cp mysql-connector-java-8.0.11/mysql-connector-java-8.0.11.jar /home/hadoop/app/hive/lib/
    rm -f mysql-connector-java-8.0.11.tar.gz
    
    • 这里使用的mysql为8.0.11版本,对应的mysql-connector-java也下载的8.0.11版本,如果使用的其他mysql版本,下载对应的驱动即可

    设置环境变量 vi ~/.bash_profile 添加以下内容:

    #hive
    export HIVE_HOME=/home/hadoop/app/hive
    export PATH=$PATH:$HIVE_HOME/bin
    

    加载环境变量

    . ~/.bash_profile
    

    进入/home/hadoop/app/hive/conf,创建hive-env.sh

    cp hive-env.sh.template hive-env.sh
    

    编辑 vi /home/hadoop/app/hive/conf/hive-env.sh,修改以下配置:

    export HADOOP_HEAPSIZE=1024
    HADOOP_HOME=/home/hadoop/app/hadoop
    export HIVE_CONF_DIR=/home/hadoop/app/hive/conf
    export HIVE_AUX_JARS_PATH=/home/hadoop/app/hive/lib
    

    添加hive配置文件 vi /home/hadoop/app/hive/conf/hive-site.xml :

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <configuration>
    	<property>
    		<name>hive.exec.scratchdir</name>
    		<value>hdfs://cluster1/hive/scratchdir</value>
    		<description>HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。</description>
    	</property>
    	
    	<property>
    		<name>hive.metastore.warehouse.dir</name>
    		<value>hdfs://cluster1/hive/warehouse</value>
    		<description>HDFS路径,用于存储hive数据文件</description>
    	</property>
    
    	<!-- 相关日志目录设置 -->
    	<property>
    		<name>hive.querylog.location</name>
    		<value>/home/hadoop/app/hive/logs</value>
    	</property>
    	<property>
    		<name>hive.downloaded.resources.dir</name>
    		<value>/home/hadoop/data/hive/local/${hive.session.id}_resources</value>
    	</property>
    	<property>
    		<name>hive.server2.logging.operation.log.location</name>
    		<value>/home/hadoop/app/hive/logs/operation_logs</value>
    	</property>
    
    	<!-- 存储元数据的mysql连接信息 -->
    	<property>
    		<name>javax.jdo.option.ConnectionURL</name>       
    		<!--<value>jdbc:mysql://CDHNode5:3306/hivedb?characterEncoding=UTF-8&amp;createDatabaseIfNotExist=true</value>-->
    		<value>jdbc:mysql://CDHNode5:3306/hivedb?characterEncoding=latin1&amp;createDatabaseIfNotExist=true</value>
    		<description>主要编码设置,其中的 &amp; 在xml中表示 ; </description>
    	</property>
    	<property>
    		<name>javax.jdo.option.ConnectionDriverName</name>
    		<value>com.mysql.jdbc.Driver</value>
    	</property>
    	<property>
    		<name>javax.jdo.option.ConnectionUserName</name>
    		<value>hive</value>
    	</property>
    	<property>
    		<name>javax.jdo.option.ConnectionPassword</name>
    		<value>hive</value>
    	</property>
    
    	<!-- 开启hive delete update 操作 -->
    	<property>
    		<name>hive.support.concurrency</name>
    		<value>true</value>
    	</property>
    	<property>
    		<name>hive.enforce.bucketing</name>
    		<value>false</value>
    	</property>
    	<property>
    		<name>hive.exec.dynamic.partition.mode</name>
    		<value>nonstrict</value>
    	</property>
    	<property>
    		<name>hive.txn.manager</name>
    		<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
    	</property>
    	<property>
    		<name>hive.compactor.initiator.on</name>
    		<value>true</value>
    	</property>
    	<property>
    		<name>hive.compactor.worker.threads</name>
    		<!--<value>1</value>-->
    		<value>5</value>
    	</property>
    	<property>
    		<name>hive.in.test</name>
    		<value>true</value>
    	</property>
    	<property>  
    		<name>hive.auto.convert.join.noconditionaltask.size</name>  
    		<value>10000000</value>  
    	</property> 
    	
    	<!-- hwi设置 -->
    	<property>
    		<name>hive.hwi.listen.host</name>
    		<value>CDHNode3</value>
    		<description>hwi监听地址,每个节点不一样</description>
    	</property>
    	<property>
    		<name>hive.hwi.listen.port</name>
    		<value>9999</value>
    		<description>listen port</description>
    	</property>
    	<property>
    		<name>hive.hwi.war.file</name>
    		<value>lib/hive-hwi-1.2.2.war</value>
    		<description>war包所在的地址,不可以写绝对路径.</description>
    	</property>
    	
        <!-- hiveserver2设置 -->
    	<property>
    		<name>hive.server2.support.dynamic.service.discovery</name>  
    		<value>true</value>  
    	</property>  
    	<property>  
    		<name>hive.server2.zookeeper.namespace</name>  
    		<value>hiveserver2</value>
    		<description>zookeeper namespace设置</description>
    	</property>  
    	<property>  
    		<name>hive.zookeeper.quorum</name>  
    		<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>  
    	</property>   
    	<property>  
    		<name>hive.zookeeper.client.port</name>  
    		<value>2181</value>  
    	</property>
    	<property>  
    		<name>hive.server2.thrift.bind.host</name>  
    		<value>CDHNode3</value>
    		<description>hiveserver2监听地址,每个节点不一样</description>
    	</property>
    	<property>  
    		<name>hive.server2.thrift.port</name>  
    		<value>10001</value>
    		<description>多个HiveServer2实例的端口号要一致</description>		
    	</property>
    </configuration>
    
    • 其中,hive-site.xml文件中的hive.exec.scratchdir和hive.metastore.warehouse.dir的hdfs访问地址需要和hadoop的配置文件core-site.xml中fs.defaultFS的值一致,即hdfs://cluster1

    创建相关目录

    mkdir -p /home/hadoop/data/hive/local
    mkdir -p /home/hadoop/app/hive/logs
    

    配置log4j日志输出,进入/home/hadoop/app/hive/conf,创建hive-exec-log4j.properties与hive-log4j.properties

    cp hive-exec-log4j.properties.template hive-exec-log4j.properties
    cp hive-log4j.properties.template hive-log4j.properties
    

    编辑 hive-exec-log4j.properties 与 hive-log4j.properties ,修改以下配置(2个配置文件都修改):

    hive.log.dir=/home/hadoop/app/hive/logs
    log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
    

    初始化mysql元数据

    hadoop@CDHNode3:/home/hadoop>schematool -initSchema -dbType mysql
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/app/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    Metastore connection URL:        jdbc:mysql://CDHNode5:3306/hivedb?characterEncoding=latin1&createDatabaseIfNotExist=true
    Metastore Connection Driver :    com.mysql.jdbc.Driver
    Metastore connection User:       hive
    Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
    Fri Jun 29 11:37:02 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    Starting metastore schema initialization to 1.1.0-cdh5.14.2
    Initialization script hive-schema-1.1.0.mysql.sql
    Fri Jun 29 11:37:03 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    Initialization script completed
    Fri Jun 29 11:37:05 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    schemaTool completed
    
    • 这个步骤必须做,如果未做,后面hive还是可以正常启动,但是在做数据操作时有可能就会出现卡住现象,原因是mysql元数据数据库没有初始化,导致hive在读写mysql元数据数据库是产生 metadata lock

    复制hbase到CDHNode4-5

    scp ~/.bash_profile CDHNode4:/home/hadoop
    scp ~/.bash_profile CDHNode5:/home/hadoop
    scp -pr /home/hadoop/app/hive CDHNode4:/home/hadoop/app
    scp -pr /home/hadoop/app/hive CDHNode5:/home/hadoop/app
    ssh CDHNode4 "mkdir -p /home/hadoop/data/hive/local;"
    ssh CDHNode5 "mkdir -p /home/hadoop/data/hive/local;"
    

    注意: 传输完毕后需要修改CDHNode4-5节点配置文件 hive-site.xml 中hwi和hiverserver2的监听地址为本机

    Hive的三种启动方式

    1. hive命令行模式 用于linux平台命令行查询,查询语句基本跟mysql查询语句类似
    hive
    

    基本操作

    hive> show databases;
    OK
    default
    Time taken: 0.08 seconds, Fetched: 1 row(s)
    
    hive> create database hive;
    OK
    Time taken: 0.18 seconds
    
    hive> show databases;
    OK
    default
    hive
    
    hive> use hive;
    OK
    Time taken: 0.089 seconds
    
    hive> create table test(id int,name string);
    OK
    Time taken: 0.331 seconds
    
    hive> show tables;
    OK
    test
    Time taken: 0.082 seconds, Fetched: 1 row(s)
    
    hive> insert into test values (1,'hello hive');
    Query ID = hadoop_20180628105757_e64fc58b-37f6-4087-a823-738d5d933454
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1530153811198_0001, Tracking URL = http://CDHNode1:8088/proxy/application_1530153811198_0001/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1530153811198_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2018-06-28 10:57:38,636 Stage-1 map = 0%,  reduce = 0%
    2018-06-28 10:57:53,893 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.27 sec
    MapReduce Total cumulative CPU time: 1 seconds 270 msec
    Ended Job = job_1530153811198_0001
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to: hdfs://cluster1/hive/warehouse/hive.db/test/.hive-staging_hive_2018-06-28_10-57-11_240_3145632387075179354-1/-ext-10000
    Loading data to table hive.test
    Table hive.test stats: [numFiles=1, numRows=1, totalSize=13, rawDataSize=12]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 1.27 sec   HDFS Read: 3706 HDFS Write: 78 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 270 msec
    OK
    Time taken: 44.177 seconds
    
    hive> select * from test;
    OK
    1       hello hive
    Time taken: 0.146 seconds, Fetched: 1 row(s)
    

    注:hive默认配置不支持update和delete操作,会报错:

    FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    

    解决方法,在 hive-site.xml中添加相关参数,具体配置参见 /home/hadoop/app/hive/conf/hive-site.xml:

    重启启动hive,执行delete语句,还是会报错:

    FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table hive.test that does not use an AcidOutputFormat or is not bucketed
    

    说是要进行delete操作的表test不是AcidOutputFormat或没有分桶。估计是要求输出是AcidOutputFormat然后必须分桶。网上查到确实如此,而且目前只有ORCFileformat支持AcidOutputFormat,不仅如此建表时必须指定参数('transactional' = true)。感觉太麻烦了。。。。

    照网上重新建表:

    hive> create table test(id int ,name string )clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
    hive> insert into table test values (1,'row1'),(2,'row2'),(3,'row3');
    hive> delete from test where id = 1;
    hive> delete from test where name = 'row2';
    hive> update test set name = 'Raj' where id = 3;
    

    执行delete,update语句正常

    1. hive web界面的启动 bin/hive –service hwi & (&表示后台运行) 用于通过浏览器来访问hive,感觉没多大用途,浏览器访问地址是:127.0.0.1:9999/hwi 启动时需要 hive-hwi-*.war 启动包,CDH版本的Hive没带此包,如果需要安装,方法如下: 首先在官网下载相应的hive源码包 点击下载apache-hive-1.2.2-src.tar.gz,由于没有找到1.1.0版本的,故下载1.2.2 然后解压安装
    tar zxvf apache-hive-1.2.2-src.tar.gz
    cd apache-hive-1.2.2-src/hwi/web
    jar -cvf hive-hwi-1.2.2.war *
    cp hive-hwi-1.2.2.war /home/hadoop/app/hive/lib
    cp /home/hadoop/app/jdk/lib/tools.jar /home/hadoop/app/hive/lib/
    

    再次启动 bin/hive –service hwi & 就可以在浏览器访问:

    http://CDHNode3:9999/hwi

    1. hive 远程服务 (默认端口号10000) 启动方式 bin/hive –service hiveserver &(&表示后台运行) 或者 bin/hive –service hiveserver2 &(&表示后台运行) 用java,python等程序实现通过jdbc等驱动的访问hive就用这种起动方式了,-p 指定端口,这个是程序员最需要的方式了,也可以直接在配置文件里面修改

    其中 hiveserver 与 hiveserver2 区别如下: 两者都允许远程客户端使用多种编程语言,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,连这个和都允许远程客户端使用多种编程语言如java,python等向hive提交请求,取回结果(从hive0.15起就不再支持hiveserver了),但是在这里我们还是要说一下hiveserver。HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift server,而HiveServer2却不会。既然已经存在HiveServer,为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供更好的支持。

    HiveServer versionConnection URLDriver Class
    HiveServer2 jdbc:hive2://: org.apache.hive.jdbc.HiveDriver
    HiveServer1 jdbc:hive://: org.apache.hadoop.hive.jdbc.HiveDriver

    hiveserver2启动方式: hiveserver2允许在配置文件hive-site.xml中进行配置管理,具体的参数为:

    hive.server2.thrift.min.worker.threads 最小工作线程数,默认为5。
    hive.server2.thrift.max.worker.threads 最小工作线程数,默认为500。
    hive.server2.thrift.port – TCP 的监听端口,默认为10000。
    hive.server2.thrift.bind.host – TCP绑定的主机,默认为localhost
    

    参数在hive-site.xml中配置的形式为:

    	<property>
    		<name>hive.server2.thrift.port</name>
    		<value>10000</value>
    		<description>listen port</description>
    	</property>
    

    启动hiveserver2 在生产环境中使用Hive,强烈建议使用HiveServer2来提供服务,好处很多:

    1. 在应用端不用部署Hadoop和Hive客户端;
    2. 相比hive-cli方式,HiveServer2不用直接将HDFS和Metastore暴漏给用户;
    3. 有安全认证机制,并且支持自定义权限校验;
    4. 配合zookeeper有HA机制,解决应用端的并发和负载均衡问题;
    5. JDBC方式,可以使用任何语言,方便与应用进行数据交互;
    6. 从2.0开始,HiveServer2提供了WEB UI。

    分别启动CDHNode3-5主机的hiveserver2:

    nohup hiveserver2 > /home/hadoop/app/hive/logs/hiveserver2.log 2>&1 &
    

    启动zk cli查看注册的hiveserver2

    /home/hadoop/app/zookeeper/bin/zkCli.sh -server CDHNode1:2181,CDHNode2:2181,CDHNode3:2181
    [zk: CDHNode1:2181,CDHNode2:2181,CDHNode3:2181(CONNECTED) 1] ls  /hiveserver2
    [serverUri=CDHNode4:10001;version=1.1.0-cdh5.14.2;sequence=0000000001, serverUri=CDHNode3:10001;version=1.1.0-cdh5.14.2;sequence=0000000002,serverUri=CDHNode5:10001;version=1.1.0-cdh5.14.2;sequence=0000000003]
    [zk: CDHNode1:2181,CDHNode2:2181,CDHNode3:2181(CONNECTED) 2] 
    
    • 可以看到3台主机的hiveserver2均注册了

    使用beeline验证hiveserver2

    hadoop@CDHNode3:/home/hadoop/app/hive/conf>beeline
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/app/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    Beeline version 1.1.0-cdh5.14.2 by Apache Hive
    beeline> !connect jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
    scan complete in 1ms
    Connecting to jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
    Enter username for jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: 
    Enter password for jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: 
    18/06/29 11:14:14 [main]: INFO jdbc.HiveConnection: Connected to CDHNode4:10001
    Connected to: Apache Hive (version 1.1.0-cdh5.14.2)
    Driver: Hive JDBC (version 1.1.0-cdh5.14.2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://CDHNode1:2181,CDHNode2:2181,C> 
    0: jdbc:hive2://CDHNode1:2181,CDHNode2:2181,C> create database leffss;
    INFO  : Compiling command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33): create database leffss
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33); Time taken: 0.023 seconds
    INFO  : Executing command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33): create database leffss
    INFO  : Starting task [Stage-0:DDL] in serial mode
    INFO  : Completed executing command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33); Time taken: 0.12 seconds
    INFO  : OK
    No rows affected (0.163 seconds)
    0: jdbc:hive2://CDHNode1:2181,CDHNode2:2181,C> show databases;
    INFO  : Compiling command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd): show databases
    INFO  : Semantic Analysis Completed
    INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
    INFO  : Completed compiling command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd); Time taken: 0.025 seconds
    INFO  : Executing command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd): show databases
    INFO  : Starting task [Stage-0:DDL] in serial mode
    INFO  : Completed executing command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd); Time taken: 0.046 seconds
    INFO  : OK
    +----------------+--+
    | database_name  |
    +----------------+--+
    | default        |
    | leffss         |
    +----------------+--+
    2 rows selected (0.117 seconds)
    

    这里还没有开启hiveserver2的账号验证,账号和密码直接输入空,创建的数据在hadoop hdfs权限是如下

    hadoop@CDHNode1:/home/hadoop>hadoop fs -ls /hive/warehouse
    Found 2 items
    drwx-wx-wx   - anonymous supergroup          0 2018-06-29 11:15 /hive/warehouse/leffss.db
    

    这样使用HiveServer2时候,将非常危险,因为任何人都可以作为超级用户来操作Hive及HDFS数据。开启hive的用户安全认证后面再补充。

    停止hiveserver2 查找到hiveserver2相关进程id,然后kill id即可。

  • 相关阅读:
    ubuntu下cmake自动化编译的一个例子
    KL变换和PCA的数学推导
    tensorflow c++ API加载.pb模型文件并预测图片
    tensorflow c++接口的编译安装与一些问题记录
    深度增强学习--总结下吧
    深度增强学习--DPPO
    深度增强学习--DDPG
    深度增强学习--A3C
    tomcat远程调试
    springboot问题记录
  • 原文地址:https://www.cnblogs.com/leffss/p/9184171.html
Copyright © 2011-2022 走看看