zoukankan      html  css  js  c++  java
  • Hive/Hbase/Sqoop的安装教程

    Hive/Hbase/Sqoop的安装教程

    HIVE INSTALL

    1.下载安装包https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
    2.上传到Linux指定目录,解压:

    mkdir hive 
    mv apache-hive-2.3.3-bin.tar.gz hive
    tar -zxvf apache-hive-2.3.3-bin.tar.gz
    mv apache-hive-2.3.3-bin apache-hive-2.3.3

    ### 安装目录为:/app/hive/apache-hive-2.3.3 


    3.配置环境变量
    sudo vi /etc/profile
    添加环境变量:

    export HIVE_HOME=/app/hive/apache-hive-2.3.3
    export PATH=$PATH:$HIVE_HOME/bin

    :wq #保存退出


    4.修改HIVE配置文件:
    配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):

    cd /app/hive/apache-hive-2.3.3/conf
    cp hive-env.sh.template hive-env.sh
    ###在文件中添加如下内容-- 去掉#,并把目录改为自己设定的目录
    export HADOOP_HEAPSIZE=1024
    export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
    export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
    export HIVE_HOME=/app/hive/apache-hive-2.3.3
    export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
    export JAVA_HOME=/app/lib/jdk
    

      

    创建hdfs文件目录:

    cd /app/hive/apache-hive-2.3.3
    mkdir hive_site_dir
    cd hive_site_dir
    hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
    hdfs dfs -mkdir -p tmp
    hdfs dfs -mkdir -p log
    hdfs dfs -chmod -R 777 warehouse
    hdfs dfs -chmod -R 777 tmp
    hdfs dfs -chmod -R 777 log
    创建临时文件夹:
    cd /app/hive/apache-hive-2.3.3
    mkdir tmp
    

      

    配置文件hive-site.xml (在原有的基础上修改):
    cp hive-default.xml.template  hive-site.xml
    vi hive-site.xml
    >>配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName

    <!--mysql database connection setting -->
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
    </property>
    
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>szprd</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>szprd</value>
    </property>
    

      

    >>配置hdfs文件目录

    <property>
    <name>hive.exec.scratchdir</name>
    <!--<value>/tmp/hive</value>-->
    <value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
    </property>
    
    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
    </property>
    
    <property>
    <name>hive.exec.local.scratchdir</name>
    <!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
    <value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
    </property>
    
    <property>
    <name>hive.downloaded.resources.dir</name>
    <!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
    <value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
    </property>
    
    <property>
    <name>hive.querylog.location</name>
    <!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
    <value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
    <description>Location of Hive run time structured log file</description>
    </property>
    
    
    <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
    Enforce metastore schema version consistency.
    True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
    schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
    proper metastore schema migration. (Default)
    False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
    </property>

    修改完配置文件后,:wq 保存退出

    5.下载合适版本的mysql驱动包,复制到HIVE安装目录的 lib目录下
    https://dev.mysql.com/downloads/connector/j/

    6.初始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )

    cd /app/hive/apache-hive-2.3.3/bin
    ./schematool -initSchema -dbType mysql
    

      

    7.启动hive
    hive     #这里配置了环境变量后,可以在任意目录下执行 (/etc/profile)


    8.实时查看日志启动hive命令(在hive安装目录的bin目录下执行):

    ./hive -hiveconf hive.root.logger=DEBUG,console


    HBASE INSTALL


    1.下载hbase安装包:  http://hbase.apache.org/downloads.html


    2.解压: tar -zxvf  hbase-1.2.6.1-bin.tar.gz


    3.配置环境变量: (加在最后面)
    vi /etc/profile

    #HBase Setting
    export HBASE_HOME=/app/hbase/hbase-1.2.6.1
    export PATH=$PATH:$HBASE_HOME/bin
    

      

    4.编辑配置文件: hbase-env.sh

    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
    export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录
    

     

    编辑配置文件: hbase-site.xml
    在configuration节点添加如下配置:

    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://192.168.1.202:9000/hbase</value>
    </property>
    
    
    <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/vc/dev/MQ/ZK/zookeeper-3.4.12</value>
    </property>
    
    
    <property>
    <name>zookeeper.znode.parent</name>
    <value>/hbase</value>
    </property>
    
    
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    
    <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
    Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
    WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
    </description>
    </property>
    

      

    5.启动zookeeper
    进入zookeeper的安装目录下的bin目录,执行 ./zkServer.sh
    然后启动客户端: ./zkCli.sh
    启动成功后,输入: create /hbase hbase

    6.启动hbase
    进入Hbase的bin目录: ./start-hbase.sh
    ./hbase shell  #这里启动成功后就可以开始执行hbase相关命令了
    list  #没有报错表示成功

    7.web访问HBASE: http://10.28.85.149:16010/master-status   #ip为当前服务器的ip,端口为16010


    #Sqoop install
    1.下载安装包: https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/


    2.解压: tar -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

    更改文件名: mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0


    3. 配置环境变量:/etc/profile

    #Sqoop Setting
    export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
    export PATH=$PATH:$SQOOP_HOME/bin
    

      

    4.将mysql的驱动包复制到 Sqoop安装目录的lib目录下

    https://dev.mysql.com/downloads/connector/j/

    5.编辑配置文件: sqoop的安装目录下的 conf下
    vi sqoop-env.sh

    #Set path to where bin/hadoop is available
    export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7
    
    #Set path to where hadoop-*-core.jar is available
    export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7
    
    #set the path to where bin/hbase is available
    export HBASE_HOME=/app/hbase/hbase-1.2.6.1
    
    #Set the path to where bin/hive is available
    export HIVE_HOME=/app/hive/apache-hive-2.3.3
    
    #Set the path for where zookeper config dir is
    export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
    

      

    6,输入命令:

    sqoop help  #查看相关的sqoop命令

    sqoop version #查看sqoop的版本

     ps:

    关于停止hbase的命令: stop-hbase.sh   ,出现关于pid的错误提示时,请参考这篇博文:https://blog.csdn.net/xiao_jun_0820/article/details/35222699

    hadoop的安装教程:http://note.youdao.com/noteshare?id=0cae2da671de0f7175376abb8e705406

    zookeeper的安装教程:http://note.youdao.com/noteshare?id=33e37b0967da40660920f755ba2c03f0

    # hadoop 伪分布式模式安装
    # 前提 JDK 安装成功
    
    # 下载hadoop2.7.7
    ```
    cd /home/vc/dev/hadoop
    
    wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz 
    ```
    # 解压缩
    
    ```
     tar -zxvf hadoop-2.7.7.tar.gz 
    ```
    
    ## 配置hadoop的环境变量,在/etc/profile下追加 hadoop配置
    
    ```
    # hadoop home setting 
    
    export HADOOP_HOME=/app/hadoop/hadoop-2.7.7
    export HADOOP_INSTALL=${HADOOP_HOME}
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_INSTALL=$HADOOP_HOME
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
    
    ```
    
    ## 修改 hadoop安装目录/etc/hadoop/hadoop-env.sh 文件
    
    ```
    # The java implementation to use.
    export JAVA_HOME=/home/vc/dev/jdk/jdk1.8.0_161
    ```
    ### hadoop安装目录/etc/hadoop/core-site.xml
    
    
    ```
    <configuration>
    
        <!-- 指定hadoop运行时产生文件的存储路径;指定被hadoop使用的目录,用于存储数据文件。-->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/vc/dev/hadoop/hadoop-2.7.7/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <!-- 指定HDFS老大(namenode)的通信地址指定默认的文件系统。 -->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://192.168.1.202:9000</value>
        </property>
    </configuration>
                        
    ```
     
    ### 配置HDFS ,etc/hadoop/hdfs-site.xml
    
    ```
    <configuration>
            <!-- 设置namenode存放的路径 -->
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/name</value>
            </property>
            <!-- 设置hdfs副本数量 -->
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
            <!-- 设置datanode存放的路径 -->
            <property>
                    <name>dfs.datanode.data.dir</name>
                    <value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/data</value>
            </property>
    
    </configuration>
    ~                   
    ```
    ### 设置hadoop 伪分布式下免密登入,Hadoop集群节点之间的免密登入务必配置成功,不然有各种问题
    
    如果是单节点情况下免密登入测试`ssh localhost`,如果不能登入成功,执行下面命令:
    
    ```
    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
    chmod 0600 ~/.ssh/authorized_keys
    
    ```
    ### 伪分布式下不需要配置/etc/hosts文件,真分布式下需要配置各主机和IP的映射关系。
    
    
    # hadoop伪分布式下启动
    ## 配置 hdfs
    ```
    # 第一次启动hdfs需要格式化:出现询问输入Y or N,全部输Y即可
    bin/hdfs namenode -format
    # 启动 Start NameNode daemon and DataNode daemon: 启动HDFS,这个命令启动hadoop单节点集群
    sbin/start-dfs.sh
    ```
    通过上面启动后即可在web页面浏览 NameNode 节点信息:
    ![](http://one17356s.bkt.clouddn.com/18-8-24/97813052.jpg)
    
    ```
    # 通过hadoop 命令在hdfs上创建目录
    hadoop fs -mkdir /test
    # 或者通过这个命令
     hdfs dfs -mkdir /user
     
    # 上传文件
    
    ```
    ![](http://one17356s.bkt.clouddn.com/18-8-24/33727958.jpg)
    
    ## 关闭 HDFS
    
    ```
    ./sbin/stop-dfs.sh
    
    ```
    ## 配置 yarn
    ### etc/hadoop/mapred-site.xml
    
    ```
    <configuration>
    
    
     <!-- 通知框架MR使用YARN -->
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    
    ```
    ### etc/hadoop/yarn-site.xml
    ```
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    
      <!-- reducer取数据的方式是mapreduce_shuffle -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
      <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
    </configuration>
    ```
    
    
    
    
    ![](http://one17356s.bkt.clouddn.com/18-8-24/53993777.jpg)
    ![](http://one17356s.bkt.clouddn.com/18-8-24/28989509.jpg)
    
    
    ## yarn 启动和停止
    
    ```
    ./sbin/start-yarn.sh
    ./sbin/stop-yarn.sh
    
    ```
    
    ## 查看集群状态
    
    ```
    ./bin/hadoop dfsadmin  -report   
    ```
    # 伪分布式下测试
    
    ```
    //服务器上新建目录
     mkdir ~/input
     //进入服务器目录并将hadoop配置文件当做数据文件复制到input目录
     cd ~/input
     cp /app/hadoop/hadoop-2.7.7/etc/hadoop/*.xml ./
     //将 input下的文件上传到hdfs分布式文件系统中/one目录下
     hdfs dfs -put ./* /one
     //检查上传到hdfs中的文件
     hdfs dfs -ls /one
     //执行jar文件,务必保证计算结果目录 /output 在hdfs上不存在。不然报错
     hadoop jar /app/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep /one /output 'dfs[a-z.]+'
     //将计算结果目录导出到服务器下~/input目录中
    hdfs dfs -get /output
    // 查看内容
    cat output/*
    
    ```
    --- 
    
    # ZK 安装
    # 下载zk解压并安装:(zookeeper-3.4.9.tar.gz)
    # 设置环境变量
    ![](http://one17356s.bkt.clouddn.com/17-11-2/30838835.jpg)
    # 改配置文件(配置文件存放在$ZOOKEEPER_HOME/conf/目录下,将zoo_sample.cfg文件名称改为zoo.cfg)
    配置说明:
    - tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
    - dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
    - clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
    
    ![](http://one17356s.bkt.clouddn.com/18-7-8/79348236.jpg)
    4.1单机模式
    - 下载zookeeper的安装包之后, 解压到合适目录. 进入zookeeper目录下的conf子目录, 创建`cp zoo_sample.cfg zoo.cfg`根据模板创建配置文件,并配置如下参数。
    - tickTime=2000 
    - dataDir=/home/vc/dev/MQ/ZK/data
    - dataLogDir=/home/vc/dev/MQ/ZK/log
    - clientPort=2181 
    ## 每个参数的含义说明
    
    - tickTime: zookeeper中使用的tick基本时间单位, 毫秒值.
    - dataDir: 数据目录. 可以是任意目录.
    - dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
    - clientPort: 监听client连接的端口号
    
    
    # 启动zk
    `/dev/Zk/zookeeper-3.4.9/bin$ ./zkServer.sh start`
    ![](http://one17356s.bkt.clouddn.com/17-11-2/76638495.jpg)
    
    # 查看是否起来
    使用命令:`netstat -antp | grep 2181`
    ![](http://one17356s.bkt.clouddn.com/17-11-2/15616237.jpg)
    
    # 通过zCl.sh链接到zk服务
    
    ```
     ./zkCli.sh -server localhost:2181 链接到本机zk服务
     history 执行命令
     quit 客户端断开zkserver链接
     
    ```
    ![](http://one17356s.bkt.clouddn.com/18-8-27/4122129.jpg)
    
    # 关闭Zk服务
    `./zkServer.sh stop`
    
    ---
    
    
    
    # [HIVE SQOOP HBASE安装博客链接:](https://www.cnblogs.com/DFX339/p/9550213.html)
    
    # HIVE-INSTALL
    - 下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
    - 上传到Linux指定目录,解压:  
    	
    ```
    mkdir hive    
    mv apache-hive-2.3.3-bin.tar.gz  hive
    tar -zxvf apache-hive-2.3.3-bin.tar.gz
    mv apache-hive-2.3.3-bin apache-hive-2.3.3
    ### 安装目录为:/app/hive/apache-hive-2.3.3
    ```
    
    - 配置环境变量:
    
    ```
    sudo  vi /etc/profile
    添加:export HIVE_HOME=/app/hive/apache-hive-2.3.3
    	  export PATH=$PATH:$HIVE_HOME/bin
    :wq    #保存退出
    ```
    
    - 修改HIVE配置文件:
    	- 配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):	
    	
        ```
        cd /app/hive/apache-hive-2.3.3/conf
        cp hive-env.sh.template   hive-env.sh
        在文件中添加如下内容(去掉#,并把目录改为自己设定的目录)
        export HADOOP_HEAPSIZE=1024
        export HADOOP_HOME=/app/hadoop/hadoop-2.7.7   #hadoop的安装目录
        export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
        export HIVE_HOME=/app/hive/apache-hive-2.3.3
        export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
        export JAVA_HOME=/app/lib/jdk
        ```
    
    	
    - 创建hdfs文件目录:
    
        ```
        cd /app/hive/apache-hive-2.3.3
        mkdir hive_site_dir
        cd hive_site_dir
        hdfs dfs -mkdir -p warehouse   #使用这条命令的前提是hadoop已经安装好了
        hdfs dfs -mkdir -p tmp
        hdfs dfs -mkdir -p log
        hdfs dfs -chmod -R 777 warehouse
        hdfs dfs -chmod -R 777 tmp
        hdfs dfs -chmod -R 777 log
        创建临时文件夹:
        cd  /app/hive/apache-hive-2.3.3
        mkdir  tmp
        ```
    
    	
    - 配置文件hive-site.xml (在原有的基础上修改):	
    
        ```
        cp hive-default.xml.template   hive-site.xml 
        vi hive-site.xml
        ```
    
    	- 配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName
    	
        ```
        <!--mysql database connection setting -->
        <property>
        	<name>javax.jdo.option.ConnectionDriverName</name>
        	<value>com.mysql.jdbc.Driver</value>
        </property>
        
        <property>
        	<name>javax.jdo.option.ConnectionURL</name>
        	<value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
        </property>
        
        <property>
        	<name>javax.jdo.option.ConnectionUserName</name>
        	<value>szprd</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>szprd</value>
        </property>
        ```
    
    
    - 配置hdfs文件目录
    
        ```
        <property>
        		<name>hive.exec.scratchdir</name>
        		<!--<value>/tmp/hive</value>-->
        		<value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
        		<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
               </property>
        	
               <property>
                     <name>hive.metastore.warehouse.dir</name>
                     <value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
               </property>
        	
        	<property>
        		<name>hive.exec.local.scratchdir</name>
        		<!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
        		<value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
        		<description>Local scratch space for Hive jobs</description>
        	</property>
          
             <property>
                <name>hive.downloaded.resources.dir</name>
                <!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
        	<value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
                <description>Temporary local directory for added resources in the remote file system.</description>
             </property>
          
             <property>
                 <name>hive.querylog.location</name>
                 <!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
        	 <value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
                 <description>Location of Hive run time structured log file</description>
             </property>
          
          
          <property>
            <name>hive.metastore.schema.verification</name>
            <value>false</value>
            <description>
              Enforce metastore schema version consistency.
              True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
                    schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
                    proper metastore schema migration. (Default)
              False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
            </description>
          </property>
        ```
    
      
    **修改完hive-site.xml 配置文件后,wq 保存退出**
    	
    - 下载合适版本的mysql驱动包,放到HIVE安装目录的 lib目录下
    	https://dev.mysql.com/downloads/connector/j/
    	
    - 始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )
    	
        ```
        cd  /app/hive/apache-hive-2.3.3/bin
        ./schematool -initSchema -dbType mysql
        ```
    
    	 
    - 启动hive
    
    	`hive   #这里配置了环境变量后(/etc/profile),可以在任意目录下执行 `
    	
    - 实时查看日志启动hive命令(在hive安装目录的bin目录下执行): `./hive -hiveconf hive.root.logger=DEBUG,console`
    
    --- 	 
    	 
    	 
    # HBASE INSTALL
    - [下载hbase安装包:](http://hbase.apache.org/downloads.html)
    
    
    - 解压: `tar -zxvf  hbase-1.2.6.1-bin.tar.gz`
    
    - 配置环境变量:	(加在最后面)
    
        ```
        vi  /etc/profile
        #HBase Setting
        export HBASE_HOME=/app/hbase/hbase-1.2.6.1
        export PATH=$PATH:$HBASE_HOME/bin
        ```
    
    - 编辑配置文件:  `hbase-env.sh`
        
        ```
        # 默认为ture,表示使用内建的zk,false使用外部zk系统
        export HBASE_MANAGES_ZK=false
        export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids   #如果该目录不存在,则先创建
        export JAVA_HOME=/app/lib/jdk   #指定JDK的安装目录
        ```
    
     
    - 编辑配置文件:   `hbase-site.xml`
     在configuration节点添加如下配置:
    
    ```
    <configuration>
    <!-- 备份数据份数 -->
    <name>dfs.replication</name>
    
        <value>1</value>
    
    </property>
    
    <!-- 配置hbase 在hadoop 中的根目录 -->
    <property>
       <name>hbase.rootdir</name>
       <value>hdfs://10.28.85.149:9000/hbase</value>
    </property>
    
    <!-- zk 监听的端口号,必须和zk系统监听的端口一致 -->
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
      </property>
    <!-- zk 属性文件中dataDir属性设置值一致 -->
    <property>
            <name>hbase.zookeeper.property.dataDir</name>
            <value>/app/zookeeper/data</value>
    </property>
    
    <!-- zk 根 znode 节点 -->
    <property>
            <name>zookeeper.znode.parent</name>
            <value>/hbase</value>
    </property>
    
    <!-- hbase 是否是集群安装 -->
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
    <!-- 如果你使用本地文件系统,LocalFileSystem 这个属性设置成 false -->
     <property>
             <name>hbase.unsafe.stream.capability.enforce</name>
             <value>true</value>
            <description>
                    Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
                     WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
             </description>
    </property>
    </configuration>
    ```
    
    
    - 启动zookeeper
    进入zookeeper的安装目录下的bin目录,执行  `./zkServer.sh`
    
    然后启动客户端: `  ./zkCli.sh`
    
    启动成功后,输入:   ` create /hbase hbase`
    
    - 启动hbase
    
    进入Hbase的bin目录:   `./start-hbase.sh`
    		
    ```
    ./hbase shell   #这里启动成功后就可以开始执行hbase相关命令了
    list  #查看当前hbase库中的所有表,没有报错表示成功
    ```
    
    					
    - web访问HBASE:   http://10.28.85.149:16010/master-status   #ip为当前服务器的ip,端口为16010
    		
    --- 
    
    
    
    # SQOOP INSTALL
    
    - [下载安装包](https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/)
    
    
    - 解压 ` tar -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz`
    
        更改文件名:    `mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0`
    		
    - 配置环境变量:
    
        ```
        #Sqoop Setting
        export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
        export PATH=$PATH:$SQOOP_HOME/bin
        ```
    
    
    - 将mysql的驱动包复制到 Sqoop安装目录的lib目录下
        下载地址:https://dev.mysql.com/downloads/connector/j/
    
    - 编辑配置文件:  sqoop的安装目录下的 conf下
    
    ```
    vi sqoop-env.sh
    
    #Set path to where bin/hadoop is available
    export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7
    
    #Set path to where hadoop-*-core.jar is available
    export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7
    
    #set the path to where bin/hbase is available
    export HBASE_HOME=/app/hbase/hbase-1.2.6.1
    
    #Set the path to where bin/hive is available
    export HIVE_HOME=/app/hive/apache-hive-2.3.3
    
    #Set the path for where zookeper config dir is
    export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
    ```
    
    
    - 测试sqoop的安装
    	- sqoop help  #可以查看到sqoop的相关命令
    	
    	- 测试sqoop的连接: 查看此连接信息下的所有数据库
    	
        ```
        sqoop list-databases 
        --connect jdbc:mysql://10.28.85.148:3306/data_mysql2hive 
        --username root 
        --password Abcd1234
        ```
    
    
    
    --- 
    
    # oozie 安装 
    # 安装基于oozie-4.0.0-cdh5.3.6.tar.gz oozie 版本
    安装之前准备条件:
    - 可用的mysql数据库
    - 已经安装好的hadoop集群
    - oozie 最终编译好的安装包中 `oozie-server` 就是一个tomcat环境,不用另外安装tomcat 环境。
    ## 安装
    - 下载编译后的压缩包:`wget http://archive.cloudera.com/cdh5/cdh/5/oozie-4.0.0-cdh5.3.6.tar.gz`
    - 解压缩到所指定的目录 :`tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz`,这里使用的目录是: `/app/oozie`
    - 设置全局环境变量:`sudo vim /etc/profile`
    ```
    
    #oozie setting
    export OOZIE_HOME=/app/oozie/oozie-4.0.0-cdh5.3.6
    export PATH=$PATH:$OOZIE_HOME/bin
    ```
    
    - 设置 ` Oozie安装目录/conf/oozie-env.sh   ` 设置环境变量
    同时oozie的web console 的端口也在这里进行设置:
    `OOZIE_HTTP_PORT ` 设置 oozie web 服务的监听端口,默认是11000
    ```
    
    export OOZIE_CONF=${OOZIE_HOME}/conf
    export OOZIE_DATA=${OOZIE_HOME}/data
    export OOZIE_LOG=${OOZIE_HOME}/logs
    export CATALINA_BASE=${OOZIE_HOME}/oozie-server
    export CATALINA_HOME=${OOZIE_HOME}/oozie-server
    ```
    
    - 在Oozie根目录下创建libext文件夹,并将Oozie依赖的其他第三方jar移动到该目录下面。`mkdir libext`
        
        - 将下载的ext2.2添加到 libext 目录 :` cp ext-2.2.zip oozie-5.0.0/libext/`
        - 添加hadoop lib下的包到libext目录,进入libext目录`cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/*.jar ./`和 ` cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/lib/*.jar ./`
        - 添加对于存储元数据的mysql数据库的驱动(`mysql-connector-java-5.1.41.jar`)
    
    - hadoop 设置oozie 代理用户设置:
    只需要替换xxx 为你oozie提交任务的用户名即可。
        - hadoop.proxyuser.**xxx**.hosts
        
        - hadoop.proxyuser.**xxx**.groups
    ```
    <!-- oozie    -->
    <property>
        <name>hadoop.proxyuser.imodule.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.imodule.groups</name>
        <value>*</value>
      </property>
    ```
    
    - 在hdfs上设置Oozie的公用jar文件夹,
    
    hadoop的默认端口号是8020,我改成了9000,所以这里注意一下:
    
    遇到一个问题是:NameNode 处于 safe mode,需要关闭安全模式:`hdfs dfsadmin -safemode leave`
    
    ```
     oozie-setup.sh sharelib create -fs hdfs://10.28.85.149:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
    ```
    - 创建Oozie的war文件
    
    先将hadoop相关包,mysql相关包,ext相关压缩包放到libext文件夹中,然后运行:`oozie-setup.sh prepare-war` 命令创建war包。
    
    
    - oozie 安装目录conf/oozie-site.xml
    
    
    oozie.service.HadoopAccessorService.hadoop.configurations属性的值为本地hadoop目录的配置文件路径:
    ```
     <configuration>
    <property>
            <name>oozie.services</name>
            <value>
            org.apache.oozie.service.JobsConcurrencyService,
                org.apache.oozie.service.SchedulerService,
                org.apache.oozie.service.InstrumentationService,
                org.apache.oozie.service.MemoryLocksService,
                org.apache.oozie.service.CallableQueueService,
                org.apache.oozie.service.UUIDService,
                org.apache.oozie.service.ELService,
                org.apache.oozie.service.AuthorizationService,
                org.apache.oozie.service.UserGroupInformationService,
                org.apache.oozie.service.HadoopAccessorService,
                org.apache.oozie.service.URIHandlerService,
                org.apache.oozie.service.DagXLogInfoService,
                org.apache.oozie.service.SchemaService,
                org.apache.oozie.service.LiteWorkflowAppService,
                org.apache.oozie.service.JPAService,
                org.apache.oozie.service.StoreService,
                org.apache.oozie.service.CoordinatorStoreService,
                org.apache.oozie.service.SLAStoreService,
                org.apache.oozie.service.DBLiteWorkflowStoreService,
                org.apache.oozie.service.CallbackService,
                org.apache.oozie.service.ActionService,
                org.apache.oozie.service.ShareLibService,
                org.apache.oozie.service.ActionCheckerService,
                org.apache.oozie.service.RecoveryService,
                org.apache.oozie.service.PurgeService,
                org.apache.oozie.service.CoordinatorEngineService,
                org.apache.oozie.service.BundleEngineService,
                org.apache.oozie.service.DagEngineService,
                org.apache.oozie.service.CoordMaterializeTriggerService,
                org.apache.oozie.service.StatusTransitService,
                org.apache.oozie.service.PauseTransitService,
                org.apache.oozie.service.GroupsService,
                org.apache.oozie.service.ProxyUserService,
                org.apache.oozie.service.XLogStreamingService,
                org.apache.oozie.service.JvmPauseMonitorService
            </value>
        </property>
        <!-- 配置hadoop etc/hadoop目录  -->
        <property>
            <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
            <value>*=/app/hadoop/hadoop-2.7.7/etc/hadoop</value>
        </property>
        <property>
            <name>oozie.service.JPAService.create.db.schema</name>
            <value>true</value>
        </property>
    
        <property>
            <name>oozie.service.JPAService.jdbc.driver</name>
            <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
            <name>oozie.service.JPAService.jdbc.url</name>
            <value>jdbc:mysql://10.28.85.148:3306/ooize?createDatabaseIfNotExist=true</value>
        </property>
    
        <property>
            <name>oozie.service.JPAService.jdbc.username</name>
            <value>root</value>
        </property>
        
        <property>
            <name>oozie.service.JPAService.jdbc.password</name>
            <value>Abcd1234</value>
        </property>
    
    </configuration>
    ```
    
    - 运行Oozie服务并检查是否安装完成
    `oozied.sh run 或者oozied.sh start` (前者在前端运行,后者在后台运行)
    - 关闭oozie 服务: `oozied.sh stop`
    - 命令行检查oozie web 状态(`oozie admin -oozie http://10.28.85.149:11000/oozie -status `)  返回:`System mode: NORMAL`
    - 然后使用shareliblist命令查看相关内容 `oozie admin -shareliblist -oozie http://localhost:11000/oozie`
    - 页面访问:`http://10.28.85.149:11000/oozie/`
    
    **遇到 了一个问题**
    
    ```
    Sep 03, 2018 4:36:47 PM org.apache.catalina.core.StandardWrapperValve invoke
    SEVERE: Servlet.service() for servlet jsp threw exception
    java.lang.NullPointerException
            at org.apache.jsp.index_jsp._jspInit(index_jsp.java:25)
            at org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)
            at org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)
            at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
            at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
            at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
            at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
            at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
            at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:154)
            at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:594)
            at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:553)
            at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:159)
            at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
            at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
            at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
            at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
            at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
            at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
            at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
            at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
            at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
            at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
            at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
            at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
            at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
            at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
            at java.lang.Thread.run(Thread.java:745)
    ```
    问题原因是工程目录下`WEB-INF/lib`目录和tomcat下lib目录都有servlet-api.jar ,jsp-api.jar 文件造成的。
    `/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `下 和`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/lib`两个目录下有具有相同的jar包造成了冲突。`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server`这个目录下就是oozie-server的tomcat 环境。目录下的lib目录就是tomcat运行时jar包。
    
    解决办法:将`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `目录下的:servlet-api-2.5-6.1.14.jar, servlet-api-2.5.jar, jsp-api-2.1.jar 三个文件删除即可。
    
    然后就可以顺利启动了
    ![](http://one17356s.bkt.clouddn.com/18-9-3/48205608.jpg)
    
    ---		
    
    Pig的安装
    # 前提 ### hadoop 2.7.7 已安装 ### jdk1.7+ # 安装 ``` tar -xzvf pig-0.17.0.tar.gz # Pig setting export PIG_HOME=/app/pig/pig-0.17.0 export PATH=$PATH:$PIG_HOME/bin ``` # 测试 ``` -- 本地模式 pig -x local -- mapreduce模式 pig -x mapreduce ``` ![](http://one17356s.bkt.clouddn.com/18-8-28/13040171.jpg) ---
  • 相关阅读:
    流程图
    如何撰写简历
    产品经理-visio
    关于 EF 对象的创建问题
    LINQ To EF
    IQueryable 与 IEnumberable 接口的区别
    UWP自动填充控件AutoSuggestBox小优化
    xamarin UWP证书问题汇总
    xamarin UWP中MessageDialog与ContentDialog的区别
    xamarin UWP自定义圆角按钮
  • 原文地址:https://www.cnblogs.com/DFX339/p/9550213.html
Copyright © 2011-2022 走看看