zoukankan      html  css  js  c++  java
  • Hive学习之路 (二)Hive安装

    正文

    Hive的下载

    下载地址http://mirrors.hust.edu.cn/apache/

    选择合适的Hive版本进行下载,进到stable-2文件夹可以看到稳定的2.x的版本是2.3.3

    Hive的安装

    1、本人使用MySQL做为Hive的元数据库,所以先安装MySQL。

    MySql安装过程http://www.cnblogs.com/qingyunzong/p/8294876.html

    2、上传Hive安装包

    3、解压安装包

    [hadoop@hadoop3 ~]$ tar -zxvf apache-hive-2.3.3-bin.tar.gz -C apps/

    4、修改配置文件

    配置文件所在目录apache-hive-2.3.3-bin/conf

    复制代码
    [hadoop@hadoop3 apps]$ cd apache-hive-2.3.3-bin/
    [hadoop@hadoop3 apache-hive-2.3.3-bin]$ ls
    bin  binary-package-licenses  conf  examples  hcatalog  jdbc  lib  LICENSE  NOTICE  RELEASE_NOTES.txt  scripts
    [hadoop@hadoop3 apache-hive-2.3.3-bin]$ cd conf/
    [hadoop@hadoop3 conf]$ ls
    beeline-log4j2.properties.template    ivysettings.xml
    hive-default.xml.template             llap-cli-log4j2.properties.template
    hive-env.sh.template                  llap-daemon-log4j2.properties.template
    hive-exec-log4j2.properties.template  parquet-logging.properties
    hive-log4j2.properties.template
    [hadoop@hadoop3 conf]$ pwd
    /home/hadoop/apps/apache-hive-2.3.3-bin/conf
    [hadoop@hadoop3 conf]$ 
    复制代码

    新建hive-site.xml并添加以下内容

    [hadoop@hadoop3 conf]$ touch hive-site.xml
    [hadoop@hadoop3 conf]$ vi hive-site.xml 
    复制代码
    <configuration>
            <property>
                    <name>javax.jdo.option.ConnectionURL</name>
                    <value>jdbc:mysql://hadoop1:3306/hivedb?createDatabaseIfNotExist=true</value>
                    <description>JDBC connect string for a JDBC metastore</description>
                    <!-- 如果 mysql 和 hive 在同一个服务器节点,那么请更改 hadoop02 为 localhost -->
            </property>
            <property>
                    <name>javax.jdo.option.ConnectionDriverName</name>
                    <value>com.mysql.jdbc.Driver</value>
                    <description>Driver class name for a JDBC metastore</description>
            </property>
            <property>
                    <name>javax.jdo.option.ConnectionUserName</name>
                    <value>root</value>
                    <description>username to use against metastore database</description>
            </property>
            <property>
                    <name>javax.jdo.option.ConnectionPassword</name>
                    <value>root</value>
            <description>password to use against metastore database</description>
            </property>
    </configuration>
    复制代码

    以下可选配置,该配置信息用来指定 Hive 数据仓库的数据存储在 HDFS 上的目录

            <property>
                    <name>hive.metastore.warehouse.dir</name>
                    <value>/hive/warehouse</value>
                    <description>hive default warehouse, if nessecory, change it</description>
            </property>    

    5、 一定要记得加入 MySQL 驱动包(mysql-connector-java-5.1.40-bin.jar)该 jar 包放置在 hive 的根路径下的 lib 目录

     

    6、 安装完成,配置环境变量

    [hadoop@hadoop3 lib]$ vi ~/.bashrc 
    #Hive
    export HIVE_HOME=/home/hadoop/apps/apache-hive-2.3.3-bin
    export PATH=$PATH:$HIVE_HOME/bin

    使修改的配置文件立即生效

    [hadoop@hadoop3 lib]$ source ~/.bashrc 

    7、 验证 Hive 安装

    复制代码
    [hadoop@hadoop3 ~]$ hive --help
    Usage ./hive <parameters> --service serviceName <service parameters>
    Service List: beeline cleardanglingscratchdir cli hbaseimport hbaseschematool help hiveburninclient hiveserver2 hplsql jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool version 
    Parameters parsed:
      --auxpath : Auxiliary jars 
      --config : Hive configuration directory
      --service : Starts specific service/component. cli is default
    Parameters used:
      HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
      HIVE_OPT : Hive options
    For help on a particular service:
      ./hive --service serviceName --help
    Debug help:  ./hive --debug --help
    [hadoop@hadoop3 ~]$ 
    复制代码

    8、 初始化元数据库

      注意:当使用的 hive 是 2.x 之前的版本,不做初始化也是 OK 的,当 hive 第一次启动的 时候会自动进行初始化,只不过会不会生成足够多的元数据库中的表。在使用过程中会 慢慢生成。但最后进行初始化。如果使用的 2.x 版本的 Hive,那么就必须手动初始化元 数据库。使用命令:

    复制代码
    [hadoop@hadoop3 ~]$ schematool -dbType mysql -initSchema
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:     jdbc:mysql://hadoop1:3306/hivedb?createDatabaseIfNotExist=true
    Metastore Connection Driver :     com.mysql.jdbc.Driver
    Metastore connection User:     root
    Starting metastore schema initialization to 2.3.0
    Initialization script hive-schema-2.3.0.mysql.sql
    Initialization script completed
    schemaTool completed
    [hadoop@hadoop3 ~]$ 
    复制代码

    9、 启动 Hive 客户端

    hive --service cli和hive效果一样

    复制代码
    [hadoop@hadoop3 ~]$ hive --service cli
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-2.3.3-bin/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> 
    复制代码


    基本使用

    现有一个文件student.txt,将其存入hive中,student.txt数据格式如下:

    复制代码
    95002,刘晨,女,19,IS
    95017,王风娟,女,18,IS
    95018,王一,女,19,IS
    95013,冯伟,男,21,CS
    95014,王小丽,女,19,CS
    95019,邢小丽,女,19,IS
    95020,赵钱,男,21,IS
    95003,王敏,女,22,MA
    95004,张立,男,19,IS
    95012,孙花,女,20,CS
    95010,孔小涛,男,19,CS
    95005,刘刚,男,18,MA
    95006,孙庆,男,23,CS
    95007,易思玲,女,19,MA
    95008,李娜,女,18,CS
    95021,周二,男,17,MA
    95022,郑明,男,20,MA
    95001,李勇,男,20,CS
    95011,包小柏,男,18,MA
    95009,梦圆圆,女,18,MA
    95015,王君,男,18,MA
    复制代码

    1、创建一个数据库myhive

    hive> create database myhive;
    OK
    Time taken: 7.847 seconds
    hive> 

    2、使用新的数据库myhive

    hive> use myhive;
    OK
    Time taken: 0.047 seconds
    hive> 

    3、查看当前正在使用的数据库

    hive> select current_database();
    OK
    myhive
    Time taken: 0.728 seconds, Fetched: 1 row(s)
    hive> 

    4、在数据库myhive创建一张student表

    hive> create table student(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
    OK
    Time taken: 0.718 seconds
    hive> 

    5、往表中加载数据

    hive> load data local inpath "/home/hadoop/student.txt" into table student;
    Loading data to table myhive.student
    OK
    Time taken: 1.854 seconds
    hive> 

    6、查询数据

    复制代码
    hive> select * from student;
    OK
    95002    刘晨    女    19    IS
    95017    王风娟    女    18    IS
    95018    王一    女    19    IS
    95013    冯伟    男    21    CS
    95014    王小丽    女    19    CS
    95019    邢小丽    女    19    IS
    95020    赵钱    男    21    IS
    95003    王敏    女    22    MA
    95004    张立    男    19    IS
    95012    孙花    女    20    CS
    95010    孔小涛    男    19    CS
    95005    刘刚    男    18    MA
    95006    孙庆    男    23    CS
    95007    易思玲    女    19    MA
    95008    李娜    女    18    CS
    95021    周二    男    17    MA
    95022    郑明    男    20    MA
    95001    李勇    男    20    CS
    95011    包小柏    男    18    MA
    95009    梦圆圆    女    18    MA
    95015    王君    男    18    MA
    Time taken: 2.455 seconds, Fetched: 21 row(s)
    hive> 
    复制代码

    7、查看表结构

    复制代码
    hive> desc student;
    OK
    id                      int                                         
    name                    string                                      
    sex                     string                                      
    age                     int                                         
    department              string                                      
    Time taken: 0.102 seconds, Fetched: 5 row(s)
    hive> 
    复制代码
    复制代码
    hive> desc extended student;
    OK
    id                      int                                         
    name                    string                                      
    sex                     string                                      
    age                     int                                         
    department              string                                      
              
    Detailed Table Information    Table(tableName:student, dbName:myhive, owner:hadoop, createTime:1522750487, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:sex, type:string, comment:null), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:department, type:string, comment:null)], location:hdfs://myha01/user/hive/warehouse/myhive.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1522750695, totalSize=523, numRows=0, rawDataSize=0, numFiles=1}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)    
    Time taken: 0.127 seconds, Fetched: 7 row(s)
    hive> 
    复制代码
    复制代码
    hive> desc formatted student;
    OK
    # col_name                data_type               comment             
              
    id                      int                                         
    name                    string                                      
    sex                     string                                      
    age                     int                                         
    department              string                                      
              
    # Detailed Table Information          
    Database:               myhive                   
    Owner:                  hadoop                   
    CreateTime:             Tue Apr 03 18:14:47 CST 2018     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://myha01/user/hive/warehouse/myhive.db/student     
    Table Type:             MANAGED_TABLE            
    Table Parameters:          
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               523                 
        transient_lastDdlTime    1522750695          
              
    # Storage Information          
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:          
        field.delim             ,                   
        serialization.format    ,                   
    Time taken: 0.13 seconds, Fetched: 34 row(s)
    hive> 
    复制代码
  • 相关阅读:
    java oop
    traceroute
    ping
    ICMP Internet控制报文协议
    window,centos双系统坏了
    bcm53344 gpio驱动分析
    ioctl参数cmd=2错误
    BCM_GPIO驱动测试
    C++ 类和对象
    C++ 内存管理
  • 原文地址:https://www.cnblogs.com/makeryan/p/11358953.html
Copyright © 2011-2022 走看看