zoukankan      html  css  js  c++  java
  • Hive的安装及配置


    title: Hive的安装及配置
    summary: 关键词:Hive ubuntu 安装和配置 Derby MySQL PostgreSQL 数据库连接
    date: 2019-5-19 13:25
    urlname: 2019051903
    author: foochane
    img: /medias/featureimages/19.jpg
    categories: 大数据
    tags:

    • hive
    • 大数据

    本文作者:foochane 
    本文链接:https://foochane.cn/article/2019051903.html

    1 安装说明

    在安装hive之前,需要安装hadoop集群环境,如果没有可以查看:Hadoop分布式集群的搭建

    1.1 用到的软件

    软件 版本 下载地址
    linux Ubuntu Server 18.04.2 LTS https://www.ubuntu.com/download/server
    hadoop hadoop-2.7.1 http://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
    java jdk-8u211-linux-x64 https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
    hive hive-2.3.5 http://mirror.bit.edu.cn/apache/hive/hive-2.3.5/apache-hive-2.3.5-bin.tar.gz
    mysql-connector-java mysql-connector-java-5.1.45.jar 命令行安装
    postgresql-jdbc4 postgresql-jdbc4.jar 命令行安装

    1.2 节点安排

    名称 ip hostname
    主节点 192.168.233.200 Master
    子节点1 192.168.233.201 Slave01
    子节点2 192.168.233.202 Slave02

    1.3 说明

    注意:本文的hiveMySQLPostgreSQL均只安装在Master节点上,实际生产环境中,需根据实际情况调整

    Hive默认元数据保存在内嵌的 Derby 数据库中,这是最简单的一种存储方式,使用derby存储方式时,运行hive会在当前目录生成一个derby文件和一个metastore_db目录。Derby 数据库中,只能允许一个会话连接,只适合简单的测试,实际生产环境中不适用。 为了支持多用户会话,则需要一个独立的元数据库,使用 MySQL 或者PostgreSQL作为元数据库,Hive 内部对 MySQLPostgreSQL提供了很好的支持。

    本文将逐一介绍hive连接DerbyPostgreSQLMySQL这三种数据库数据库的安装和配置。

    2 hive连接Derby

    2.1 解压

    $ tar -zxvf apache-hive-2.3.5-bin.tar.gz -C /usr/local/bigdata & cd /usr/local/bigdata
    $ mv apache-hive-2.3.5-bin hive-2.3.5
    $ sudo chown -R hadoop:hadoop hive #之前bigdata目录已经修改过权限了
    

    2.2 修改配置文件

    要修改的文件在/usr/local/hive-2.3.5/conf目录下,需要修改hive-site.xmlhive-env.shhive-log4j2.properties这3个文件。

    先把.template文件复制一份出来,然后进行修改。

    $ cd /usr/local/hive-2.3.5/conf
    $ cp hive-default.xml.template hive-site.xml
    $ cp hive-env.sh.template hive-env.sh
    $ cp hive-log4j.properties.template hive-log4j.properties
    

    2.2.1 hive-site.xml(Derby)

    配置Derby只需要修改javax.jdo.option.ConnectionURL指定metastore_db的存储位置即可
    具体修改如下:

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true</value>
        <description>
          JDBC connect string for a JDBC metastore.
          To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
          For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
        </description>
    </property>
    

    2.2.2 hive-env .sh

    添加:

    export HADOOP_HOME=/usr/local/bigdata/hadoop-2.7.1
    export HIVE_CONF_DIR=/usr/local/bigdata/hive-2.3.5/conf
    

    2.2.3 hive-log4j2.properties

    日志配置可以先默认,暂时不修改什么。

    2.3 配置环境变量

    ~/.bashrc文件中添加如下内容,执行source ~/.bashrc使其生效。

    export HIVE_HOME=/usr/local/bigdata/hive-2.3.5
    export PATH=$PATH:/usr/local/bigdata/hive-2.3.5/bin
    

    2.4 为hive创建数据仓库存储目录

    注意先启动hadoop集群

    $ hadoop fs -mkdir -p /user/hive/warehouse
    $ hadoop fs -mkdir -p /tmp
    $ hadoop fs -chmod g+w /user/hive/warehouse
    $ hadoop fs -chmod g+w /tmp
    

    2.4 启动hive

    初始化元数据数据库

    $ schematool -initSchema -dbType derby
    

    成功初始化应该出现如下内容:

    $ schematool -initSchema -dbType derby
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:        jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true
    Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       APP
    Starting metastore schema initialization to 2.3.0
    Initialization script hive-schema-2.3.0.derby.sql
    Initialization script completed
    schemaTool completed
    

    启动hive

    $ hive
    
    

    如果成功运行将出现如下内容:

    $ hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive>
        >
    

    创建表

    create table t1(
         id      int
        ,name    string
        ,hobby   array<string>
        ,add     map<String,string>
        )
        row format delimited
        fields terminated by ','
        collection items terminated by '-'
        map keys terminated by ':'
        ;
    
    hive>
        >
        >
        > show databases;
    OK
    default
    Time taken: 22.279 seconds, Fetched: 1 row(s)
    hive> create table t1(
        >     id      int
        >    ,name    string
        >    ,hobby   array<string>
        >    ,add     map<String,string>
        > )
        > row format delimited
        > fields terminated by ','
        > collection items terminated by '-'
        > map keys terminated by ':'
        > ;
    OK
    Time taken: 1.791 seconds
    hive>
    

    至此,以Derby做元数据库的hive连接方式就配置完成了。

    下面介绍如何将hive连接到PostgreSQLMySQL

    3 PostgreSQL的安装

    3.1 安装

    执行如下命令:

    $ sudo apt install postgresql postgresql-contrib
    

    安装完成后默认会有一个postgres的用户,且没有密码,作为管理员

    3.2 启动PostgreSQL

    $ sudo systemctl enable postgresql
    $ sudo systemctl start postgresql
    

    3.3 登录

    hadoop@Master:~$ sudo -i -u postgres
    postgres@Master:~$ psql
    psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
    Type "help" for help.
    
    postgres=# help
    You are using psql, the command-line interface to PostgreSQL.
    Type:  copyright for distribution terms
           h for help with SQL commands
           ? for help with psql commands
           g or terminate with semicolon to execute query
           q to quit
    postgres=#
    

    4 hive连接PostgreSQL

    4.1 安装PostgreSQL-JDBC驱动

    $ sudo apt-get install libpostgresql-jdbc-java
    $ ln -s /usr/share/java/postgresql-jdbc4.jar /usr/local/bigdata/hive-2.3.5/lib
    

    4.2 修改pg_hba.conf文件

    修改 /etc/postgresql/10/main/pg_hba.conf文件

    # Database administrative login by Unix domain socket
    #local   all             postgres                                peer
    local   all             postgres                                trust
    
    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    
    # "local" is for Unix domain socket connections only
    #local   all             all                                     peer
    local   all             all                                     trust
    # IPv4 local connections:
    #host    all             all             127.0.0.1/32            md5
    host    all             all             127.0.0.1/32            trust
    # IPv6 local connections:
    #host    all             all             ::1/128                 md5
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    #local   replication     all                                     peer
    #local   replication     all                                     peer
    #local   replication     all                                     peer
    local   replication     all                                     trust
    host    replication     all             127.0.0.1/32            trust
    host    replication     all             ::1/128                 trust
    

    4.3 在PostpreSQL中创建数据库和用户

    先创建一个名为hiveuser的用户,密码:123456

    然后创建一个名为metastore的数据库:

    $ sudo -u postgres psql 
    
    postgres=# CREATE USER hiveuser WITH PASSWORD '123456';
    postgres=# CREATE DATABASE metastore;
    

    测试用户和数据库是否能登录

    $ psql -h localhost -U hiveuser -d pymetastore
    

    登录成功说明配置完成

    hadoop@Master:~$  psql -h localhost -U hiveuser -d metastore
    Password for user hive:
    psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
    SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
    Type "help" for help.
    
    pymetastore=>
    

    4.5 修改hive-site.xml(PostgreSQL)

    之前配置的是以Derby做元数据库,现在同样也是修改hive-site.xml文件。
    首先在开头添加如下内容:

      <property>
        <name>system:java.io.tmpdir</name>
        <value>/tmp/hive/java</value>
      </property>
     <property>
        <name>system:user.name</name>
        <value>${user.name}</value>
     </property>
    

    然后修改如下属性:

    name value description
    javax.jdo.option.ConnectionURL jdbc:postgresql://localhost/metastore 指定连接的数据库(之前创建的)
    javax.jdo.option.ConnectionDriverName org.postgresql.Driver 数据库驱动
    javax.jdo.option.ConnectionUserName hiveuser 用户名(之前创建的)
    javax.jdo.option.ConnectionPassword 123456 用户名密码

    具体如下:

      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:postgresql://localhost/metastore</value>
        <description>
          JDBC connect string for a JDBC metastore.
          To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
          For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
        </description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>org.postgresql.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hiveuser</value>
        <description>Username to use against metastore database</description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
        <description>password to use against metastore database</description>
      </property>
    
    

    4.6 启动Hive

    先运行schematool进行初始化:

    schematool -dbType postgres -initSchema
    

    然后执行$ hive 启动hive。

    创建表格进行测试

    hadoop@Master:~$ hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true
    Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/bigdata/hadoop-2.7.7/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive>
        > show databases;
    OK
    default
    Time taken: 12.294 seconds, Fetched: 1 row(s)
    hive> create table t1(
        >     id      int
        >    ,name    string
        >    ,hobby   array<string>
        >    ,add     map<String,string>
        > )
        > row format delimited
        > fields terminated by ','
        > collection items terminated by '-'
        > map keys terminated by ':'
        > ;
    OK
    Time taken: 1.239 seconds
    hive> Connection reset by 192.168.233.200 port 22
    
    

    查看是否创建成功:

    $ psql -h localhost -U hiveuser -d metastore
    psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
    SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
    Type "help" for help.
    
    metastore=> SELECT * from "TBLS";
     TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER  | RETENTION | SD_ID | TBL_NAME |   TBL_TYPE    | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED
    --------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------
          1 |  1560074934 |     1 |                0 | hadoop |         0 |     1 | t1       | MANAGED_TABLE |                    |                    | f
    (1 row)
    
    

    5 MySQL安装

    5.1 安装

    $ sudo apt install mysql-server
    

    5.2 设置MySQL的root用户密码

    如果没有设置密码的话,设置密码。

    这里密码设置为hadoop

    $ mysql -u root -p
    

    6 Hive连接MySQL

    6.1 在MySQL中为Hive新建数据库

    用来存放Hive的元数据。

    与Hive配置文件hive-site.xml中的 mysql://localhost:3306/metastore 对应

    #建立数据库和用户
    mysql> create database if not exists metastore;
    mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY '123456';
    
    #设置远程登录的权限
    mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hiveuser'@'localhost';
    mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hiveuser'@'localhost';
    
    #刷新配置
    mysql> FLUSH PRIVILEGES;
    mysql> quit;
    

    6.2 安装MySQL-JDBC驱动

    $ sudo apt-get install libmysql-java
    $ ln -s /usr/share/java/mysql-connector-java-5.1.45.jar /usr/local/bigdata/hive-2.3.5/lib
    

    6.3 修改修改hive-site.xml(MySQL)

    首先在开头添加如下内容:

      <property>
        <name>system:java.io.tmpdir</name>
        <value>/tmp/hive/java</value>
      </property>
     <property>
        <name>system:user.name</name>
        <value>${user.name}</value>
     </property>
    

    然后修改如下属性:

    name value description
    javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/metastore?useSSL=true 指定连接的数据库(之前创建的)
    javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver 数据库驱动
    javax.jdo.option.ConnectionUserName hiveuser 用户名(之前创建的)
    javax.jdo.option.ConnectionPassword 123456 用户名密码

    具体如下:

      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/metastore?useSSL=true</value>
        <description>
          JDBC connect string for a JDBC metastore.
          To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
          For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
        </description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hiveuser</value>
        <description>Username to use against metastore database</description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
        <description>password to use against metastore database</description>
      </property>
    
    

    6.4 启动hive

    先初始化

    schematool -dbType mysql -initSchema
    

    和前面一样,执行

    $ hive
    

    7 问题总结

    问题1

    初始化derby时报如下错误,提示没有hive-exec-*.jar

    hadoop@Master:~$ schematool -initSchema -dbType derby
    Missing Hive Execution Jar: /usr/local/biddata/hive-2.3.5/lib/hive-exec-*.jar
    
    解决:

    检查该目录下是否确实不存在hive-exec-2.35.jar,如果不存在,下载一个放到该目录下。
    下载地址:https://mvnrepository.com/artifact/org.apache.hive/hive-exec/2.3.5
    如果存在,那一定是环境变量配置有问题,查看HIVE_HOME$HIVE_HOME/bin是否配置正确。

    问题2

    报错:

    Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.na
            at org.apache.hadoop.fs.Path.initialize(Path.java:205)
            at org.apache.hadoop.fs.Path.<init>(Path.java:171)
            at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659)
            at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
            at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549)
            at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750)
            at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
            at java.net.URI.checkPath(URI.java:1823)
            at java.net.URI.<init>(URI.java:745)
            at org.apache.hadoop.fs.Path.initialize(Path.java:202)
            ... 12 more
    
    解决

    hive-site.xml文件开头加入如下配置:

    <property>
        <name>system:java.io.tmpdir</name>
        <value>/tmp/hive/java</value>
      </property>
    <property>
        <name>system:user.name</name>
        <value>${user.name}</value>
    </property>
    

    问题3

    执行$ schematool -dbType postgres -initSchema时报错

    hadoop@Master:~$ schematool -dbType postgres -initSchema
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:        jdbc:postgresql://localhost/pymetastore
    Metastore Connection Driver :    org.postgresql.Driver
    Metastore connection User:       hive
    Starting metastore schema initialization to 2.3.0
    Initialization script hive-schema-2.3.0.postgres.sql
    Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    Underlying cause: java.io.IOException : Schema script failed, errorcode 2
    Use --verbose for detailed stacktrace.
    *** schemaTool failed ***
    

    另外也会有这个错:

    Error: ERROR: relation "txns" already exists (state=42P07,code=0)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    Underlying cause: java.io.IOException : Schema script failed, errorcode 2
    Use --verbose for detailed stacktrace.
    *** schemaTool failed ***
    

    这个问题,我尝试了很久也没有找到原因,网上有说是hive版本的原因,我换了hive-1.2.1hive-1.2.2 等低版本的hive,依然时候有这个问题。
    最后是重新创建用户和数据库就没有这个问题了,感觉是数据库有冲突。

    问题4

    Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    Underlying cause: java.io.IOException : Schema script failed, errorcode 2
    Use --verbose for detailed stacktrace.
    *** schemaTool failed ***
    
    解决:

    注意使用MySQL存储元数据的时候,使用root用户有可能权限不够,会报错。另外,$ schematool -dbType postgres -initSchema执行一次就好了。

  • 相关阅读:
    java环境变量配置
    线性表基本操作(没实现)
    请求路径问题(视频学习)
    SpringMVC_关于<url-pattern>
    SpringMVC_第一个程序
    Spring与Web
    Spring与MyBatis整合上_Mapper动态代理方式
    Spring_Spring与DAO_Spring的事务管理
    Spring_Spring与DAO_Spring的Jdbc模板
    Spring_Spring与AOP_AspectJ基于XML的实现
  • 原文地址:https://www.cnblogs.com/foochane/p/11016003.html
Copyright © 2011-2022 走看看