zoukankan      html  css  js  c++  java
  • 大数据学习(8)Hive基础

    什么是Hive

    Hive是一个基于HDFS的查询引擎。我们日常中的需求如果都自己去写MapReduce来实现的话会很费劲的,Hive把日常用到的MapReduce功能,比如排序、分组等功能进行了抽象,对外提供类似于普通数据库的查询服务。

    它只是封装MapReduce计算,但它本质并不是数据库服务,不适合作为联机服务。通常用于数据仓库的离线计算中。
    在Hive中已经明确说明,不建议使用MapReduce了,而推荐使用Spark。

    安装

    tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /usr/local/

    hive-site.xml:

    <configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost/hive_metastore?createDatabaseIfNotExist=true</value>
    <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>MySQL JDBC driver class</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>user name for connecting to mysql server</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password for connecting to mysql server</description>
    </property>
    </configuration>

    拷贝MySQL驱动jar到hive的lib目录中。

    先启动HDFS:

    start-dfs.sh

    再启动hive

    如果报错:

    Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
    [ERROR] Terminal initialization failed; falling back to unsupported
    java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
    at jline.TerminalFactory.create(TerminalFactory.java:101)
    at jline.TerminalFactory.get(TerminalFactory.java:158)
    at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
    at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
    at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
    at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    
    Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
    at jline.console.ConsoleReader.<init>(ConsoleReader.java:230)
    at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
    at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
    at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

    需要把hive下lib目录中的jline jar文件替换到hadoop中yarn目录中:

    rm /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar 
    cp /usr/local/apache-hive-1.2.2-bin/lib/jline-2.12.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/

    启动之后会自动创建数据库,登录数据库中可以查看到一些元信息:

    mysql> use hive_metastore;
    mysql> use hive_metastore ;
    mysql> select * from dbs;
    +-------+-----------------------+------------------------------------------+---------+------------+------------+
    | DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
    +-------+-----------------------+------------------------------------------+---------+------------+------------+
    | 1 | Default Hive database | hdfs://centos01:9000/user/hive/warehouse | default | public | ROLE |
    +-------+-----------------------+------------------------------------------+---------+------------+------------+
    1 row in set (0.00 sec)

    Hive基本操作

    DDL
    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
    [(col_name data_type [COMMENT col_comment], ...)]
    [COMMENT table_comment]
    [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
    [CLUSTERED BY (col_name, col_name, ...)
    [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
    [ROW FORMAT row_format]
    [STORED AS file_format]
    [LOCATION hdfs_path]
    参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

    Thrift

    启动hiveserver2服务

    ./hiveserver2

    使用beeline连接到这个服务上:

    [root@centos01 bin]# ./beeline 
    Beeline version 1.2.2 by Apache Hive
    beeline> !connect jdbc:hive2://localhost:10000
    Connecting to jdbc:hive2://localhost:10000
    Enter username for jdbc:hive2://localhost:10000: root
    Enter password for jdbc:hive2://localhost:10000: ****
    Connected to: Apache Hive (version 1.2.2)
    Driver: Hive JDBC (version 1.2.2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://localhost:10000> show databases;
    +----------------+--+
    | database_name |
    +----------------+--+
    | default |
    +----------------+--+
    1 row selected (3.666 seconds)
    0: jdbc:hive2://localhost:10000>
  • 相关阅读:
    问题——虚拟机连接,查本地DNS,查软件位置,payload生成,检测注册表变化
    nmap命令解释
    SMB扫描,SMTP扫描
    操作系统识别,SNMP扫描
    服务扫描——查询banner信息,服务识别
    nmap之扫描端口(附加hping3隐藏扫描)
    scapy简单用法——四层发现
    转载 界面组装器模式
    设计模式=外观模式
    如何进行自动化测试和手工测试
  • 原文地址:https://www.cnblogs.com/at0x7c00/p/8098787.html
Copyright © 2011-2022 走看看