zoukankan      html  css  js  c++  java
  • 【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详细解释

    环境:
    hadoop2.2.0
    hive0.13.1
    Ubuntu 14.04 LTS
    java version "1.7.0_60"
    Oracle10g

    ***欢迎转载。请注明来源***   
    http://blog.csdn.net/u010967382/article/details/38709751

    到下面地址下载安装包
    http://mirrors.cnnic.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz

    安装包解压到server上
    /home/fulong/Hive/apache-hive-0.13.1-bin

    改动环境变量,加入下面内容
    export HIVE_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin
    export PATH=$HIVE_HOME/bin:$PATH

    进到conf文件夹下拷贝模板配置文件重命名
    fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ ls
    hive-default.xml.template  hive-exec-log4j.properties.template
    hive-env.sh.template       hive-log4j.properties.template
    fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-env.sh.template hive-env.sh
    fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-default.xml.template hive-site.xml
    fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ ls
    hive-default.xml.template  hive-env.sh.template                 hive-log4j.properties.template
    hive-env.sh                hive-exec-log4j.properties.template  hive-site.xml

    改动配置文件hive-env.sh中的下面几处。分别制定Hadoop的根文件夹,Hive的conf和lib文件夹
    # Set HADOOP_HOME to point to a specific hadoop install directory
    HADOOP_HOME=/home/fulong/Hadoop/hadoop-2.2.0

    # Hive Configuration Directory can be controlled by:
    export HIVE_CONF_DIR=/home/fulong/Hive/apache-hive-0.13.1-bin/conf

    # Folder containing extra ibraries required for hive compilation/execution can be controlled by:
    export HIVE_AUX_JARS_PATH=/home/fulong/Hive/apache-hive-0.13.1-bin/lib

    改动配置文件hive-site.sh中的下面几处连接Oracle相关參数
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:oracle:thin:@192.168.0.138:1521:orcl</value>
      <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>oracle.jdbc.driver.OracleDriver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
      <description>username to use against metastore database</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hivefbi</value>
      <description>password to use against metastore database</description>
    </property>


    配置log4j
    在$HIVE_HOME下创建log4j文件夹,用于存储日志文件
    拷贝模板重命名
    fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-log4j.properties.template hive-log4j.properties

    改动存放日志的文件夹
    hive.log.dir=/home/fulong/Hive/apache-hive-0.13.1-bin/log4j

    拷贝Oracle JDBC的jar包
    将相应Oracle的jdbc包复制到$HIVE_HOME/lib下

    启动Hive
    fulong@FBI006:~/Hive/apache-hive-0.13.1-bin$ hive
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
    14/08/20 17:14:05 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
    14/08/20 17:14:05 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

    Logging initialized using configuration in file:/home/fulong/Hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties
    Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/fulong/Hadoop/hadoop-2.2.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
    hive>

    验证
    打算创建一张表存储搜狗实验室下载的用户搜索行为日志。

    数据下载地址:
    http://www.sogou.com/labs/dl/q.html

    首先创建表:
    hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by ' ' lines terminated by ' ' stored as textfile;

    此时会报错:
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : ORA-01754: a table may contain only one column of type LONG

    解决的方法:
    用解压缩工具打开${HIVE_HOME}/lib中的hive-metastore-0.13.0.jar,发现名为package.jdo的文件。打开该文件并找到以下的内容。
    <field name="viewOriginalText" default-fetch-group="false">
            <column name="VIEW_ORIGINAL_TEXT" jdbc-type="LONGVARCHAR"/>
    </field>
    <field name="viewExpandedText" default-fetch-group="false">
            <column name="VIEW_EXPANDED_TEXT" jdbc-type="LONGVARCHAR"/>
    </field>
    能够发现列VIEW_ORIGINAL_TEXT和VIEW_EXPANDED_TEXT的类型都为LONGVARCHAR,相应于Oracle中的LONG,这样就与Oracle表仅仅能存在一列类型为LONG的列的要求相矛盾,所以就出现错误了。


    依照Hive官网的建议将该两列的jdbc-type的值改为CLOB。改动后的内容例如以下所看到的。
    <field name="viewOriginalText"default-fetch-group="false">
                 <column name="VIEW_ORIGINAL_TEXT" jdbc-type="CLOB"/>
    </field>
    <field name="viewExpandedText"default-fetch-group="false">
                 <column name="VIEW_EXPANDED_TEXT" jdbc-type="CLOB"/>
    </field>

    改动以后,重新启动hive。


    又一次运行创建表的命令。创建表成功:
    hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by ' ' lines terminated by ' ' stored as textfile;
    OK
    Time taken: 0.986 seconds

    将本地数据载入进表中:
    hive> load data local inpath '/home/fulong/Downloads/SogouQ.reduced' overwrite into table searchlog;
    Copying data from file:/home/fulong/Downloads/SogouQ.reduced
    Copying file: file:/home/fulong/Downloads/SogouQ.reduced
    Loading data to table default.searchlog
    rmr: DEPRECATED: Please use 'rm -r' instead.
    Deleted hdfs://fulonghadoop/user/hive/warehouse/searchlog
    Table default.searchlog stats: [numFiles=1, numRows=0, totalSize=152006060, rawDataSize=0]
    OK
    Time taken: 25.705 seconds

    查看全部表:
    hive> show tables;
    OK
    searchlog
    Time taken: 0.139 seconds, Fetched: 1 row(s)

    统计行数:
    hive> select count(*) from searchlog;
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1407233914535_0001, Tracking URL = http://FBI003:8088/proxy/application_1407233914535_0001/
    Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1407233914535_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2014-08-20 18:03:17,667 Stage-1 map = 0%,  reduce = 0%
    2014-08-20 18:04:05,426 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.46 sec
    2014-08-20 18:04:27,317 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.74 sec
    MapReduce Total cumulative CPU time: 4 seconds 740 msec
    Ended Job = job_1407233914535_0001
    MapReduce Jobs Launched:
    Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.74 sec   HDFS Read: 152010455 HDFS Write: 8 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 740 msec
    OK
    1724264
    Time taken: 103.154 seconds, Fetched: 1 row(s)






    版权声明:本文博客原创文章。博客,未经同意,不得转载。

  • 相关阅读:
    Unity3D系列教程--使用免费工具在Unity3D中开发2D游戏 第一节
    第十三周项目2:形状类族中的纯虚函数
    js中的null和undefined
    javaScript Windows相关
    Array和ArrayList的异同点
    汉语-词语-人才:T型人才
    图书-计算机-软件编程:《程序员的自我需要》
    智力-智力测试-门萨:百科
    智力开发-早教:七田真
    经济-AMA:百科
  • 原文地址:https://www.cnblogs.com/lcchuguo/p/4679189.html
Copyright © 2011-2022 走看看