zoukankan      html  css  js  c++  java
  • 011-HQL中级1-Hive快捷查询:不启用Mapreduce job启用Fetch task三种方式介绍

    如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:

    hive> SELECT id, money FROM m limit 10;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Cannot run job locally: Input Size (= 235105473) is larger than 
    hive.exec.mode.local.auto.inputbytes.max (= 134217728)
    Starting Job = job_1384246387966_0229, Tracking URL = 
    
    http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
    
    Kill Command = /home/q/hadoop-2.2.0/bin/hadoop job  
    -kill job_1384246387966_0229
    hadoop job information for Stage-1: number of mappers: 1; 
    number of reducers: 0
    2013-11-13 11:35:16,167 Stage-1 map = 0%,  reduce = 0%
    2013-11-13 11:35:21,327 Stage-1 map = 100%,  reduce = 0%,
     Cumulative CPU 1.26 sec
    2013-11-13 11:35:22,377 Stage-1 map = 100%,  reduce = 0%,
     Cumulative CPU 1.26 sec
    MapReduce Total cumulative CPU time: 1 seconds 260 msec
    Ended Job = job_1384246387966_0229
    MapReduce Jobs Launched:
    Job 0: Map: 1   Cumulative CPU: 1.26 sec   
    HDFS Read: 8388865 HDFS Write: 60 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 260 msec
    OK
    1       122
    1       185
    1       231
    1       292
    1       316
    1       329
    1       355
    1       356
    1       362
    1       364
    Time taken: 16.802 seconds, Fetched: 10 row(s)
    View Code

    我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:

    方法一:

    hive> set hive.fetch.task.conversion=more;
    hive> SELECT id, money FROM m limit 10;
    OK
    1       122
    1       185
    1       231
    1       292
    1       316
    1       329
    1       355
    1       356
    1       362
    1       364
    Time taken: 0.138 seconds, Fetched: 10 row(s)

    上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!

    方法二:

    bin/hive --hiveconf hive.fetch.task.conversion=more

    方法三:
    上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:

    <property>
      <name>hive.fetch.task.conversion</name>
      <value>more</value>
      <description>
        Some select queries can be converted to single FETCH task 
        minimizing latency.Currently the query should be single 
        sourced not having any subquery and should not have
        any aggregations or distincts (which incurrs RS), 
        lateral views and joins.
        1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
        2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
      </description>
    </property>

    这样就可以长期启用Fetch任务了

  • 相关阅读:
    eval函数欺负我
    JS Compress and Decompress
    PowerDesigner 把Comment写到name中 和把name写到Comment中 pd7以后版本可用
    vue + axios 通过Blob 转换excel文件流 下载乱码问题
    poj 3687Labeling Balls
    poj 2485Highways
    poj 1258AgriNet
    poj 3041Asteroids
    poj 1035Spell checker
    poj 3020Antenna Placement
  • 原文地址:https://www.cnblogs.com/bjlhx/p/6946267.html
Copyright © 2011-2022 走看看