zoukankan      html  css  js  c++  java
  • Hive常见问题

    1.HQL是否区分大小写  不区分

    hive> select AGE from default.studeNT;  --不区分大小写,即使是表中字段

    2.查看创建表过程  show create table

    hive> show create table default.student; --查看创建表过程的命令
    OK
    createtab_stmt
    CREATE  TABLE `default.student`(
      `age` int, 
      `name` string)
    ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY '	'   --行内分隔符
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://hadoop01:9000/user/hive/warehouse/student'
    TBLPROPERTIES (
      'numFiles'='1', 
      'last_modified_by'='root', 
      'last_modified_time'='1439795658', 
      'COLUMN_STATS_ACCURATE'='true', 
      'transient_lastDdlTime'='1439795895', 
      'numRows'='0', 
      'totalSize'='74', 
      'rawDataSize'='0')
    Time taken: 0.063 seconds, Fetched: 20 row(s)

    3.hive是否支持跨数据库查询?  支持

    hive> select * from default.logs; --和普通sql一样,数据库名.数据表名

    4.hive是否可以打印列名?  set hive.cli.print.header=true;

    hive> select * from default.student limit 5;
    OK
    1       zhangsan   --显示结果不包含列名
    2       lisi
    3       wangwu
    4       zhaoba
    5       liliuy
    Time taken: 0.047 seconds, Fetched: 5 row(s)
    hive> set hive.cli.print.header=true;       --设置true,即显示列名
    hive> select * from default.student limit 5;
    OK
    student.age     student.name  --显示结果包含列名
    1       zhangsan
    2       lisi
    3       wangwu
    4       zhaoba
    5       liliuy
    Time taken: 0.05 seconds, Fetched: 5 row(s)

    5.hive是否可以设置reduce的个数  set mapred.reduce.tasks=5;

    hive> set mapred.reduce.tasks;                       
    mapred.reduce.tasks=-1 --默认reduce个数为-1
    hive> insert overwrite local directory '/root/hive/a' --只能产生一个结果文件
        > row format delimited fields terminated by '	'
        > select * from test.t081901                     
        > distribute by length(country)                  
        > sort by te;  
    hive> set mapred.reduce.tasks=4;      --设置4个reduce               
    hive> insert overwrite local directory '/root/hive/a' --产生4个结果文件
        > row format delimited fields terminated by '	' 
        > select * from test.t081901                     
        > distribute by length(country)     --和普通的MR一样,当【reduce个数>partition个数】,只有两个(等于分区数)结果文件中有数据。             
        > sort by te; 

    6.hive是否支持 exist 或 in 查询吗?  支持

    hive> select * from default.student where student.age in(3,5,29);
    OK
    student.age     student.name
    3       wangwu
    5       liliuy
    Time taken: 10.217 seconds, Fetched: 2 row(s)

    7.hive是否可以查询hive表的集群文件位置?  desc formatted T_NAME

     INPUT__FILE__NAME字段可以查询出Table在Hdfs上的location

    hive> desc formatted default.student; --desc formatted <T_Name>
    OK
    col_name        data_type       comment
    # col_name              data_type               comment             
                     
    age                     int                                         
    name                    string                                      
                     
    # Detailed Table Information             
    Database:               default       --所属数据库           
    Owner:                  root             --Owner        
    CreateTime:             Mon Aug 17 14:24:29 SGT 2015     
    LastAccessTime:         UNKNOWN                  
    Protect Mode:           None                     
    Retention:              0                        
    Location:               hdfs://hadoop01:9000/user/hive/warehouse/student         --HDFS位置
    Table Type:             MANAGED_TABLE            --内部表
    Table Parameters:                
            COLUMN_STATS_ACCURATE   true                
            last_modified_by        root                
            last_modified_time      1439795658          
            numFiles                1                   
            numRows                 0                   
            rawDataSize             0                   
            totalSize               74                  
            transient_lastDdlTime   1439795895          
                     
    # Storage Information            
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:             
            field.delim             	                  
            serialization.format    	                  
    Time taken: 0.078 seconds, Fetched: 35 row(s)
    hive>  select name,age,INPUT__FILE__NAME --其中INPUT__FILE__NAME之间是两个下划线
        >  from default.student;
    OK
    name    age     input__file__name
    zhangsan        1       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    lisi    2       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    wangwu  3       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    zhaoba  4       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    liliuy  5       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    jiujiu  9       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    shiba   18      hdfs://hadoop01:9000/user/hive/warehouse/student/student
    xiyita  28      hdfs://hadoop01:9000/user/hive/warehouse/student/student
    NULL    NULL    hdfs://hadoop01:9000/user/hive/warehouse/student/student
    Time taken: 9.902 seconds, Fetched: 9 row(s)

    8.hive是否支持本地执行模式

    • 支持,当数据量小的时候,本地执行比提交到hadoop集群执行效率提升很大(大批量小作业,尽量采用本地执行模式)
    • set hive.exec.mode.local.auto=true;(默认false)
    • 满足如下条件才能真正使用本地模式:
    • 1、job的数据数据大小必须小于参数hive.exec.mode.local.auto.inputbytes.max(默认值128M)
    • 2、job的map处理的文件数小于参数hive.exec.mode.local.auto.input.files.max(默认值4)
    • 3、job的reduce数必须小于等于1
    • 用参数hive.mapred.local.mem来设置local mode下mapper和reducer task jvm heap size(在本地模式的最大内存量,以字节为单位,0为不限制)
    hive> set hive.exec.mode.local.auto=true; --启用本地模式
    hive>  select name,age,INPUT__FILE__NAME 
        >  from default.student;             
    Automatically selecting local only mode for query
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there is no reduce operator
    Execution log at: /tmp/root/root_20150819122121_4a47f38a-3214-4990-a164-ce5b8a89b88e.log
    Job running in-process (local Hadoop)  --本地模式的Hadoop
    Hadoop job information for null: number of mappers: 0; number of reducers: 0  --0个mapper和0个reducer
    2015-08-19 12:21:51,828 null map = 100%,  reduce = 0%
    Ended Job = job_local531808121_0001 --从jobId也可以看出是Local_Job
    Execution completed successfully
    MapredLocal task succeeded  --本地MR任务完成
    OK
    name    age     input__file__name
    zhangsan        1       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    lisi    2       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    wangwu  3       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    zhaoba  4       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    liliuy  5       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    jiujiu  9       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    shiba   18      hdfs://hadoop01:9000/user/hive/warehouse/student/student
    xiyita  28      hdfs://hadoop01:9000/user/hive/warehouse/student/student
    NULL    NULL    hdfs://hadoop01:9000/user/hive/warehouse/student/student
    Time taken: 5.216 seconds, Fetched: 9 row(s)
    hive>  select name,age,INPUT__FILE__NAME  --采用分布式执行MR任务
        >  from default.student;            
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there is no reduce operator
    Starting Job = job_1438754688191_0137,                                   --分布式MR任务的JobId
    Tracking URL = http://hadoop01:8088/proxy/application_1438754688191_0137/
    Kill Command = /hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1438754688191_0137 --kill hadoop job的命令
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 --map任务1个,reduce任务0个
    2015-08-19 12:17:18,571 Stage-1 map = 0%,  reduce = 0%
    2015-08-19 12:17:22,757 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.55 sec
    MapReduce Total cumulative CPU time: 550 msec
    Ended Job = job_1438754688191_0137
    MapReduce Jobs Launched: 
    Job 0: Map: 1   Cumulative CPU: 0.55 sec                --1个map任务
    HDFS Read: 284 HDFS Write: 592                            --HDFS读取284字节,HDFS写出592字节
    SUCCESS Total MapReduce CPU Time Spent: 550 msec
    OK
    name    age     input__file__name
    zhangsan        1       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    lisi    2       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    wangwu  3       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    zhaoba  4       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    liliuy  5       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    jiujiu  9       hdfs://hadoop01:9000/user/hive/warehouse/student/student
    shiba   18      hdfs://hadoop01:9000/user/hive/warehouse/student/student
    xiyita  28      hdfs://hadoop01:9000/user/hive/warehouse/student/student
    NULL    NULL    hdfs://hadoop01:9000/user/hive/warehouse/student/student
    Time taken: 9.902 seconds, Fetched: 9 row(s)

    9.hive是否支持自动Map-Join?

    支持,设置hive.auto.convert.join=true,根据输入表的文件大小,自动将reduce端common join转为mapjoin,从而加快大表关联小表的速度。(建议开启

  • 相关阅读:
    Android:在eclipse中快速多行注释的方法
    DB2中若何除去SELECT输入的头信息
    哄骗DB2look重新设立建立优化器访谒操持(1)
    利用DB2look 从头创建优化器访问经营(2)
    运用DB2look重新建立优化器会晤企图(5)
    运用DB2look重新建立优化器访问摒挡(9)
    实例理睬IBM DB2的数据复制、迁移设置装备摆设
    DB2数据库优化的几条根底战略
    使用DB2look从头树立优化器拜候经营(6)
    哄骗DB2look重新创立优化器访谒企图(7)
  • 原文地址:https://www.cnblogs.com/skyl/p/4741913.html
Copyright © 2011-2022 走看看