zoukankan      html  css  js  c++  java
  • Hive重写表数据丢失风险记录

    若在Hive中执行INSERT OVERWRITE重写同一个表的数据时,有可能会造成数据丢失。

    如 INSERT OVERWRITE TABLE table_name SELECT * FROM table_name


    一、新建一张分区表

    create table test_chj_cols (id string, name string, age string) partitioned by (ds string) stored as textfile;
    

    二、插入一条记录

    insert into test_chj_cols partition (ds='20181224') values ('1','chj','18');
    

    三、确认表数据及结构

    > select * from test_chj_cols;
    OK
    test_chj_cols.id        test_chj_cols.name      test_chj_cols.age       test_chj_cols.ds
    1       chj     18      20181224
     
    
    > desc formatted test_chj_cols partition (ds='20181224');
    OK
    col_name        data_type       comment
    # col_name              data_type               comment             
                     
    id                      string                                      
    name                    string                                      
    age                     string                                      
                     
    # Partition Information          
    # col_name              data_type               comment             
                     
    ds                      string                                      
                     
    # Detailed Partition Information                 
    Partition Value:        [20181224]               
    Database:               hduser05db               
    Table:                  test_chj_cols            
    CreateTime:             Mon Dec 24 19:35:28 CST 2018     
    LastAccessTime:         UNKNOWN                  
    Protect Mode:           None                     
    Location:               hdfs://bdphdp02/user/hive/warehouse/hduser05/hduser05db.db/test_chj_cols/ds=20181224     
    Partition Parameters:            
            COLUMN_STATS_ACCURATE   true                
            numFiles                1                   
            numRows                 1                   
            rawDataSize             8                   
            totalSize               17                  
            transient_lastDdlTime   1545651329          
                     
    # Storage Information            
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:             
            serialization.format    1                   
    Time taken: 0.099 seconds, Fetched: 37 row(s)
    

    四、在表中间新增字段

    alter table test_chj_cols replace columns (id string, name string, money string, age string);
     
    
    > desc formatted test_chj_cols;
    OK
    col_name        data_type       comment
    # col_name              data_type               comment             
                     
    id                      string                                      
    name                    string                                      
    money                   string                                      
    age                     string                                      
                     
    # Partition Information          
    # col_name              data_type               comment             
                     
    ds                      string                                      
                     
    # Detailed Table Information             
    Database:               hduser05db               
    Owner:                  hadoop                   
    CreateTime:             Mon Dec 24 19:34:46 CST 2018     
    LastAccessTime:         UNKNOWN                  
    Protect Mode:           None                     
    Retention:              0                        
    Location:               hdfs://bdphdp02/user/hive/warehouse/hduser05/hduser05db.db/test_chj_cols         
    Table Type:             MANAGED_TABLE            
    Table Parameters:                
            last_modified_by        hadoop              
            last_modified_time      1545651722          
            transient_lastDdlTime   1545651722          
                     
    # Storage Information            
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:             
            serialization.format    1                   
    Time taken: 0.051 seconds, Fetched: 36 row(s)
    

    五、重写数据

    insert overwrite table test_chj_cols partition (ds='20181224') select id,name,age,name from

    test_chj_cols;
    

    六、age字段数据丢失

    > select * from test_chj_cols;
    OK
    test_chj_cols.id        test_chj_cols.name      test_chj_cols.age       test_chj_cols.money     test_chj_cols.ds
    1       chj     NULL    NULL    20181224
    
  • 相关阅读:
    HTML5画图板PhoneGap移植
    简易WCF负载均衡方案
    坑爹的“类型初始值设定项引发异常”
    关于for循环中变量定义的位置
    有道云笔记open api sdk for .net α版
    SL/WPF仿WIN8进度条
    HTML5做个画图板
    Google搜索命令“site:”运算符高级使用技巧:
    ubuntu系统托盘不显示ibus输入法图标(语言栏输入法)的解决方法:
    DOS中如何删除文件夹~
  • 原文地址:https://www.cnblogs.com/qiaoyihang/p/10229511.html
Copyright © 2011-2022 走看看