zoukankan      html  css  js  c++  java
  • hive多分隔符支持

    1.问题描述

    如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:

      字段分隔符为“@#$”

    test1@#$test1name@#$test2value
    test2@#$test2name@#$test2value
    test3@#$test3name@#$test4value

    如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:

    字段名

    字段类型

    s1 string
    s2 string
    s3 string

    2.Hive多分隔符支持

    Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

    3.实现步骤

    1.准备多分隔符文件并装载到HDFS对应目录

    [ec2-user@ip-172-31-8-141  ~]$ cat multi_delimiter_test.dat
    
    test1@#$test1name@#$test2value
    
    test2@#$test2name@#$test2value
    
    test3@#$test3name@#$test4value  
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -ls /fayson/multi_delimiter_test
    
    DEPRECATED: Use of this  script to execute hdfs command is deprecated.
    
    Instead use the hdfs  command for it.
    
    
    Found 1 items
    
    -rw-r--r--   3 user_r supergroup         93 2017-08-23 03:24  /fayson/multi_delimiter_test/multi_delimiter_test.dat
    
    [ec2-user@ip-172-31-8-141  ~]$

    2.基于准备好的多分隔符文件建表

    create  external table multi_delimiter_test(
    
    s1 string,
    
    s2 string,
    
    s3 string)
    
    ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")
    
    stored as  textfile location '/fayson/multi_delimiter_test';

    3.测试

    >  select * from multi_delimiter_test;
    +--------------------------+--------------------------+--------------------------+--+
    |  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
    +--------------------------+--------------------------+--------------------------+--+
    | test1                    | test1name                | test2value               |
    | test2                    | test2name                | test2value               |
    | test3                    | test3name                | test4value               |
    +--------------------------+--------------------------+--------------------------+--+

    字段名

    字段类型

    s1

    String

    s2

    String

  • 相关阅读:
    串行通讯FIFO法和中断处理程序中直接处理的比较
    [FSM]状态机入门——程咬金只要三斧头厉害
    [FSM]状态机平面(任务平面)
    KMP算法
    【对线面试官】 Java 泛型
    【对线面试官】Java注解
    古文明中的经典作品
    《我是猫》总结
    Vue快速使用
    《算法帝国》总结
  • 原文地址:https://www.cnblogs.com/LIAOBO/p/13752039.html
Copyright © 2011-2022 走看看