zoukankan      html  css  js  c++  java
  • hive多分隔符支持

    1.问题描述

    如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:

      字段分隔符为“@#$”

    test1@#$test1name@#$test2value
    test2@#$test2name@#$test2value
    test3@#$test3name@#$test4value

    如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:

    字段名

    字段类型

    s1 string
    s2 string
    s3 string

    2.Hive多分隔符支持

    Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

    3.实现步骤

    1.准备多分隔符文件并装载到HDFS对应目录

    [ec2-user@ip-172-31-8-141  ~]$ cat multi_delimiter_test.dat
    
    test1@#$test1name@#$test2value
    
    test2@#$test2name@#$test2value
    
    test3@#$test3name@#$test4value  
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -ls /fayson/multi_delimiter_test
    
    DEPRECATED: Use of this  script to execute hdfs command is deprecated.
    
    Instead use the hdfs  command for it.
    
    
    Found 1 items
    
    -rw-r--r--   3 user_r supergroup         93 2017-08-23 03:24  /fayson/multi_delimiter_test/multi_delimiter_test.dat
    
    [ec2-user@ip-172-31-8-141  ~]$

    2.基于准备好的多分隔符文件建表

    create  external table multi_delimiter_test(
    
    s1 string,
    
    s2 string,
    
    s3 string)
    
    ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")
    
    stored as  textfile location '/fayson/multi_delimiter_test';

    3.测试

    >  select * from multi_delimiter_test;
    +--------------------------+--------------------------+--------------------------+--+
    |  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
    +--------------------------+--------------------------+--------------------------+--+
    | test1                    | test1name                | test2value               |
    | test2                    | test2name                | test2value               |
    | test3                    | test3name                | test4value               |
    +--------------------------+--------------------------+--------------------------+--+

    字段名

    字段类型

    s1

    String

    s2

    String

  • 相关阅读:
    【基于mini2440开发板的交叉编译环境及内核树配置.
    linux 模块编译步骤(原)
    鸟哥的linux私房菜
    ios消息机制
    初学者必学文档:Objective-C语法入门(1)
    oc基础知识
    ios 人魔七七
    Usaco Open09 Gold
    USACO JAN 2012 Bronze
    USACO·2012·Feb Bronze
  • 原文地址:https://www.cnblogs.com/LIAOBO/p/13752039.html
Copyright © 2011-2022 走看看