zoukankan      html  css  js  c++  java
  • Hive面试题(3):如何用sqoop将hive中分区表的分区字段导入到MySQL中

    问题分析:

    1. hive中分区表其底层就是HDFS中的多个目录下的单个文件,hive导出数据本质是将HDFS中的文件导出
    2. hive中的分区表,因为分区字段(静态分区)不在文件中,所以在sqoop导出的时候,无法将分区字段进行直接导出

    思路:在hive中创建一个临时表,将分区表复制过去后分区字段转换为普通字段,然后再用sqoop将tmp表导出即实现需求
    步凑如下:

    1.创建目标表(分区表)

    hive> CREATE TABLE `dept_partition`(                     
    `deptno` int,                                    
    `dname` string,                                  
    `loc` string)                                    
    PARTITIONED BY (`month` string) row format delimited fields terminated by '	';

    1.1查看表结构

    hive> show create table dept_partition;
    +----------------------------------------------------+--+
    |                   createtab_stmt                   |
    +----------------------------------------------------+--+
    | CREATE TABLE `dept_partition`(                     |
    |   `deptno` int,                                    |
    |   `dname` string,                                  |
    |   `loc` string)                                    |
    | PARTITIONED BY (                                   |
    |   `month` string)

    2.导入数据

    hive> load data inpath '/user/hive/hive_db/data/dept.txt' into table dept_partition;
    10    ACCOUNTING    1700
    20    RESEARCH    1800
    30    SALES    1900
    40    OPERATIONS    1700

    3.查询表dept_partition

    hive> select * from dept_partition;
    +------------------------+-----------------------+---------------------+-----------------------+--+
    | dept_partition.deptno  | dept_partition.dname  | dept_partition.loc  | dept_partition.month  |
    +------------------------+-----------------------+---------------------+-----------------------+--+
    | 10                     | ACCOUNTING            | 1700                | 2019-10-19            |
    | 20                     | RESEARCH              | 1800                | 2019-10-19            |
    | 30                     | SALES                 | 1900                | 2019-10-19            |
    | 40                     | OPERATIONS            | 1700                | 2019-10-19            |
    | 10                     | ACCOUNTING            | 1700                | 2019-10-20            |
    | 20                     | RESEARCH              | 1800                | 2019-10-20            |
    | 30                     | SALES                 | 1900                | 2019-10-20            |
    | 40                     | OPERATIONS            | 1700                | 2019-10-20            |
    +------------------------+-----------------------+---------------------+-----------------------+--+

    4.创建临时表 tmp_dept_partition

    hive> create table tmp_dept_partition as select * from dept_partition;

    5.查询临时表

    hive> select * from tmp_dept_partition;
    +----------------------------+---------------------------+-------------------------+---------------------------+--+
    | tmp_dept_partition.deptno  | tmp_dept_partition.dname  | tmp_dept_partition.loc  | tmp_dept_partition.month  |
    +----------------------------+---------------------------+-------------------------+---------------------------+--+
    | 10                         | ACCOUNTING                | 1700                    | 2019-10-19                |
    | 20                         | RESEARCH                  | 1800                    | 2019-10-19                |
    | 30                         | SALES                     | 1900                    | 2019-10-19                |
    | 40                         | OPERATIONS                | 1700                    | 2019-10-19                |
    | 10                         | ACCOUNTING                | 1700                    | 2019-10-20                |
    | 20                         | RESEARCH                  | 1800                    | 2019-10-20                |
    | 30                         | SALES                     | 1900                    | 2019-10-20                |
    | 40                         | OPERATIONS                | 1700                    | 2019-10-20                |
    +----------------------------+---------------------------+-------------------------+---------------------------+--+

    6.查看表结构(这个时候分区表已经转换为非分区表了)

    hive> show create table tmp_dept_partition;
    +----------------------------------------------------+--+
    |                   createtab_stmt                   |
    +----------------------------------------------------+--+
    | CREATE TABLE `tmp_dept_partition`(                 |
    |   `deptno` int,                                    |
    |   `dname` string,                                  |
    |   `loc` string,                                    |
    |   `month` string)

    7.MySQL中建表 dept_partition

    mysql> drop table if exists dept_partition;
    create table dept_partition(
    `deptno` int,        
    `dname` varchar(20),      
    `loc` varchar(20),               
    `month` varchar(50))

    8.使用sqoop导入到MySQL

    bin/sqoop export 
    --connect jdbc:mysql://hadoop01:3306/partitionTb 
    --username root 
    --password 123456 
    --table dept_partition 
    --num-mappers 1 
    --export-dir /user/hive/warehouse/hive_db.db/tmp_dept_partition 
    --input-fields-terminated-by "01"

    9.Mysql查询验证是否成功导出

    mysql> select * from dept_partition;
    +--------+------------+------+------------+
    | deptno | dname      | loc  | month      |
    +--------+------------+------+------------+
    |     10 | ACCOUNTING | 1700 | 2019-10-19 |
    |     20 | RESEARCH   | 1800 | 2019-10-19 |
    |     30 | SALES      | 1900 | 2019-10-19 |
    |     40 | OPERATIONS | 1700 | 2019-10-19 |
    |     10 | ACCOUNTING | 1700 | 2019-10-20 |
    |     20 | RESEARCH   | 1800 | 2019-10-20 |
    |     30 | SALES      | 1900 | 2019-10-20 |
    |     40 | OPERATIONS | 1700 | 2019-10-20 |
    +--------+------------+------+------------+ 

    本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/14169511.html

  • 相关阅读:
    java.lang.OutOfMemoryError: Java heap space解决方法
    深入理解SQL的四种连接-左外连接、右外连接、内连接、全连接
    CSS中.和#区别
    斯坦福数据挖掘之LSH的应用
    N个元素的集合划分成互斥的两个子集的数目
    JDBC小结
    初识Java反射机制
    关于Java中重载的若干问题
    吐槽
    Tomcat
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14169511.html
Copyright © 2011-2022 走看看