zoukankan      html  css  js  c++  java
  • 【Sqoop1】Sqoop1实战之导入Sqoop import

    本篇主要演示Sqoop1的导入以及增量导入,涉及到的命令是Sqoop import。

    一 准备工作

    1 下载并安装Mysql示例数据库


    2 Sqoop1版本
    [hadoop@strong ~]$ sqoop version
    18/06/26 15:02:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    Sqoop 1.4.7
    git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
    Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
    二 Sqoop1导入演示

    1 查看import命令帮助
    [hadoop@strong ~]$ sqoop help import
    2 创建HDFS目录
    [hadoop@strong ~]$ hdfs dfs -mkdir /user/sqoop1
    [hadoop@strong ~]$ hdfs dfs -ls /user/
    Found 4 items
    drwxr-xr-x   - hadoop supergroup          0 2018-06-11 13:00 /user/hadoop
    drwxrwxrwx   - hadoop supergroup          0 2018-06-19 13:52 /user/hive
    drwxr-xr-x   - hadoop supergroup          0 2018-06-20 12:04 /user/hive1
    drwxr-xr-x   - hadoop supergroup          0 2018-06-26 14:54 /user/sqoop1
    3 查看Mysql数据库信息
    [hadoop@strong ~]$ sqoop list-databases --connect jdbc:mysql://strong.hadoop.com:3306 --username root --password root
    18/06/26 15:02:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    18/06/26 15:02:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    18/06/26 15:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    Tue Jun 26 15:02:05 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    mysql
    information_schema
    performance_schema
    sys
    hive
    sakila
    注:本篇演示用的数据库是sakila。

    4 查看sakila下的表信息
    [hadoop@strong ~]$ sqoop list-tables --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root
    18/06/26 15:04:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    18/06/26 15:04:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    18/06/26 15:04:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    Tue Jun 26 15:04:02 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    actor
    actor_info
    address
    category
    city
    country
    customer
    customer_list
    film
    film_actor
    film_category
    film_list
    film_text
    inventory
    language
    nicer_but_slower_film_list
    payment
    rental
    sales_by_film_category
    sales_by_store
    staff
    staff_list
    store
    5 导入city表(未注明目的地)
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city
    
    [hadoop@strong ~]$ hdfs dfs -ls /user/hadoop/city
    Found 5 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 15:12 /user/hadoop/city/_SUCCESS
    -rw-r--r--   1 hadoop supergroup       5690 2018-06-26 15:12 /user/hadoop/city/part-m-00000
    -rw-r--r--   1 hadoop supergroup       5704 2018-06-26 15:12 /user/hadoop/city/part-m-00001
    -rw-r--r--   1 hadoop supergroup       5733 2018-06-26 15:12 /user/hadoop/city/part-m-00002
    -rw-r--r--   1 hadoop supergroup       5830 2018-06-26 15:12 /user/hadoop/city/part-m-00003
    [hadoop@strong ~]$ hdfs dfs -cat /user/hadoop/city/part-m-00000
    A Corua (La Corua),1,87,2006-02-15 04:45:25.0
    Abha,2,82,2006-02-15 04:45:25.0
    Abu Dhabi,3,101,2006-02-15 04:45:25.0
    Acua,4,60,2006-02-15 04:45:25.0
    Adana,5,97,2006-02-15 04:45:25.0
    Addis Abeba,6,31,2006-02-15 04:45:25.0
    Aden,7,107,2006-02-15 04:45:25.0
    Adoni,8,44,2006-02-15 04:45:25.0
    Ahmadnagar,9,44,2006-02-15 04:45:25.0
    ---------------省略以下数据-----------------
    注:默认以逗号分隔的文件。

    6 导入city表(注明目的地)
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --warehouse-dir /user/sqoop1
    [hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
    Found 5 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 15:25 /user/sqoop1/city/_SUCCESS
    -rw-r--r--   1 hadoop supergroup       5690 2018-06-26 15:25 /user/sqoop1/city/part-m-00000
    -rw-r--r--   1 hadoop supergroup       5704 2018-06-26 15:25 /user/sqoop1/city/part-m-00001
    -rw-r--r--   1 hadoop supergroup       5733 2018-06-26 15:25 /user/sqoop1/city/part-m-00002
    -rw-r--r--   1 hadoop supergroup       5830 2018-06-26 15:25 /user/sqoop1/city/part-m-00003
    注:如果再次执行会出错,提示目录已存在,需删除原来的目录。

    7 控制导入并行度
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1/city
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --warehouse-dir /user/sqoop1 -m 1
    [hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 15:33 /user/sqoop1/city/_SUCCESS
    -rw-r--r--   1 hadoop supergroup      22957 2018-06-26 15:33 /user/sqoop1/city/part-m-00000
    注:并行度是多少就会生成对应的多少文件。

    8 指定导入分隔符
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1/city
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --warehouse-dir /user/sqoop1 -m 1 --fields-terminated-by '	'
    注:以制表符分隔。
    [hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 15:42 /user/sqoop1/city/_SUCCESS
    -rw-r--r--   1 hadoop supergroup      22957 2018-06-26 15:42 /user/sqoop1/city/part-m-00000
    [hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/city/part-m-00000
    A Corua (La Corua)	1	87	2006-02-15 04:45:25.0
    Abha	2	82	2006-02-15 04:45:25.0
    Abu Dhabi	3	101	2006-02-15 04:45:25.0
    Acua	4	60	2006-02-15 04:45:25.0
    Adana	5	97	2006-02-15 04:45:25.0
    Addis Abeba	6	31	2006-02-15 04:45:25.0
    Aden	7	107	2006-02-15 04:45:25.0
    Adoni	8	44	2006-02-15 04:45:25.0
    Ahmadnagar	9	44	2006-02-15 04:45:25.0
    Akishima	10	50	2006-02-15 04:45:25.0
    Akron	11	103	2006-02-15 04:45:25.0
    -------------省略以下数据----------------------
    9 导入部分数据

    1)使用where导入部分数据
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1/city
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --where 'country_id>100' --warehouse-dir /user/sqoop1 -m 1
    [hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 15:59 /user/sqoop1/city/_SUCCESS
    -rw-r--r--   1 hadoop supergroup       2646 2018-06-26 15:59 /user/sqoop1/city/part-m-00000
    2)使用query导入部分数据
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --query 'select *from city where country_id>100 and $CONDITIONS' --target-dir 
    /user/sqoop1 -m 1 
    3)导入部分列
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --query 'select city,country_id from city where country_id>100 and $CONDITIONS' --target-dir /user/sqoop1 -m 1
    [hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/part-m-00000
    Abu Dhabi,101
    al-Ayn,101
    Sharja,101
    Bradford,102
    Dundee,102
    London,102
    -----------以下数据省略------------------
    4)使用columns导入部分数据
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --columns "city,country_id" --where 'country_id>100'  --target-di
    r /user/sqoop1 -m 1 
    10 增量导入

    1)说明

    增量导入需要使用三个参数,分别为:
    • --check-column(col):用来检查该列是否作为增量数据,不能是字符类型;
    • --incremental(mode):指定增量导入的模式,有两个值,分别为append和lastmodified;
    • --last-value(value):指定上一次导入中检查列指定字段最大值。
    2)使用append模式导入
    --第一次导入
    [hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table language --warehouse-dir /user/sqoop1 -m 1
    --查看导入的数据
    [hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/language/part-m-00000
    1,2006-02-15 05:02:19.0,English
    2,2006-02-15 05:02:19.0,Italian
    3,2006-02-15 05:02:19.0,Japanese
    4,2006-02-15 05:02:19.0,Mandarin
    5,2006-02-15 05:02:19.0,French
    6,2006-02-15 05:02:19.0,German
    --第二次导入
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table language --warehouse-dir /user/sqoop1 -m 1 --check-column language_id 
    --incremental append --last-value 6
    --查看导入的数据
    [hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/language/
    Found 3 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 16:48 /user/sqoop1/language/_SUCCESS
    -rw-r--r--   1 hadoop supergroup        192 2018-06-26 16:48 /user/sqoop1/language/part-m-00000
    -rw-r--r--   1 hadoop supergroup         69 2018-06-26 16:52 /user/sqoop1/language/part-m-00001
    [hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/language/part-m-00001
    7,2018-06-26 16:51:34.0,Chinese
    8,2018-06-26 16:51:34.0,Guangdonghua
    3)使用lastmodified模式导入
    --Mysql下修改记录
    mysql> update language set name='GD' where language_id=8;
    Query OK, 1 row affected (0.14 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    mysql> select *from language;
    +-------------+----------+---------------------+
    | language_id | name     | last_update         |
    +-------------+----------+---------------------+
    |           1 | English  | 2006-02-15 05:02:19 |
    |           2 | Italian  | 2006-02-15 05:02:19 |
    |           3 | Japanese | 2006-02-15 05:02:19 |
    |           4 | Mandarin | 2006-02-15 05:02:19 |
    |           5 | French   | 2006-02-15 05:02:19 |
    |           6 | German   | 2006-02-15 05:02:19 |
    |           7 | Chinese  | 2018-06-26 16:51:34 |
    |           8 | GD       | 2018-06-26 17:54:54 |
    +-------------+----------+---------------------+
    8 rows in set (0.00 sec)
    --执行增量导入,采用--merge-key方式
    [hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table language --warehouse-dir /user/sqoop1 -m 1 --check-column last_update 
    --incremental lastmodified --last-value '2018-06-26 16:51:34.0' --merge-key language_id
    --查看生成的文件
    [hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/language/
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2018-06-26 18:02 /user/sqoop1/language/_SUCCESS
    -rw-r--r--   1 hadoop supergroup        251 2018-06-26 18:02 /user/sqoop1/language/part-r-00000
    注:part-r-00000变为r,表示执行了reduce任务。
    --查看生成的数据
    [hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/language/part-r-00000
    1,2006-02-15 05:02:19.0,English
    2,2006-02-15 05:02:19.0,Italian
    3,2006-02-15 05:02:19.0,Japanese
    4,2006-02-15 05:02:19.0,Mandarin
    5,2006-02-15 05:02:19.0,French
    6,2006-02-15 05:02:19.0,German
    7,2018-06-26 16:51:34.0,Chinese
    8,2018-06-26 17:54:54.0,GD
    注:id为8的记录已被更改为GD,和MySQL中的操作一致。

  • 相关阅读:
    二叉排序树
    C# 大端与小端
    【转】C#socket通信
    【转】Github 搜索技巧,快速找到好资源
    web api 跨域请求,ajax跨域调用webapi
    【转】Linux简介及最常用命令
    【转】带你吃透RTMP
    09-vuex基本应用之计数demo
    08-配置vue路由的步骤
    02-原型与原型链
  • 原文地址:https://www.cnblogs.com/alen-liu-sz/p/12975633.html
Copyright © 2011-2022 走看看