本篇主要演示Sqoop1的导入以及增量导入,涉及到的命令是Sqoop import。
一 准备工作
1 下载并安装Mysql示例数据库
2 Sqoop1版本
[hadoop@strong ~]$ sqoop version
18/06/26 15:02:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
二 Sqoop1导入演示
1 查看import命令帮助
[hadoop@strong ~]$ sqoop help import
2 创建HDFS目录
[hadoop@strong ~]$ hdfs dfs -mkdir /user/sqoop1
[hadoop@strong ~]$ hdfs dfs -ls /user/
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2018-06-11 13:00 /user/hadoop
drwxrwxrwx - hadoop supergroup 0 2018-06-19 13:52 /user/hive
drwxr-xr-x - hadoop supergroup 0 2018-06-20 12:04 /user/hive1
drwxr-xr-x - hadoop supergroup 0 2018-06-26 14:54 /user/sqoop1
3 查看Mysql数据库信息
[hadoop@strong ~]$ sqoop list-databases --connect jdbc:mysql://strong.hadoop.com:3306 --username root --password root
18/06/26 15:02:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/06/26 15:02:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/06/26 15:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Tue Jun 26 15:02:05 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
mysql
information_schema
performance_schema
sys
hive
sakila
注:本篇演示用的数据库是sakila。
4 查看sakila下的表信息
[hadoop@strong ~]$ sqoop list-tables --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root
18/06/26 15:04:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/06/26 15:04:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/06/26 15:04:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Tue Jun 26 15:04:02 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
actor
actor_info
address
category
city
country
customer
customer_list
film
film_actor
film_category
film_list
film_text
inventory
language
nicer_but_slower_film_list
payment
rental
sales_by_film_category
sales_by_store
staff
staff_list
store
5 导入city表(未注明目的地)
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city
[hadoop@strong ~]$ hdfs dfs -ls /user/hadoop/city
Found 5 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 15:12 /user/hadoop/city/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 5690 2018-06-26 15:12 /user/hadoop/city/part-m-00000
-rw-r--r-- 1 hadoop supergroup 5704 2018-06-26 15:12 /user/hadoop/city/part-m-00001
-rw-r--r-- 1 hadoop supergroup 5733 2018-06-26 15:12 /user/hadoop/city/part-m-00002
-rw-r--r-- 1 hadoop supergroup 5830 2018-06-26 15:12 /user/hadoop/city/part-m-00003
[hadoop@strong ~]$ hdfs dfs -cat /user/hadoop/city/part-m-00000
A Corua (La Corua),1,87,2006-02-15 04:45:25.0
Abha,2,82,2006-02-15 04:45:25.0
Abu Dhabi,3,101,2006-02-15 04:45:25.0
Acua,4,60,2006-02-15 04:45:25.0
Adana,5,97,2006-02-15 04:45:25.0
Addis Abeba,6,31,2006-02-15 04:45:25.0
Aden,7,107,2006-02-15 04:45:25.0
Adoni,8,44,2006-02-15 04:45:25.0
Ahmadnagar,9,44,2006-02-15 04:45:25.0
---------------省略以下数据-----------------
注:默认以逗号分隔的文件。
6 导入city表(注明目的地)
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --warehouse-dir /user/sqoop1
[hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
Found 5 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 15:25 /user/sqoop1/city/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 5690 2018-06-26 15:25 /user/sqoop1/city/part-m-00000
-rw-r--r-- 1 hadoop supergroup 5704 2018-06-26 15:25 /user/sqoop1/city/part-m-00001
-rw-r--r-- 1 hadoop supergroup 5733 2018-06-26 15:25 /user/sqoop1/city/part-m-00002
-rw-r--r-- 1 hadoop supergroup 5830 2018-06-26 15:25 /user/sqoop1/city/part-m-00003
注:如果再次执行会出错,提示目录已存在,需删除原来的目录。
7 控制导入并行度
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1/city
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --warehouse-dir /user/sqoop1 -m 1
[hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 15:33 /user/sqoop1/city/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 22957 2018-06-26 15:33 /user/sqoop1/city/part-m-00000
注:并行度是多少就会生成对应的多少文件。
8 指定导入分隔符
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1/city
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --warehouse-dir /user/sqoop1 -m 1 --fields-terminated-by ' '
注:以制表符分隔。
[hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 15:42 /user/sqoop1/city/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 22957 2018-06-26 15:42 /user/sqoop1/city/part-m-00000
[hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/city/part-m-00000
A Corua (La Corua) 1 87 2006-02-15 04:45:25.0
Abha 2 82 2006-02-15 04:45:25.0
Abu Dhabi 3 101 2006-02-15 04:45:25.0
Acua 4 60 2006-02-15 04:45:25.0
Adana 5 97 2006-02-15 04:45:25.0
Addis Abeba 6 31 2006-02-15 04:45:25.0
Aden 7 107 2006-02-15 04:45:25.0
Adoni 8 44 2006-02-15 04:45:25.0
Ahmadnagar 9 44 2006-02-15 04:45:25.0
Akishima 10 50 2006-02-15 04:45:25.0
Akron 11 103 2006-02-15 04:45:25.0
-------------省略以下数据----------------------
9 导入部分数据
1)使用where导入部分数据
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1/city
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --where 'country_id>100' --warehouse-dir /user/sqoop1 -m 1
[hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/city
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 15:59 /user/sqoop1/city/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 2646 2018-06-26 15:59 /user/sqoop1/city/part-m-00000
2)使用query导入部分数据
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --query 'select *from city where country_id>100 and $CONDITIONS' --target-dir
/user/sqoop1 -m 1
3)导入部分列
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --query 'select city,country_id from city where country_id>100 and $CONDITIONS' --target-dir /user/sqoop1 -m 1
[hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/part-m-00000
Abu Dhabi,101
al-Ayn,101
Sharja,101
Bradford,102
Dundee,102
London,102
-----------以下数据省略------------------
4)使用columns导入部分数据
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table city --columns "city,country_id" --where 'country_id>100' --target-di
r /user/sqoop1 -m 1
10 增量导入
1)说明
增量导入需要使用三个参数,分别为:
- --check-column(col):用来检查该列是否作为增量数据,不能是字符类型;
- --incremental(mode):指定增量导入的模式,有两个值,分别为append和lastmodified;
- --last-value(value):指定上一次导入中检查列指定字段最大值。
2)使用append模式导入
--第一次导入
[hadoop@strong ~]$ hdfs dfs -rm -R /user/sqoop1
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table language --warehouse-dir /user/sqoop1 -m 1
--查看导入的数据
[hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/language/part-m-00000
1,2006-02-15 05:02:19.0,English
2,2006-02-15 05:02:19.0,Italian
3,2006-02-15 05:02:19.0,Japanese
4,2006-02-15 05:02:19.0,Mandarin
5,2006-02-15 05:02:19.0,French
6,2006-02-15 05:02:19.0,German
--第二次导入
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table language --warehouse-dir /user/sqoop1 -m 1 --check-column language_id
--incremental append --last-value 6
--查看导入的数据
[hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/language/
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 16:48 /user/sqoop1/language/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 192 2018-06-26 16:48 /user/sqoop1/language/part-m-00000
-rw-r--r-- 1 hadoop supergroup 69 2018-06-26 16:52 /user/sqoop1/language/part-m-00001
[hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/language/part-m-00001
7,2018-06-26 16:51:34.0,Chinese
8,2018-06-26 16:51:34.0,Guangdonghua
3)使用lastmodified模式导入
--Mysql下修改记录
mysql> update language set name='GD' where language_id=8;
Query OK, 1 row affected (0.14 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select *from language;
+-------------+----------+---------------------+
| language_id | name | last_update |
+-------------+----------+---------------------+
| 1 | English | 2006-02-15 05:02:19 |
| 2 | Italian | 2006-02-15 05:02:19 |
| 3 | Japanese | 2006-02-15 05:02:19 |
| 4 | Mandarin | 2006-02-15 05:02:19 |
| 5 | French | 2006-02-15 05:02:19 |
| 6 | German | 2006-02-15 05:02:19 |
| 7 | Chinese | 2018-06-26 16:51:34 |
| 8 | GD | 2018-06-26 17:54:54 |
+-------------+----------+---------------------+
8 rows in set (0.00 sec)
--执行增量导入,采用--merge-key方式
[hadoop@strong ~]$ sqoop import --connect jdbc:mysql://strong.hadoop.com:3306/sakila --username root --password root --table language --warehouse-dir /user/sqoop1 -m 1 --check-column last_update
--incremental lastmodified --last-value '2018-06-26 16:51:34.0' --merge-key language_id
--查看生成的文件
[hadoop@strong ~]$ hdfs dfs -ls /user/sqoop1/language/
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-06-26 18:02 /user/sqoop1/language/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 251 2018-06-26 18:02 /user/sqoop1/language/part-r-00000
注:part-r-00000变为r,表示执行了reduce任务。
--查看生成的数据
[hadoop@strong ~]$ hdfs dfs -cat /user/sqoop1/language/part-r-00000
1,2006-02-15 05:02:19.0,English
2,2006-02-15 05:02:19.0,Italian
3,2006-02-15 05:02:19.0,Japanese
4,2006-02-15 05:02:19.0,Mandarin
5,2006-02-15 05:02:19.0,French
6,2006-02-15 05:02:19.0,German
7,2018-06-26 16:51:34.0,Chinese
8,2018-06-26 17:54:54.0,GD
注:id为8的记录已被更改为GD,和MySQL中的操作一致。