Hadoop生态圈-Sqoop部署以及基本使用方法
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
Sqoop(发音:skup)是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
Sqoop项目开始于2009年,最早是作为Hadoop的一个第三方模块存在,后来为了让使用者能够快速部署,也为了让开发人员能够更快速的迭代开发,Sqoop独立成为一个Apache项目。详情请参考:http://sqoop.apache.org/)
注意,本篇博客部署方式是建立在高可用集群的基础上部署的Sqoop,关于高可用集群部署请参考:https://www.cnblogs.com/yinzhengjie/p/9154265.html。
一.部署Sqoop工具
1>.下载Sqoop软件(下载地址:http://mirrors.hust.edu.cn/apache/sqoop/1.4.7/,建议下载最新版本,截止2018-06-14时,最新版本为1.4.7。)
2>.解压并创建符号链接
[yinzhengjie@s101 data]$ tar zxf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /soft/
[yinzhengjie@s101 data]$ ln -s /soft/sqoop-1.4.7.bin__hadoop-2.6.0/ /soft/sqoop
[yinzhengjie@s101 data]$
3>.配置环境变量并使之生效
[yinzhengjie@s101 ~]$ sudo vi /etc/profile
[sudo] password for yinzhengjie:
[yinzhengjie@s101 ~]$ tail -3 /etc/profile
#ADD SQOOP
SQOOP_HOME=/soft/sqoop
PATH=$PATH:$SQOOP_HOME/bin
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ source /etc/profile
[yinzhengjie@s101 ~]$
4>.创建sqoop-env.sh配置文件
[yinzhengjie@s101 ~]$ cp /soft/sqoop/conf/sqoop-env-template.sh /soft/sqoop/conf/sqoop-env.sh [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ more /soft/sqoop/conf/sqoop-env.sh | grep -v ^# | grep -v ^$ export HADOOP_COMMON_HOME=/soft/hadoop export HADOOP_MAPRED_HOME=/soft/hadoop export HBASE_HOME=/soft/hbase export HIVE_HOME=/soft/hive export ZOOCFGDIR=/soft/zk/conf [yinzhengjie@s101 ~]$
5>.将mysql驱动放置在sqoop/lib下
[yinzhengjie@s101 ~]$ cp /soft/hive/lib/mysql-connector-java-5.1.41.jar /soft/sqoop/lib/ [yinzhengjie@s101 ~]$
6>.sqoop version验证安装
[yinzhengjie@s101 ~]$ sqoop version Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 00:30:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Sqoop 1.4.7 git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017 [yinzhengjie@s101 ~]$
二.基本使用
1>.使用sqoop命令行链接MySQL数据库
[yinzhengjie@s101 ~]$ sqoop list-databases --connect jdbc:mysql://s101 --username root -P Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 00:33:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 18/06/14 00:33:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] information_schema hive mysql performance_schema [yinzhengjie@s101 ~]$
2>.sqoop查看帮助
[yinzhengjie@s101 ~]$ sqoop help Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 01:50:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command. [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ sqoop import --help Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 01:51:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file <properties-file> Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <hdir> Override $HADOOP_MAPR ED_HOME_ARG --hadoop-mapred-home <dir> Override $HADOOP_MAPR ED_HOME_ARG --help Print usage instructions --metadata-transaction-isolation-level <isolationlevel> Defines the transaction isolation level for metadata queries. For more details check java.sql.Con nection javadoc or the JDBC specificaito n --oracle-escaping-disabled <boolean> Disable the escaping mechanism of the Oracle/OraOo p connection managers -P Read password from console --password <password> Set authenticati on password --password-alias <password-alias> Credential provider password alias --password-file <password-file> Set authenticati on password file path --relaxed-isolation Use read-uncommi tted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --temporary-rootdir <rootdir> Defines the temporary root directory for the import --throw-on-error Rethrow a RuntimeExcep tion on error occurred during the job --username <username> Set authenticati on username --verbose Print more information while working Import control arguments: --append Imports data in append mode --as-avrodatafile Imports data to Avro data files --as-parquetfile Imports data to Parquet files --as-sequencefile Imports data to SequenceFile s --as-textfile Imports data as plain text (default) --autoreset-to-one-mapper Reset the number of mappers to one mapper if no split key available --boundary-query <statement> Set boundary query for retrieving max and min value of the primary key --columns <col,col,col...> Columns to import from table --compression-codec <codec> Compression codec to use for import --delete-target-dir Imports data in delete mode --direct Use direct import fast path --direct-split-size <n> Split the input stream every 'n' bytes when importing in direct mode -e,--query <statement> Import results of SQL 'statement' --fetch-size <n> Set number 'n' of rows to fetch from the database when more rows are needed --inline-lob-limit <n> Set the maximum size for an inline LOB -m,--num-mappers <n> Use 'n' map tasks to import in parallel --mapreduce-job-name <name> Set name for generated mapreduce job --merge-key <column> Key column to use to join results --split-by <column-name> Column of the table used to split work units --split-limit <size> Upper Limit of rows per split for split columns of Date/Time/Ti mestamp and integer types. For date or timestamp fields it is calculated in seconds. split-limit should be greater than 0 --table <table-name> Table to read --target-dir <dir> HDFS plain table destination --validate Validate the copy using the configured validator --validation-failurehandler <validation-failurehandler> Fully qualified class name for ValidationFa ilureHandler --validation-threshold <validation-threshold> Fully qualified class name for ValidationTh reshold --validator <validator> Fully qualified class name for the Validator --warehouse-dir <dir> HDFS parent for table destination --where <where clause> WHERE clause to use during import -z,--compress Enable compression Incremental import arguments: --check-column <column> Source column to check for incremental change --incremental <import-type> Define an incremental import of type 'append' or 'lastmodified' --last-value <value> Last imported value in the incremental check column Output line formatting arguments: --enclosed-by <char> Sets a required field enclosing character --escaped-by <char> Sets the escape character --fields-terminated-by <char> Sets the field separator character --lines-terminated-by <char> Sets the end-of-line character --mysql-delimiters Uses MySQL's default delimiter set: fields: , lines: escaped-by: optionally-enclosed-by: ' --optionally-enclosed-by <char> Sets a field enclosing character Input parsing arguments: --input-enclosed-by <char> Sets a required field encloser --input-escaped-by <char> Sets the input escape character --input-fields-terminated-by <char> Sets the input field separator --input-lines-terminated-by <char> Sets the input end-of-line char --input-optionally-enclosed-by <char> Sets a field enclosing character Hive arguments: --create-hive-table Fail if the target hive table exists --external-table-dir <hdfs path> Sets where the external table is in HDFS --hive-database <database-name> Sets the database name to use when importing to hive --hive-delims-replacement <arg> Replace Hive record 0x01 and row delimiters ( ) from imported string fields with user-defined string --hive-drop-import-delims Drop Hive record 0x01 and row delimiters ( ) from imported string fields --hive-home <dir> Override $HIVE_HOME --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to hive --hive-table <table-name> Sets the table name to use when importing to hive --map-column-hive <arg> Override mapping for specific column to hive types. HBase arguments: --column-family <family> Sets the target column family for the import --hbase-bulkload Enables HBase bulk loading --hbase-create-table If specified, create missing HBase tables --hbase-row-key <col> Specifies which input column to use as the row key --hbase-table <table> Import to <table> in HBase HCatalog arguments: --hcatalog-database <arg> HCatalog database name --hcatalog-home <hdir> Override $HCAT_HOME --hcatalog-partition-keys <partition-key> Sets the partition keys to use when importing to hive --hcatalog-partition-values <partition-value> Sets the partition values to use when importing to hive --hcatalog-table <arg> HCatalog table name --hive-home <dir> Override $HIVE_HOME --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to hive --map-column-hive <arg> Override mapping for specific column to hive types. HCatalog import specific options: --create-hcatalog-table Create HCatalog before import --drop-and-create-hcatalog-table Drop and Create HCatalog before import --hcatalog-storage-stanza <arg> HCatalog storage stanza for table creation Accumulo arguments: --accumulo-batch-size <size> Batch size in bytes --accumulo-column-family <family> Sets the target column family for the import --accumulo-create-table If specified, create missing Accumulo tables --accumulo-instance <instance> Accumulo instance name. --accumulo-max-latency <latency> Max write latency in milliseconds --accumulo-password <password> Accumulo password. --accumulo-row-key <col> Specifies which input column to use as the row key --accumulo-table <table> Import to <table> in Accumulo --accumulo-user <user> Accumulo user name. --accumulo-visibility <vis> Visibility token to be applied to all rows imported --accumulo-zookeepers <zookeepers> Comma-separated list of zookeepers (host:port) Code generation arguments: --bindir <dir> Output directory for compiled objects --class-name <name> Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --escape-mapping-column-names <boolean> Disable special characters escaping in column names --input-null-non-string <null-str> Input null non-string representation --input-null-string <null-str> Input null string representation --jar-file <file> Disable code generation; use specified jar --map-column-java <arg> Override mapping for specific columns to java types --null-non-string <null-str> Null non-string representation --null-string <null-str> Null string representation --outdir <dir> Output directory for generated code --package-name <name> Put auto-generated classes in this package Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] At minimum, you must specify --connect and --table Arguments to mysqldump and other subprograms may be supplied after a '--' on the command line. [yinzhengjie@s101 ~]$
3>.sqoop列出表
[yinzhengjie@s101 ~]$ sqoop list-tables --connect jdbc:mysql://s101/yinzhengjie --username root -P Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 01:56:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 18/06/14 01:56:23 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Classmate word [yinzhengjie@s101 ~]$
4>.Sqoop列出数据库
[yinzhengjie@s101 ~]$ sqoop list-databases --connect jdbc:mysql://s101 --username root -P Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 02:05:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 18/06/14 02:05:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] information_schema hive mysql performance_schema yinzhengjie [yinzhengjie@s101 ~]$
三.Sqoop将数据导入HDFS(需要启动hdfs,yarn,MySQL等相关服务)
1>.在数据库进行授权操作
mysql> grant all PRIVILEGES on *.* to root@'s101' identified by 'yinzhengjie'; Query OK, 0 rows affected (0.31 sec) mysql> grant all PRIVILEGES on *.* to root@'s102' identified by 'yinzhengjie'; Query OK, 0 rows affected (0.02 sec) mysql> grant all PRIVILEGES on *.* to root@'s103' identified by 'yinzhengjie'; Query OK, 0 rows affected (0.00 sec) mysql> grant all PRIVILEGES on *.* to root@'s104' identified by 'yinzhengjie'; Query OK, 0 rows affected (0.00 sec) mysql> grant all PRIVILEGES on *.* to root@'s105' identified by 'yinzhengjie'; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.02 sec) mysql>
2>.将数据库的数据导入到hdfs中
[yinzhengjie@s101 ~]$ sqoop import --connect jdbc:mysql://s101/yinzhengjie --username root -P --table word --fields-terminated-by ' ' --target-dir /wc -m 1 Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 02:16:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 18/06/14 02:16:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 18/06/14 02:16:03 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 18/06/14 02:16:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `word` AS t LIMIT 1 18/06/14 02:16:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `word` AS t LIMIT 1 18/06/14 02:16:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/hadoop Note: /tmp/sqoop-yinzhengjie/compile/506dbf41a3a9165eebe93e9d2ec30818/word.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 18/06/14 02:16:05 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-yinzhengjie/compile/506dbf41a3a9165eebe93e9d2ec30818/word.jar 18/06/14 02:16:05 WARN manager.MySQLManager: It looks like you are importing from mysql. 18/06/14 02:16:05 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 18/06/14 02:16:05 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 18/06/14 02:16:05 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 18/06/14 02:16:05 INFO mapreduce.ImportJobBase: Beginning import of word 18/06/14 02:16:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 18/06/14 02:16:06 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 18/06/14 02:16:14 INFO db.DBInputFormat: Using read commited transaction isolation 18/06/14 02:16:14 INFO mapreduce.JobSubmitter: number of splits:1 18/06/14 02:16:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528967628934_0002 18/06/14 02:16:15 INFO impl.YarnClientImpl: Submitted application application_1528967628934_0002 18/06/14 02:16:15 INFO mapreduce.Job: The url to track the job: http://s101:8088/proxy/application_1528967628934_0002/ 18/06/14 02:16:15 INFO mapreduce.Job: Running job: job_1528967628934_0002 18/06/14 02:16:22 INFO mapreduce.Job: Job job_1528967628934_0002 running in uber mode : false 18/06/14 02:16:22 INFO mapreduce.Job: map 0% reduce 0% 18/06/14 02:16:27 INFO mapreduce.Job: Task Id : attempt_1528967628934_0002_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: null, message from server: "Host 's105' is not allowed to connect to this MySQL server" at org.apache.sqoop.mapreduce.db.DBInputFormat.setDbConf(DBInputFormat.java:170) at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:161) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: java.sql.SQLException: null, message from server: "Host 's105' is not allowed to connect to this MySQL server" at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:223) at org.apache.sqoop.mapreduce.db.DBInputFormat.setDbConf(DBInputFormat.java:168) ... 10 more Caused by: java.sql.SQLException: null, message from server: "Host 's105' is not allowed to connect to this MySQL server" at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1040) at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2205) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2236) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2035) at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:790) at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:400) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:330) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302) at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:216) ... 11 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 18/06/14 02:16:35 INFO mapreduce.Job: map 100% reduce 0% 18/06/14 02:16:36 INFO mapreduce.Job: Job job_1528967628934_0002 completed successfully 18/06/14 02:16:36 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=140325 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=74 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Failed map tasks=1 Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=8181 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8181 Total vcore-milliseconds taken by all map tasks=8181 Total megabyte-milliseconds taken by all map tasks=8377344 Map-Reduce Framework Map input records=4 Map output records=4 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=80 CPU time spent (ms)=1120 Physical memory (bytes) snapshot=104509440 Virtual memory (bytes) snapshot=2086359040 Total committed heap usage (bytes)=19701760 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=74 18/06/14 02:16:36 INFO mapreduce.ImportJobBase: Transferred 74 bytes in 29.3085 seconds (2.5249 bytes/sec) 18/06/14 02:16:36 INFO mapreduce.ImportJobBase: Retrieved 4 records. [yinzhengjie@s101 ~]$ hdfs dfs -cat /wc/part-m-00000 1 hello world 2 yinzhengjie hadoop 2 yinzhengjie hive 2 yinzhengjie hbase [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ hdfs dfs -cat /wc/part-m-00000 1 hello world 2 yinzhengjie hadoop 2 yinzhengjie hive 2 yinzhengjie hbase [yinzhengjie@s101 ~]$
3>.在hdfs的WebUI中查看数据
4>.其他参数介绍
--table //指定导入mysql表 -m //mapper数量 --target-dir //指定导入hdfs的目录 --fields-terminated-by //指定列分隔符 --lines-terminated-by //指定行分隔符 --append //追加 --as-avrodatafile //设置文件格式为avrodatafile --as-parquetfile · //设置文件格式为parquetfile --as-sequencefile //设置文件格式为sequencefile --as-textfile //设置文件格式为textfile --columns <col,col,col...> //指定导入的mysql列 --compression-codec <codec> //制定压缩
四.sqoop导入mysql数据到hive(需要启动hdfs,yarn,MySQL等相关服务,hive不需要手动启动,因为导入的时候它自己会自行启动)
1>.修改sqoop-env.sh
[yinzhengjie@s101 ~]$ tail -2 /soft/sqoop/conf/sqoop-env.sh #ADD BY YINZHENGJIE export HIVE_CONF_DIR=/soft/hive/conf [yinzhengjie@s101 ~]$
2>.编辑环境变量
[yinzhengjie@s101 ~]$ sudo vi /etc/profile [sudo] password for yinzhengjie: [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ tail -2 /etc/profile #ADD sqool import hive export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/* [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ source /etc/profile [yinzhengjie@s101 ~]$
3>.关闭安全方面的异常信息(不修改也不会影响测试结果)
4>.导入数据到hive中
0: jdbc:hive2://s101:10000> show tables; +---------------+--+ | tab_name | +---------------+--+ | pv | | user_orc | | user_parquet | | user_rc | | user_seq | | user_text | | users | +---------------+--+ 7 rows selected (0.061 seconds) 0: jdbc:hive2://s101:10000>
[yinzhengjie@s101 ~]$ sqoop import --connect jdbc:mysql://s101/yinzhengjie --username root -P --table word --fields-terminated-by ' ' --hive-import --create-hive-table --hive-database yinzhengjie --hive-table wc -m 1 Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /soft/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 18/06/14 03:00:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 18/06/14 03:00:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 18/06/14 03:00:39 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 18/06/14 03:00:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `word` AS t LIMIT 1 18/06/14 03:00:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `word` AS t LIMIT 1 18/06/14 03:00:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/hadoop Note: /tmp/sqoop-yinzhengjie/compile/a904d79d3e86841540489a5459400e8b/word.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 18/06/14 03:00:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-yinzhengjie/compile/a904d79d3e86841540489a5459400e8b/word.jar 18/06/14 03:00:43 WARN manager.MySQLManager: It looks like you are importing from mysql. 18/06/14 03:00:43 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 18/06/14 03:00:43 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 18/06/14 03:00:43 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 18/06/14 03:00:43 INFO mapreduce.ImportJobBase: Beginning import of word 18/06/14 03:00:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 18/06/14 03:00:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 18/06/14 03:00:56 INFO db.DBInputFormat: Using read commited transaction isolation 18/06/14 03:00:57 INFO mapreduce.JobSubmitter: number of splits:1 18/06/14 03:00:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528967628934_0005 18/06/14 03:00:59 INFO impl.YarnClientImpl: Submitted application application_1528967628934_0005 18/06/14 03:00:59 INFO mapreduce.Job: The url to track the job: http://s101:8088/proxy/application_1528967628934_0005/ 18/06/14 03:00:59 INFO mapreduce.Job: Running job: job_1528967628934_0005 18/06/14 03:01:18 INFO mapreduce.Job: Job job_1528967628934_0005 running in uber mode : false 18/06/14 03:01:18 INFO mapreduce.Job: map 0% reduce 0% 18/06/14 03:01:41 INFO mapreduce.Job: map 100% reduce 0% 18/06/14 03:01:42 INFO mapreduce.Job: Job job_1528967628934_0005 completed successfully 18/06/14 03:01:43 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=140344 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=74 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=19876 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=19876 Total vcore-milliseconds taken by all map tasks=19876 Total megabyte-milliseconds taken by all map tasks=20353024 Map-Reduce Framework Map input records=4 Map output records=4 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=82 CPU time spent (ms)=1120 Physical memory (bytes) snapshot=89550848 Virtual memory (bytes) snapshot=2086518784 Total committed heap usage (bytes)=18808832 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=74 18/06/14 03:01:43 INFO mapreduce.ImportJobBase: Transferred 74 bytes in 58.4206 seconds (1.2667 bytes/sec) 18/06/14 03:01:43 INFO mapreduce.ImportJobBase: Retrieved 4 records. 18/06/14 03:01:43 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table word 18/06/14 03:01:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `word` AS t LIMIT 1 18/06/14 03:01:44 INFO hive.HiveImport: Loading uploaded data into Hive 18/06/14 03:01:45 INFO conf.HiveConf: Found configuration file file:/soft/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true 18/06/14 03:01:48 INFO SessionState: Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true 18/06/14 03:01:51 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 18/06/14 03:01:57 INFO metastore.ObjectStore: ObjectStore, initialize called 18/06/14 03:01:57 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 18/06/14 03:01:57 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 18/06/14 03:02:00 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 18/06/14 03:02:04 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 18/06/14 03:02:04 INFO metastore.ObjectStore: Initialized ObjectStore 18/06/14 03:02:05 INFO metastore.HiveMetaStore: Added admin role in metastore 18/06/14 03:02:05 INFO metastore.HiveMetaStore: Added public role in metastore 18/06/14 03:02:05 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 18/06/14 03:02:05 INFO metastore.HiveMetaStore: 0: get_all_functions 18/06/14 03:02:05 INFO HiveMetaStore.audit: ugi=yinzhengjie ip=unknown-ip-addr cmd=get_all_functions 18/06/14 03:02:05 INFO metadata.Hive: Registering function parsejson cn.org.yinzhengjie.udf.ParseJson 18/06/14 03:02:06 WARN metadata.Hive: Failed to register persistent function parsejson:cn.org.yinzhengjie.udf.ParseJson. Ignore and continue. 18/06/14 03:02:06 INFO metadata.Hive: Registering function parsejson cn.org.yinzhengjie.udf.MyUDTF 18/06/14 03:02:06 WARN metadata.Hive: Failed to register persistent function parsejson:cn.org.yinzhengjie.udf.MyUDTF. Ignore and continue. 18/06/14 03:02:06 INFO metadata.Hive: Registering function todate cn.org.yinzhengjie.udf.MyUDTF 18/06/14 03:02:06 WARN metadata.Hive: Failed to register persistent function todate:cn.org.yinzhengjie.udf.MyUDTF. Ignore and continue. 18/06/14 03:02:06 INFO session.SessionState: Created HDFS directory: /tmp/hive/yinzhengjie/ab78aeaa-274a-4ed6-bff0-ffa488a2c8df 18/06/14 03:02:07 INFO session.SessionState: Created local directory: /home/yinzhengjie/yinzhengjie/ab78aeaa-274a-4ed6-bff0-ffa488a2c8df 18/06/14 03:02:07 INFO session.SessionState: Created HDFS directory: /tmp/hive/yinzhengjie/ab78aeaa-274a-4ed6-bff0-ffa488a2c8df/_tmp_space.db 18/06/14 03:02:07 INFO conf.HiveConf: Using the default value passed in for log id: ab78aeaa-274a-4ed6-bff0-ffa488a2c8df 18/06/14 03:02:07 INFO session.SessionState: Updating thread name to ab78aeaa-274a-4ed6-bff0-ffa488a2c8df main 18/06/14 03:02:07 INFO conf.HiveConf: Using the default value passed in for log id: ab78aeaa-274a-4ed6-bff0-ffa488a2c8df 18/06/14 03:02:07 INFO ql.Driver: Compiling command(queryId=yinzhengjie_20180614030207_d95714d9-84da-405e-97b6-d9f36436e2f2): CREATE TABLE `yinzhengjie`.`wc` ( `id` INT, `string` STRING) COMMENT 'Imported by sqoop on 2018/06/14 03:01:43' ROW FORMAT DELIMITED FIELDS TERMINATED BY '