zoukankan html css js c++ java

sqoop-1.4.6安装与使用

一、安装

1.下载sqoop-1.4.6-bin.tar.gz并解压

2.修改conf/sqoop-env.sh，设置如下变量：

export HADOOP_COMMON_HOME=/usr/local/hadoop-2.6.3
export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.6.3
export HBASE_HOME=/usr/local/hbase-1.1.3
export HIVE_HOME=/usr/local/hive-2.0.0
#export ZOOCFGDIR=

或者在用户的环境变量中做以上设置

二、sqoop使用

sqoop通过bin下的各种工具完成任务

1.连接数据库

参数：

Argument	Description
--connect <jdbc-uri>	Specify JDBC connect string
--connection-manager <class-name>	Specify connection manager class to use
--driver <class-name>	Manually specify JDBC driver class to use
--hadoop-mapred-home <dir>	Override $HADOOP_MAPRED_HOME
--help	Print usage instructions
--password-file	Set path for a file containing the authentication password
-P	Read password from console
--password <password>	Set authentication password
--username <username>	Set authentication username
--verbose	Print more information while working
--connection-param-file <filename>	Optional properties file that provides connection parameters
--relaxed-isolation	Set connection transaction isolation to read uncommitted for the mappers.

$ sqoop import --connect jdbc:mysql://database.example.com/employees

--connect参数中主机名不能用localhost代替，否则各个结点都查询自己机器上的数据库。

安全是验证方式是把数据库的密码写入在/home/${user}下，并赋400权限。如下：

$ sqoop import --connect jdbc:mysql://database.example.com/employees 
    --username venkatesh --password-file ${user.home}/.password

2.导出数据到HDFS

以下是参数：

`--append`	Append data to an existing dataset in HDFS
`--as-avrodatafile`	Imports data to Avro Data Files
`--as-sequencefile`	Imports data to SequenceFiles
`--as-textfile`	Imports data as plain text (default)
`--as-parquetfile`	Imports data to Parquet Files
`--boundary-query <statement>`	Boundary query to use for creating splits
`--columns <col,col,col…>`	Columns to import from table --columns "name,employee_id,jobtitle"
`--delete-target-dir`	Delete the import target directory if it exists
`--direct`	Use direct connector if exists for the database
`--fetch-size <n>`	Number of entries to read from database at once.
`--inline-lob-limit <n>`	Set the maximum size for an inline LOB
`-m,--num-mappers <n>`	Use n map tasks to import in parallel
`-e,--query <statement>`	Import the results of `statement`. select min(<split-by>), max(<split-by>) from <table name>
`--split-by <column-name>`	Column of the table used to split work units. Cannot be used with `--autoreset-to-one-mapper` option. 以某个字段平衡负载
`--autoreset-to-one-mapper`	Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with`--split-by <col>` option.
`--table <table-name>`	Table to read
`--target-dir <dir>`	HDFS destination dir
`--warehouse-dir <dir>`	HDFS parent for table destination
`--where <where clause>`	WHERE clause to use during import
`-z,--compress`	Enable compression
`--compression-codec <c>`	Use Hadoop codec (default gzip)
`--null-string <null-string>`	The string to be written for a null value for string columns
`--null-non-string <null-string>`	The string to be written for a null value for non-string columns 来源： http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

示例：

bin/sqoop list-databases --connect jdbc:mysql://yangxw:3306/mysql --username root --password root
bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table customers --target-dir /mysql_hadoop
$ sqoop import 
  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' 
  --split-by a.id --target-dir /user/foo/joinresults

其它参数：

控制字段类型：

$ sqoop import ... --map-column-java id=String,value=Integer

增量导入：

使用append 或者lastmodified 模式。http://blog.csdn.net/ryantotti/article/details/14226635

大对象(BLOB CLOB)：

对于16M以下的LOB，存储在常规的地方，大于16M的对象，存储在_lobs 目录下，并且格式与常规数据不一样，每个存储对象可以存储2^63大小的字节。

3.导入HIVE

导入HIVE的步骤：

dbms->hdfs->hive(load inpath)

参数：

--hive-home <dir>	Override $HIVE_HOME
--hive-import	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite	Overwrite existing data in the Hive table.
--create-hive-table	If set, then the job will fail if the target hive table exits. By default this property is false.
--hive-table <table-name>	Sets the table name to use when importing to Hive.
--hive-drop-import-delims	Drops 
, 
, and 1 from string fields when importing to Hive.
--hive-delims-replacement	Replace 
, 
, and 1 from string fields with user defined string when importing to Hive.
--hive-partition-key	Name of a hive field to partition are sharded on
--hive-partition-value <v>	String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map>	Override default mapping from SQL type to Hive type for configured columns.

示例：

bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table products --hive-import --create-hive-table

如果原表是压缩表，导入HIVE时可能无法分割任务(无法并行)，但lzop编码可以分割以并行执行

4.导入hbase

相关参数：

`--column-family <family>`	Sets the target column family for the import
`--hbase-create-table`	If specified, create missing HBase tables
`--hbase-row-key <col>`	Specifies which input column to use as the row key
	In case, if input table contains composite(复合)
	key, then <col> must be in the form of a
	comma-separated(逗号分隔) list of composite key
	attributes
`--hbase-table <table-name>`	Specifies an HBase table to use as the target instead of HDFS
`--hbase-bulkload`	Enables bulk loading 指导入

sqoop使用hdfs的put功能把数据导入hdfs中。默认会使用split key做为rowkey，如果没有定义split key，则尝试用primary key.如果原表是组合键，--hbase-row-key要设置组合键。如果hbase中没有表或者列簇，则会报错，可以添加--hbase-create-table解决。如果不使用--hbase-create-table，则要设置--column-family,所有的输出列都放在一个--column-family 里。

sqoop先导入hdfs中再以utf8导入hbase，忽略除row-key外的空值。为了减轻负载，可以使用批量导入bulk

示例：

bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table orders --target-dir /mysql_hadoop/orders4 --hbase-table orders --column-family orders --hbase-create-table

报以下错误：无法创建hbase的表：
16/03/24 18:30:23 INFO mapreduce.HBaseImportJob: Creating missing HBase table orders
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
        at org.apache.sqoop.mapreduce.HBaseImportJob.jobSetup(HBaseImportJob.java:222)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

可能是hadoop和hbase兼容性差的原因引起的：http://www.aboutyun.com/thread-12236-1-1.html

那么先将hbase的表创建好：

hbase(main):002:0> create 'orders','CF1'
0 row(s) in 1.6730 seconds
=> Hbase::Table - orders

再执行：

bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table orders --target-dir /mysql_hadoop/orders5 --hbase-table orders --column-family CF1

然后执行成功了！

5.从数据库导入到HADOOP中$CONDITIONS

$CONDITIONS 前面要写个

来自为知笔记(Wiz)

查看全文

相关阅读:
mac上python3安装HTMLTestRunner
双目深度估计传统算法流程及OpenCV的编译注意事项
 深度学习梯度反向传播出现Nan值的原因归类
 1394. Find Lucky Integer in an Array
1399. Count Largest Group
1200. Minimum Absolute Difference
999. Available Captures for Rook
509. Fibonacci Number
1160. Find Words That Can Be Formed by Characters
1122. Relative Sort Array

原文地址：https://www.cnblogs.com/skyrim/p/7455942.html