zoukankan      html  css  js  c++  java
  • sqoop-1.4.6安装与使用

    一、安装
    1.下载sqoop-1.4.6-bin.tar.gz并解压
    2.修改conf/sqoop-env.sh,设置如下变量:
    1. export HADOOP_COMMON_HOME=/usr/local/hadoop-2.6.3
    2. export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.6.3
    3. export HBASE_HOME=/usr/local/hbase-1.1.3
    4. export HIVE_HOME=/usr/local/hive-2.0.0
    5. #export ZOOCFGDIR=
    或者在用户的环境变量中做以上设置

    二、sqoop使用
    sqoop通过bin下的各种工具完成任务
    1.连接数据库
    参数:
    1. Argument Description
    2. --connect <jdbc-uri> Specify JDBC connect string
    3. --connection-manager <class-name> Specify connection manager class to use
    4. --driver <class-name> Manually specify JDBC driver class to use
    5. --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME
    6. --help Print usage instructions
    7. --password-file Set path for a file containing the authentication password
    8. -P Read password from console
    9. --password <password> Set authentication password
    10. --username <username> Set authentication username
    11. --verbose Print more information while working
    12. --connection-param-file <filename> Optional properties file that provides connection parameters
    13. --relaxed-isolation Set connection transaction isolation to read uncommitted for the mappers.
    $ sqoop import --connect jdbc:mysql://database.example.com/employees
    --connect参数中主机名不能用localhost代替,否则各个结点都查询自己机器上的数据库。
    安全是验证方式是把数据库的密码写入在/home/${user}下,并赋400权限。如下:
    $ sqoop import --connect jdbc:mysql://database.example.com/employees 
        --username venkatesh --password-file ${user.home}/.password

    2.导出数据到HDFS
    以下是参数:
    --appendAppend data to an existing dataset in HDFS
    --as-avrodatafileImports data to Avro Data Files
    --as-sequencefileImports data to SequenceFiles
    --as-textfileImports data as plain text (default)
    --as-parquetfileImports data to Parquet Files
    --boundary-query <statement>Boundary query to use for creating splits
    --columns <col,col,col…>Columns to import from table   --columns "name,employee_id,jobtitle"
    --delete-target-dirDelete the import target directory if it exists
    --directUse direct connector if exists for the database
    --fetch-size <n>Number of entries to read from database at once.
    --inline-lob-limit <n>Set the maximum size for an inline LOB
    -m,--num-mappers <n>Use n map tasks to import in parallel
    -e,--query <statement>Import the results of statement.  select min(<split-by>), max(<split-by>) from <table name> 
    --split-by <column-name>Column of the table used to split work units. Cannot be used with --autoreset-to-one-mapper option.
    以某个字段平衡负载
    --autoreset-to-one-mapperImport should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with--split-by <col> option.
    --table <table-name>Table to read
    --target-dir <dir>HDFS destination dir
    --warehouse-dir <dir>HDFS parent for table destination
    --where <where clause>WHERE clause to use during import
    -z,--compressEnable compression
    --compression-codec <c>Use Hadoop codec (default gzip)
    --null-string <null-string>The string to be written for a null value for string columns
    --null-non-string <null-string>The string to be written for a null value for non-string columns
    示例:
    1. bin/sqoop list-databases --connect jdbc:mysql://yangxw:3306/mysql --username root --password root
    2. bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table customers --target-dir /mysql_hadoop
    3. $ sqoop import 
        --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' 
        --split-by a.id --target-dir /user/foo/joinresults
    其它参数:
    控制字段类型:
    $ sqoop import ... --map-column-java id=String,value=Integer
    增量导入:
    使用append 或者lastmodified 模式。http://blog.csdn.net/ryantotti/article/details/14226635


    大对象(BLOB CLOB):
    对于16M以下的LOB,存储在常规的地方,大于16M的对象,存储在_lobs 目录下,并且格式与常规数据不一样,每个存储对象可以存储2^63大小的字节。

    3.导入HIVE
    导入HIVE的步骤:
    dbms->hdfs->hive(load inpath)
    参数:
    1. --hive-home <dir> Override $HIVE_HOME
    2. --hive-import Import tables into Hive (Uses Hives default delimiters if none are set.)
    3. --hive-overwrite Overwrite existing data in the Hive table.
    4. --create-hive-table If set, then the job will fail if the target hive table exits. By default this property is false.
    5. --hive-table <table-name> Sets the table name to use when importing to Hive.
    6. --hive-drop-import-delims Drops , , and 1 from string fields when importing to Hive.
    7. --hive-delims-replacement Replace , , and 1 from string fields with user defined string when importing to Hive.
    8. --hive-partition-key Name of a hive field to partition are sharded on
    9. --hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
    10. --map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.
    示例:
    bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table products --hive-import --create-hive-table 

    如果原表是压缩表,导入HIVE时可能无法分割任务(无法并行),但lzop编码可以分割以并行执行

    4.导入hbase
    相关参数:
    --column-family <family>Sets the target column family for the import
    --hbase-create-tableIf specified, create missing HBase tables
    --hbase-row-key <col>Specifies which input column to use as the row key
    In case, if input table contains composite(复合)
    key, then <col> must be in the form of a
    comma-separated(逗号分隔) list of composite key
    attributes
    --hbase-table <table-name>Specifies an HBase table to use as the target instead of HDFS
    --hbase-bulkloadEnables bulk loading 指导入
        sqoop使用hdfs的put功能把数据导入hdfs中。默认会使用split key做为rowkey,如果没有定义split key,则尝试用primary key.如果原表是组合键,--hbase-row-key要设置组合键。如果hbase中没有表或者列簇,则会报错,可以添加--hbase-create-table解决。如果不使用--hbase-create-table,则要设置--column-family,所有的输出列都放在一个--column-family 里。
        sqoop先导入hdfs中再以utf8导入hbase,忽略除row-key外的空值。为了减轻负载,可以使用批量导入bulk
    示例:
    1. bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table orders --target-dir /mysql_hadoop/orders4 --hbase-table orders --column-family orders --hbase-create-table
    1. 报以下错误:无法创建hbase的表:
    2. 16/03/24 18:30:23 INFO mapreduce.HBaseImportJob: Creating missing HBase table orders
    3. Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
    4. at org.apache.sqoop.mapreduce.HBaseImportJob.jobSetup(HBaseImportJob.java:222)
    5. at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
    6. at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
    7. at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
    8. at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
    9. at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    10. at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    11. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    12. at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    13. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    可能是hadoop和hbase兼容性差的原因引起的:http://www.aboutyun.com/thread-12236-1-1.html
    那么先将hbase的表创建好:
    1. hbase(main):002:0> create 'orders','CF1'
    2. 0 row(s) in 1.6730 seconds
    3. => Hbase::Table - orders
    再执行:
    1. bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table orders --target-dir /mysql_hadoop/orders5 --hbase-table orders --column-family CF1
    然后执行成功了!


    5.从数据库导入到HADOOP中$CONDITIONS
        $CONDITIONS 前面要写个



















  • 相关阅读:
    mac上python3安装HTMLTestRunner
    双目深度估计传统算法流程及OpenCV的编译注意事项
    深度学习梯度反向传播出现Nan值的原因归类
    1394. Find Lucky Integer in an Array
    1399. Count Largest Group
    1200. Minimum Absolute Difference
    999. Available Captures for Rook
    509. Fibonacci Number
    1160. Find Words That Can Be Formed by Characters
    1122. Relative Sort Array
  • 原文地址:https://www.cnblogs.com/skyrim/p/7455942.html
Copyright © 2011-2022 走看看