zoukankan      html  css  js  c++  java
  • Sqoop安装与应用过程



     

    1.  参考说明

    参考文档:

     

    http://sqoop.apache.org/

     

    http://sqoop.apache.org/docs/1.99.7/admin/Installation.html

     

    2.  安装环境说明

    2.1.  环境说明

     

    CentOS7.4+ Hadoop2.7.5的伪分布式环境

     

    主机名

    NameNode

    SecondaryNameNode

    DataNodes

    centoshadoop.smartmap.com

    192.168.1.80

    192.168.1.80

    192.168.1.80

     

     

     

     

     

    Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5

     

    3.  安装

    Sqoop由两部分组成:客户端(client)和服务端(server)。需要在集群的其中某个节点上安装server,该节点的服务端可以作为其他Sqoop客户端的入口点。在服务端的节点上必须安装有Hadoop。客户端可以安装在任意数量的机子上。在装有客户端的机子上不需要安装Hadoop

    3.1.  Sqoop下载

    http://sqoop.apache.org/

     

    3.2.  Sqoop解压

     

    将下载的sqoop-1.99.7-bin-hadoop200.tar.gz解压到/opt/hadoop/sqoop-1.99.7目录下

     

    4.  配置

    4.1.  修改Hadoop配置

    4.1.1.  修改core-site.xml文件

    [root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml

     

    <property>

           <name>hadoop.proxyuser.sqoop2.hosts</name>

           <value>*</value>

    </property>

    <property>

           <name>hadoop.proxyuser.sqoop2.groups</name>

           <value>*</value>

    </property>

    <property>

           <name>hadoop.proxyuser.root.hosts</name>

           <value>*</value>

    </property>

    <property>

           <name>hadoop.proxyuser.root.groups</name>

           <value>*</value>

    </property>

     

    4.1.2.  修改container-executor.cfg文件

    [root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/container-executor.cfg

     

    allowed.system.users=sqoop2,hadoop,root

     

    4.1.3.  修改yarn-site.xml文件

     

    [root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml

     

    4.2.  创建第三方数据库驱动目录

     

    [root@centoshadoop sqoop-1.99.7]# mkdir /opt/hadoop/sqoop-1.99.7/jdbcDriver

     

    [root@centoshadoop sqoop-1.99.7]# chmod g+rwx,o+rwx /opt/hadoop/sqoop-1.99.7/jdbcDriver

     

    4.3.  上传驱动到存放JDBC驱动的目录下

    上传软件包mysql-connector-java-5.1.46.jar/opt/hadoop/sqoop-1.99.7/jdbcDriver目录下

     

    4.4.  修改profile文件

    vi /etc/profile

     

    # export JAVA_HOME=/opt/java/jdk1.7.0_45

    export JAVA_HOME=/opt/java/jdk1.8.0_171

    # export CLASSPATH=.:$JAVA_HOME/lib

    export HADOOP_HOME=/opt/hadoop/hadoop-2.7.5

    export HADOOP_INSTALL=$HADOOP_HOME

    export HADOOP_MAPRED_HOME=$HADOOP_HOME

    export HADOOP_COMMON_HOME=$HADOOP_HOME

    export HADOOP_HDFS_HOME=$HADOOP_HOME

    export YARN_HOME=$HADOOP_HOME

    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

     

    export PIG_HOME=/opt/hadoop/pig-0.17.0

    export HIVE_HOME=/opt/hadoop/hive-2.3.3

     

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    export PATH=$PATH:$PIG_HOME/bin

    export PATH=$PATH:$HIVE_HOME/bin

     

    export SQOOP_HOME=/opt/hadoop/sqoop-1.99.7

    export PATH=$PATH:$SQOOP_HOME/bin

    export CLASSPATH=$CLASSPATH:$SQOOP_HOME/server/lib:$SQOOP_HOME/shell/lib:$SQOOP_HOME/tools/lib

    export SQOOP_SERVER_EXTRA_LIB=/opt/hadoop/sqoop-1.99.7/jdbcDriver/

     

    export CLASSPATH=.:$JAVA_HOME/lib:$HIVE_HOME/lib

    export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$HADOOP_HOME/lib/native

    export JAVA_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH

    export HADOOP_USER_NAME=hadoop

     

    4.5.  JDK升级为1.8版本

     

    JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量

     

    4.6.  修改Sqoop的配置文件

     

    [root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/conf/sqoop.properties

     

    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/hadoop-2.7.5/etc/hadoop/

     

     

    [root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/bin/sqoop.sh

     

    export HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common

    export HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs

    export HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce

    export HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn

     

    5.  启动Hadoop

     

    5.1.  启动YARNHDFS

    cd /opt/hadoop/hadoop-2.7.5/sbin

     

    start-all.sh

     

    5.2.  启动historyserver

     

    cd /opt/hadoop/hadoop-2.7.5/sbin

     

    mr-jobhistory-daemon.sh start historyserver

     

    6.  验证Sqoop安装与配置

     

    [root@centoshadoop sqoop-1.99.7]# sqoop2-tool verify

    Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

    Sqoop home directory: /opt/hadoop/sqoop-1.99.7

    Sqoop tool executor:

            Version: 1.99.7

            Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

            Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine

    Running tool: class org.apache.sqoop.tools.tool.VerifyTool

    1   [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.

    6   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

    SLF4J: Class path contains multiple SLF4J bindings.

    SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

    SLF4J: Found binding in [jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

    Verification was successful.

    Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

    7.  初始化SqoopRepository

    [root@centoshadoop sqoop-1.99.7]# sqoop2-tool upgrade

    Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

    Sqoop home directory: /opt/hadoop/sqoop-1.99.7

    Sqoop tool executor:

            Version: 1.99.7

            Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

            Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine

    Running tool: class org.apache.sqoop.tools.tool.UpgradeTool

    0   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

    SLF4J: Class path contains multiple SLF4J bindings.

    SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

    SLF4J: Found binding in [jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

    Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

    [root@centoshadoop sqoop-1.99.7]#

     

     

    8.  应用Sqoop工具

    8.1.  服务端

    8.1.1.  启动与关闭sqoop2-server

    [root@centoshadoop bin]# cd /opt/hadoop/sqoop-1.99.7/bin/

    [root@centoshadoop bin]# sqoop2-server start

    Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

    Sqoop home directory: /opt/hadoop/sqoop-1.99.7

    Starting the Sqoop2 server...

     

    [root@centoshadoop bin]# sqoop2-server stop

    Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

    Sqoop home directory: /opt/hadoop/sqoop-1.99.7

    Stopping the Sqoop2 server...

    Sqoop2 server stopped.

     

    [root@centoshadoop bin]# jps

    29780 NameNode

    30085 SecondaryNameNode

    30247 ResourceManager

    31767 SqoopJettyServer

    29913 DataNode

    32107 Jps

    30364 NodeManager

    29245 JobHistoryServer

    [root@centoshadoop bin]#

     

    8.2.  客户端

    8.2.1.  启动运行Sqoop的交互式Shell环境

     

    [root@centoshadoop bin]# cd /opt/hadoop/sqoop-1.99.7/bin/

    [root@centoshadoop bin]# sqoop2-shell

    sqoop:000> :exit

     

    8.2.2.  设置错误显示

    sqoop:000> set option --name verbose --value true

    Verbose option was changed to true

     

    8.2.3.  连接Sqoop服务端

     

    sqoop:000> set server -host 192.168.1.80

    Server is set successfully

     

    # set server --host ubuntu02 --port 12000 --webapp sqoop

     

    sqoop:000> show version --all

    client version:

     Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

     Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016

    server version:

     Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

     Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016

    API versions:

      [v1]

     

    8.2.4.  创建MySQLLink对象

     

    sqoop:000> show connector

    +------------------------+---------+------------------------------------------------------------+----------------------+

    |          Name          | Version |                           Class                            | Supported Directions |

    +------------------------+---------+------------------------------------------------------------+----------------------+

    | generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |

    | kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |

    | oracle-jdbc-connector  | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |

    | ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |

    | hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |

    | kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |

    | sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |

    +------------------------+---------+------------------------------------------------------------+----------------------+

    sqoop:000>

     

    sqoop:000> create link -c generic-jdbc-connector

    Creating link for connector with name generic-jdbc-connector

    Please fill following values to create new link object

    Name: mysql-link

     

    Database connection

     

    Driver class: com.mysql.jdbc.Driver

    Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

    Username: root

    Password: ***

    Fetch Size:

    Connection Properties:

    There are currently 0 values in the map:

    entry#

    SQL Dialect

     

    Identifier enclose:       # 空格

    New link was successfully created with validation status OK and name mysql-link

    sqoop:000>

     

    8.2.5.  创建HDFSLink对象

     

    sqoop:000> create link -connector hdfs-connector

    Creating link for connector with name hdfs-connector

    Please fill following values to create new link object

    Name: hdfs-link

     

    HDFS cluster

     

    URI: hdfs://192.168.1.80:9000

    Conf directory:

    Additional configs: /opt/hadoop/hadoop-2.7.5/etc/hadoop

    There are currently 0 values in the map:

    entry#

    New link was successfully created with validation status OK and name hdfs-link

    sqoop:000>

     

    8.2.6.  显示Link对象

     

    sqoop:000> show link --all

    2 link(s) to show:

    link with name mysql-link (Enabled: true, Created by root at 5/23/18 3:48 PM, Updated by root at 5/23/18 3:48 PM)

    Using Connector generic-jdbc-connector with name {1}

     Database connection

       Driver class: com.mysql.jdbc.Driver

       Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

       Username: root

       Password:

       Fetch Size:

       Connection Properties:

         protocol = tcp

      SQL Dialect

       Identifier enclose: `

    link with name hdfs-link (Enabled: true, Created by root at 5/23/18 4:00 PM, Updated by root at 5/23/18 4:00 PM)

    Using Connector hdfs-connector with name {1}

      HDFS cluster

       URI: hdfs://192.168.1.80:9000

       Conf directory:

       Additional configs::

    sqoop:000>

     

     

    8.2.7.  创建MySQL导数据到HDFSJob

     

    8.2.7.1. 准备数据

     

    create database sqooptest character set utf8;

     

    CREATE TABLE topic (

             id int(11) NOT NULL AUTO_INCREMENT,

             name varchar(50) DEFAULT NULL,

             address varchar(100) DEFAULT NULL,

             time datetime DEFAULT NULL,

             content varchar(500) DEFAULT NULL,

             PRIMARY KEY(id)

    )ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=UTF8;

     

    INSERT INTO topic(name, address, time, content) values('11', '111', '2018-05-01 08:00:00', 'hello world');

     

    INSERT INTO topic(name, address, time, content) values('11', '111', '2018-05-01 08:00:00', NULL);

     

    8.2.7.2. 创建任务(MySQLàHDFS

     

    sqoop:000> create job -f mysql-link -t hdfs-link

    Creating job for links with from name mysql-link and to name hdfs-link

    Please fill following values to create new job object

    Name:  mysql2hdfs

     

    Database source

     

    Schema name: sqooptest   #必填,数据库名称

    Table name: topic  #必填,表名

    SQL statement:

    Column names:

    There are currently 0 values in the list:

    element#

    Partition column: id

    Partition column nullable:

    Boundary query:

     

    Incremental read

     

    Check column:

    Last value:

     

    Target configuration

     

    Override null value:

    Null value:

    File format:

      0 : TEXT_FILE

      1 : SEQUENCE_FILE

      2 : PARQUET_FILE

    Choose: 0

    Compression codec:

      0 : NONE

      1 : DEFAULT

      2 : DEFLATE

      3 : GZIP

      4 : BZIP2

      5 : LZO

      6 : LZ4

      7 : SNAPPY

      8 : CUSTOM

    Choose: 0

    Custom codec:

    Output directory:/sqoop/sqooptest/mysqltable/    #必填

    Append mode:

     

    Throttling resources

     

    Extractors:

    Loaders:

     

    Classpath configuration

     

    Extra mapper jars:

    There are currently 0 values in the list:

    element#

    New job was successfully created with validation status OK  and name  mysql2hdfs

    sqoop:000>

     

    8.2.7.3. 启动Job

     

    [root@centoshadoop sbin]# hadoop fs –mkdir -p /sqoop/sqooptest/mysqltable

    [root@centoshadoop sbin]# hadoop fs –chmod –R 777 /sqoop

     

    sqoop:000> show job

    +----+-------------------------+-------------------------------------+----------------------------+---------+

    | Id |          Name           |           From Connector            |        To Connector        | Enabled |

    +----+-------------------------+-------------------------------------+----------------------------+---------+

    | 1  |  mysql2hdfs | mysql-link (generic-jdbc-connector) | hdfs-link (hdfs-connector) | true    |

    +----+-------------------------+-------------------------------------+----------------------------+---------+

    sqoop:000>

     

    sqoop:000> set option --name verbose --value true

    Verbose option was changed to true

    sqoop:000> start job -name mysql2hdfs

    Submission details

    Job Name: mysql2hdfs

    Server URL: http://localhost:12000/sqoop/

    Created by: root

    Creation date: 2018-05-24 08:57:14 CST

    Lastly updated by: root

    External ID: job_1527123411152_0002

            http://centoshadoop:8088/proxy/application_1527123411152_0002/

    Source Connector schema: Schema{name=  sqooptest  .  topic  ,columns=[

            FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},

            Text{name=name,nullable=true,type=TEXT,charSize=null},

            Text{name=address,nullable=true,type=TEXT,charSize=null},

            Date{name=time,nullable=true,type=DATE_TIME,hasFraction=true,hasTimezone=false},

            Text{name=content,nullable=true,type=TEXT,charSize=null}]}

    2018-05-24 08:57:14 CST: BOOTING  - Progress is not available

    sqoop:000>

     

    8.2.7.4. 查看结果

     

    [root@centoshadoop sbin]# hadoop fs -ls /sqoop/sqooptest/mysqltable

    Found 1 items

    -rw-r--r--   1 root supergroup         97 2018-05-24 08:57 /sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

    [root@centoshadoop sbin]# hadoop fs -cat /sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

    3,'11','111','2018-05-01 08:00:00.000','hello world'

    4,'11','111','2018-05-01 08:00:00.000',NULL

    [root@centoshadoop sbin]#

     

     

     

     

  • 相关阅读:
    从干将莫邪的故事说起--java比较操作注意要点
    我又不是你的谁--java instanceof操作符用法揭秘
    色即是空,空即是色---java有关null的几件小事
    大头儿子和小头爸爸的战斗--java字符和字符串
    你的环境有问题吧?--byte数组转字符串的疑惑
    两小无猜的爱恨情仇--java =+和+=揭秘
    java程序猿如何练习java版的易筋经?
    孙悟空的七十二变是那般?--java类型的七十二变揭秘
    leetcode 341. Flatten Nested List Iterator
    leetcode 44. Wildcard Matching
  • 原文地址:https://www.cnblogs.com/gispathfinder/p/9081085.html
Copyright © 2011-2022 走看看