zoukankan      html  css  js  c++  java
  • sqoop2-1.99.5-cdh5.5.4.tar.gz的部署搭建

      不多说,直接上干货!

       

      首先,说下博主我,为什么,好端端的Sqoop1用的好好的,然后又安装和学习Sqoop2?

      因为,在Cloudera Hue里的Sqoop,是需要Sqoop2。

    HUE配置文件hue.ini 的sqoop模块详解(图文详解)(分HA集群)

    Sqoop2安装简介

      sqoop2的安装分为server端和client端。
      server端:负责与hadoop集群通信进行数据的迁移,client端负责与用户和server交互。
      client端:不用安装, 只需要将其安装包解压到集群中任何的机器上去,然后对其进行解压即可, 无需其他额外的配置。

       Sqoop分client和server,server安装在Hadoop或Spark集群中的某个节点上,这个节点充当要连接sqoop的入口节点,

      client端不需要安装hadoop。

      本博文是个入门,即只在bigdatamaster上安装server端。当然你也可以假设认为server和client都在bigdatamaster上哈。

      对于Sqoop和hive这样的组件,我一般都是安装在master节点,即在本博客里是bigdatamaster。

    http://archive.cloudera.com/cdh5/cdh/5/sqoop2-1.99.5-cdh5.5.4.tar.gz

     

    [hadoop@bigdatamaster app]$ cd sqoop
    [hadoop@bigdatamaster sqoop]$ pwd
    /home/hadoop/app/sqoop
    [hadoop@bigdatamaster sqoop]$ ll
    total 188
    drwxr-xr-x  2 hadoop hadoop  4096 Apr 26  2016 bin
    -rw-r--r--  1 hadoop hadoop 51764 Apr 26  2016 CHANGELOG.txt
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 client
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 cloudera
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 common
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 common-test
    drwxr-xr-x  9 hadoop hadoop  4096 Apr 26  2016 connector
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 core
    drwxr-xr-x  2 hadoop hadoop  4096 Apr 26  2016 dev-support
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 dist
    drwxr-xr-x  8 hadoop hadoop  4096 Apr 26  2016 docs
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 execution
    -rw-r--r--  1 hadoop hadoop 17253 Apr 26  2016 LICENSE.txt
    -rw-r--r--  1 hadoop hadoop   166 Apr 26  2016 NOTICE.txt
    -rw-r--r--  1 hadoop hadoop 30245 Apr 26  2016 pom.xml
    -rw-r--r--  1 hadoop hadoop  1610 Apr 26  2016 README.txt
    drwxr-xr-x  5 hadoop hadoop  4096 Apr 26  2016 repository
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 security
    drwxr-xr-x 10 hadoop hadoop  4096 Apr 26  2016 server
    drwxr-xr-x  4 hadoop hadoop  4096 Apr 26  2016 shell
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 submission
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 test
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 tomcat
    drwxr-xr-x  3 hadoop hadoop  4096 Apr 26  2016 tools
    [hadoop@bigdatamaster sqoop]$ 

     

       配置环境变量

    [hadoop@bigdatamaster sqoop]$ su root
    Password: 
    [root@bigdatamaster sqoop]# vim /etc/profile

     

    #sqoop1
    #export SQOOP_HOME=/home/hadoop/app/sqoop
    #export PATH=$PATH:$SQOOP_HOME/bin
    
    
    #sqoop2
    export SQOOP_HOME=/home/hadoop/app/sqoop
    export PATH=$PATH:$SQOOP_HOME/bin
    export CATALINA_BASE=/home/hadoop/app/sqoop/server
    export LOGDIR=$SQOOP_HOME/logs/

    对于Sqoop2的server端安装配置(说明)

    • 解压软件包到一个目录下。(我一般是放在/home/hadoop/app下)
    • 修改环境变
      export SQOOP_HOME=/home/hadoop/app/sqoop  (因为我是用的软链接)
      export PATH=$PATH:$SQOOP_HOME/bin
      export CATALINA_BASE=$SQOOP_HOME/server
      export LOGDIR=$SQOOP_HOME/logs/
    • 生效环境变量的配置
    • source 配置文件名
    • 修改sqoop配置:
      vi server/conf/sqoop.properties
      将org.apache.sqoop.submission.engine.mapreduce.configuration.directory后面hadoop的位置修改为自己安装的hadoop配置文件位置,我的为:/home/hadoop/app/hadoop/

    • 修改sqoop读取hadoop的jar包的路径 vi /sqoop/server/conf/catalina.properties
      将common.loader行后的/usr/lib/hadoop/lib/.jar改成自己的hadoop jar 包目录,我的为:
      /home/hadoop/app/hadoop/share/hadoop/common/.jar,
      /home/hadoop/app/hadoop/share/hadoop/common/lib/.jar,
      /home/hadoop/app/hadoop/share/hadoop/hdfs/.jar,
      /home/hadoop/app/hadoop/share/hadoop/hdfs/lib/.jar,
      /home/hadoop/app/hadoop/share/hadoop/mapreduce/.jar,
      /home/hadoop/app/hadoop/share/hadoop/mapreduce/lib/.jar,
      /home/hadoop/app/hadoop/share/hadoop/tools/.jar,
      /home/hadoop/app/hadoop/share/hadoop/tools/lib/.jar,
      /home/hadoop/app/hadoop/share/hadoop/yarn/.jar,
      /home/hadoop/app/hadoopshare/hadoop/yarn/lib/*.jar

    注意: 在修改common.loader的过程中, 不能换行

      本步骤的另外的一种方法是: 直接将上诉的包 拷贝到$SQOOP_HOME/server/lib文件夹内部- 将mysql的连接jar包拷贝的$SQOOP_HOME/lib文件夹中(lib文件夹需要自己创建)到此sqoop就基本配置完成可以直接运行.

    下面是,对sqoop2的配置文件进行配置

      1、修改$SQOOP_HOME/server/conf/catalina.properties文件中的common.loader属性,在其后增加(写到一行)

    $HADOOP_HOME/share/hadoop/common/*.jar,
    $HADOOP_HOME/share/hadoop/common/lib/*.jar,
    $HADOOP_HOME/share/hadoop/yarn/*.jar,
    $HADOOP_HOME/share/hadoop/hdfs/*.jar,
    $HADOOP_HOME,/share/hadoop/mapreduce/*.jar  

    [hadoop@bigdatamaster conf]$ pwd
    /home/hadoop/app/sqoop/server/conf
    [hadoop@bigdatamaster conf]$ ll
    total 216
    -rw-r--r-- 1 hadoop hadoop  10572 May  8  2015 catalina.policy
    -rw-r--r-- 1 hadoop hadoop   4767 Apr 26  2016 catalina.properties
    -rw-r--r-- 1 hadoop hadoop   1395 May  8  2015 context.xml
    -rw-r--r-- 1 hadoop hadoop   3290 May  8  2015 logging.properties
    -rw-r--r-- 1 hadoop hadoop   6528 Apr 26  2016 server.xml
    -rw-r--r-- 1 hadoop hadoop   1586 Apr 26  2016 sqoop_bootstrap.properties
    -rw-r--r-- 1 hadoop hadoop   7483 Apr 26  2016 sqoop.properties
    -rw-r--r-- 1 hadoop hadoop   1530 May  8  2015 tomcat-users.xml
    -rw-r--r-- 1 hadoop hadoop 164089 May  8  2015 web.xml
    [hadoop@bigdatamaster conf]$ vim catalina.properties 

     

    $HADOOP_HOME/share/hadoop/common/*.jar,$HADOOP_HOME/share/hadoop/common/lib/*.jar,$HADOOP_HOME/share/hadoop/yarn/*.jar,$HADOOP_HOME/share/hadoop/hdfs/*.jar,$HADOOP_HOME,/share/hadoop/mapreduce/*.jar

      2、修改$SQOOP_HOME/server/conf/sqoop.properties文件org.apache.sqoop.submission.engine.mapreduce.configuration.directory属性,指向本机hadoop配置目录。

     

    [hadoop@bigdatamaster conf]$ pwd
    /home/hadoop/app/sqoop/server/conf
    [hadoop@bigdatamaster conf]$ ll
    total 216
    -rw-r--r-- 1 hadoop hadoop  10572 May  8  2015 catalina.policy
    -rw-r--r-- 1 hadoop hadoop   4966 May  7 18:28 catalina.properties
    -rw-r--r-- 1 hadoop hadoop   1395 May  8  2015 context.xml
    -rw-r--r-- 1 hadoop hadoop   3290 May  8  2015 logging.properties
    -rw-r--r-- 1 hadoop hadoop   6528 Apr 26  2016 server.xml
    -rw-r--r-- 1 hadoop hadoop   1586 Apr 26  2016 sqoop_bootstrap.properties
    -rw-r--r-- 1 hadoop hadoop   7483 Apr 26  2016 sqoop.properties
    -rw-r--r-- 1 hadoop hadoop   1530 May  8  2015 tomcat-users.xml
    -rw-r--r-- 1 hadoop hadoop 164089 May  8  2015 web.xml
    [hadoop@bigdatamaster conf]$ vim sqoop.properties 

    # Hadoop configuration directory
    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/app/hadoop/conf/

       3、$SQOOP_HOME/server/conf/server.xml是Tomcat的配置文件,端口什么的可以在这个文件设置

      这里,我暂时不设置。

       4、复制mysql-connector-java-5.1.21.jar到$SQOOP_HOME/server/lib/下

    [hadoop@bigdatamaster lib]$ pwd
    /home/hadoop/app/sqoop2-1.99.5-cdh5.5.4/server/lib
    [hadoop@bigdatamaster lib]$ ls
    annotations-api.jar  catalina-ha.jar  catalina-tribes.jar  el-api.jar     jasper.jar   servlet-api.jar                   tomcat-coyote.jar  tomcat-i18n-es.jar  tomcat-i18n-ja.jar
    catalina-ant.jar     catalina.jar     ecj-4.3.1.jar        jasper-el.jar  jsp-api.jar  sqoop-tomcat-1.99.5-cdh5.5.4.jar  tomcat-dbcp.jar    tomcat-i18n-fr.jar
    [hadoop@bigdatamaster lib]$ rz
    
    [hadoop@bigdatamaster lib]$ ls
    annotations-api.jar  catalina.jar         el-api.jar     jsp-api.jar                      sqoop-tomcat-1.99.5-cdh5.5.4.jar  tomcat-i18n-es.jar
    catalina-ant.jar     catalina-tribes.jar  jasper-el.jar  mysql-connector-java-5.1.21.jar  tomcat-coyote.jar                 tomcat-i18n-fr.jar
    catalina-ha.jar      ecj-4.3.1.jar        jasper.jar     servlet-api.jar                  tomcat-dbcp.jar                   tomcat-i18n-ja.jar
    [hadoop@bigdatamaster lib]$ 

    Sqoop2全部配置好之后,按照如下的顺序来

      1、启动Sqoop2的server(我这里做个最简单的,在bigdatamaster上)。启动sqoop服务

    $SQOOP_HOME/bin/sqoop.sh server start  

     

    [hadoop@bigdatamaster hadoop]$ cd $SQOOP_HOME
    [hadoop@bigdatamaster sqoop]$ pwd
    /home/hadoop/app/sqoop
    [hadoop@bigdatamaster sqoop]$ $SQOOP_HOME/bin/sqoop.sh server start
    Sqoop home directory: /home/hadoop/app/sqoop
    Setting SQOOP_HTTP_PORT:     12000
    Setting SQOOP_ADMIN_PORT:     12001
    Using   CATALINA_OPTS:       
    Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
    Using CATALINA_BASE:   /home/hadoop/app/sqoop/server
    Using CATALINA_HOME:   /home/hadoop/app/sqoop/server
    Using CATALINA_TMPDIR: /home/hadoop/app/sqoop/server/temp
    Using JRE_HOME:        /home/hadoop/app/jdk
    Using CLASSPATH:       /home/hadoop/app/sqoop/server/bin/bootstrap.jar
    [hadoop@bigdatamaster sqoop]$ jps
    2200 QuorumPeerMain
    2339 NameNode
    2973 Bootstrap
    2984 Jps
    2495 SecondaryNameNode
    2669 ResourceManager
    [hadoop@bigdatamaster sqoop]$ 

      jsp命令看到Bootstrap进程。

      2、启动Sqoop2的client(我这里做个最简单的,在bigdatamaster上),进入客户端交互模式。进入sqoop控制台  

    $SQOOP_HOME/bin/sqoop.sh client  

     

    [hadoop@bigdatamaster sqoop]$ pwd
    /home/hadoop/app/sqoop
    [hadoop@bigdatamaster sqoop]$ $SQOOP_HOME/bin/sqoop.sh client
    Sqoop home directory: /home/hadoop/app/sqoop
    May 07, 2017 8:26:47 PM java.util.prefs.FileSystemPreferences$1 run
    INFO: Created user preferences directory.
    Sqoop Shell: Type 'help' or 'h' for help.
    
    sqoop:000> 

      3、在Sqoop2的client连接Sqoop2的server 。 连接服务器  

    sqoop:000> set server --host bigdatamaster  --port 12000  --webapp sqoop

    sqoop:000> set server --host localhost  --port 12000  --webapp sqoop
    sqoop:000> show version --all     

      当看到show version -all正确的显示 就说明了Sqoop2的client连接上了Sqoop2的服务器。

      show version --all            显示服务器、客户端的版本信息,如果server显示错误,

        重启一下 server./sqoop.sh server stop  

     

    [hadoop@bigdatamaster sqoop]$ pwd
    /home/hadoop/app/sqoop
    [hadoop@bigdatamaster sqoop]$ $SQOOP_HOME/bin/sqoop.sh client
    Sqoop home directory: /home/hadoop/app/sqoop
    May 07, 2017 8:26:47 PM java.util.prefs.FileSystemPreferences$1 run
    INFO: Created user preferences directory.
    Sqoop Shell: Type 'help' or 'h' for help.
    
    sqoop:000> set server --host bigdatamaster  --port 12000  --webapp sqoop
    Server is set successfully
    sqoop:000> show version --all
    client version:
      Sqoop 1.99.5-cdh5.5.4 source revision 9e81b8b90b040997e8d6080e9bf43c7723bb382f 
      Compiled by jenkins on Mon Apr 25 11:14:57 PDT 2016
    0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Exception has occurred during processing command 
    Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0000:An unknown error has occurred
    sqoop:000>  
    sqoop:000> show connector --all   查看连接器  
    sqoop:000> show connection --all  查看连接 
    sqoop:000> show connection --xid 1  查看id为1的连接  
    sqoop:000> create connection --cid 1  创建id为1的连接 
    <pre name="code" class="java">Creating connection for connector with id 1    
    Please fill following values to create new connection object    
    Name: mysql  --输入名称    
        
    Connection configuration    
        
    JDBC Driver Class: com.mysql.jdbc.Driver  --输入    
    JDBC Connection String: jdbc:mysql://bigdatamaster:3306/sqoop --输入    
    Username: root --输入    
    Password: ****** --输入    
    JDBC Connection Properties:     
    There are currently 0 values in the map:    
    entry#     
        
    Security related configuration options    
        
    Max connections: 20 --输入    
    New connection was successfully created with validation status FINE and persistent id 1  
    sqoop:000> create job --xid 1 --type import  
    Creating job for connection with id 1    
    Please fill following values to create new job object    
    Name: mysql_job    
        
    Database configuration    
        
    Schema name:     
    Table name: userinfo  要全量导出一张表,请填写表名,table name 和 table sql statement不能同时配置  
    Table SQL statement:   如果填写格式必须为 select * from userinfo where ${CONDITIONS}  
    Table column names:     
    Partition column name: id  使用哪个字段来填充过滤条件 userid  
    Nulls in partition column:    
    Boundary query:  如果选择sql方式,这里要写一个查询语句,返回值需为整形,sqoop运行job时,会自动填充${CONDITIONS} 这个占位符,如:select 0,3 from userinfo  
      
        
    Output configuration    
        
    Storage type:     
      0 : HDFS    
    Choose: 0    
    Output format:     
      0 : TEXT_FILE    
      1 : SEQUENCE_FILE    
    Choose: 1    
    Compression format:     
      0 : NONE    
      1 : DEFAULT    
      2 : DEFLATE    
      3 : GZIP    
      4 : BZIP2    
      5 : LZO    
      6 : LZ4    
      7 : SNAPPY    
    Choose: 0    
    Output directory: /home/hadoop/out    
        
    Throttling resources    
        
    Extractors:     
    Loaders:     
    New job was successfully created with validation status FINE  and persistent id 1   
    sqoop:000> start job --jid 1     启动  
    Submission details    
    Job ID: 1    
    Server URL: http://localhost:12000/sqoop/  
    <span style="font-family: Consolas, 'Courier New', Courier, mono, serif; line-height: 18px;">hadoop fs -ls /mysql/out </span>  

      更详细,请见

    http://sqoop.apache.org/docs/1.99.5/Sqoop5MinutesDemo.html

  • 相关阅读:
    jupyter同时使用python2、3
    [python之ipython] jupyter notebook在云端服务器上开启,本地访问
    Transformer的PyTorch实现--转载
    二叉树中和为某一值的路径
    hadoop初识笔记
    大数据初识笔记
    mysql快速入门笔记
    118.Java反射-工厂模式
    117.Java观察者设计模式
    116.Java对象的拷贝
  • 原文地址:https://www.cnblogs.com/zlslch/p/6821763.html
Copyright © 2011-2022 走看看