zoukankan      html  css  js  c++  java
  • Sqoop笔记

    复制代码

    Sqoop笔记
    1.下载
    https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
    本次下载:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
    解压到/home/hadoop/sqoop
    拷贝sqoop-1.4.7.jar到/home/hadoop/sqoop/lib目录下
    
    2.配置conf/sqoop-env.sh
    export HADOOP_COMMON_HOME=/home/hadoop/hadoop
    export HADOOP_MAPRED_HOME=/home/hadoop/hadoop
    export HIVE_HOME=/home/hadoop/hive
    
    3.配置环境变量~/.bashrc
    验证sqoop
    sqoop version
    
    4.执行命名报错,主要是确实一些tar
    sqoop/lib
    sqoop-1.4.7.jar
    mysql-connector-java-5.1.39-bin.jar
    commons-lang3-3.3.2.jar
    avro-tools-1.8.1.jar 
    下载地址:https://maven.aliyun.com/mvn/search
    
    5.将mysql数据库中的表数据导入到HDFS指定目录
    bin/sqoop import 
    --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
    --username root 
    --password root 
    --target-dir /sqoop/post 
    --table post --m 1
    
    6.导入表数据子集
    sqoop import 
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
        --username root 
        --password root 
        --where "user_id>1" 
        --target-dir /sqoop/post2 
        --table post 
        --m 1
    
    7.增量导入
    sqoop import  
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
        --username root 
        --password root 
        --target-dir /sqoop/post 
        --table post 
        --m 1 
        --incremental append 
        --check-column id 
        --last-value 9
    
    hadoop fs -cat /sqoop/post/part-m-*
    
    8.hdfs导出数据到mysql
    sqoop export 
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test?characterEncoding=UTF-8 
        --username root 
        --password root 
        --table student 
        --m 1 
        --export-dir /user/hive/warehouse/jstudent
    
    characterEncoding=UTF-8解决导入到mysql的数据中文乱码问题。
    
    
    9.作业(job)环境配置
    配置mysql为metastore
    修改配置文件sqoop-site.xml,以下配置默认是被注释掉的,现在将它都打开:
      <property>
        <name>sqoop.metastore.client.autoconnect.url</name>
        <value>jdbc:mysql://192.168.15.45:3307/sqoop?createDatabaseIfNotExist=true</value>
        <description>The connect string to use when connecting to a
          job-management metastore. If unspecified, uses ~/.sqoop/.
          You can specify a different path here.
        </description>
      </property>
      <property>
        <name>sqoop.metastore.client.autoconnect.username</name>
        <value>root</value>
        <description>The username to bind to the metastore.
        </description>
      </property>
      <property>
        <name>sqoop.metastore.client.autoconnect.password</name>
        <value>root</value>
        <description>The password to bind to the metastore.
        </description>
      </property>
    <property>
        <name>sqoop.metastore.client.record.password</name>
        <value>true</value>
        <description>If true, allow saved passwords in the metastore.
        </description>
      </property>
      <property>
        <name>sqoop.jobbase.serialize.sqoopoptions</name>
        <value>true</value>
        <description>If true, then all options will be serialized into job.xml
        </description>
      </property>
      <property>
        <name>sqoop.metastore.server.location</name>
        <value>/tmp/sqoop-metastore/shared.db</value>
        <description>Path to the shared metastore database files.
        If this is not set, it will be placed in ~/.sqoop/.
        </description>
      </property>
      <property>
        <name>sqoop.metastore.server.port</name>
        <value>16000</value>
        <description>Port that this metastore should listen on.
        </description>
      </property>
    
    10.启动metastore
    sqoop metastore &
    
    11.运行sqoop job --list 报错
    java.io.IOException: Invalid metadata version
    解决办法:
    进入mysql的sqoop数据库中,自动生成了SQOOP_ROOT表
    在表中插入:
    INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0');
    再次运行就不会报错
    
    12.创建job过程又报错,缺少jar包
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-beanutils-1.7.0.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-collections-3.2.2.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-lang-2.6.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-logging-1.2.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/json-20070829.jar /home/hadoop/sqoop/lib/
    
    
    13.创建import的job作业
    sqoop job --create myjob1 
        -- import 
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
        --username root 
        --password root 
        --target-dir /sqoop/zhilian 
        --table zhiliancookies 
        --m 1
    
    14.执行作业
    sqoop job --exec myjob1
    
    15.代码定制
    sqoop codegen --connect jdbc:mysql://192.168.15.45:3307/a_spider_test --username root --password root --table post

     转自:https://www.cnblogs.com/zhangkaipc/p/11890858.html

  • 相关阅读:
    Hive on Spark
    Mongodb添加副本及修改优先级
    RabbitMQ与Spring集成
    最简单的图文教程,几步完成Git的公私钥配置
    Idea Ant 打开发包
    Spring Web 项目Junit测试报错问题
    阿里云maven仓库地址,速度提升100倍
    Spring boot 学习
    JAVA开发常用工具包
    从无到有搭建SSM框架
  • 原文地址:https://www.cnblogs.com/javalinux/p/14923428.html
Copyright © 2011-2022 走看看