zoukankan      html  css  js  c++  java
  • Sqoop笔记

    1.下载
    https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
    本次下载:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
    解压到/home/hadoop/sqoop
    拷贝sqoop-1.4.7.jar到/home/hadoop/sqoop/lib目录下
    
    2.配置conf/sqoop-env.sh
    export HADOOP_COMMON_HOME=/home/hadoop/hadoop
    export HADOOP_MAPRED_HOME=/home/hadoop/hadoop
    export HIVE_HOME=/home/hadoop/hive
    
    3.配置环境变量~/.bashrc
    验证sqoop
    sqoop version
    
    4.执行命名报错,主要是确实一些tar
    sqoop/lib
    sqoop-1.4.7.jar
    mysql-connector-java-5.1.39-bin.jar
    commons-lang3-3.3.2.jar
    avro-tools-1.8.1.jar 
    下载地址:https://maven.aliyun.com/mvn/search
    
    5.将mysql数据库中的表数据导入到HDFS指定目录
    bin/sqoop import 
    --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
    --username root 
    --password root 
    --target-dir /sqoop/post 
    --table post --m 1
    
    6.导入表数据子集
    sqoop import 
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
        --username root 
        --password root 
        --where "user_id>1" 
        --target-dir /sqoop/post2 
        --table post 
        --m 1
    
    7.增量导入
    sqoop import  
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
        --username root 
        --password root 
        --target-dir /sqoop/post 
        --table post 
        --m 1 
        --incremental append 
        --check-column id 
        --last-value 9
    
    hadoop fs -cat /sqoop/post/part-m-*
    
    8.hdfs导出数据到mysql
    sqoop export 
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test?characterEncoding=UTF-8 
        --username root 
        --password root 
        --table student 
        --m 1 
        --export-dir /user/hive/warehouse/jstudent
    
    characterEncoding=UTF-8解决导入到mysql的数据中文乱码问题。
    
    
    9.作业(job)环境配置
    配置mysql为metastore
    修改配置文件sqoop-site.xml,以下配置默认是被注释掉的,现在将它都打开:
      <property>
        <name>sqoop.metastore.client.autoconnect.url</name>
        <value>jdbc:mysql://192.168.15.45:3307/sqoop?createDatabaseIfNotExist=true</value>
        <description>The connect string to use when connecting to a
          job-management metastore. If unspecified, uses ~/.sqoop/.
          You can specify a different path here.
        </description>
      </property>
      <property>
        <name>sqoop.metastore.client.autoconnect.username</name>
        <value>root</value>
        <description>The username to bind to the metastore.
        </description>
      </property>
      <property>
        <name>sqoop.metastore.client.autoconnect.password</name>
        <value>root</value>
        <description>The password to bind to the metastore.
        </description>
      </property>
    <property>
        <name>sqoop.metastore.client.record.password</name>
        <value>true</value>
        <description>If true, allow saved passwords in the metastore.
        </description>
      </property>
      <property>
        <name>sqoop.jobbase.serialize.sqoopoptions</name>
        <value>true</value>
        <description>If true, then all options will be serialized into job.xml
        </description>
      </property>
      <property>
        <name>sqoop.metastore.server.location</name>
        <value>/tmp/sqoop-metastore/shared.db</value>
        <description>Path to the shared metastore database files.
        If this is not set, it will be placed in ~/.sqoop/.
        </description>
      </property>
      <property>
        <name>sqoop.metastore.server.port</name>
        <value>16000</value>
        <description>Port that this metastore should listen on.
        </description>
      </property>
    
    10.启动metastore
    sqoop metastore &
    
    11.运行sqoop job --list 报错
    java.io.IOException: Invalid metadata version
    解决办法:
    进入mysql的sqoop数据库中,自动生成了SQOOP_ROOT表
    在表中插入:
    INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0');
    再次运行就不会报错
    
    12.创建job过程又报错,缺少jar包
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-beanutils-1.7.0.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-collections-3.2.2.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-lang-2.6.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-logging-1.2.jar /home/hadoop/sqoop/lib/
    cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/json-20070829.jar /home/hadoop/sqoop/lib/
    
    
    13.创建import的job作业
    sqoop job --create myjob1 
        -- import 
        --connect jdbc:mysql://192.168.15.45:3307/a_spider_test 
        --username root 
        --password root 
        --target-dir /sqoop/zhilian 
        --table zhiliancookies 
        --m 1
    
    14.执行作业
    sqoop job --exec myjob1
    
    15.代码定制
    sqoop codegen --connect jdbc:mysql://192.168.15.45:3307/a_spider_test --username root --password root --table post
  • 相关阅读:
    FZU 2098 刻苦的小芳(卡特兰数,动态规划)
    卡特兰数总结
    FZU 1064 教授的测试(卡特兰数,递归)
    HDU 4745 Two Rabbits(区间DP,最长非连续回文子串)
    Java 第十一届 蓝桥杯 省模拟赛 正整数的摆动序列
    Java 第十一届 蓝桥杯 省模拟赛 反倍数
    Java 第十一届 蓝桥杯 省模拟赛 反倍数
    Java 第十一届 蓝桥杯 省模拟赛 反倍数
    Java 第十一届 蓝桥杯 省模拟赛 凯撒密码加密
    Java 第十一届 蓝桥杯 省模拟赛 凯撒密码加密
  • 原文地址:https://www.cnblogs.com/javalinux/p/14990418.html
Copyright © 2011-2022 走看看