zoukankan      html  css  js  c++  java
  • Sqoop_mysql,hive,hdfs导入导出操作

    前言: 搭建环境,这里使用cdh版hadoop+hive+sqoop+mysql

    下载 hadoop-2.5.0-cdh5.3.6.tar.gz
       hive-0.13.1-cdh5.3.6.tar.gz
       sqoop-1.4.5-cdh5.3.6.tar.gz
    配置 Hadoop
       *.env(3个)--jdk_Path
       core-sit.xml
        fs.defaultFS
        hadoop.tmp.dir
       hdfs-site.xml 
        dfs.replication
        mapred-site.xml
        mapreduce.framework.name--yarn     mapreduce.jobhistory.address # 10020     mapreduce.jobhistory.webapp.address # 19888    yarn-site.xml     yarn.resourcemanager.hostname     yarn.nodemanager.aux-services--mapreduce_shuffle     yarn.log-aggregation-enable--true     yarn.log-aggregation.retain-seconds--108600    slave     主机地址 PS: 格式化namenode,启动hdfs与yarn    $ bin/hdfs dfs -mkdir /tmp    $ bin/hdfs dfs -mkdir -p /user/hive/warehouse    $ bin/hdfs dfs -chmod g+w /tmp    $ bin/hdfs dfs -chmod g+w /user/hive/warehouse 配置Hive    hive-env.sh     HADOOP_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6     export HIVE_CONF_DIR=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/conf     hive-log4j.properties     hive.log.threshold=ALL     hive.root.logger=INFO,DRFA     hive.log.dir=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs     hive.log.file=hive.log    hive-site.xml # 事先将mysql部署好     javax.jdo.option.ConnectionURL--jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/chd_metastore?createDatabaseIfNotExist=true     javax.jdo.option.ConnectionDriverName--com.mysql.jdbc.Driver     javax.jdo.option.ConnectionUserNam     javax.jdo.option.ConnectionPassword     hive.cli.print.header--true     hive.cli.print.current.db--true     hive.fetch.task.conversion--more PS: hive目录下    $ mkdir logs 将准备好的mysql.jar包放入lib    启动 $ bin/hive 配置Sqoop    sqoop-env.sh     export HADOOP_COMMON_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6     export HADOOP_MAPRED_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6     export HIVE_HOME=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6 将准备好的mysql.jar包放入lib

    一、准备数据

    # 在我的mysql下创建数据库和表,并插入几条数据
      mysql> create database if not exists student default character set utf8 collate utf8_general_ci;
      mysql> use student;
      mysql> create table if not exists stu_info( id int(10) primary key not null auto_increment, name varchar(20) not null) default character set utf8 collate utf8_general_ci;
      mysql> insert into stu_info(name) values("李建");
      mysql> insert into stu_info(name) values("张明");
      mysql> insert into stu_info(name) values("赵兴");
      mysql> insert into stu_info(name) values("陈琦");
      mysql> insert into stu_info(name) values("刘铭");
      mysql> select id,name from stu_info;
      +----+--------+
      | id | name   |
      +----+--------+
      |  1 | 李建   |
      |  2 | 张明   |
      |  3 | 赵兴   |
      |  4 | 陈琦   |
      |  5 | 刘铭   |
      +----+--------+
      5 rows in set (0.00 sec)
    

    二、使用sqoop将mysql中的这张表导入到hdfs上

    bin/sqoop import 
    --connect 
    jdbc:mysql://10.0.0.108:3306/student 
    --username root 
    --password root 
    --table stu_info 
    --target-dir /student 
    --num-mappers 1 
    --fields-terminated-by '	'
    

    三、使用sqoop将mysql中的这张表导入到hive

    方式一、
    1. 在hive中创建数据库和表
        create database if not exists student;
        create table if not exists stu_info(id int,name string) row format delimited fields terminated by '	';
    2. bin/sqoop import 
        --connect jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/student 
        --username root --password root 
        --table stu_info 
        --delete-target-dir 
        --target-dir /user/hive/warehouse/student.db/stu_info 
        --hive-import 
        --hive-database student 
        --hive-table stu_info 
        --hive-overwrite 
        --num-mappers 1 
        --fields-terminated-by '	'
    方式二、
    1. 使用sqoop create-hive-table,但必须创建出自定义数据库,否则目标路径将是元数据库
    2. bin/sqoop create-hive-table 、
        --connect jdbc:mysql://10.0.0.108:3306/student 、
        --username root --password root 
        --table stu_info 
        --hive-table student.stu_info
    3. bin/sqoop import --connect jdbc:mysql://10.0.0.108:3306/student 
        --username root --password root 
        --table stu_info 
        --hive-import 
        --hive-database student 
        --hive-table stu_info 
        --hive-overwrite  
        --num-mappers 1 
        --fields-terminated-by '	' 
        --delete-target-dir 
        --target-dir /user/hive/warehouse/student.db/stu_info
    4. 在hive中查询会发现数据全部为NULL
        但是从hdfs上查看却是正常的,确定hive无法解析数据,定位在分隔符问题
        使用--fields-terminated-by '01' 即可  # 01就是ctrl+A,hive默认分隔符,mysql默认分隔符为","
    

    五、从hdfs或hive导出数据到mysql表

    1. 在mysql上准备好数据库和表
    2. 数据库我就直接使用student数据库
        create table if not exists stu_info_export like stu_info;
    3. 根据hdfs/hive表数据分隔符为主
        bin/sqoop export 
        --connect jdbc:mysql://10.0.0.108/student 
        --username root --password root 
        --table stu_info_export 
        --export-dir /user/hive/warehouse/student.db/stu_info 
        --num-mappers 1 
        --input-fields-terminated-by '01'
    

    六、sqoop --option-file

    另外 企业级增量迁移数据使用 --option-file + shell脚本
    -- $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST
    -- $ sqoop --options-file /users/homer/work/import.txt --table TEST
    注意:脚本格式开头直接导入导出命令然后一行一个属性,如:
    -->import
       --connect
       jdbc:mysql://localhost/db
       --username
       foo

    七、使用sqoop job

    $ bin/sqoop job --delete <job_id>
    $ bin/sqoop job --list 
    $ bin/sqoop job --show <job_id>
    $ bin/sqoop job --exec <job_id>
    $ bin/sqoop job --create job_id -- <job-info>
    $ bin/sqoop job --create stu_info -- 
     import 
    --connect 
    jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/sqoop 
     --username root 
    --password root 
    --table tohdfs  
    --target-dir /sqoop 
    --num-mappers 1 
    --fields-terminated-by '	' 
    --check-column id  
    --incremental append  
    --last-value 11
    PS: 增量导入(与--delete-target-dir冲突)
        --check-column id 
        --incremental append/lastmodified(时间戳的更改) 
        --last-value 11
    另外:
        --columns field1,field2,field3     
        --query <ql> # 需要加 $CONDITIONS,且不能和--table连用
        --where <where xxx> # 无需加$CONDITIONS
  • 相关阅读:
    安卓开发_浅谈TimePicker(时间选择器)
    eclipse显示代码行数
    Java数据解析---JSON
    Java数据解析---PULL
    Java数据解析---SAX
    统计机器学习(目录)
    FP Tree算法原理总结
    梯度下降(Gradient Descent)小结
    用scikit-learn和pandas学习线性回归
    用scikit-learn学习BIRCH聚类
  • 原文地址:https://www.cnblogs.com/eRrsr/p/6050902.html
Copyright © 2011-2022 走看看