1. 什么是sqoop sqoop是apache组织开源的一个hadoop的辅助工具 2. 作用: hdfs/hive ----- rdb
搭建Sqoop的开发环境
CDH版 Hadoop 和 Hive 1. 环境搭建 mac 防火墙 ip 主机名 映射 selinux jdk 2. 安装hadoop 1. 解压缩 2. hadoop-evn.sh core-site.xml <property> <name>fs.default.name</name> <value>hdfs://sqoop:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/install/hadoop-2.5.0-cdh5.3.6/data/tmp</value> </property> hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> slave 改为sqoop 3. 加压缩 cdh5.3.6-snappy-lib-natirve.tar.gz lib/native/* cp /opt/install/hadoop-2.5.0-cdh5.3.6/lib/native 命令: cp * /opt/install/hadoop-2.5.0-cdh5.3.6/lib/native/ 4. 格式化 bin/hdfs namenode -format 5. 启动进程 sbin/hadoop-daemon.sh start namenode sbin/yarn-daemon.sh start resourcemanager sbin/yarn-daemon.sh start nodemanager sbin/hadoop-daemon.sh start datanode 如下则启动成功 [root@sqoop hadoop-2.5.0-cdh5.3.6]# jps 1609 ResourceManager 2021 Jps 1858 NodeManager 1982 DataNode 1485 NameNode 3. 安装hive 1. 解压缩 2. 修改配置文件 hive-env.sh # Set HADOOP_HOME to point to a specific hadoop install directory HADOOP_HOME=/opt/install/hadoop-2.5.0-cdh5.3.6 # Hive Configuration Directory can be controlled by: export HIVE_CONF_DIR=/opt/install/hive-0.13.1-cdh5.3.6/conf hive-site.xml [metastore] <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop5.lhcedu.com:3306/cdhmetastore?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> 3. hdfs 创建2个目录 mp userhivewarehouse 4. hive_home/lib 导入 mysql jar包 5. 启动hive bin/hive
#sqoop安装 1. 解压缩Sqoop 2. 修改配置 sqoop_home/conf cp sqoop-env-template.sh sqoop-env.sh 修改conf/sqoop-env.sh export HADOOP_COMMON_HOME=/opt/install/hadoop-2.5.0-cdh5.3.6 export HADOOP_MAPRED_HOME=/opt/install/hadoop-2.5.0-cdh5.3.6 export HIVE_HOME=/opt/cdh5/hive-0.13.1-cdh5.3.6 3. mysql-connect.jar copy sqoop_home/lib 4. 测试sqoop是否正常使用 bin/sqoop list-databases -connect jdbc:mysql://sqoop:3306 -username root -password 1234456
命令换行使用
bin/sqoop list-databases --connect jdbc:mysql://sqoop:3306 --username root --password 1234456
import (mysql 导入数据 hdfs)
#mysql 创建数据库 创建表 create database sqoop create table mysql_user( id int primary key, name varchar(12) ); insert into mysql_user values (1,'lhc1'); insert into mysql_user values (2,'lhc2'); insert into mysql_user values (3,'lhc3'); insert into mysql_user values (4,'lhc4'); insert into mysql_user values (5,'lhc5');
基本导入
bin/sqoop import --connect jdbc:mysql://sqoop:3306/sqoop --username root --password 1234456 --table mysql_user #hdfs上传的默认位置 /user/root/mysql_user #默认情况 5行数据 5split 5map #没有reduce #数据默认是以,分割列 1,suns1
指定hdfs中的导入目录
bin/sqoop import --connect jdbc:mysql://sqoop:3306/sqoop --username root --password 1234456 --table mysql_user --target-dir /sqoop
删除hdfs中已经存在的目录
bin/sqoop import --connect jdbc:mysql://sqoop:3306/sqoop --username root --password 1234456 --table mysql_user --target-dir /sqoop --delete-target-dir
修改默认的map数量
bin/sqoop import --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table mysql_user --target-dir /sqoop --delete-target-dir --num-mappers 1
修改默认的列分割符
bin/sqoop import --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table mysql_user --target-dir /sqoop --delete-target-dir --num-mappers 1 --fields-terminated-by ' '
快速导入
bin/sqoop import --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table mysql_user --target-dir /sqoop --delete-target-dir --num-mappers 1 --fields-terminated-by ' ' --direct #快速模式 sqoop必须和mysql安装在同一个机器上 #如果集群环境下 没有sqoop与mysql安装在一起,那么需要在所有节点 赋值mysql/bin目录
增加导入数据
# 增量导入
--check-column <column> Source column to check for incremental id
--last-value <value> Last imported value in the incremental 5
--incremental <import-type> Define an incremental import of type append
'append' or 'lastmodified'
#注意 不能引入 --delete-target-dir
bin/sqoop import
--connect
jdbc:mysql://hadoop6:3306/sqoop
--username root
--password 123456
--table mysql_user
--target-dir /sqoop
--num-mappers 1
--fields-terminated-by ' '
--direct
--check-column id
--last-value 5
--incremental append
export (hdfs导出数据到数据库)
先确定数据库有表 create table to_mysql( id int primary key, name varchar(12) ); bin/sqoop export --connect jdbc:mysql://sqoop:3306/sqoop --username root --password 1234456 --table to_mysql --export-dir /sqoop --num-mappers 1 --input-fields-terminated-by ','
Hive Import
--hive-import --hive-database lhc129 --hive-table t_user bin/sqoop import --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table mysql_user --delete-target-dir --hive-import --hive-database lhc_140 --hive-table t_user --num-mappers 1 --fields-terminated-by ' '
Hive Export
bin/sqoop export --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table hive_mysql --export-dir /user/hive/warehouse/lhc_140.db/t_user --num-mappers 1 --input-fields-terminated-by ' '
脚本化的Sqoop
存在一个独立的文件中
create table filetomysql( id int, name varchar(12) ); 1. 创建一个Sqoop文件 普通文件 sqoop.file export --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table filetomysql --export-dir /sqoop --num-mappers 1 --input-fields-terminated-by ' ' 2. 执行文件 bin/sqoop --options-file /root/sqoop.file
定义Sqoop的作业
1. 创建作业 bin/sqoop job --create test_job1 -- export --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password 123456 --table filetomysql --export-dir /sqoop --num-mappers 1 --input-fields-terminated-by ' ' 2. 使用作业 bin/sqoop job --exec test_job1 3. 问题 每一次输入密码 不利于自动化处理 所以定密码的存储文件 echo -n "123456" >> /root/password bin/sqoop job --create test_job2 -- export --connect jdbc:mysql://hadoop6:3306/sqoop --username root --password-file file:///root/password --table filetomysql --export-dir /sqoop --num-mappers 1 --input-fields-terminated-by ' ' bin/sqoop job --exec test_job2
定时任务
1. 安装crontab yum -y install vixie-cron 2. crontab -e 编辑配置文件 minute hour day month week command */5 * * * * command */1 * * * * echo 'suns' >> /root/sunshuai */1 * * * * /opt/install/sqoop-1.4.5-cdh5.3.6/bin/sqoop job --exec test_job2 * 24 * * * /opt/install/sqoop-1.4.5-cdh5.3.6/bin/sqoop job --exec test_job2 3. 启动关闭服务 /sbin/service crond start /sbin/service crond stop ps -ef | grep cron