Hadoop课程
2.1 初始设置
初始环境这里平台已设置好,同学们需要了解一下如何设置。
1. 修改主机名,以master节点为例
[ec2-user@ip-172-31-32-47 ~]$ sudo vi /etc/hostname
#在里面删去所有内容,在首行添加 master作为自己新的主机名。
#重启虚拟机,使配置生效
[ec2-user@ip-172-31-32-47 ~]$ sudo reboot
2. 修改hosts映射,以master节点为例
#查看所有节点的IP
[ec2-user@master ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.31.32.47 netmask 255.255.240.0 broadcast 172.31.47.255
inet6 fe80::8b2:80ff:fe01:e5c2 prefixlen 64 scopeid 0x20<link>
ether 0a:b2:80:01:e5:c2 txqueuelen 1000 (Ethernet)
RX packets 3461 bytes 687720 (671.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3262 bytes 544011 (531.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[ec2-user@slave1 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.31.36.81 netmask 255.255.240.0 broadcast 172.31.47.255
inet6 fe80::87d:36ff:fe72:bc0c prefixlen 64 scopeid 0x20<link>
ether 0a:7d:36:72:bc:0c txqueuelen 1000 (Ethernet)
RX packets 2195 bytes 543199 (530.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2178 bytes 361053 (352.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[ec2-user@slave2 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.31.46.142 netmask 255.255.240.0 broadcast 172.31.47.255
inet6 fe80::850:68ff:fe8c:6c5e prefixlen 64 scopeid 0x20<link>
ether 0a:50:68:8c:6c:5e txqueuelen 1000 (Ethernet)
RX packets 2284 bytes 547630 (534.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2241 bytes 375782 (366.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
#以IP 主机名格式写道hosts文件中
[ec2-user@master ~]$ sudo vi /etc/hosts
#查看修改结果,注意:所有节点都要修改hosts文件
[ec2-user@master ~]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost6 localhost6.localdomain6
172.31.32.47 master
172.31.36.81 slave1
172.31.46.142 slave2
2.2 安装Java环境
我们先来了解一下为什么要安装JDK,JDK是 Java 语言的软件开发工具包,提供给程序员使用。主要用于移动设备、嵌入式设备上的java应用程序。JDK是整个java开发的核心,它包含了JAVA的运行环境(JVM+Java系统类库)和JAVA工具。
1. 解压jdk1.8
#将jdk解压到指定路径
[ec2-user@master ~]$ sudo tar -zxvf hadoop/jdk-8u144-linux-x64.tar.gz -C /usr/local/src/
#查看目标目录下是否有jdk解压包
[ec2-user@master ~]$ ls /usr/local/src/
jdk1.8.0_144
2. 重命名为jdk
[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
jdk1.8.0_144
[ec2-user@master src]$ sudo mv jdk1.8.0_144/ jdk
[ec2-user@master src]$ ls
jdk
3. 添加环境变量(所有节点)–以master为例
[ec2-user@master src]$ sudo vi /etc/profile
#在文件末尾添加如下内容
export JAVA_HOME=/usr/local/src/jdk
export PATH=$PATH:$JAVA_HOME/bin
#刷新环境变量
[ec2-user@master src]$ source /etc/profile
4. 查看jdk版本,验证是否安装成功
[ec2-user@master src]$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
5. 修改权限(所有节点,以master为例)
因为我们的实验是采用普通用户执行的,但是/usr/local/src/目录需要root权限才能操作,如果不修改权限,在分发文件时会显示权限不足。
[ec2-user@master ~]$ ll /usr/local/
total 0
drwxr-xr-x 2 root root 6 Apr 9 2019 bin
drwxr-xr-x 2 root root 6 Apr 9 2019 etc
drwxr-xr-x 2 root root 6 Apr 9 2019 games
drwxr-xr-x 2 root root 6 Apr 9 2019 include
drwxr-xr-x 2 root root 6 Apr 9 2019 lib
drwxr-xr-x 2 root root 6 Apr 9 2019 lib64
drwxr-xr-x 2 root root 6 Apr 9 2019 libexec
drwxr-xr-x 2 root root 6 Apr 9 2019 sbin
drwxr-xr-x 5 root root 49 Mar 4 20:51 share
drwxr-xr-x 4 root root 31 Mar 19 06:54 src
#把/usr/local/src/目录和子文件夹的所属用户以及所属组设置为ec2-user用户
[ec2-user@master ~]$ sudo chown -R ec2-user:ec2-user /usr/local/src/
#再次查看/usr/local/src/目录所属用户以及所属组
[ec2-user@master ~]$ ll /usr/local/
total 0
drwxr-xr-x 2 root root 6 Apr 9 2019 bin
drwxr-xr-x 2 root root 6 Apr 9 2019 etc
drwxr-xr-x 2 root root 6 Apr 9 2019 games
drwxr-xr-x 2 root root 6 Apr 9 2019 include
drwxr-xr-x 2 root root 6 Apr 9 2019 lib
drwxr-xr-x 2 root root 6 Apr 9 2019 lib64
drwxr-xr-x 2 root root 6 Apr 9 2019 libexec
drwxr-xr-x 2 root root 6 Apr 9 2019 sbin
drwxr-xr-x 5 root root 49 Mar 4 20:51 share
drwxr-xr-x 4 ec2-user ec2-user 31 Mar 19 06:54 src
6. 远程分发到其他节点
[ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave1:/usr/local/src/
[ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave2:/usr/local/src/
2.3 安装Hadoop集群
1. 解压
[ec2-user@master src]$tar -zxvf /home/ec2-user/hadoop/hadoop-2.9.1.tar.gz -C /usr/local/src/
[ec2-user@master src]$ ls
hadoop-2.9.1 jdk
2. 重命名为Hadoop
[ec2-user@master src]$ pwd
/usr/local/src
[ec2-user@master src]$ mv hadoop-2.9.1/ hadoop
[ec2-user@master src]$ ls
hadoop jdk
3. 添加环境变量(所有节点)–以master为例
[ec2-user@master ~]$ sudo vi /etc/profile
#在文件末尾添加如下内容
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=/usr/local/src/hadoop/lib/*
#刷新环境变量
[ec2-user@master ~]$ source /etc/profile
4. 修改core-site.xml配置文件
[ec2-user@master ~]$ cd /usr/local/src/hadoop/etc/hadoop/
[ec2-user@master hadoop]$ vi core-site.xml
在<configuration></configuration>
标签中添加如下内容:
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/src/hadoop/tmp</value>
</property>
5. 修改hdfs-site.xml配置文件
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi hdfs-site.xml
在<configuration></configuration>
标签中添加如下内容:
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配s置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/src/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/src/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
6. 修改yarn-site.xml配置文件
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi yarn-site.xml
在<configuration></configuration>
标签中添加如下内容:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
7. 修改mapred-site.xml配置文件
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
[ec2-user@master hadoop]$ vi mapred-site.xml
在<configuration></configuration>
标签中添加如下内容:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
8. 修改hadoop-env.sh配置文件
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi hadoop-env.sh
配置jdk路径:
export JAVA_HOME=/usr/local/src/jdk
注意:要根据自己路径来修改。
9. 修改slaves配置文件
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi slaves
[ec2-user@master hadoop]$ cat slaves
slave1
slave2
10. 远程分发到其他节点
[ec2-user@master hadoop]$ cd /usr/local/src/
[ec2-user@master src]$ scp -r hadoop/ slave1:/usr/local/src/
[ec2-user@master src]$ scp -r hadoop/ slave2:/usr/local/src/
11. 在namenode节点格式化namenode
[ec2-user@master src]$ hdfs namenode -format
12. 启动hadoop集群
#在namenode节点启动Hadoop集群
[ec2-user@master src]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
The authenticity of host 'master (172.31.32.47)' can't be established.
ECDSA key fingerprint is SHA256:Tueyo4xR8lsxmdA11GlXAO3w44n6T75dYHe9flk8Y70.
ECDSA key fingerprint is MD5:22:9b:6d:f2:f3:11:a2:6d:4d:dd:ec:25:56:3b:2d:b2.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,172.31.32.47' (ECDSA) to the list of known hosts.
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-namenode-master.out
slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave1.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-secondarynamenode-slave1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave2.out
#jps查看进程
[ec2-user@master src]$ jps
31522 Jps
31256 ResourceManager
30973 NameNode
[ec2-user@master src]$ ssh slave1
Last login: Fri Mar 19 06:15:47 2021 from 219.153.251.37
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
[ec2-user@slave1 ~]$ jps
29424 DataNode
29635 NodeManager
29544 SecondaryNameNode
29789 Jps
[ec2-user@slave1 ~]$ ssh slave2
Last login: Fri Mar 19 06:15:57 2021 from 219.153.251.37
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
[ec2-user@slave2 ~]$ jps
29633 Jps
29479 NodeManager
29354 DataNode
13. 查看hadoop集群状态
[ec2-user@master ~]$ hdfs dfsadmin -report
Configured Capacity: 17154662400 (15.98 GB)
Present Capacity: 11389693952 (10.61 GB)
DFS Remaining: 11389685760 (10.61 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 172.31.36.81:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 8577331200 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2882510848 (2.68 GB)
DFS Remaining: 5694816256 (5.30 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Mar 19 07:45:06 UTC 2021
Last Block Report: Fri Mar 19 07:41:00 UTC 2021
Name: 172.31.46.142:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 8577331200 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2882457600 (2.68 GB)
DFS Remaining: 5694869504 (5.30 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Mar 19 07:45:06 UTC 2021
Last Block Report: Fri Mar 19 07:41:00 UTC 2021
2.4 安装Hive
1. 安装MySQL
在安装hive前我们需要先安装MySQL数据库,用来存储hive的元数据。
1)下载mysql源安装包
[ec2-user@master ~]$ sudo wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
2)安装mysql源
[ec2-user@master ~]$ sudo yum localinstall mysql57-community-release-el7-8.noarch.rpm
3)检查mysql源是否安装成功
[ec2-user@master ~]$ sudo yum repolist enabled | grep "mysql.*.community.*"
mysql-connectors-community/x86_64 MySQL Connectors Community 146+39
mysql-tools-community/x86_64 MySQL Tools Community 123
mysql57-community/x86_64 MySQL 5.7 Community Server 484
4)安装MySQL
[ec2-user@master ~]$ sudo yum install mysql-community-server
5)启动MySQL服务并查看运行状态
[ec2-user@master ~]$ sudo systemctl start mysqld
[ec2-user@master ~]$ sudo systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-03-19 07:56:43 UTC; 1s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 31978 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
Process: 31927 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 31981 (mysqld)
CGroup: /system.slice/mysqld.service
└─31981 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
Mar 19 07:56:39 master systemd[1]: Starting MySQL Server...
Mar 19 07:56:43 master systemd[1]: Started MySQL Server.
6)查看mysql初始密码
[ec2-user@master ~]$ sudo grep "password" /var/log/mysqld.log
2021-03-19T07:56:41.030922Z 1 [Note] A temporary password is generated for root@localhost: v=OKXu0laSo;
7)修改mysql登陆密码
先把之前我们查看到的初始密码复制下来,在进入mysql需要输入密码时粘贴下来,回车,就可以进入MySQL命令行。
[ec2-user@master ~]$ sudo mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 9
Server version: 5.7.33
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql>
修改密码,设置MySQL登陆密码为1234:
mysql> set password for 'root'@'localhost'=password('1234');
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
由上可知,新密码设置的时候如果设置的过于简单会报错。
这时我们需要修改密码规则:
mysql> set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global validate_password_length=1;
Query OK, 0 rows affected (0.00 sec)
重新设置密码:
mysql> set password for 'root'@'localhost'=password('1234');
Query OK, 0 rows affected, 1 warning (0.00 sec)
8) 设置远程登陆
先退出MySQL,以新密码登陆MySQL。
[ec2-user@master ~]$ mysql -uroot -p1234
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 10
Server version: 5.7.33 MySQL Community Server (GPL)
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql>
创建用户:
mysql> create user 'root'@'172.%.%.%' identified by '1234';
Query OK, 0 rows affected (0.00 sec)
允许远程连接:
mysql> grant all privileges on *.* to 'root'@'172.%.%.%' with grant option;
Query OK, 0 rows affected (0.00 sec)
刷新权限:
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
至此,MySQL安装成功。
2. 把hive解压到指定位置
[ec2-user@master ~]$ tar -zxvf hadoop/apache-hive-1.1.0-bin.tar.gz -C /usr/local/src/
3. 重命名
[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
apache-hive-1.1.0-bin hadoop jdk
[ec2-user@master src]$ mv apache-hive-1.1.0-bin/ hive
[ec2-user@master src]$ ls
hadoop hive jdk
4. 添加环境变量
[ec2-user@master src]$ sudo vi /etc/profile
#在文件末尾添加如下内容
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/src/hive/lib/*
#刷新环境变量
[ec2-user@master src]$ source /etc/profile
5. 修改hive-site.xml配置文件
[ec2-user@master src]$ cd hive/conf/
#创建hive-site.xml文件
[ec2-user@master conf]$ touch hive-site.xml
[ec2-user@master conf]$ vi hive-site.xml
在hive-site.xml文件中添加如下内容:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>1234</value>
</property>
</configuration>
注意:MySQL密码要改成自己设置的密码。
6. 修改hive-env.sh配置文件
[ec2-user@master conf]$ pwd
/usr/local/src/hive/conf
[ec2-user@master conf]$ cp hive-env.sh.template hive-env.sh
[ec2-user@master conf]$ vi hive-env.sh
#在里面添加如下配置
export HADOOP_HOME=/usr/local/src/hadoop
export HIVE_CONF_DIR=/usr/local/src/hive/conf
7. 添加MySQL连接包
把MySQL驱动放到hive的lib目录下。
[ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $HIVE_HOME/lib
[ec2-user@master conf]$ ls $HIVE_HOME/lib/mysql-connector-java-5.1.44-bin.jar
/usr/local/src/hive/lib/mysql-connector-java-5.1.44-bin.jar
8. 启动Hadoop集群(hive需要hdfs分布式文件系统存储来数据)
如果Hadoop已启动,则不需要执行这一步。
start-all.sh
9. 初始化MySQL中的hive的数据库
[ec2-user@master conf]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 1.1.0
Initialization script hive-schema-1.1.0.mysql.sql
Initialization script completed
schemaTool completed
10. 启动hive并测试
[ec2-user@master conf]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-1.1.0.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.587 seconds, Fetched: 1 row(s)
至此,hive安装成功。
2.5 安装Sqoop
1. 解压
[ec2-user@master ~]$ tar -zxvf hadoop/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
2. 重命名为sqoop
[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
hadoop hive jdk sqoop-1.4.7.bin__hadoop-2.6.0
[ec2-user@master src]$ mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop
[ec2-user@master src]$ ls
hadoop hive jdk sqoop
3. 添加环境变量
[ec2-user@master src]$ sudo vi /etc/profile
#在里面添加如下代码
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
#刷新环境变量
[ec2-user@master src]$ source /etc/profile
4. 修改sqoop-env.sh配置文件
[ec2-user@master src]$ cd sqoop/conf/
[ec2-user@master conf]$ mv sqoop-env-template.sh sqoop-env.sh
[ec2-user@master conf]$ vi sqoop-env.sh
在里面修改一下配置项,根据自己的环境来修改:
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/src/hive
5. 把mysql驱动放到sqoop的lib目录下
[ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $SQOOP_HOME/lib[ec2-user@master conf]$ ls $SQOOP_HOME/lib/mysql-connector-java-5.1.44-bin.jar
/usr/local/src/sqoop/lib/mysql-connector-java-5.1.44-bin.jar
6. 验证sqoop是否配置成功
[ec2-user@master conf]$ sqoop help
Warning: /usr/local/src/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/src/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/03/19 08:53:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.