1 解压 apache-hive-1.2.1-bin.tar.gz 到/opt/module/目录下面
tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/module/
2.修改 apache-hive-1.2.1-bin.tar.gz 的名称为 hive
[root@hadoop01 module]# mv apache-hive-1.2.1-bin/ hive
3. 修改/opt/module/hive/conf 目录下的 hive-env.sh.template 名称为 hive-env.sh
mv hive-env.sh.template hive-env.sh
4.配置 hive-env.sh 文件
export HADOOP_HOME=/software/hadoop-2.7.1 export HIVE_CONF_DIR=/opt/module/hive/conf
5 .必须启动 hdfs 和 yarn
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh [atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh
6. 在 HDFS 上创建/tmp 和/user/hive/warehouse 两个目录并修改他们的同组权限可写
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir -p /user/hive/warehouse
7.更改目录权限
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse
8. Hive 基本操作
启动 hive
[atguigu@hadoop102 hive]$ bin/hive
查看数据库
hive> show databases;
打开默认数据库
hive> use default;
显示 default 数据库中的表
hive> show tables;
创建一张表
hive> create table student(id int, name string);
显示数据库中有几张表
hive> show tables;
查看表的结构
hive> desc student;
向表中插入数据
hive> insert into student values(1000,"ss");
hive> insert into student values(100,"ss"); Query ID = root_20200426160909_831fc215-2543-45f7-93a2-4867f3320f56 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1587886069426_0001, Tracking URL = http://hadoop02:8088/proxy/application_1587886069426_0001/ Kill Command = /software/hadoop-2.7.1/bin/hadoop job -kill job_1587886069426_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2020-04-26 16:09:16,861 Stage-1 map = 0%, reduce = 0% 2020-04-26 16:09:23,129 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.43 sec MapReduce Total cumulative CPU time: 1 seconds 430 msec Ended Job = job_1587886069426_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://hadoop01:9000/user/hive/warehouse/student/.hive-staging_hive_2020-04-26_16-09-09_979_3268714695468457370-1/-ext-10000 Loading data to table default.student Table default.student stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6] MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.43 sec HDFS Read: 3639 HDFS Write: 78 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 430 msec OK Time taken: 14.413 seconds
查询表中数据
hive> select * from student;
退出 hive
hive> quit;
将本地文件导入 Hive 案例
将本地/opt/module/data/student.txt 这个目录下的数据导入到 hive 的 student(id int, name string)表中。
1.数据准备
在/opt/module/data 这个目录下准备数据
在/opt/module/目录下创建 data
[atguigu@hadoop102 module]$ mkdir data
在/opt/module/datas/目录下创建 student.txt 文件并添加数据
[atguigu@hadoop102 datas]$ touch student.txt
[atguigu@hadoop102 datas]$ vi student.txt
1001 zhangshan
1002 lishi
1003 zhaoliu
注意以 tab 键间隔。
2.Hive 实际操作
(1)启动 hive
[atguigu@hadoop102 hive]$ bin/hive
(2)显示数据库
hive> show databases;
(3)使用 default 数据库
hive> use default;
(4)显示 default 数据库中的表
hive> show tables;
(5)删除已创建的 student 表
hive> drop table student;
(6)创建 student 表, 并声明文件分隔符’ ’
hive> create table student(id int, name string) ROW FORMAT
DELIMITED FIELDS TERMINATED
BY ' ';
(7)加载/opt/module/data/student.txt 文件到 student 数据库表中。
hive> load data local inpath '/opt/module/data/student.txt' into
table student;
(8)Hive 查询结果
hive> select * from student;
OK
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.266 seconds, Fetched: 3 row(s)
在建立一个txt文件
[root@hadoop01 data]# cp student.txt student1.txt
[root@hadoop01 data]# vim student1.txt
[root@hadoop01 data]# cp student.txt student1.txt
[root@hadoop01 data]# vim student1.txt
通过hadoop -fs命令 添加
[root@hadoop01 data]# hadoop fs -put student1.txt /user/hive/warehouse/stu
查看select
hive> select * from stu; OK 1001 zhangshan 1002 lishi 1003 zhaoliu 1004 zhangshan33 1005 lishi4444 1006 zhaoliu5555 Time taken: 0.043 seconds, Fetched: 6 row(s)
也就是说 load data local inpath '/opt/module/data/student.txt' into table student; 和
hadoop fs -put student1.txt /user/hive/warehouse/stu
这两个命令一样,都是添加数据往表里面. 用insert命令太少了,不建议使用.
在建立一个文件 上传到根目录
1077 cp student1.txt student2.txt 1079 hadoop fs -put student2.txt /
然后在导入 ,这个命令相当于移动了 ,根目录下面的这个student2.txt 移动了.
hive> load data inpath '/student2.txt' into table stu; Loading data to table default.stu Table default.stu stats: [numFiles=3, totalSize=137]
查看
hive> select * from stu; OK 1001 zhangshan 1002 lishi 1003 zhaoliu 1004 zhangshan33 1005 lishi4444 1006 zhaoliu5555 1004 zhangshan33 1005 lishi4444 1006 zhaoliu5555 Time taken: 0.31 seconds, Fetched: 9 row(s)
原因是,Metastore 默认存储在自带的 derby 数据库中,推荐使用 MySQL 存储 Metastore
只能开一个hive 连接 ,因为deby的原因
2.4 MySql 安装
[root@hadoop01 data]# rpm -qa |grep mysql
卸载
root@hadoop102 桌 面 ]# rpm -e --nodeps mysql-libs-5.1.73-7.el6.x86_64 清除依赖卸载
安装服务端
root@hadoop01 mysql]# ll 总用量 198612 -rw-r--r--. 1 root root 25381952 4月 26 17:20 mysql-community-client-5.7.26-1.el7.x86_64.rpm -rw-r--r--. 1 root root 173541272 4月 26 17:20 mysql-community-server-5.7.26-1.el7.x86_64.rpm -rw-r--r--. 1 root root 4452049 4月 24 15:46 mysql-connector-java-5.1.47.tar.gz
安装
983 rpm -ivh mysql-community-server-5.7.26-1.el7.x86_64.rpm --nodeps --force
查看临时密码
984 cat /root/.mysql_secret
或者更改
vim /etc/my.cnf
在my.ini文件末尾加上“skip-grant-tables”(取消权限设置)保存文件
不需要密码登录
mysql -uroot -p
设置密码重点
必须选择 use mysql 数据库否则会报错。
如下:
MySQL [(none)]> update user set authentication_string=passworD("root") where user='root';
ERROR 1046 (3D000):
选择use mysql后即可
MySQL [mysql]> update user set password=passworD("root") where user='root';
Query OK, 4 rows affected, 1 warning (0.01 sec)
flush privileges;
然后把 vim /etc/my.cnf
把skip-grant-tables去掉.
安装客户端;
[root@hadoop01 mysql]# rpm -ivh mysql-community-client-5.7.26-1.el7.x86_64.rpm --nodeps --force
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | hue | | metastore | | mysql | | nav_as | | nav_ms | | oozie | | performance_schema | | sentry | +--------------------+ 9 rows in set (0.00 sec)
mysql> select user,host,password from user; +------+-----------+-------------------------------------------+ | user | host | password | +------+-----------+-------------------------------------------+ | root | localhost | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B | | root | hadoop01 | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B | | root | 127.0.0.1 | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B | | root | ::1 | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B | | hive | % | *2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19 | +------+-----------+-------------------------------------------+
修改 user 表,把 Host 表内容修改为%
mysql>update user set host='%' where host='localhost';
删除 root 用户的其他 host
delete from user where Host='hadoop102';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
mysql> flush privileges
安装驱动MySQL-connector
tar -zxvf mysql-connector-java-5.1.47.tar.gz
.拷贝 mysql-connector-java-5.1.27-bin.jar 到/opt/module/hive/lib/ [root@hadoop102 mysql-connector-java-5.1.27]# cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib
配置 Metastore 到 到 MySql
1. 在/opt/module/hive/conf 目录下创建一个 hive-site.xml
[atguigu@hadoop102 conf]$ touch hive-site.xml [atguigu@hadoop102 conf]$ vi hive-site.xml
2.根据官方文档配置参数,拷贝数据到 hive-site.xml 文件中
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop22:3306/metastore?createDatabaseI fNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore
database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore
database</description>
</property>
</configuration>
3.配置完毕后,如果启动 hive 异常,可以重新启动虚拟机。(重启后,别忘了启
动 hadoop 集群)
2.5.3 多窗口启动 Hive 测试
1.先启动 MySQL
[atguigu@hadoop102 mysql-libs]$ mysql -uroot -proot
查看有几个数据库
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hue |
| metastore |
| mysql |
| nav_as |
| nav_ms |
| oozie |
| performance_schema |
| sentry |
+--------------------+
9 rows in set (0.00 sec)
2.再次打开多个窗口,分别启动 hive
[atguigu@hadoop102 hive]$ bin/hive
3.启动 hive 后,回到 MySQL 窗口查看数据库,显示增加了 metastore 数据库
2.6 HiveJDBC 访问
2.6.1 启动 hiveserver2 服务
[atguigu@hadoop102 hive]$ bin/hiveserver2
启动 beeline
[root@hadoop01 hive]# cd bin/ [root@hadoop01 bin]# ls beeline derby.log ext hive hive-config.sh hiveserver2 metastore_db metatool schematool [root@hadoop01 bin]# beeline Beeline version 1.2.1 by Apache Hive beeline>
2.9.1 Hive 数据仓库位置配置
1)Default 数据仓库的最原始位置是在 hdfs 上的:/user/hive/warehouse 路径下
2)在仓库目录下,没有对默认的数据库 default 创建文件夹。如果某张表属于 default
数据库,直接在数据仓库目录下创建一个文件夹。
3)修改 default 数据仓库原始位置(将 hive-default.xml.template 如下配置信息拷贝到
hive-site.xml 文件中)。
<property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property>
2.9.3 Hive 运行日志信息配置
1.Hive 的 log 默认存放在/tmp/atguigu/hive.log 目录下(当前用户名下)
2.修改 hive 的 log 存放日志到/opt/module/hive/logs
(1)修改/opt/module/hive/conf/hive-log4j.properties.template 文件名称为
hive-log4j.properties
[atguigu@hadoop102 conf]$ pwd
/opt/module/hive/conf
[atguigu@hadoop102 conf]$ mv hive-log4j.properties.template
hive-log4j.properties
(2)在 hive-log4j.properties 文件中修改 log 存放位置
hive.log.dir=/opt/module/hive/logs
查看目录与表
hive> dfs -ls /user/hive/warehouse/