HiveQL是一种SQL语言,但缺少udpate和insert类型操作时的行,列或者查询级别的锁支持,hadoop文件通常是一次写入(支持有限的文件追加功能),hadoop和hive都是多用户系统,锁和协调是非常有用的。所有锁必须有单独的系统进行协调。
hive包含了一个使用 apache zookeeper进行锁定的锁功能。zookeeper实现了高度可靠的分布式协调功能。zookeeper对于hive用户是透明的。
zookeeper ['zukipɚ]
zookeeper伪集群模式安装配置如下:
下载
wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.12/zookeeper-3.4.12.tar.gz
解压
tar -zxf zookeeper-3.4.12.tar.gz -C /root/
[root@host zookeeper-3.4.12]# pwd
/root/zookeeper-3.4.12
创建文件夹serverlist,并创建三个子文件夹
[root@host serverlist]# pwd
/root/zookeeper-3.4.12/serverlist
[root@host serverlist]# ls
server1 server2 server3
在serverlist的每个子文件夹中分别创建data 以及logs两个文件夹:
[root@host server1]# pwd
/root/zookeeper-3.4.12/serverlist/server1
[root@host server1]# ls
data logs
配置
将zoo_sample.cfg文件复制4份,zoo.cfg,zoo1.cfg zoo2.cfg zoo3.cfg
zoo1.cfg 如下:
[root@host conf]# cat zoo1.cfg
tickTime=2000
dataDir=/root/zookeeper-3.4.12/serverlist/server1/data
dataLogDir=/root/zookeeper-3.4.12/serverlist/server1/logs
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.53.122:2888:3888
server.2=192.168.53.122:4888:5888
server.3=192.168.53.122:6888:7888
zoo2.cfg 如下:
[root@host conf]# cat zoo2.cfg
tickTime=2000
dataDir=/root/zookeeper-3.4.12/serverlist/server2/data
dataLogDir=/root/zookeeper-3.4.12/serverlist/server2/logs
clientPort=2182
initLimit=5
syncLimit=2
server.1=192.168.53.122:2888:3888
server.2=192.168.53.122:4888:5888
zoo3.cfg 如下:
[root@host conf]# cat zoo3.cfg
tickTime=2000
dataDir=/root/zookeeper-3.4.12/serverlist/server3/data
dataLogDir=/root/zookeeper-3.4.12/serverlist/server3/logs
clientPort=2183
initLimit=5
syncLimit=2
server.1=192.168.53.122:2888:3888
server.2=192.168.53.122:4888:5888
server.3=192.168.53.122:6888:7888
注:
tickTime:基本事件单元,以毫秒为单位,这个时间作为 Zookeeper 服务器之间或客户端之间维持心跳的时间间隔
dataDir:存储内存中数据库快照的位置,顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存到这个目录里
clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求
initLimit:这个配置项是用来配置 Zookeeper 接受客户端初始化连接时最长能忍受多少个心跳时间间隔,当已经超过 10 个心跳的时间也就是(ticktime)长度后 Zookeeper 服务器还没有收到客户端的返回信息,那么表明这个客户端连接失败,总的时间长度就是:10*2000 = 20s
syncLimit:这个配置项表示 Leader 与 Follower 之间发送消息,请求和应答时间长度,最长不能超过多少个 tickTime 的时间长度,总的时间长度就是:5*2000 = 10s
server.A=B:C:D:其中 A 是一个数字,表示这个是第几号服务器;B 是这个服务器的 ip 地址;C 表示的是这个服务器与集群中的 Leader 服务器交换信息的端口;D 表示的是万一集群中的 Leader 服务器挂了,需要一个端口来重新进行选举,选出一个新的 Leader,而这个端口就是用来执行选举时服务器相互通信的端口。如果是伪集群的配置方式,由于 B 都是一样,所以不同的 Zookeeper 实例通信端口号不能一样,所以要给它们分配不同的端口号。
[root@host serverlist]# echo 1 > server1/data/myid
[root@host serverlist]# echo 2 > server2/data/myid
[root@host serverlist]# echo 3 > server3/data/myid
启动
[root@host serverlist]# ../bin/zkServer.sh start zoo1.cfg
[root@host serverlist]# ../bin/zkServer.sh start zoo2.cfg
[root@host serverlist]# ../bin/zkServer.sh start zoo3.cfg
查看状态(leader产生有随机性)
leader:负责客户端的 writer 类型请求
Follower:负责客户端 reader 类型请求,参与 leader 选举
Observer:特殊的“Follower”,其可以接收客户端 reader 请求,但不参与选举。(扩容系统支撑能力,提高读取速度)因为他不接受任何同步的写入请求,只负责 leader 同步数据
[root@host serverlist]# ../bin/zkServer.sh status zoo1.cfg
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.12/bin/../conf/zoo1.cfg
Mode: follower
[root@host serverlist]# ../bin/zkServer.sh status zoo2.cfg
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.12/bin/../conf/zoo2.cfg
Mode: leader
[root@host serverlist]# ../bin/zkServer.sh status zoo3.cfg
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.12/bin/../conf/zoo3.cfg
Mode: follower
链接测试:
[root@host serverlist]# ../bin/zkCli.sh -server 192.168.53.122:2181
Connecting to 192.168.53.122:2181 2018-05-09 17:27:52,673 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT 2018-05-09 17:27:52,675 [myid:] - INFO [main:Environment@100] - Client environment:host.name=host 2018-05-09 17:27:52,675 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_101 2018-05-09 17:27:52,677 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2018-05-09 17:27:52,677 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_101/jre 2018-05-09 17:27:52,677 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/root/zookeeper-3.4.12/bin/../build/classes:/root/zookeeper-3.4.12/bin/../build/lib/*.jar:/root/zookeeper-3.4.12/bin/../lib/slf4j-log4j12-1.7.25.jar:/root/zookeeper-3.4.12/bin/../lib/slf4j-api-1.7.25.jar:/root/zookeeper-3.4.12/bin/../lib/netty-3.10.6.Final.jar:/root/zookeeper-3.4.12/bin/../lib/log4j-1.2.17.jar:/root/zookeeper-3.4.12/bin/../lib/jline-0.9.94.jar:/root/zookeeper-3.4.12/bin/../lib/audience-annotations-0.5.0.jar:/root/zookeeper-3.4.12/bin/../zookeeper-3.4.12.jar:/root/zookeeper-3.4.12/bin/../src/java/lib/*.jar:/root/zookeeper-3.4.12/bin/../conf: 2018-05-09 17:27:52,677 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/root/hadoop/hadoop-2.7.4/lib/native/::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2018-05-09 17:27:52,677 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2018-05-09 17:27:52,677 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2018-05-09 17:27:52,678 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2018-05-09 17:27:52,678 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2018-05-09 17:27:52,678 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-431.el6.x86_64 2018-05-09 17:27:52,678 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root 2018-05-09 17:27:52,678 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root 2018-05-09 17:27:52,678 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/root/zookeeper-3.4.12/serverlist 2018-05-09 17:27:52,679 [myid:] - INFO [main:ZooKeeper@441] - Initiating client connection, connectString=192.168.53.122:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29 Welcome to ZooKeeper! 2018-05-09 17:27:52,695 [myid:] - INFO [main-SendThread(192.168.53.122:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server 192.168.53.122/192.168.53.122:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2018-05-09 17:27:52,760 [myid:] - INFO [main-SendThread(192.168.53.122:2181):ClientCnxn$SendThread@878] - Socket connection established to 192.168.53.122/192.168.53.122:2181, initiating session [zk: 192.168.53.122:2181(CONNECTING) 0] 2018-05-09 17:27:52,810 [myid:] - INFO [main-SendThread(192.168.53.122:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server 192.168.53.122/192.168.53.122:2181, sessionid = 0x100877925070001, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 192.168.53.122:2181(CONNECTED) 0] [zk: 192.168.53.122:2181(CONNECTED) 0]
[root@host serverlist]# ../bin/zkCli.sh -server 192.168.53.122:2182
Connecting to 192.168.53.122:2182 2018-05-09 17:28:53,602 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT 2018-05-09 17:28:53,605 [myid:] - INFO [main:Environment@100] - Client environment:host.name=host 2018-05-09 17:28:53,605 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_101 2018-05-09 17:28:53,607 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2018-05-09 17:28:53,607 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_101/jre 2018-05-09 17:28:53,607 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/root/zookeeper-3.4.12/bin/../build/classes:/root/zookeeper-3.4.12/bin/../build/lib/*.jar:/root/zookeeper-3.4.12/bin/../lib/slf4j-log4j12-1.7.25.jar:/root/zookeeper-3.4.12/bin/../lib/slf4j-api-1.7.25.jar:/root/zookeeper-3.4.12/bin/../lib/netty-3.10.6.Final.jar:/root/zookeeper-3.4.12/bin/../lib/log4j-1.2.17.jar:/root/zookeeper-3.4.12/bin/../lib/jline-0.9.94.jar:/root/zookeeper-3.4.12/bin/../lib/audience-annotations-0.5.0.jar:/root/zookeeper-3.4.12/bin/../zookeeper-3.4.12.jar:/root/zookeeper-3.4.12/bin/../src/java/lib/*.jar:/root/zookeeper-3.4.12/bin/../conf: 2018-05-09 17:28:53,607 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/root/hadoop/hadoop-2.7.4/lib/native/::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-431.el6.x86_64 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root 2018-05-09 17:28:53,608 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/root/zookeeper-3.4.12/serverlist 2018-05-09 17:28:53,609 [myid:] - INFO [main:ZooKeeper@441] - Initiating client connection, connectString=192.168.53.122:2182 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29 Welcome to ZooKeeper! 2018-05-09 17:28:53,625 [myid:] - INFO [main-SendThread(192.168.53.122:2182):ClientCnxn$SendThread@1028] - Opening socket connection to server 192.168.53.122/192.168.53.122:2182. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2018-05-09 17:28:53,687 [myid:] - INFO [main-SendThread(192.168.53.122:2182):ClientCnxn$SendThread@878] - Socket connection established to 192.168.53.122/192.168.53.122:2182, initiating session [zk: 192.168.53.122:2182(CONNECTING) 0] 2018-05-09 17:28:53,736 [myid:] - INFO [main-SendThread(192.168.53.122:2182):ClientCnxn$SendThread@1302] - Session establishment complete on server 192.168.53.122/192.168.53.122:2182, sessionid = 0x200877925380002, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 192.168.53.122:2182(CONNECTED) 0] [zk: 192.168.53.122:2182(CONNECTED) 0] [zk: 192.168.53.122:2182(CONNECTED) 0]
配置hive,修改$HIVE_HOME/conf/hive_site.xml
<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>
Whether Hive supports concurrency control or not.
A ZooKeeper instance must be up and running when using zookeeper Hive lock manager
</description>
由于本机的伪分布式,一个就可以
<property>
<name>hive.zookeeper.quorum</name>
<value>192.168.53.122</value>
<description>
List of ZooKeeper servers to talk to. This is needed for:
1. Read/write locks - when hive.lock.manager is set to
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
2. When HiveServer2 supports service discovery via Zookeeper.
3. For delegation token storage if zookeeper store is used, if
hive.cluster.delegation.token.store.zookeeper.connectString is not set
4. LLAP daemon registry service
</description>
配置好属性后,hive会对特性的查询自动启动获取锁
hive> show locks;
OK
Time taken: 0.006 seconds
hive> show locks tb_cust extended;
OK
tab_name mode
Time taken: 0.023 seconds
hive> show locks tb_cust;
OK
tab_name mode
Time taken: 0.022 seconds
hive> show locks tb_cust partition(city='beijing');
OK
tab_name mode
Time taken: 0.145 seconds
hive> show locks tb_cust partition(city='beijing') extended;
OK
tab_name mode
Time taken: 0.078 seconds
hive提供了两种类型的锁,开启并发功能后,它们也就自动启动了。某个表被读取时需要使用共享锁,多重并发共享锁也是被允许的。
修改数据的操作需要使用独占锁,它不仅冻结其他的表修改操作,还有阻止其他进程的查询。
只要一个操作对表或者分区出发了独占锁,该表或者分区不能并发执行作业。
对表是分区表时,对表分区的独占锁会导致需要对表本身获取共享锁来防止发生不相容的变更。
load同insert一样会出发独占锁
显示锁和独占锁
独占锁表或者分区后,其他进程会等待,解锁后,其他进程继续执行。
hive> lock table tb_cust exclusive; //锁表
OK
Time taken: 0.123 seconds
hive> show locks tb_cust;
OK
tab_name mode
gamedw@tb_cust EXCLUSIVE
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive> unlock table tb_cust;
OK
Time taken: 0.081 seconds
hive> show locks tb_cust;
OK
tab_name mode
Time taken: 0.019 seconds
hive> lock table tb_cust partition(city='beijing') exclusive; //锁定一个分区,其他分区不被锁定,可以有其他操作,但整表无法进行操作。
OK
Time taken: 0.184 seconds
hive> show locks tb_cust;
OK
Time taken: 0.023 seconds
hive> show locks tb_cust partition(city='beijing');
OK
gamedw@tb_cust@city=beijing EXCLUSIVE
Time taken: 0.106 seconds, Fetched: 1 row(s)