zoukankan      html  css  js  c++  java
  • 深入浅出zookeeper

    我的博客均是原创,转载请注名出处

    前言

    zookeeper 在分布式系统中主要提供了consensus support, 即一致性协议实现。提起一致性协议,首先想到底是大名顶顶的paxos(E.g. google's chubby), 更新的是raft(e.g. etcd), zookeeper是在paxos 之后,raft之前,它的协议名为ZAB(zookeeper atomic broadcast),笔者觉得它是汲取了paxos之精华。当然17年好像出了几篇更新的paper,因为笔者还没读,不敢妄加揣测consensus protocol的future trend,不过zookeeper因为在hadoop eco-system里的广泛应用,作为一个经典的存在,还是值得聊一聊

    安装

    wget http://www.gtlib.gatech.edu/pub/apache/zookeeper/stable/zookeeper-3.4.10.tar.gz

    export ZK_HOME=$YOUR_FAV_DIR

    cp conf/zoo_sample.cfg conf/zoo.cfg

    修改一下dataDir即可,tickTime is in milliseconds, minimum session timeout is 2 * tickTime, client port 默认2181

    make sure java installed and JAVA_HOME env variable exists. following command could use to find java home:

    $(dirname $(dirname $(readlink -f $(which javac))))

    运行

    zkServer.sh start

    (output looks like Using config: .../zoo.cfg Starting zookeeper ... STARTED)

    zookeeper pid could be got from following command:

    ps -ef | grep zookeeper | grep -v grep | awk '{print $2}'

    Or run:

    jps

    listed as QuorumPeerMain

    用java-based shell 连接:

    zkCli.sh -server localhost:2181

    停止运行

    zkServer.sh stop

    设置多点cluster, called zookeeper ensemble, the minimum recommended number is 3, 5 is most common in production env.

    server.1=zoo1:3000:4000
    server.2=zoo2:3000:4000
    server.3=zoo3:3000:4000

    为了测试的需要,也可以单机上设置多点(好多分布式数据库都是这么玩的,不然你让开发者怎么搞):

    server.1=localhost:2345:3345
    server.2=localhost:2346:3346
    server.3=localhost:2347:3347

    当然单机上略微麻烦点,需要搞3个cfg, 比如zoo1.cfg, zoo2.cfg, zoo3.cfg, 同时3个myid,  zoo1/myid, zoo2/myid, zoo3/myid,在每个cfg中dataDir都指向相应的zoo1, zoo2, zoo3,同时clientPort为2181,2182,2183, 启动的时候:

    zkCli.sh -server localhost:2181, localhost:2182, localhost:2183

     Zookeeper 总揽

    Detail: http://zookeeper.apache.org/doc/r3.4.10/zookeeperOver.html

    Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

    Zookeeper 数据模型

    如果把zookeeper的数据模型看作是文件系统,那么zode就是其中的一个节点,有四种znode:

    • persistent
    • ephemeral
    • persistent_sequential
    • ephemeral_sequential

    persistent znode 的lifetime是知道它们被显示delete, 任何有权限删除的client都可以删除,它主要适用于存储highly available and accessible数据,比如configuration data

    ephemeral zonde visible by all clients(受ACL限制), 一旦客户session退出,就会被删除, 当然delete也可以删除,它可以适用于membership service,节点加入,删除,一目了然。

    Zookeeper watch

    zookeeper client could choose to watch a znode and get notification when there is change, E.g. clientA sets watch on /group, now if clientA does getChildren on /group, [clientA]. Now if clientB joins /group at this time, then clientA will be notified, and client A could getChildren again, now [clientA, clientB] will be returned.

    • the watch is ordered in FIFO and dispatched in order
    • watch notification delivered to client before other changes happen, although once client received, there could be other things happen there
    • order of the watch events is ordered in order seen by zookeeper

     另外session timeout是有zookeeper服务器端检测的,所以一个当一个client和一台服务器断开时,watch会被trigger,但是client如果重新加入了另一台同cluster的服务器,由于session ID,所以client的信息会重新被找回(只要不超过expire time)

    Kazoo

    zookeeper 支持java,C,python(社区支持)还有其它语言(社区支持),这里以Kazoo这个python client binding举例:

    watcher:

    import signal
    from kazoo.client import KazooClient
    from kazoo.recipe.watchers import ChildrenWatch
    
    zoo_path = '/MyPath'
    zk = KazooClient(hosts='localhost:2181')
    zk.start()
    zk.ensure_path(zoo_path)
    @zk.ChildrenWatch(zoo_path)
    def child_watch_func(children):
        print "List of Children %s" % children
    while True:
        signal.pause()

    output looks like

    /usr/bin/python child_watch.py
    List of Children []
    List of Children [u'child1']
    List of Children [u'child1', u'child2']
    List of Children [u'child1', u'child3', u'child2']
    List of Children [u'child1', u'child2']
    List of Children [u'child1']
    List of Children []
    On another terminal: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] create /MyPath/child1 "" Created /MyPath/child1 [zk: localhost:2181(CONNECTED) 1] create /MyPath/child2 "" Created /MyPath/child2 [zk: localhost:2181(CONNECTED) 2] create /MyPath/child3 "" Created /MyPath/child3 [zk: localhost:2181(CONNECTED) 3] delete /MyPath/child3 [zk: localhost:2181(CONNECTED) 4] delete /MyPath/child2 [zk: localhost:2181(CONNECTED) 5] delete /MyPath/child1

    Leader election:

    from kazoo.client import KazooClient
    import time
    import uuid
    
    import logging
    logging.basicConfig()
    
    my_id = uuid.uuid4()
    
    def leader_func():
        print "I am leader : %s" % my_id
        while True:
            print "%s is still working" % my_id
            time.sleep(3)
    
    zk = KazooClient(hosts='127.0.0.1:2181')
    zk.start()
    
    election = zk.Election("/electionpath")
    
    election.run(leader_func)
    
    zk.stop()

    output looks like:

    $ /usr/bin/python leader_election.py 
    I am leader : 961165dc-c6f7-4858-a315-0be40fca1a47
    961165dc-c6f7-4858-a315-0be40fca1a47 is still working
    961165dc-c6f7-4858-a315-0be40fca1a47 is still working
    961165dc-c6f7-4858-a315-0be40fca1a47 is still working
    961165dc-c6f7-4858-a315-0be40fca1a47 is still working
    961165dc-c6f7-4858-a315-0be40fca1a47 is still working
    
    ctrl-C
    
    Then another work will become leader

    Zookeeper 应用

    Barrier 

    single barrier: to start with, a znode named say "/zk_barrier" is created, clients all watch it, once it not exists, clients do something.

    double barrier: block until N number of clients join, then do something, block until N number of clients leave, then do something

    Queue

    A znode say "queue-znode" is designated to hold a queue, producer adds item to the queue by creating child znode under queue-znode with sequential flag, consumer retrieve the item by getting and deleting a child from queue-znode

    Lock

    distributed lock, assuming there is a "/lock_node", then each client try creat EPHEMERAL_SEQUENTIAL node, and call get children, if the node someone created is the lowest number, then it's the owner of lock, the next number 需要watch 它之前紧挨着的number,如果那个number突然消失,get_all,重复这个过程

    Leader election

    和Lock道理差不多,最小的当leader

    Group membership

    /membership node, getChildren("/membership", true), get with watch, client 刚来时会注册一个ephemeral znode,这样大家都会得到通知,同时getChildren可以告诉当前的membership情况。

    2 Phase commit

    Service Discovery

    比如hosts在/services下面注册服务通过创立ephemeral node,如果是caching可以是/services/caching,如果是storage可以是/services/storage,当clients访问系统的时候,register watch在/services path上,这样新的host加入或离开就会被client知道

  • 相关阅读:
    从C#到TypeScript
    从C#到TypeScript
    从C#到TypeScript
    从C#到TypeScript
    UWP开源项目 LLQNotifier 页面间通信利器(移植EventBus)
    .net源码分析
    读读日报小布版 计划与反馈
    Pyinstaller使用
    Pyspider安装使用
    元器件资料查看网站
  • 原文地址:https://www.cnblogs.com/huashao1985/p/8328171.html
Copyright © 2011-2022 走看看