zoukankan html css js c++ java

一致性哈希

一致性哈希算法

使用场景

现在我们假设有100台redis data服务器，一份数据101进来的时候，以散列公式hash(i)&100，计算所存放的服务器，假设hash(i) = i,那么数据被散列到标号为1的服务器,然后这个时候服务器新增了一台，然后散列公式为hash(i)%101，这个时候请求访问数据101的时候，被分配至0号服务器，但是其实这个时候数据是在1号服务器的。

所以这个时候大量的数据失效了（访问不到了）。

所以这个时候，我们假设是新增了服务器，如果是持久化存储的，我们可以让服务器集群对数据进行重新散列，进行数据迁移，然后进行恢复，但是这个时候就意味着每次增减服务器的时候，集群就需要大量的通信，进行数据迁移，这个开销是非常大的。如果只是缓存，那么缓存就都失效了。所以这个时候怎么办？

我们可以看到，关键问题在于，服务器数量变动的时候，要能够保证旧的数据能够按照老的算法，计算到数据所在的服务器，而新的数据能够按照新的散列算法，计算出数据所在的服务器。

如上图，我们有ABCD四台服务器，这四台服务器被分配至0~232 的一个环上，比如0~230的存储在A服务器，230 +1~231 存储到B服务器上.....CD按照这样的进行均分。将我们的散列空间也划为0~232 ，然后数据进来后对232 取模，得到一个值K1，我们根据K1在环上所处的位置，得到所分配到的服务器，如图，K1被分配到B服务器。这个时候，我们有一台服务器B失效了。

我们可以看到，如果是B失效了，那么如果有持久化存储的，需要做数据恢复，将B的数据迁移至C即可，对于原本散列在A和D的数据，不需要做任何改变。同理，如果我们是新增了服务器，那么只需要对一台服务器的数据迁移一部分至新加的服务器即可。

一致性hash算法，减少了数据映射关系的变动，不会像hash(i)%N那样带来全局的变动

而且这样还有个好处，假设我们使用UID作为散列范围（即上面的232 ）,那么假设有部分UID的访问很频繁，而且这部分UID集中在B服务器上，那么就造成了B的负载远远高于其他服务器。这就是热点数据的问题。这个时候我们可以向B所在的UID空间添加服务器，减少B的压力。

其实还有个更好的解决办法：虚拟节点。

上面说的情况是，使用真实的服务器作为节点散列在232 上。我们假设，只有4台服务器（如上图），然后A上面有热点数据，结果A挂掉了，然后做数据恢复，A的数据迁移至B，然后B需要承受A+B的数据，也承受不住，也挂了。。。。然后继续CD都挂了。这就造成了

雪崩效应。

上面会造成雪崩效应的原因分析：
如果不存在热点数据的时候，每台机器的承受的压力是M/2(假设每台机器的最高负载能力为M)，原本是不会有问题的，但是，这个时候A服务器由于有热点数据挂了，然后A的数据迁移至B，导致B所需要承受的压力变为M（还不考虑热点数据访问的压力），所以这个失败B是必挂的，然后C至少需要承受1.5M的压力。。。。然后大家一起挂。。。
所以我们通过上面可以看到，之所以会大家一起挂，原因在于如果一台机器挂了，那么它的压力全部被分配到一台机器上，导致雪崩。

如果我们A挂了以后，数据被平均分配到BCD上，每台机器多承受M/6的压力，然后大家就都不会挂啦（不考虑热点数据）。

这里引入虚拟节点，如图：

环上的空间被划分为8份，然后A存储A1和A2。。。
这个时候，如果A服务器挂了，访问压力会分配至C2和D1，也就是C和D服务器，而不是像前面，全部被分配到B上。

引入虚拟节点，主要在于，如果一台服务器挂了，能够将压力引流至不同的服务器。

总结：一致性hash算法（DHT）通过减少影响范围的方式解决了增减服务器导致的数据散列问题，从而解决了分布式环境下负载均衡问题，如果存在热点数据，那么通过增添节点的方式，对热点区间进行划分，将压力分配至其他服务器。重新达到负载均衡的状态。

tair的负载均衡就是采用的一致性hash算法啦~~~
一致性hash算法在分布式环境中应用的很广，只要是涉及到分布式存储的负载均衡问题，一致性hash都是很好的解决的方案。

python中的hashring模块便实现了一致性哈希

hashring安装

下载hashring https://pypi.python.org/pypi/hash_ring#downloads

找到hash_ring.py文件，如果是python2.x环境直接拿来可以用

如果安装环境python3.x，需要在hash_ring.py文件里做如下修改

  1 # -*- coding: utf-8 -*-
  2 """
  3     hash_ring
  4     ~~~~~~~~~~~~~~
  5     Implements consistent hashing that can be used when
  6     the number of server nodes can increase or decrease (like in memcached).
  7 
  8     Consistent hashing is a scheme that provides a hash table functionality
  9     in a way that the adding or removing of one slot
 10     does not significantly change the mapping of keys to slots.
 11 
 12     More information about consistent hashing can be read in these articles:
 13 
 14         "Web Caching with Consistent Hashing":
 15             http://www8.org/w8-papers/2a-webserver/caching/paper2.html
 16 
 17         "Consistent hashing and random trees:
 18         Distributed caching protocols for relieving hot spots on the World Wide Web (1997)":
 19             http://citeseerx.ist.psu.edu/legacymapper?did=38148
 20 
 21 
 22     Example of usage::
 23 
 24         memcache_servers = ['192.168.0.246:11212',
 25                             '192.168.0.247:11212',
 26                             '192.168.0.249:11212']
 27 
 28         ring = HashRing(memcache_servers)
 29         server = ring.get_node('my_key')
 30 
 31     :copyright: 2008 by Amir Salihefendic.
 32     :license: BSD
 33 """
 34 
 35 import math
 36 import sys
 37 from bisect import bisect
 38 
 39 if sys.version_info >= (2, 5):
 40     import hashlib
 41     md5_constructor = hashlib.md5
 42 else:
 43     import md5
 44     md5_constructor = md5.new
 45 
 46 class HashRing(object):
 47 
 48     def __init__(self, nodes=None, weights=None):
 49         """`nodes` is a list of objects that have a proper __str__ representation.
 50         `weights` is dictionary that sets weights to the nodes.  The default
 51         weight is that all nodes are equal.
 52         """
 53         self.ring = dict()
 54         self._sorted_keys = []
 55 
 56         self.nodes = nodes
 57 
 58         if not weights:
 59             weights = {}
 60         self.weights = weights
 61 
 62         self._generate_circle()
 63 
 64     def _generate_circle(self):
 65         """Generates the circle.
 66         """
 67         total_weight = 0
 68         for node in self.nodes:
 69             total_weight += self.weights.get(node, 1)
 70 
 71         for node in self.nodes:
 72             weight = 1
 73 
 74             if node in self.weights:
 75                 weight = self.weights.get(node)
 76 
 77             factor = math.floor((40*len(self.nodes)*weight) / total_weight);
 78 
 79             for j in range(0, int(factor)):
 80                 b_key = self._hash_digest( '%s-%s' % (node, j) )
 81 
 82                 for i in range(0, 3):
 83                     key = self._hash_val(b_key, lambda x: x+i*4)
 84                     self.ring[key] = node
 85                     self._sorted_keys.append(key)
 86 
 87         self._sorted_keys.sort()
 88 
 89     def get_node(self, string_key):
 90         """Given a string key a corresponding node in the hash ring is returned.
 91 
 92         If the hash ring is empty, `None` is returned.
 93         """
 94         pos = self.get_node_pos(string_key)
 95         if pos is None:
 96             return None
 97         return self.ring[ self._sorted_keys[pos] ]
 98 
 99     def get_node_pos(self, string_key):
100         """Given a string key a corresponding node in the hash ring is returned
101         along with it's position in the ring.
102 
103         If the hash ring is empty, (`None`, `None`) is returned.
104         """
105         if not self.ring:
106             return None
107 
108         key = self.gen_key(string_key)
109 
110         nodes = self._sorted_keys
111         pos = bisect(nodes, key)
112 
113         if pos == len(nodes):
114             return 0
115         else:
116             return pos
117 
118     def iterate_nodes(self, string_key, distinct=True):
119         """Given a string key it returns the nodes as a generator that can hold the key.
120 
121         The generator iterates one time through the ring
122         starting at the correct position.
123 
124         if `distinct` is set, then the nodes returned will be unique,
125         i.e. no virtual copies will be returned.
126         """
127         if not self.ring:
128             yield None, None
129 
130         returned_values = set()
131         def distinct_filter(value):
132             if str(value) not in returned_values:
133                 returned_values.add(str(value))
134                 return value
135 
136         pos = self.get_node_pos(string_key)
137         for key in self._sorted_keys[pos:]:
138             val = distinct_filter(self.ring[key])
139             if val:
140                 yield val
141 
142         for i, key in enumerate(self._sorted_keys):
143             if i < pos:
144                 val = distinct_filter(self.ring[key])
145                 if val:
146                     yield val
147 
148     def gen_key(self, key):
149         """Given a string key it returns a long value,
150         this long value represents a place on the hash ring.
151 
152         md5 is currently used because it mixes well.
153         """
154         b_key = self._hash_digest(key)
155         return self._hash_val(b_key, lambda x: x)
156 
157     def _hash_val(self, b_key, entry_fn):
158         return (( b_key[entry_fn(3)] << 24)
159                 |(b_key[entry_fn(2)] << 16)
160                 |(b_key[entry_fn(1)] << 8)
161                 | b_key[entry_fn(0)] )
162 
163     def _hash_digest(self, key):
164         m = md5_constructor()
165         m.update(bytes(key,encoding='utf-8'))
166         #m.digest  2.7中返回字符串，3.x返回字节
167         # return map(ord, m.digest())
168         return list(m.digest())
169 
170 
171 
172 if __name__ == '__main__':
173     memcache_servers = ['192.168.0.246:11212',
174                         '192.168.0.247:11212',
175                         '192.168.0.249:11212']
176 
177     ring = HashRing(memcache_servers)
178     server = ring.get_node('my_key')
179     print(server)

3.5版本hash_ring

查看全文

相关阅读:
极具创意的专辑封面
 【Linux必知必会】五种开源协议的比较(BSD,Apache,GPL,LGPL,MIT)
【Ubuntu技巧】Ubuntu下gedit 打开txt文件乱码的处理方法
 【Linux原理】Linux中硬链接和软链接的区别和联系
 【短语学习】out of the box的含义和翻译
 【Ubuntu技巧】在全新安装的Ubuntu上快速重装软件包
 【论文阅读心得】图像识别中一个常用词的中英文释义——artifact
【短语学习】狮子那一份the lions share
【OpenCV学习】摄像头显示、录像、拍照程序
 【Perl学习】学习笔记（持续更新中）

原文地址：https://www.cnblogs.com/chenice/p/6875714.html