Rendezvous Hashing vs Consistent Hashing

zoukankan html css js c++ java

Rendezvous Hashing vs Consistent Hashing
Rendezvous Hashing

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to. When k is 1, it subsumes the goals of consistent hashing, using an entirely different method.

Rendezvous hashing solves the distributed hash table problem: How can a set of clients, given an object O, agree on where in a set of n sites (servers, say) to place O? Each client is to select a site independently, but all clients must end up picking the same site. This is non-trivial if we add a minimal disruption constraint, and require that only objects mapping to a removed site may be reassigned to other sites.

The basic idea is to give each site S_j a score (a weight) for each object O_i, and assign the object to the highest scoring site. All clients first agree on a hash function h(). For object O_i, the site S_j is defined to have weight w_i,j = h(O_i, S_j). HRW assigns O_i to the site S_m whose weight w_i,mis the largest. Since h() is agreed upon, each client can independently compute the weights w_i,1, w_i,2, ..., w_i,n and pick the largest. If the goal is distributed k-agreement, the clients can independently pick the sites with the k largest hash values.

If a site S is added or removed, only the objects mapping to S are remapped to different sites, satisfying the minimal disruption constraint above. The HRW assignment can be computed independently by any client, since it depends only on the identifiers for the set of sites S₁, S₂, ..., S_n and the object being assigned.

HRW easily accommodates different capacities among sites. If site S_k has twice the capacity of the other sites, we simply represent S_k twice in the list, say, as S_k,1 and S_k,2. Clearly, twice as many objects will now map to S_k as to the other sites.

Properties

Under rendezvous hashing, however, clients handle site failures by picking the site that yields the next largest weight. Remapping is required only for objects currently mapped to the failed site, and as proved in,^[1]^[2] disruption is minimal. Rendezvous hashing has the following properties.
1. Low overhead: The hash function used is efficient, so overhead at the clients is very low.
2. Load balancing: Since the hash function is randomizing, each of the n sites is equally likely to receive the object O. Loads are uniform across the sites.
  
  Site capacity: Sites with different capacities can be represented in the site list with multiplicity in proportion to capacity. A site with twice the capacity of the other sites will be represented twice in the list, while every other site is represented once.
3. High hit rate: Since all clients agree on placing an object O into the same site S_O , each fetch or placement of O into S_O yields the maximum utility in terms of hit rate. The object O will always be found unless it is evicted by some replacement algorithm at S_O .
4. Minimal disruption: When a site fails, only the objects mapped to that site need to be remapped. Disruption is at the minimal possible level.
5. Distributed k-agreement: Clients can reach distributed agreement on k sites simply by selecting the top k sites in the ordering.
Comparison with Consistent Hashing
1. Rendezvous hashing is much simpler to understand and code.
2. Rendezvous hashing provides a very even distribution of keys on each node, even while node are being added/removed. Consistent hashing can fail to provide an even distribution for small clusters (though this can be fixed to a large extent by using many virtual replicas for each node). This is the biggest advantage of Rendezvous hashing over consistent hashing.
3. Consistent hashing is typically done in $O (l o g N)$
4. Consistent hashing requires just one hash computation per key, whereas Rendezvous hashing requires $O (N)$
5. Consistent hashing requires some fixed memory to work well (mapping nodes to virtual nodes and hashes for all the virtual nodes) whereas Rendezvous hashing doesn't require storing any additional data.
6. Rendezvous hashing can naturally provide $k$
So in nutshell, use Rendezvous hashing if:
- Your clusters are very small.
- Your clusters are very large (say thousands of nodes) and you need to keep your memory footprint low.
- You want to support replication, but don't want to implement a slightly modified consistent hashing algorithm yourself.
Comparison with Consistent Hashing

Apache Ignite uses Rendezvous Hashing to distribute cache data uniformaly in the computing grid. Cassandra uses Consistent Hashing for replication and high availability.
查看全文

相关阅读:
[原创]桓泽学音频编解码（13）：AC3 位分配模块算法分析
 白话红黑树系列之一——初识红黑树
 白话红黑树系列之二——红黑树的构建
 数据驱动编程之表驱动法
 每周一算法之六——KMP字符串匹配算法
 HDOJ 1098 Ignatius's puzzle
HDOJ 1097 A hard puzzle（循环问题）
HDOJ 1019 Least Common Multiple(最小公倍数问题)
辗转相除法_欧几里得算法_java的实现（求最大公约数）
HDOJ 1020 Encoding

原文地址：https://www.cnblogs.com/codingforum/p/10316442.html