zoukankan      html  css  js  c++  java
  • HDKV: HighDimensional Similarity Query in KeyValue Stores

    文章集中于key-value store

    Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

    Stable distributions

    The hash function [8] h_{\mathbf{a},b} (\boldsymbol{\upsilon}) : 
\mathcal{R}^d
\to \mathcal{N}  maps a d dimensional vector \boldsymbol{\upsilon} onto a set of integers. Each hash function in the family is indexed by a choice of random \mathbf{a} and b where \mathbf{a} is a d dimensional vector with entries chosen independently from a stable distribution and b is a real number chosen uniformly from the range [0,r]. For a fixed \mathbf{a},b the hash function h_{\mathbf{a},b} is given by h_{\mathbf{a},b} (\boldsymbol{\upsilon}) = \left \lfloor
\frac{\mathbf{a}\cdot \boldsymbol{\upsilon}+b}{r} \right \rfloor .

    Other construction methods for hash functions have been proposed to better fit the data. [9] In particular k-means hash functions are better in practice than projection-based hash functions, but without any theoretical guarantee.

    The key idea of locality-sensitive hash (LSH) is to hash the points using several hash functions so as to ensure that, for each function, the probability of
    collision is much higher for objects which are close to each other than for those which are far apart. Then, one can determine near neighbors by hashing the
    query point and retrieving elements stored in buckets containing that point.

  • 相关阅读:
    2017年第八蓝桥杯C/C++ A组国赛 —— 第二题:生命游戏
    451. 根据字符出现频率排序
    剑指 Offer 40. 最小的k个数
    list使用详解
    STL---priority_queue
    1046. 最后一块石头的重量
    739. 每日温度
    921. 使括号有效的最少添加
    STL----stack
    173. 二叉搜索树迭代器
  • 原文地址:https://www.cnblogs.com/zhangzhang/p/2355143.html
Copyright © 2011-2022 走看看