zoukankan      html  css  js  c++  java
  • 倒排列表求交集算法汇总

    http://bbs.sjtu.edu.cn/bbstcon,board,Algorithm,reid,1225812893.html

    我总结了一下,归纳如下:
    1.1 SvS and Swapping SvS
    Algorithm 1 Pseudo-code for SvS
    SvS(set, k)
    1: Sort the sets by size (|set[0]| ≤ |set[1]| ≤ . . . ≤ |set[k]|).
    2: Let the smallest set s[0] be the candidate answer set.
    3: for each set s[i], i = 1. . . k do initialize _[k] = 0.
    4: for each set s[i], i = 1. . . k do
    5:  for each element e in the candidate answer set do
    6:    search for e in s[i] in the range l[i] to |s[i]|,
    7:    and update l[i] to the last position probed in the previous step.
    8:    if e was not found then
    9:      remove e from candidate answer set,
    10:      and advance e to the next element in the answer set.
    11:    end if
    12:  end for
    13: end for
    这是常用的一种算法,它首先是找出最短的两个集合,依次查找第一个集合里的元素是否
    出现在第二个集合内部;Demaine考虑的Swapping_SvS和上述算法有稍微的不同,即是在每
    次比较后,取包含更少元素的集合来与再下一个集合进行比较,这种算法在第一个集合和
    第二个集合比较之后第二个集合反而更少的情况下效果更好,但实验表明这种情况并不多
    见。
    
    1.2 Small Adaptive
    Algorithm 2 Pseudo-code for Small_Adaptive
    Small_Adaptive(set, k)
    1: while no set is empty do
    2:   Sort the sets by increasing number of remaining elements.
    3:   Pick an eliminator e = set[0][0] from the smallest set.
    4:   elimset ← 1.
    5:   repeat
    6:     search for e in set[elimset].
    7:     increment elimset;
    8:   until s = k or e is not found in set[elimset]
    9:   if s = k then
    10:     add e to answer.
    11:   end if
    12: end while
    这是一种混合算法,结合了Svs和Adaptive的优点。它的特点是对每个集合按未被检查过的
    元素个数进行排序,从中挑出未被检查过的元素个数最少和次少的集合进行比较,找到公
    有的一个元素后,再在其他集合中进行查找,有某个集合查找完毕即结束。
    
    1.3 Sequential and Random Sequential
    Algorithm 3 Pseudo-code for Sequential
    Sequential(set, k)
    1: Choose an eliminator e = set[0][0], in the set elimset ← 0.
    2: Consider the first set, i ← 1.
    3: while the eliminator e _= ∞do
    4:   search in set[i] for e
    5:   if the search found e then
    6:     increase the occurrence counter.
    7:   if the value of occurrence counter is k then output e end if
    8:   end if
    9:   if the value of the occurrence counter is k, or e was not found then
        /*若计数到k或者e没有被找到*/
    10:     update the eliminator to e ← set[i][succ(e)]. 
        /*将e赋值为现在集合中下一个值*/
    11:   end if
    12:   Consider the next set in cyclic order i ← i + 1 mod k.
         /*循环移位地选择新的集合*/
    13: end while
    Barbay and Kenyon引入的,对不确定复杂度的样本查找比较好,每次在各个集合中的查找
    是用快速查找。
    RSequential与Sequential的区别是Sequential挑选循环中下一个集合作为下一个搜索集合
    ,而RSequential则是随机挑选一个集合。
    
    1.4 Baeza-Yates and Baeza-Yates Sorted 
    Algorithm 4 Pseudo-code for BaezaYates
    BaezaYates(set, k)
    1: Sort the sets by size (|set[0]| ≤ |set[1]| ≤ . . . ≤ |set[k]|).
    2: Let the smallest set set[0] be the candidate answer set.
    3: for each set set[i], i = 1. . . k do
    4:   candidate ← BYintersect(candidate, set[i], 0, |candidate| − 1, 0,|set[i]| − 1)
    5:   sort the candidate set.
    6: end for
    
    BYintersect(setA, setB, minA, maxA, minB, maxB)
    1: if setA or setB are empty then return   endif.
    2: Let m = (minA + maxA)/2 and let medianA be the element at setA[m].
    3: Search for medianA in setB.
    4: if medianA was found then
    5:   add medianA to result.
    6: end if
    7: Let r be the insertion rank of medianA in setB.
    8: Solve the intersection recursively on both sides of r and m in each set.
    Baeza-Yates(巴伊赞-耶茨,他著有著名书籍《现代信息检索》)提出的方法,主要是利用
    了分治思想,取出较短集合中的中间元素,在较长集合中搜索该元素,于是将较短和较长
    集合均分为了2部分,在这2各部分中再递归搜索下去即可。注意:这样每次搜索完2个集合
    ,输出的交集是无序的,因此需要将此交集再排序后,再和第3个集合进行比较搜索。
    Baeza-Yates Sorted是对上述方法进行了改进,即在保存公有的元素时是按序保存的,保
    存整段中间元素时必须保证前半段搜索到的中部元素已经被保存了,这样处理可以节省最
    后将搜索到的交集再次排序的时间,但代价是中间处理的时候需要增加处理的细节。
    
    1.5 总结
    上面所有的算法最坏情况下都有线性的时间复杂度。BaezaYates、So_BaezaYates, Small
    _Adaptive和SvS在集合的大小不同时有显著优势,并且Small_Adaptive是惟一一个在算法
    去除集合中元素导致集合的大小动态变化时,有更大的优势;Sequential and RSequenti
    al 对集合大小不敏感。
    
    
  • 相关阅读:
    Leetcode Excel Sheet Column Number
    AlgorithmsI PA2: Randomized Queues and Deques Subset
    AlgorithmsI PA2: Randomized Queues and Deques RandomizedQueue
    AlgorithmsI PA2: Randomized Queues and Deques Deque
    AlgorithmsI Programming Assignment 1: PercolationStats.java
    hdu多校第四场 1003 (hdu6616) Divide the Stones 机智题
    hdu多校第四场 1007 (hdu6620) Just an Old Puzzle 逆序对
    hdu多校第四场1001 (hdu6614) AND Minimum Spanning Tree 签到
    hdu多校第三场 1007 (hdu6609) Find the answer 线段树
    hdu多校第三场 1006 (hdu6608) Fansblog Miller-Rabin素性检测
  • 原文地址:https://www.cnblogs.com/bonelee/p/6593655.html
Copyright © 2011-2022 走看看