zoukankan      html  css  js  c++  java
  • Find the largest K numbers from array (找出数组中最大的K个值)

    Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).

     function select(list[1..n], k)
         for i from 1 to k
             minIndex = i
             minValue = list[i]
             for j from i+1 to n
                 if list[j] < minValue
                     minIndex = j
                     minValue = list[j]
             swap list[i] and list[minIndex]
         return list[k]

    The C++ implementation is
    template<typename T>
    std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
    {
        if (K > vecInput.size())
            return vecInput;
    
        std::vector<T> vecLocal(vecInput);
        std::vector<T> vecResult;
        for (size_t k = 0; k < K; ++ k)
        {
            T maxValue = vecLocal[k];
            int maxIndex = k;
            for (size_t i = k + 1; i < vecLocal.size(); ++i) {
                if (vecLocal[i] > maxValue) {
                    maxValue = vecLocal[i];
                    maxIndex = i;
                }
            }
            if (maxIndex != k)
                std::swap(vecLocal[maxIndex], vecLocal[k]);
            vecResult.push_back( maxValue );
            vecIndex.push_back( maxIndex );
        }
        return vecResult;
    }

    When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:

    1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)

    2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
    ……a) If the element is greater than the root then make it root and call heapify for MH
    ……b) Else ignore it.
    // The step 2 is O((n-k)*logk)

    3) Finally, MH has k largest elements and root of the MH is the kth largest element.

    Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).

    The C++ implementation of the method is as below:

    // To heapify a subtree rooted with node i which is
    // an index in arr[]. n is size of heap
    template<typename T>
    void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
    {
        int smallestIndex = i;  // Initialize largest as root
        int l = 2 * i + 1;  // left = 2*i + 1
        int r = 2 * i + 2;  // right = 2*i + 2
    
        // If left child is larger than root
        if (l < n && vecInput[l] < vecInput[smallestIndex])
            smallestIndex = l;
    
        // If right child is larger than largest so far
        if (r < n && vecInput[r] < vecInput[smallestIndex])
            smallestIndex = r;
    
        // If largest is not root
        if (smallestIndex != i)
        {
            std::swap(vecInput[i], vecInput[smallestIndex]);
            std::swap(vecIndex[i], vecIndex[smallestIndex]);
    
            // Recursively heapify the affected sub-tree
            heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
        }
    }
    
    template<typename T>
    std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
    {
        if (K > vecInput.size())  {
            std::vector<T> vecResult(vecInput);
            std::sort(vecResult.begin(), vecResult.end());
            std::reverse(vecResult.begin(), vecResult.end());
            for (size_t i = 0; i < vecInput.size(); ++i)
                vecIndex.push_back(i);
            return vecResult;
        }
    
        std::vector<T> vecLocal(vecInput);
        std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
        vecIndex.clear();
        for (size_t i = 0; i < K; ++ i) vecIndex.push_back(i);
    
        for (int K1 = K / 2 - 1; K1 >= 0; -- K1)
            heapifyMinToRoot(vecResult, K, K1, vecIndex);
    
        for (size_t i = K; i < vecLocal.size(); ++ i) {
            if (vecLocal[i] > vecResult[0]) {
                vecResult[0] = vecLocal[i];
                vecIndex[0] = i;
                
                for (int K1 = K / 2 - 1; K1 >= 0; -- K1)
                    heapifyMinToRoot(vecResult, K, K1, vecIndex);
            }
        }
        for (int k = K - 1; k >= 0; -- k )
        {
            std::swap(vecResult[k], vecResult[0]);
            std::swap(vecIndex[k], vecIndex[0]);
    
            heapifyMinToRoot(vecResult, k, 0, vecIndex);
        }
    
        return vecResult;
    }

    Here is the code to test these two methods.

    void SelectionAlgorithmBenchMark()
    {
        int N = 200000;
        std::vector<int> vecInput;
    
        std::minstd_rand0 generator(1000);
        for (int i = 0; i < N; ++i)
        {
            int nValue = generator();
            vecInput.push_back(nValue );
        }
        std::vector<int> vecResult, vecIndex;
        int K = 20;
        CStopWatch stopWatch;
        vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
        std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
        for (int k = 0; k < K; ++k)
        {
            std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
        }
        std::cout << std::endl;
    
        stopWatch.Start();
        vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
        std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
        for (int k = 0; k < K; ++k)
        {
            std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
        }
    }

    When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.

  • 相关阅读:
    jquery插件课程1 幻灯片、城市选择、日期时间选择、拖放、方向拖动插件
    博客园随笔如何自动生成目录(原理:页脚js函数且执行)
    JAVA web四个属性的范围汇总
    关于继承modelDriven接口action的ajax来电参数
    Objective-C基调(4)Category
    Easyui使用记录
    jQuery地图热点效应-后在弹出的提示鼠标层信息
    跨境移动互联网的魅力演绎,hao123无论成就下一个条目?
    启示—地点IT高管20在职场心脏经(读书笔记6)
    C# 获得Excel工作簿Sheet页面(工作表)集合的名称
  • 原文地址:https://www.cnblogs.com/shengguang/p/6110158.html
Copyright © 2011-2022 走看看