zoukankan      html  css  js  c++  java
  • *Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements.

    For example,
    Given [1,1,1,2,2,3] and k = 2, return [1,2].

    Note: 

      • You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
      • Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

    Solution 1: HashMap + Heap

    This solution is very straight forward, it stores all the elements in a HashMap that maps elements to their frequencies, then inserts all the map entry to a max heap so as to get the ones with the highest frequency. The time complexity is O(nlogn) and space complexity is O(n).

    public List<Integer> topKFrequent(int[] nums, int k) {
        Map<Integer, Integer> map = new HashMap<>();
        for(int n : nums){
            map.put(n, map.getOrDefault(n,0) + 1);
        }
    
        PriorityQueue<Map.Entry<Integer, Integer>> maxHeap =
                         new PriorityQueue<>((a,b) -> (b.getValue() - a.getValue()));
        for(Map.Entry<Integer,Integer> entry : map.entrySet()){
            maxHeap.add(entry);
        }
    
        List<Integer> res = new ArrayList<>();
        while(res.size()<k){
            Map.Entry<Integer, Integer> entry = maxHeap.poll();
            res.add(entry.getKey());
        }
        return res;
    }

    Solution 2: Bucket Sort

    Since sorting takes O(nlogn), can we accelerate the sorting process by using extra space? The answer is yes, bucket sort.

    bucketSort(arr[], n)
    1) Create n empty buckets (Or lists).
    2) Do following for every array element arr[i].
    .......a) Insert arr[i] into bucket[n*array[i]]
    3) Sort individual buckets using insertion sort.
    4) Concatenate all sorted buckets.

    public List<Integer> topKFrequent(int[] nums, int k) {
        List<Integer>[] bucket = new List[nums.length + 1];
        Map<Integer, Integer> frequencyMap = new HashMap<>();
        for (int n : nums) {
            frequencyMap.put(n, frequencyMap.getOrDefault(n, 0) + 1);
        }
    
        for (int key : frequencyMap.keySet()) {
            int frequency = frequencyMap.get(key);
            if (bucket[frequency] == null) {
                bucket[frequency] = new ArrayList<>();
            }
            bucket[frequency].add(key);
        }
    
        List<Integer> res = new ArrayList<>();
        for (int pos = bucket.length - 1; pos >= 0 && res.size() < k; pos--) {
            if (bucket[pos] != null) {
                res.addAll(bucket[pos]);
            }
        }
        return res;
    }

    Top K Frequent Elements in a Data Stream (System Design)

    If we replace the input array with a data stream, it becomes harder for a single machine to store everything in a single HashMap, it’s also impossible to perform sorting for such large data set. Let’s see what can we do if we can use multi machines to handle this work.

    Solution 1: Multi HashMap + heap

    We can split the data into n shardings and move a record to a instance respectively. Instance0 only handles the data with its hash code equals hash(item) % n == 0Instance1 only handles the data with hash(item) % n == 1, etc.

    For each instance, we have a HashMap and Heap to store the items as in the first solution for the leetcode question.

    In order to retrieve the global top k elements, we only need to gather the heaps from all the instances and run a merge sort, it’s very similar to the leetcode question: 23. Merge k Sorted Lists.

    Even though we distribute the work load to multiple machines to reduce the data size to 1/n, it’s still not the optimal way to handle large data set in daily work.

    See more details at: 

    https://zpjiang.me/2017/11/13/top-k-elementes-system-design/

    reference: https://zpjiang.me/2017/11/13/top-k-elementes-system-design/

  • 相关阅读:
    基于C/S模式的程序更新下载
    拓展方法
    主从级GridView
    using 的用法
    五大数据类型
    redis的主从复制原理及步骤
    redis的发布订阅
    redis的持久化
    redis的事务
    配置文件详解
  • 原文地址:https://www.cnblogs.com/hygeia/p/5759512.html
Copyright © 2011-2022 走看看