zoukankan      html  css  js  c++  java
  • [LeetCode] 1093. Statistics from a Large Sample 大样本统计


    You are given a large sample of integers in the range [0, 255]. Since the sample is so large, it is represented by an array count where count[k] is the number of times that k appears in the sample.

    Calculate the following statistics:

    • minimum: The minimum element in the sample.
    • maximum: The maximum element in the sample.
    • mean: The average of the sample, calculated as the total sum of all elements divided by the total number of elements.
    • median:
      • If the sample has an odd number of elements, then the median is the middle element once the sample is sorted.
      • If the sample has an even number of elements, then the median is the average of the two middle elements once the sample is sorted.
    • mode: The number that appears the most in the sample. It is guaranteed to be unique.

    Return the statistics of the sample as an array of floating-point numbers [minimum, maximum, mean, median, mode]. Answers within 10-5 of the actual answer will be accepted.

    Example 1:

    Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    Output: [1.00000,3.00000,2.37500,2.50000,3.00000]
    Explanation: The sample represented by count is [1,2,2,2,3,3,3,3].
    The minimum and maximum are 1 and 3 respectively.
    The mean is (1+2+2+2+3+3+3+3) / 8 = 19 / 8 = 2.375.
    Since the size of the sample is even, the median is the average of the two middle elements 2 and 3, which is 2.5.
    The mode is 3 as it appears the most in the sample.
    

    Example 2:

    Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    Output: [1.00000,4.00000,2.18182,2.00000,1.00000]
    Explanation: The sample represented by count is [1,1,1,1,2,2,2,3,3,4,4].
    The minimum and maximum are 1 and 4 respectively.
    The mean is (1+1+1+1+2+2+2+3+3+4+4) / 11 = 24 / 11 = 2.18181818... (for display purposes, the output shows the rounded number 2.18182).
    Since the size of the sample is odd, the median is the middle element 2.
    The mode is 1 as it appears the most in the sample.
    

    Constraints:

    • count.length == 256
    • 0 <= count[i] <= 109
    • 1 <= sum(count) <= 109
    • The mode of the sample that count represents is unique.

    这道题说是有很多在0到 255 中的整数,由于重复的数字太多了,所以这里采用的是统计每个数字出现的个数的方式,用数组 count 来表示,其中 count[i] 表示数字i出现的次数。现在让统计原始数组中的最大值,最小值,平均值,中位数,和众数。这里面的最大最小值很好求,最小值就是 count 数组中第一个不为0的位置,最大值就是 count 数组中最后一个不为0的位置。最小值 mn 初始化为 256,在遍历 count 数组的过程中,遇到不为0的数字时,若此时 mn 为 256,则更新为坐标i。最大值 mx 直接每次更新为值不为0的坐标i即可。平均值也好求,只要求出所有的数字之和,跟数字的个数相除就行了,注意由于数字之和可能很大,需要用 double 来表示。众数也不难求,只要找出 count 数组中的最大值,则其坐标就是众数。比较难就是中位数了,由于数组的个数可奇可偶,中位数的求法不同,这里为了统一,采用一个小 trick,比如数组 1,2,3 和 1,2,3,4,可以用坐标为 (n-1)/2 和 n/2 的两个数字求平均值得到,对于长度为奇数的数组,这两个坐标表示的是相同的数字。这里由于是统计数组,所以要找的两个位置是 (cnt+1)/2 和 (cnt+2)/2,其中 cnt 是所有数字的个数。再次遍历 count 数组,使用 cur 来累计当前经过的数字个数,若 cur 小于 first,且 cur 加上 count[i] 大于等于 first,说明当前数字i即为所求,加上其的一半到 median。同理,若 cur 小于 second,cur 加上 count[i] 大于等于 second,说明当前数字i即为所求,加上其的一半到 median 即可,参见代码如下:


    class Solution {
    public:
        vector<double> sampleStats(vector<int>& count) {
            double mn = 256, mx = 0, mean = 0, median = 0, sum = 0;
            int cnt = 0, mode = 0;
            for (int i = 0; i < count.size(); ++i) {
                if (count[i] == 0) continue;
                if (mn == 256) mn = i;
                mx = i;
                sum += (double)i * count[i];
                cnt += count[i];
                if (count[i] > count[mode]) mode = i;
            }
            mean = sum / cnt;
            int first = (cnt + 1) / 2, second = (cnt + 2) / 2, cur = 0;
            for (int i = 0; i < count.size(); ++i) {
                if (cur < first && cur + count[i] >= first) median += i / 2.0;
                if (cur < second && cur + count[i] >= second) median += i / 2.0;
                cur += count[i];
            }
            return {mn, mx, sum / cnt, median, (double)mode};
        }
    };
    

    Github 同步地址:

    https://github.com/grandyang/leetcode/issues/1093


    参考资料:

    https://leetcode.com/problems/statistics-from-a-large-sample/

    https://leetcode.com/problems/statistics-from-a-large-sample/discuss/317626/Python-Solution

    https://leetcode.com/problems/statistics-from-a-large-sample/discuss/317857/Java-Simple-2-pass-code-w-comments-and-explanation.


    LeetCode All in One 题目讲解汇总(持续更新中...)

  • 相关阅读:
    第二节. SignalR开篇以及如何指定传输协议
    第一节:.Net版基于WebSocket的聊天室样例
    Maven常用dependency记录
    Linux学习笔记
    Maven 学习笔记
    SQL脚本去重分组统计
    版本号定义
    C# Random循环生成随机数重复问题解决方案
    C#通过Oracle.ManagedDataAccess无法访问Oralce
    Java注解的使用,类似于C#的Attribute
  • 原文地址:https://www.cnblogs.com/grandyang/p/14606594.html
Copyright © 2011-2022 走看看