zoukankan      html  css  js  c++  java
  • statistics of Python

    statistics

    统计模块支持普通的int float类型,还支持封装的 Decimal 和 Fraction的统计计算。

    且输入数据的类型要保持一致。

    统计功能分为两个部分:

    (1)均值和中心位置度量。-- 均值和中位数。

    (2)延展度度量。-- 偏差和标准差。

    https://docs.python.org/3.7/library/statistics.html

    This module provides functions for calculating mathematical statistics of numeric (Real-valued) data.

    Note

    Unless explicitly noted otherwise, these functions support int, float, decimal.Decimal and fractions.Fraction. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Mixed types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use map() to ensure a consistent result, e.g. map(float, input_data).

    Averages and measures of central location

    These functions calculate an average or typical value from a population or sample.

    mean()

    Arithmetic mean (“average”) of data.

    harmonic_mean()

    Harmonic mean of data.

    median()

    Median (middle value) of data.

    median_low()

    Low median of data.

    median_high()

    High median of data.

    median_grouped()

    Median, or 50th percentile, of grouped data.

    mode()

    Mode (most common value) of discrete data.

    Measures of spread

    These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values.

    pstdev()

    Population standard deviation of data.

    pvariance()

    Population variance of data.

    stdev()

    Sample standard deviation of data.

    variance()

    Sample variance of data.

    DEMO

    https://pymotw.com/3/statistics/index.html

    使用方差或者标准差来度量数据的分散度。

    其值越小,表示数据更加向均值聚拢。

    Statistics uses two values to express how disperse a set of values is relative to the mean. The variance is the average of the square of the difference of each value and the mean, and the standard deviation is the square root of the variance (which is useful because taking the square root allows the standard deviation to be expressed in the same units as the input data). Large values for variance or standard deviation indicate that a set of data is disperse, while small values indicate that the data is clustered closer to the mean.

    from statistics import *
    import subprocess
    
    
    def get_line_lengths():
        cmd = 'wc -l ../[a-z]*/*.py'
        out = subprocess.check_output(
            cmd, shell=True).decode('utf-8')
        for line in out.splitlines():
            parts = line.split()
            if parts[1].strip().lower() == 'total':
                break
            nlines = int(parts[0].strip())
            if not nlines:
                continue  # skip empty files
            yield (nlines, parts[1].strip())
    
    
    data = list(get_line_lengths())
    
    lengths = [d[0] for d in data]
    sample = lengths[::2]
    
    print('Basic statistics:')
    print('  count     : {:3d}'.format(len(lengths)))
    print('  min       : {:6.2f}'.format(min(lengths)))
    print('  max       : {:6.2f}'.format(max(lengths)))
    print('  mean      : {:6.2f}'.format(mean(lengths)))
    
    print('
    Population variance:')
    print('  pstdev    : {:6.2f}'.format(pstdev(lengths)))
    print('  pvariance : {:6.2f}'.format(pvariance(lengths)))
    
    print('
    Estimated variance for sample:')
    print('  count     : {:3d}'.format(len(sample)))
    print('  stdev     : {:6.2f}'.format(stdev(sample)))
    print('  variance  : {:6.2f}'.format(variance(sample)))

    计算方差和标准差,有抽样 和 全集模式。

    stdev -- 表示抽样

    pstdev -- 表示统计所有数据。

    Python includes two sets of functions for computing variance and standard deviation, depending on whether the data set represents the entire population or a sample of the population. This example uses wc to count the number of lines in the input files for all of the example programs and then uses pvariance() and pstdev() to compute the variance and standard deviation for the entire population before using variance() and stddev() to compute the sample variance and standard deviation for a subset created by using the length of every second file found.

    $ python3 statistics_variance.py
    
    Basic statistics:
      count     : 1282
      min       :   4.00
      max       : 228.00
      mean      :  27.79
    
    Population variance:
      pstdev    :  17.86
      pvariance : 319.04
    
    Estimated variance for sample:
      count     : 641
      stdev     :  16.94
      variance  : 286.99
    
  • 相关阅读:
    selenium Grid2 分布式自动化测试环境搭建
    Python Appium 开启Android测试之路
    C#导出数据到CSV和EXCEL文件时数字文本被转义的解决方法
    浅谈 DML、DDL、DCL的区别
    让EntityFramework6支持SQLite
    System.Drawing.Color的颜色对照表
    清除远程桌面连接记录和SQLSERVER 连接记录的办法
    Jquery操作select选项集合!
    asp.net 模拟CURL调用微信公共平台API 上传下载多媒体文件接口
    Log4net 根据日志类别保存到不同的文件,并按照日期生成不同文件名称
  • 原文地址:https://www.cnblogs.com/lightsong/p/13994422.html
Copyright © 2011-2022 走看看