zoukankan      html  css  js  c++  java
  • 小马哥课堂-统计学-标准误差

    小马哥课堂-统计学-中心极限定理一节的例子中提到一个标准误差的概念,有同学对此不清楚,所以这里单独写一节,来对standard error进行阐述,希望能大家能有一个直观的理解。

    Standard error(标准误差)

    The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution.If the parameter or the statistic is the mean, it is called the standard error of the mean (SEM).

    The sampling distribution of a population mean is generated by repeated sampling and recording of the means obtained. This forms a distribution of different means, and this distribution has its own mean and variance. Mathematically, the variance of the sampling distribution obtained is equal to the variance of the population divided by the sample size. This is because as the sample size increases, sample means cluster more closely around the population mean.

    Therefore, the relationship between the standard error and the standard deviation is such that, for a given sample size, the standard error equals the standard deviation divided by the square root of the sample size. In other words, the standard error of the mean is a measure of the dispersion of sample means around the population mean.

    标准误差,通常是指 某个统计量(一般是某个分布的参数估计,例如正态分布的(mu)参数的估计)的标准误差,即抽样分布的标准差。

    对总体进行样本容量为n的抽样,样本容量为n,反复进行抽样,那么"每个样本"的均值 形成一个分布,该分布有自己的期望和方差。数学上,抽样分布的方差等于 总体方差除以样本容量。随着样本容量的增大,样本均值越来越接近于总体均值。因此,标准差和标准误的关系是:给定样本容量n,标准误等于 标准差除以 样本容量的平方根。换而言之,样本均值的标准误是衡量 样本均值和总体均值的离散程度。

    我们知道,方差是衡量 随机变量与其期望的离散程度;

    又因为,样本均值的标准误是衡量 样本均值和总体均值的离散程度;

    所以,我们将 样本均值 看成是一个 随机变量(overline X),那么,标准误就是 随机变量(overline X)的标准差。概括言之(抽象成更一般的情况),标准误是抽样分布的标准差。

    Population

    The standard error of the mean (SEM) can be expressed as:

    [sigma_ar x = frac {sigma}{sqrt n} ]

    where

    σ is the standard deviation of the population.
    n is the size (number of observations) of the sample.

    Estimate

    Since the population standard deviation is seldom known, the standard error of the mean is usually estimated as the sample standard deviation divided by the square root of the sample size (assuming statistical independence of the values in the sample).

    [sigma_ar x approx frac {s}{sqrt n} ]

    where

    s is the sample standard deviation (i.e., the sample-based estimate of the standard deviation of the population), and
    n is the size (number of observations) of the sample.

    代码示例

    #!/usr/bin/env python3
    #-*- coding:utf-8 -*-
    #############################################
    #File Name: standard_error.py
    #Brief:  直观上演示 标准误差公式 的正确性
    #Author: frank
    #Email: frank0903@aliyun.com
    #Created Time:2018-08-09 20:29:10
    #Blog: http://www.cnblogs.com/black-mamba
    #Github: https://github.com/xiaomagejunfu0903/statistic_notes 
    #############################################
    import random
    import matplotlib.pyplot as plt
    import numpy as np
    
    n=10000
    
    #list_population=list(np.random.normal(size=n))
    list_population = list(np.random.randint(low=1,high=7,size=n))
    #print("list_population:{},len:{}".format(list_population,len(list_population)))
    
    #总体期望
    mean_population=np.mean(list_population)
    print("mean_population: %.6f"%mean_population)
    
    #总体标准差
    sigma=np.std(list_population,ddof=0)
    print("standard deviation of population:{}".format(sigma))
    
    #显示总体分布
    plt.figure(1)
    n,bins,patches = plt.hist(list_population,bins='auto',density=1)
    y_population = ((1 / (np.sqrt(2 * np.pi) * sigma)) * np.exp(-0.5 * (1 / sigma * (bins - mean_population))**2))
    plt.plot(bins, y_population, 'r--')
    plt.title('population distribution')
    text_comment = "$mu={}, sigma={}$".format(mean_population,sigma)
    plt.text(1, .5, text_comment,{'color':'r','fontsize':15})
    
    #抽样分布
    #获取standard error of the mean
    def get_SEM(list_population, simple_size, sampling_times):
        #进行 容量为simple_size的样本 抽样,抽样次数为sampling_times
        for i in range(sampling_times):
            samples=random.sample(list_population,simple_size)
            #print("samples:{}".format(samples))
            sampling_mean = np.mean(samples)
            #print("sampling mean:{}".format(sampling_mean))
            list_sampling_mean.append(sampling_mean)
        print("size of list_sampling_mean:{}".format(len(list_sampling_mean)))
        sampling_sd = np.std(list_sampling_mean,ddof=0)
        print("standard deviation of the sampling mean:{}".format(sampling_sd))
        return sampling_sd
        
    #样本容量
    simple_size = 10
    #抽样次数
    sampling_times = 1000
    #样本均值list
    list_sampling_mean = []
    
    print("理论标准误:{}".format(sigma/np.sqrt(simple_size)))
    
    sampling_sd = get_SEM(list_population, simple_size, sampling_times)
    
    plt.figure(2)
    n,bins,patches = plt.hist(list_sampling_mean,bins='auto',density=1)
    y_population = ((1 / (np.sqrt(2 * np.pi) * sampling_sd)) * np.exp(-0.5 * (1 / sampling_sd * (bins - np.mean(list_sampling_mean)))**2))
    plt.plot(bins, y_population, 'r--')
    plt.title('sample distribution of the sample mean')
    text_comment = "real $mu={0:}, sigma={1:}$".format(np.mean(list_sampling_mean),sampling_sd)
    plt.text(2.0, 0.4, text_comment,{'color':'r','fontsize':15})
    text_comment = "theoretical standard error of the mean:{}".format(sigma/np.sqrt(simple_size))
    plt.text(2.0, 0.8, text_comment,{'color':'m','fontsize':15})
    
    plt.show()
    

    std_err_10_1000

    std_err_30_1000

    从上面的结果可以看出,抽样分布的方差等于 总体方差除以样本容量,而且随着样本容量和抽样次数的增加,标准误的值越来越小,即越接近总体方差。

  • 相关阅读:
    Angular6在自定义指令中使用@HostBingDing() 和@HostListener()
    升级到Angular6后对老版本的RXJS代码做相应的调整
    关于Angular6版本升级和RXJS6新特性的讲解
    ANGULAR 使用 ng build --prod 编译报内存错误的解决办法
    在js内生成PDF文件并下载的功能实现(不调用后端),以及生成pdf时换行的格式不被渲染,word-break:break-all
    在js中获取页面元素的属性值时,弱类型导致的诡异事件踩坑记录,
    前端使用mobx时,变量已经修改了,为什么组件还是没变化,map类型变量,对象类型变量的值获取问题(主要矛盾发生在组件使用时)
    在Java中发送http的post请求,设置请求参数等等
    spring定时任务注解@Scheduled的记录
    js获取dom元素的子元素,父元素,兄弟元素小记
  • 原文地址:https://www.cnblogs.com/black-mamba/p/9451708.html
Copyright © 2011-2022 走看看