zoukankan      html  css  js  c++  java
  • 基本概率分布Basic Concept of Probability Distributions 5: Hypergemometric Distribution

    PDF version

    PMF

    Suppose that a sample of size $n$ is to be chosen randomly (without replacement) from an urn containing $N$ balls, of which $m$ are white and $N-m$ are black. If we let $X$ denote the number of white balls selected, then $$f(x; N, m, n) = Pr(X = x) = {{mchoose x}{N-mchoose n-x}over {Nchoose n}}$$ for $x= 0, 1, 2, cdots, n$.

    Proof:

    This is essentially the Vandermonde's identity: $${m+nchoose r} = sum_{k=0}^{r}{mchoose k}{nchoose r-k}$$ where $m$, $n$, $k$, $rin mathbb{N}_0$. Because $$ egin{align*} sum_{r=0}^{m+n}{m+nchoose r}x^r &= (1+x)^{m+n} quadquadquadquadquadquadquadquad mbox{(binomial theorem)}\ &= (1+x)^m(1+x)^n\ &= left(sum_{i=0}^{m}{mchoose i}x^{i} ight)left(sum_{j=0}^{n}{nchoose j}x^{j} ight)\ &= sum_{r=0}^{m+n}left(sum_{k=0}^{r}{mchoose k}{nchoose r-k} ight)x^r quadquadmbox{(product of two binomials)} end{align*} $$ Using the product of two binomials: $$ egin{eqnarray*} left(sum_{i=0}^{m}a_i x^i ight)left(sum_{j=0}^{n}b_j x^j ight) &=& left(a_0+a_1x+cdots + a_mx^m ight)left(b_0+b_1x+cdots + b_nx^n ight)\ &=& a_0b_0 + a_0b_1x +a_1b_0x +cdots +a_0b_2x^2 + a_1b_1x^2 + a_2b_0x^2 +\ & &cdots + a_mb_nx^{m+n}\ &=& sum_{r=0}^{m+n}left(sum_{k=0}^{r}a_{k}b_{r-k} ight)x^{r} end{eqnarray*} $$ Hence $$ egin{eqnarray*} & &sum_{r=0}^{m+n}{m+nchoose r}x^r = sum_{r=0}^{m+n}left(sum_{k=0}^{r}{mchoose k}{nchoose r-k} ight)x^r\ &implies& {m+nchoose r} = sum_{k=0}^{r}{mchoose k}{nchoose r-k}\ & implies& sum_{k=0}^{r}{{mchoose k}{nchoose r-k}over {m+nchoose r}} = 1 end{eqnarray*} $$

    Mean

    The expected value is $$mu = E[X] = {nmover N}$$

    Proof:

    $$ egin{eqnarray*} E[X^k] &=& sum_{x=0}^{n}x^kf(x; N, m, n)\ &=& sum_{x=0}^{n}x^k{{mchoose x}{N-mchoose n-x}over {Nchoose n}}\ &=& {nmover N}sum_{x=0}^{n} x^{k-1} {{m-1 choose x-1}{N-mchoose n-x}over {N-1 choose n-1}}\ & & (mbox{identities:} x{mchoose x} = m{m-1choose x-1}, n{Nchoose n} = N{N-1choose n-1})\ &=& {nmover N}sum_{x=0}^{n} (y+1)^{k-1} {{m-1 choose y}{(N-1) - (m - 1)choose (n-1)-y}over {N-1 choose n-1}}quadquad(mbox{setting} y=x-1)\ &=& {nmover N}Eleft[(Y+1)^{k-1} ight] quadquadquad quadquad quadquadquadquad (mbox{since} Ysim g(y; m-1, n-1, N-1)) end{eqnarray*} $$ Hence, setting $k=1$ we have $$E[X] = {nmover N}$$ Note that this follows the mean of the binomial distribution $mu = np$, where $p = {mover N}$.

    Variance

    The variance is $$sigma^2 = mbox{Var}(X) = np(1-p)left(1 - {n-1 over N-1} ight)$$ where $p = {mover N}$.

    Proof:

    $$ egin{align*} E[X^2] &= {nmover N}E[Y+1] quadquadquad quadquadquad quad (mbox{setting} k=2)\ &= {nmover N}left(E[Y] + 1 ight)\ & = {nmover N}left[{(n-1) (m-1) over N-1}+1 ight] end{align*} $$ Hence the variance is $$ egin{align*} mbox{Var}(X) &= Eleft[X^2 ight] - E[X]^2\ &= {mnover N}left[{(n-1) (m-1) over N-1}+1 - {nmover N} ight]\ &= np left[ (n-1) cdot {pN-1over N-1}+1-np ight] quadquad quad quad quadquad(mbox{setting} p={mover N})\ &= npleft[(n-1)cdot {p(N-1) + p -1 over N-1} + 1 -np ight]\ &= npleft[(n-1)p + (n-1)cdot{p-1 over N-1} + 1-np ight]\ &= npleft[1-p - (1-p)cdot {n-1over N-1} ight] \ &= np(1-p)left(1 - {n-1 over N-1} ight) end{align*} $$ Note that it is approximately equal to 1 when $N$ is sufficient large (i.e. ${n-1over N-1} ightarrow 0$ when $N ightarrow +infty$). And then it is the same as the variance of the binomial distribution $sigma^2 = np(1-p)$, where $p = {mover N}$.

    Examples

    1. At a lotto game, seven balls are drawn randomly from an urn containing 37 balls numbered from 0 to 36. Calculate the probability $P$ of having exactly $k$ balls with an even number for $k=0, 1, cdots, 7$.

    Solution:

    $$P(X = k) = {{19choose k}{18choose 7-k}over {37 choose 7}}$$

    p = NA; k = 0:7
    for (i in k){
    +   p[i+1] = round(choose(19, i) * choose(18, 7-i) 
    +                  / choose(37, 7), 3)
    + }
    p
    # [1] 0.003 0.034 0.142 0.288 0.307 0.173 0.047 0.005

    2. Determine the same probabilities as in the previous problem, this time using the normal approximation.

    Solution:

    The mean is $$mu = {nmover N} = {7 imes19over 37} = 3.594595$$ and the standard deviation is $$sigma = sqrt{{nmover N}left(1-{mover N} ight)left(1 - {n-1over N-1} ight)} = sqrt{{7 imes19over 37}left(1 - {19over 37} ight) left(1 - {7-1over 37-1} ight)} = 1.207174$$ The probability of normal approximation is

    p = NA; k = 0:7
    mu = 7 * 19 / 37
    s = sqrt(7 * 19 / 37 * (1 - 19/37) * (1 - 6/36))
    for (i in k){
    +   p[i+1] = round(dnorm(i, mu, s), 3)
    + }
    p
    # [1] 0.004 0.033 0.138 0.293 0.312 0.168 0.045 0.006

    Reference

    1. Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
    2. Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 11. ISBN: 978-87-7681-409-0.


    作者:赵胤
    出处:http://www.cnblogs.com/zhaoyin/
    本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

  • 相关阅读:
    2
    异常处理
    接口
    抽象与多态
    关联关系
    9-13
    数据类型转换
    Day3
    对象和类
    MyEclipse导入现成项目出现小红叉错误
  • 原文地址:https://www.cnblogs.com/zhaoyin/p/4206519.html
Copyright © 2011-2022 走看看