Study notes for Discrete Probability Distribution

zoukankan html css js c++ java

Study notes for Discrete Probability Distribution
The Basics of Probability

Probability measures the amount of uncertainty of an event: a fact whose occurence is uncertain.

Sample space refers to the set of all possible events, denoted as $mathcal{S}$ .

Some properties:

Sum rule: $p(Acup B)=p(A)+p(B)-p(Acap B)$

Union bound: $p(cup_{i=1}^n A_i)le sum_{i=1}^n p(A_i)$

Conditional probability: $p(B|A)=frac{p(B, A)}{p(A)}$ . To emphasize that p(A) is unconditional, p(A) is called "marginal probability", and p(B, A) is called "joint probability", where p(A, B)=p(B|A) p(A) is called the "multiplication rule" or "factorization rule".

Total probability theorem: p(B) = p(B|A)p(A) + p(B|~A)p(~A)

Bayes' Theorem:
$p(A|B)=frac{p(A, B)}{p(B)}=frac{p(B|A)p(A)}{p(B)}=frac{p(B|A)p(A)}{p(B|A)p(A)+p(B|ar{A})p(ar{A})}$
Bayes' Theorem can be regarded as a rule to update a prior probability p(A) into a posterior probability p(A|B), taking into account the amount/occurrence of evidence/event B.

Conditional independence: Two events A and B, with p(A)>0 and p(B)>0 are independent, given C, if p(A, B|C)=p(A|C) p(B|C).

Probability mass function (p.m.f) of random variable X is a function $f: x ightarrow f(x)=Pr[X=x]$

Joint probability mass function of X and Y is a function $f: (x, y) ightarrow f(x,y)=Pr[X=xcap Y=y]$

Cumulative distribution function (c.d.f) of a random variable X is a function: $f: x ightarrow f(x)=Pr[Xle x]$

The c.d.f describes the probability in a specific interval, whereas the p.m.f describes the probability in a specific event.

Expectation: the expectationof a random variable X is: $E[x]=sum olimits_x x Pr[X=x]$

linearity: E[aX+bY]=aE[x]+bE[Y]

if X and Y are independent: E[XY]=E[X]*E[Y]

Markov's inequality: let X be a nonnegative random variable with $E[X]<infty$ , then for all $t>0, Pr[Xge tE[X]]le frac{1}{t}$

Variance: the variance of a random variable X is: $Var[x]=sigma_X^2=E[(X-E[X])^2]$ , where $sigma_X$ is called the standard deviation of the random variable X.

Var[aX] = a²Var[X]

if X and Y are independent, Var[X+Y]=Var[X]+Var[Y]

Chebyshev's inequality: let X be a random variable $E[X]<infty$ , then for all $t>0, Pr[|X-E[X]|ge tsigma_X]le frac{1}{t^2}$

Bernoulli Distribution

A (single) Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure", or "yes" and "no". Examples of Bernoulli trials include: flipping a coin, political option poll, etc.

The Bernoulli distribution is a discrete probability distribution ofone (a) discrete random variable X, which takes value 1 with success probability p: Pr(X=1)=p, and value 0 with failure probability Pr(X=0)=q=1-p. For formally, the Bernoulli distribution is summarized as follows:

notation: Bern(p), where 0<p<1 is the probability of success.

support: X={0, 1}

p.m.f: Pr[X=0]=q=1-p, Pr[X=1]=p

mean: E[X]=p

variance: Var[X]=p(1-p)

It is a special case of Binomial distribution B(n, p). Bernoulli distribution is B(1, p).

Binomial Distribution

The Binomial distribution is the discrete probability distribution of the number of successes in a sequence ofn independent Bernoulli trials with success probabilityp, denoted asX~B(n, p).

The Binomial distribution is often used to model the number of successes in a sample of sizen drawn with replacement from a population of sizeN. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one.

The Binomial distribution is summarized as follows:

notation: B(n, p), where n is the number of trials and p is the success probability in each trial

support: k = {0, 1, ..., n} the number of successes

p.m.f: $inom{n}{k}p^k(1-p)^{n-k}$

mean: np

variance: np(1-p)

If n is large enough, then the skew of the distribution is not too great. In this case, a reasonable approximation to B(n, p) is given by the normal distribution: $mathcal{N}(np, np(1-p))$ since a large n will result in difficulty to compute the p.m.f of Binomial distribution.

one rule to determine if such approximation is reasonable, or if n is large enough is that both np and np(1-p) must be greater than 5. If both are greater than 15 then the approximation should be good.

A second rule is than for n>5, the normal approximation is adequate if:
$Big|Big(frac{1}{sqrt{n}}Big)Big(sqrt{frac{1-p}{p}}-sqrt{frac{p}{1-p}}Big)Big|<0.3$

Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviation of its mean is within the range of possible values, that is if:
$mupm 3sigma=nppm 3sqrt{np(1-p)}$

To improve the accuracy of the approximation, we usually use a correction factor to take into account that the binomial random variable is discrete while the normal random variable is continuous. In particular, the basic idea is to treat the discrete value k as the continuous interval from k-0.5 to k+0.5.

In addition, Poisson distribution can be used to approximate the Binomial distribution when n is very large. A rule of thumb stating that the Poisson distribution is a good approximation oof the binomial distribution if n is at least 20 and p is smaller than or equal to 0.05, and an excellent approximation if n>=100, and np<=10: $B(n, p) approx P(lambda=np)$

Poisson Distribution

Poisson distribution: Let X be a discrete random variable taking values in the set of integer numbers $mathcal{N}={�, 1, 2, ldots}$ with probability:
$Pr(X=x)=frac{lambda^x}{x!} e^{-lambda} quad x = 0, 1, 2, ldots$

My understanding. Poisson distribution describes the fact that the probability of drawing a specific integer from a set of integers is not uniform. For example, it is well-known that if someone is asked to pick a random integer from 1-10, some integers are occurring with greater probability whereas some others happen with lower probability. Although it seems that all possible integers get equal chance to be picked, it is not true in real case. I think this may be due to subjectivity of people, i.e., some one prefers larger values while other tends to pick smaller ones. This point needs to be verified as I got this feeling totally from intuitions.

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independent of the time since the last event.

The Poisson distribution is summarized as follows.

notation: $P(lambda)$ , where $lambda=lambda T>0$ is a real number, indicating the number of events occurring that will be observed in the time interval $T=1$ .

support: k = {0, 1, 2, 3, ...}

mean: $lambda$

variance: $lambda$

Applications of Poisson distribution

Telecommunication: telephone calls arriving in a system

Management: customers arriving at a counter or call center

Civil engineering: cars arriving at a traffic light

Generating Poisson random variables
algorithm poisson_random_number: init: Let
$Lleftarrow e^{-lambda}$
,
$kleftarrow 0$
, and
$pleftarrow 1$
. do:
$kleftarrow k+1$
Generate uniform random number u in [0, 1], and let
$pleftarrow p imes u$
while p>L. return k-1.

References

Paola Sebastiani, A tutorial on probability theory

Mehryar Mohri, Introduction to Machine Learning - Basic Probability Notations.
查看全文

相关阅读:
docker构建镜像
 SpringBoot 配置的加载
 Gradle实战(02)--Gradle Build的创建
 Gradle实战(01)--介绍与安装
 统计最常用10个命令的脚本
 jackson序列化与反序列化的应用实践
 go http请求流程分析
 java线程的3种实现方式及线程池
 git多账号使用
 java多版本管理

原文地址：https://www.cnblogs.com/jiangu66/p/3180246.html

Study notes for Discrete Probability Distribution

The Basics of Probability

Bernoulli Distribution

Binomial Distribution

Poisson Distribution

References