zoukankan      html  css  js  c++  java
  • 似然和对数似然Likelihood & LogLikelihood

    One of the most fundamental concepts of modern statistics is that of likelihood. In each of the discrete random variables we have considered thus far, the distribution depends on one or more parameters that are, in most statistical applications, unknown. In the Poisson distribution, the parameter is λ. In the binomial, the parameter of interest is p (since n is typically fixed and known).

    Likelihood is a tool for summarizing the data’s evidence about unknown parameters. Let us denote the unknown parameter(s) of a distribution generically by θ. Since the probability distribution depends on θ, we can make this dependence explicit by writing f(x) as f(x ; θ). For example, in the Bernoulli distribution the parameter is θ =  π , and the distribution is

     f(x;π)=πx(1π)1xx=0,1f(x;π)=πx(1−π)1−xx=0,1    (2)

    Once a value of X has been observed, we can plug this observed value x into f(x ; π ) and obtain a function of π only. For example, if we observe X = 1, then plugging x = 1 into (2) gives the function π . If we observe X = 0, the function becomes 1 − π .

    Whatever function of the parameter we get when we plug the observed data x into f(x ; θ), we call that function thelikelihood function.

    We write the likelihood function as L(θ;x)=ni=1f(Xi;θ)L(θ;x)=∏i=1nf(Xi;θ) or sometimes just L(θ). Algebraically, the likelihoodL(θ ; x) is just the same as the distribution f(x ; θ), but its meaning is quite different because it is regarded as a function of θ rather than a function of x. Consequently, a graph of the likelihood usually looks very different from a graph of the probability distribution.

    For example, suppose that X has a Bernoulli distribution with unknown parameter π . We can graph the probability distribution for any fixed value of π  . For example, if π = .5 we get this:

    plot

    Now suppose that we observe a value of X, say X = 1. Plugging x = 1 into the distribution πx(1π)1xπx(1−π)1−x gives the likelihood function L(π ; x) = π , which looks like this:

    plot

    For discrete random variables, a graph of the probability distribution f(x ; θ) has spikes at specific values of x, whereas a graph of the likelihood L(θ ; x) is a continuous curve (e.g. a line) over the parameter space, the domain of possible values for θ.

    L(θ ; x) summarizes the evidence about θ contained in the event X = x. L(θ ; x) is high for values of θ that make X =x more likely, and small for values of θ that make X = x unlikely. In the Bernoulli example, observing X = 1 gives some (albeit weak) evidence that π  is nearer to 1 than to 0, so the likelihood for x = 1 rises as p moves from 0 to 1.

    For example, if we observe xx from Bin(n,π)Bin(n,π), the likelihood function is 

    L(π|x)=n!(nx)!x!πx(1π)nx.L(π|x)=n!(n−x)!x!πx(1−π)n−x.


    Any multiplicative constant which does not depend on θθ is irrelevant and may be discarded, thus, 

    L(π|x)πx(1π)nx.L(π|x)∝πx(1−π)n−x.

    Loglikelihood

    In most cases, for various reasons, but often computational convenience, we work with the loglikelihood 

    l(θ|x)=logL(θ|x)l(θ|x)=log⁡L(θ|x)



    which is defined up to an arbitrary additive constant. 

    For example, the binomial loglikelihood is 

    l(π|x)=xlogπ+(nx)log(1π).l(π|x)=xlog⁡π+(n−x)log⁡(1−π).

    In many problems of interest, we will derive our loglikelihood from a sample rather than from a single observation. If we observe an independent sample x1,x2,...,xnx1,x2,...,xn  from a distribution f(x|θ)f(x|θ), then the overall likelihood is the product of the individual likelihoods:

    L(θ|x)==i=1nf(xi|θ)i=1nL(θ|xi)L(θ|x)=∏i=1nf(xi|θ)=∏i=1nL(θ|xi)



    and the loglikelihood is: 

    l(θ|x)==logi=1nf(xi|θ)i=1nlogf(xi|θ)=i=1nl(θ|xi).l(θ|x)=log∏i=1nf(xi|θ)=∑i=1nlogf(xi|θ)=∑i=1nl(θ|xi).



    Binomial loglikelihood examples:  
    Plot of binomial loglikelihood function if n = 5 and we observe x = 0, x = 1, and x = 2 (see the lec1fig.R code on ANGEL on how to produce these figures):

    In regular problems, as the total sample size nn grows, the loglikelihood function does two things:

    • it  becomes more sharply peaked around its maximum,  and
    • its shape becomes nearly quadratic (i.e. a  parabola, if there is a single parameter).

    This is important since the tests such as Wald test based on z=statisticSE of statisticz=statisticSE of statistic only works if the logL approximates well to quadratic form. For example, the loglikelihood for a normal-mean problem is exactly quadratic. As the sample size grows, the inference comes to resemble the normal-mean problem. This is true even for discrete data. The extent to which normal-theory approximations work for discrete data does not depend on how closely the distribution of responses resembles a normal curve, but on how closely the loglikelihood resembles a quadratic function.

    Transformations may help us to improve the shape of loglikelihood. More on this in Section 1.6 on Alternative Parametrizations. Next we will see how we use the likelihood, that is the corresponding loglikelihood, to estimate the most likely value of the unknown parameter of interest.

    from: https://onlinecourses.science.psu.edu/stat504/node/27

  • 相关阅读:
    k8s二进制安装
    jenkins
    Deploy Apollo on Kubernetes
    Apollo配置中心搭建常见报错
    Apollo配置中心搭建过程
    使用CephRBD为Kubernetes提供StorageClass
    Ceph基础命令总结
    Ceph分布式存储系统搭建
    zabbix入门之配置邮件告警
    zabbix入门之定义触发器
  • 原文地址:https://www.cnblogs.com/GarfieldEr007/p/5232140.html
Copyright © 2011-2022 走看看