zoukankan      html  css  js  c++  java
  • 似然和对数似然Likelihood & LogLikelihood

    One of the most fundamental concepts of modern statistics is that of likelihood. In each of the discrete random variables we have considered thus far, the distribution depends on one or more parameters that are, in most statistical applications, unknown. In the Poisson distribution, the parameter is λ. In the binomial, the parameter of interest is p (since n is typically fixed and known).

    Likelihood is a tool for summarizing the data’s evidence about unknown parameters. Let us denote the unknown parameter(s) of a distribution generically by θ. Since the probability distribution depends on θ, we can make this dependence explicit by writing f(x) as f(x ; θ). For example, in the Bernoulli distribution the parameter is θ =  π , and the distribution is

     f(x;π)=πx(1π)1xx=0,1f(x;π)=πx(1−π)1−xx=0,1    (2)

    Once a value of X has been observed, we can plug this observed value x into f(x ; π ) and obtain a function of π only. For example, if we observe X = 1, then plugging x = 1 into (2) gives the function π . If we observe X = 0, the function becomes 1 − π .

    Whatever function of the parameter we get when we plug the observed data x into f(x ; θ), we call that function thelikelihood function.

    We write the likelihood function as L(θ;x)=ni=1f(Xi;θ)L(θ;x)=∏i=1nf(Xi;θ) or sometimes just L(θ). Algebraically, the likelihoodL(θ ; x) is just the same as the distribution f(x ; θ), but its meaning is quite different because it is regarded as a function of θ rather than a function of x. Consequently, a graph of the likelihood usually looks very different from a graph of the probability distribution.

    For example, suppose that X has a Bernoulli distribution with unknown parameter π . We can graph the probability distribution for any fixed value of π  . For example, if π = .5 we get this:

    plot

    Now suppose that we observe a value of X, say X = 1. Plugging x = 1 into the distribution πx(1π)1xπx(1−π)1−x gives the likelihood function L(π ; x) = π , which looks like this:

    plot

    For discrete random variables, a graph of the probability distribution f(x ; θ) has spikes at specific values of x, whereas a graph of the likelihood L(θ ; x) is a continuous curve (e.g. a line) over the parameter space, the domain of possible values for θ.

    L(θ ; x) summarizes the evidence about θ contained in the event X = x. L(θ ; x) is high for values of θ that make X =x more likely, and small for values of θ that make X = x unlikely. In the Bernoulli example, observing X = 1 gives some (albeit weak) evidence that π  is nearer to 1 than to 0, so the likelihood for x = 1 rises as p moves from 0 to 1.

    For example, if we observe xx from Bin(n,π)Bin(n,π), the likelihood function is 

    L(π|x)=n!(nx)!x!πx(1π)nx.L(π|x)=n!(n−x)!x!πx(1−π)n−x.


    Any multiplicative constant which does not depend on θθ is irrelevant and may be discarded, thus, 

    L(π|x)πx(1π)nx.L(π|x)∝πx(1−π)n−x.

    Loglikelihood

    In most cases, for various reasons, but often computational convenience, we work with the loglikelihood 

    l(θ|x)=logL(θ|x)l(θ|x)=log⁡L(θ|x)



    which is defined up to an arbitrary additive constant. 

    For example, the binomial loglikelihood is 

    l(π|x)=xlogπ+(nx)log(1π).l(π|x)=xlog⁡π+(n−x)log⁡(1−π).

    In many problems of interest, we will derive our loglikelihood from a sample rather than from a single observation. If we observe an independent sample x1,x2,...,xnx1,x2,...,xn  from a distribution f(x|θ)f(x|θ), then the overall likelihood is the product of the individual likelihoods:

    L(θ|x)==i=1nf(xi|θ)i=1nL(θ|xi)L(θ|x)=∏i=1nf(xi|θ)=∏i=1nL(θ|xi)



    and the loglikelihood is: 

    l(θ|x)==logi=1nf(xi|θ)i=1nlogf(xi|θ)=i=1nl(θ|xi).l(θ|x)=log∏i=1nf(xi|θ)=∑i=1nlogf(xi|θ)=∑i=1nl(θ|xi).



    Binomial loglikelihood examples:  
    Plot of binomial loglikelihood function if n = 5 and we observe x = 0, x = 1, and x = 2 (see the lec1fig.R code on ANGEL on how to produce these figures):

    In regular problems, as the total sample size nn grows, the loglikelihood function does two things:

    • it  becomes more sharply peaked around its maximum,  and
    • its shape becomes nearly quadratic (i.e. a  parabola, if there is a single parameter).

    This is important since the tests such as Wald test based on z=statisticSE of statisticz=statisticSE of statistic only works if the logL approximates well to quadratic form. For example, the loglikelihood for a normal-mean problem is exactly quadratic. As the sample size grows, the inference comes to resemble the normal-mean problem. This is true even for discrete data. The extent to which normal-theory approximations work for discrete data does not depend on how closely the distribution of responses resembles a normal curve, but on how closely the loglikelihood resembles a quadratic function.

    Transformations may help us to improve the shape of loglikelihood. More on this in Section 1.6 on Alternative Parametrizations. Next we will see how we use the likelihood, that is the corresponding loglikelihood, to estimate the most likely value of the unknown parameter of interest.

    from: https://onlinecourses.science.psu.edu/stat504/node/27

  • 相关阅读:
    用JAVA发送一个XML格式的HTTP请求
    LR 测试http协议xml格式数据接口
    软件测试术语
    linux学习笔记
    接口测试文章整理
    InputStream只能读取一次的解决办法 C# byte[] 和Stream转换
    zTree更新自定义标签>>>
    C# 各类常见Exception 异常信息
    C# 调用存储过程 Sql Server存储过程 存储过程报错,程序中的try
    SQL Server 2014 清除用户名和密码
  • 原文地址:https://www.cnblogs.com/GarfieldEr007/p/5232140.html
Copyright © 2011-2022 走看看