zoukankan      html  css  js  c++  java
  • an introduction to conditional random fields

    1.Structured prediction methods are essentially a combination of classification and graphical modeling.

    2.They combine the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features.

    3.The input x is divided into feature vectors {x0,x1, . . . ,xT }. Each xs contains various information about the word at position s, such as its identity, orthographic features such as prefixes and suffixes, membership in domain-specific lexicons, and information in semantic databases such as WordNet.

    4.CRFs are essentially a way of combining the advantages of discriminative classification and graphical modeling, combining the ability to compactly model multivariate outputs y with the ability to leverage a large number of input features x for prediction.

    5.The difference between generative models and CRFs is thus exactly analogous to the difference between the naive Bayes and logistic regression classifiers. Indeed, the multinomial logistic regression model can be seen as the simplest kind of CRF, in which there is only one output variable.

    6.The insight of the graphical modeling perspective is that a distribution over very many variables can often be represented as a product of local functions that each depend on a much smaller subset of variables. This factorization turns out to have a close connection to certain conditional independence relationships among the variables — both types of information being easily summarized by a graph. Indeed, this relationship between factorization, conditional independence, and graph structure comprises much of the power of the graphical modeling framework: the conditional independence viewpoint is most useful for designing models, and the factorization viewpoint is most useful for designing inference algorithms.

    7.The principal advantage of discriminative modeling is that it is better suited to including rich, overlapping features.

    8.In principle, it may not be clear why these approaches should be so different, because we can always convert between the two methods using Bayes rule. For example, in the naive Bayes model, it is easy to convert the joint p(y)p(x|y) into a conditional distribution p(y|x). Indeed, this conditional has the same form as the logistic regression model (2.9). And if we managed to obtain a “true” generative model for the data, that is, a distribution p∗(y,x) = p∗(y)p∗(x|y) from which the data were actually sampled, then we could simply compute the true p∗(y|x), which is exactly the target of the discriminative approach. But it is precisely because we never have the true distribution that the two approaches are different in practice. Estimating p(y)p(x|y) first, and then computing the resulting p(y|x) (the generative approach)yields a different estimate than estimating p(y|x) directly. In other words, generative and discriminative models both have the aim of stimating p(y|x), but they get there in different ways.

  • 相关阅读:
    【详记MySql问题大全集】四、设置MySql大小写敏感(踩坑血泪史)
    【详记MySql问题大全集】三、安装之后没有my.ini配置文件怎么办
    【详记MySql问题大全集】二、安装并破解Navicat
    【详记MySql问题大全集】一、安装MySql
    【从零开始搭建自己的.NET Core Api框架】(五)由浅入深详解CORS跨域机制并快速实现
    Redhat6.5编译安装MySQL5.6.38详解
    聚宽常用函数汇总
    小象机器学习(邹博老师)学习笔记
    Python画图色板
    kaggle注册、短信验证终极解决方案(亲测181205)
  • 原文地址:https://www.cnblogs.com/kevinGaoblog/p/3880687.html
Copyright © 2011-2022 走看看