zoukankan      html  css  js  c++  java
  • Python自然语言处理学习笔记(15):2.7 Further Reading 深入阅读

    转载请注明出处一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/

    新手上路,翻译不恰之处,恳请指出,不胜感谢 

    2.7 Further Reading 深入阅读

     

    Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. The corpus methods are summarized in the Corpus HOWTO, at http://www.nltk.org/howto , and documented extensively in the online API documentation.

    Significant sources of published corpora are the Linguistic Data Consortium (LDC) and the European Language Resources Agency (ELRA). Hundreds of annotated text and speech corpora are available in dozens of languages. Non-commercial licenses permit the data to be used in teaching and research. For some corpora, commercial licenses are also available (but for a higher fee).

     

    These and many other language resources have been documented using OLAC Metadata, and can be searched via the OLAC home page at http://www.language-archives.org/.Corpora List (see http://gandalf.aksis.uib.no/corpora/sub.html ) is a mailing list for discussions about corpora, and you can find resources by searching the list archives or posting to the list. The most complete inventory of the world’s languages is Ethnologue, http://www.ethnologue.com/ . Of 7,000 languages, only a few dozen have substantial digital resources suitable for use in NLP.

     

    This chapter has touched on the field of Corpus Linguistics(语料库语言学). Other useful books in this area include (Biber, Conrad, & Reppen, 1998), (McEnery, 2006), (Meyer, 2002), (Sampson & McCarthy, 2005), and (Scott & Tribble, 2006). Further readings in quantitative data analysis in linguistics are: (Baayen, 2008), (Gries, 2009), and (Woods, Fletcher, & Hughes, 1986).

    The original description of WordNet is (Fellbaum, 1998). Although WordNet was originally developed for research in psycholinguistics, it is now widely used in NLP and Information Retrieval. WordNets are being developed for many other languages, as documented at http://www.globalwordnet.org/ . For a study of WordNet similarity measures, see (Budanitsky & Hirst, 2006).

    Other topics touched on in this chapter were phonetics and lexical semantics, and we refer readers to Chapters 7 and 20 of (Jurafsky & Martin, 2008).

    None
  • 相关阅读:
    第12讲:数据库完整性
    第11讲:视图及其应用
    第10讲:利用SQL语言实现关系代数操作
    ArcEngine 坐标系转换
    [转]ArcGIS计算图斑的四邻坐标(XMin,XMax,YMin,YMax)
    oracle11g 修改字符集 修改为ZHS16GBK
    Oracle 全文索引相关命令
    SQL语句 递归
    流量操控之SSH隧道与端口转发
    VIM 常用操作
  • 原文地址:https://www.cnblogs.com/yuxc/p/2129038.html
Copyright © 2011-2022 走看看