zoukankan      html  css  js  c++  java
  • Python自然语言处理学习笔记(15):2.7 Further Reading 深入阅读

    转载请注明出处一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/

    新手上路,翻译不恰之处,恳请指出,不胜感谢 

    2.7 Further Reading 深入阅读

     

    Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. The corpus methods are summarized in the Corpus HOWTO, at http://www.nltk.org/howto , and documented extensively in the online API documentation.

    Significant sources of published corpora are the Linguistic Data Consortium (LDC) and the European Language Resources Agency (ELRA). Hundreds of annotated text and speech corpora are available in dozens of languages. Non-commercial licenses permit the data to be used in teaching and research. For some corpora, commercial licenses are also available (but for a higher fee).

     

    These and many other language resources have been documented using OLAC Metadata, and can be searched via the OLAC home page at http://www.language-archives.org/.Corpora List (see http://gandalf.aksis.uib.no/corpora/sub.html ) is a mailing list for discussions about corpora, and you can find resources by searching the list archives or posting to the list. The most complete inventory of the world’s languages is Ethnologue, http://www.ethnologue.com/ . Of 7,000 languages, only a few dozen have substantial digital resources suitable for use in NLP.

     

    This chapter has touched on the field of Corpus Linguistics(语料库语言学). Other useful books in this area include (Biber, Conrad, & Reppen, 1998), (McEnery, 2006), (Meyer, 2002), (Sampson & McCarthy, 2005), and (Scott & Tribble, 2006). Further readings in quantitative data analysis in linguistics are: (Baayen, 2008), (Gries, 2009), and (Woods, Fletcher, & Hughes, 1986).

    The original description of WordNet is (Fellbaum, 1998). Although WordNet was originally developed for research in psycholinguistics, it is now widely used in NLP and Information Retrieval. WordNets are being developed for many other languages, as documented at http://www.globalwordnet.org/ . For a study of WordNet similarity measures, see (Budanitsky & Hirst, 2006).

    Other topics touched on in this chapter were phonetics and lexical semantics, and we refer readers to Chapters 7 and 20 of (Jurafsky & Martin, 2008).

    None
  • 相关阅读:
    matlab curve fitting tool
    simulink model configuration parameter
    MATLAB小知识
    电源噪声与纹波
    各种“地”—— 各种“GND”
    一个不错的充电方案论坛:
    pre -regulator 前端稳压器
    LT4020替代方案
    power delivery功率输出
    LTC4020锂电池充电模块开发记录
  • 原文地址:https://www.cnblogs.com/yuxc/p/2129038.html
Copyright © 2011-2022 走看看