zoukankan      html  css  js  c++  java
  • Hunspell介绍及试用

    1、简介

      Hunspell是一个为拥有多态和复杂组合词的语言所设计的拼写检查器,原本为匈牙利语设计。

      Hunspell是一个自由软件,在GPL、LGPL和MPL三许可证下发行。

      Hunspell对主要平台和编程语言都有接口和封装。Hunspell基于MySpell,并且与MySpell词典后端兼容。MySpell使用单字节字符编码,而Hunspell则可以使用Unicode UTF-8编码的词典。

    2、以下应用程序使用Hunspell作为拼写检查器:

      Mac OS X10.6 以及之后版本

      Eclipse,使用Hunspell4Eclipse

      Google Chrome,Google开发的一个网页浏览器

      Evernote,笔记软件

      LibreOffice和OpenOffice.org,开源办公组件

      Mozilla Firefox和Thunderbird以及SeaMonkey

      Opera,一个跨平台的网页浏览器

      Scribus,桌面出版应用

      Vim,一个文本编辑器

      WPS Office,国产办公组件

    3、使用docker镜像测试Hunspell的功能:

      3.1查看可用字典

    [root@host-10-0-251-159 hunspell]# docker run --rm tmaier/hunspell -D
    SEARCH PATH:
    .::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/root/.openoffice.org/3/user/wordbook:/root/.openoffice.org2/user/wordbook:/root/.openoffice.org2.0/user/w/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/shhare/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
    AVAILABLE DICTIONARIES (path is not mandatory for -d option):
    /usr/share/hunspell/en_CA
    /usr/share/hunspell/de_DE_comb
    /usr/share/hunspell/en_ZA
    /usr/share/hunspell/en_US
    /usr/share/hunspell/en_GB
    /usr/share/hunspell/en_AU
    /usr/share/hunspell/de_CH
    /usr/share/hunspell/de_DE_neu
    /usr/share/hunspell/en_NZ
    /usr/share/hunspell/de_AT
    /usr/share/hunspell/default
    LOADED DICTIONARY:
    /usr/share/hunspell/default.aff
    /usr/share/hunspell/default.dic
    Hunspell 1.6.2

      3.2查看帮助信息

    [root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words  -h
    Usage: hunspell [OPTION]... [FILE]...
    Check spelling of each FILE. Without FILE, check standard input.
     
      -1        check only first field in lines (delimiter = tabulator)
      -a        Ispell's pipe interface
      --check-url   check URLs, e-mail addresses and directory paths
      --check-apostrophe    check Unicode typographic apostrophe
      -d d[,d2,...] use d (d2 etc.) dictionaries
      -D        show available dictionaries
      -G        print only correct words or lines
      -h, --help    display this help and exit
      -H        HTML input file format
      -i enc    input encoding
      -l        print misspelled words(只打印错误的单词)
      -L        print lines with misspelled words(打印错误单词所在行)
      -m        analyze the words of the input text
      -n        nroff/troff input file format
      -O        OpenDocument (ODF or Flat ODF) input file format
      -p dict   set dict custom dictionary
      -r        warn of the potential mistakes (rare words)
      -P password   set password for encrypted dictionaries
      -s        stem the words of the input text
      -S        suffix words of the input text
      -t        TeX/LaTeX input file format
      -v, --version print version number
      -vv       print Ispell compatible version number
      -w        print misspelled words (= lines) from one word/line input.
      -X        XML input file format
     
    Example: hunspell -d en_US file.txt    # interactive spelling
             hunspell -i utf-8 file.txt    # check UTF-8 encoded file
             hunspell -l *.odt             # print misspelled words of ODF files
     
             # Quick fix of ODF documents by personal dictionary creation
     
             # 1 Make a reduced list from misspelled and unknown words:
     
             hunspell -l *.odt | sort | uniq >words
     
             # 2 Delete misspelled words of the file by a text editor.
             # 3 Use this personal dictionary to fix the deleted words:
     
             hunspell -p words *.odt
     
    Bug reports: http://hunspell.github.io/

      3.3检查某个文档的拼写(显示错误词所在行数及建议更改)原文:test1.TXT(链接:https://pan.baidu.com/s/17JRmtnebLblVsMG05CIm-w 密码:l3q9)

    [root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words  test1.TXT
    test1.TXT:7: Locate: rans | Try: rand
    test1.TXT:15: Locate: wew | Try: woo
    test1.TXT:23: Locate: Sevenn | Try: Severn
    test1.TXT:27: Locate: cannt | Try: canny
    test1.TXT:203: Locate: Hmm | Try: Mm
    test1.TXT:211: Locate: Lele | Try: Lee
    test1.TXT:215: Locate: Lele | Try: Lee
    test1.TXT:243: Locate: Lele | Try: Lee
    test1.TXT:247: Locate: Lele | Try: Lee
    test1.TXT:284: Locate: Hmm | Try: Mm
    test1.TXT:292: Locate: Hmm | Try: Mm
    test1.TXT:468: Locate: ve | Try: be
    test1.TXT:500: Locate: ve | Try: be
    test1.TXT:516: Locate: ve | Try: be
    test1.TXT:564: Locate: Hmm | Try: Mm
    test1.TXT:644: Locate: ve | Try: be
    test1.TXT:776: Locate: hasn | Try: has
    test1.TXT:921: Locate: isn | Try: sin
    test1.TXT:945: Locate: ve | Try: be
    test1.TXT:953: Locate: ve | Try: be
    test1.TXT:989: Locate: Hmm | Try: Mm
    test1.TXT:1005: Locate: Hmm | Try: Mm
    test1.TXT:1085: Locate: wasn | Try: wans
    test1.TXT:1129: Locate: isn | Try: sin
    test1.TXT:1145: Locate: isn | Try: sin
    test1.TXT:1173: Locate: vomeronasal | Try: astronomer
    test1.TXT:1213: Locate: didn | Try: did
    test1.TXT:1289: Locate: ve | Try: be
    test1.TXT:1329: Locate: weren | Try: were
    test1.TXT:1349: Locate: wasn | Try: wans
    test1.TXT:1425: Locate: wouldn | Try: would
    test1.TXT:1425: Locate: weren | Try: were
    test1.TXT:1470: Locate: ve | Try: be
    test1.TXT:1495: Locate: ve | Try: be
    test1.TXT:1803: Locate: cefepime | Try: timepiece
    test1.TXT:1807: Locate: amikacin | Try: Kamikaze
    test1.TXT:1819: Locate: Mmm | Try: Mm
    test1.TXT:1839: Locate: kuai | Try: Kauai
    test1.TXT:1895: Locate: ve | Try: be
    test1.TXT:1903: Locate: isn | Try: sin
    test1.TXT:2012: Locate: ve | Try: be
    test1.TXT:2096: Locate: aren | Try: earn
    test1.TXT:2116: Locate: shouldn | Try: should
    test1.TXT:2168: Locate: whould | Try: would
    test1.TXT:2232: Locate: Hmm | Try: Mm
    test1.TXT:2800: Locate: Hmm | Try: Mm
    test1.TXT:2820: Locate: Hmm | Try: Mm
    test1.TXT:2930: Locate: ve | Try: be
    test1.TXT:2993: Locate: Hmm | Try: Mm
    test1.TXT:2997: Locate: Hmm | Try: Mm
    test1.TXT:3076: Locate: Uhh | Try: Shh
    test1.TXT:3331: Locate: Chh | Try: Ch
    test1.TXT:3376: Locate: Hmm | Try: Mm
    test1.TXT:3412: Locate: isn | Try: sin
    test1.TXT:3436: Locate: ve | Try: be
    test1.TXT:3448: Locate: exfoliator | Try: defoliator
    test1.TXT:3518: Locate: didn | Try: did
    test1.TXT:3531: Locate: didn | Try: did
    test1.TXT:3652: Locate: Hmm | Try: Mm
    test1.TXT:3696: Locate: ve | Try: be
  • 相关阅读:
    微信开发 接口测试
    微信开发 消息接口
    java微信学习 接入
    排序算法 java实现2
    排序算法 java实现
    第一篇博客
    Android——反编译持续完善
    Android——实用小技巧
    Android——网络编程
    Android——服务
  • 原文地址:https://www.cnblogs.com/zhenyuyaodidiao/p/9288469.html
Copyright © 2011-2022 走看看