zoukankan      html  css  js  c++  java
  • Elasticsearch之中文分词器插件es-ik的自定义词库

      它在哪里呢?

       非常重要!

    [hadoop@HadoopMaster custom]$ pwd
    /home/hadoop/app/elasticsearch-2.4.3/plugins/ik/config/custom
    [hadoop@HadoopMaster custom]$ ll
    total 5252
    -rw-r--r--. 1 hadoop hadoop 156 Dec 14 10:34 ext_stopword.dic
    -rw-r--r--. 1 hadoop hadoop 130 Dec 14 10:34 mydict.dic
    -rw-r--r--. 1 hadoop hadoop 63188 Dec 14 10:34 single_word.dic
    -rw-r--r--. 1 hadoop hadoop 63188 Dec 14 10:34 single_word_full.dic
    -rw-r--r--. 1 hadoop hadoop 10855 Dec 14 10:34 single_word_low_freq.dic
    -rw-r--r--. 1 hadoop hadoop 5225922 Dec 14 10:34 sougou.dic
    [hadoop@HadoopMaster custom]$

    [hadoop@HadoopMaster elasticsearch-2.4.3]$ ll
    total 56
    drwxrwxr-x. 2 hadoop hadoop 4096 Feb 22 01:37 bin
    drwxrwxr-x. 3 hadoop hadoop 4096 Feb 22 18:46 config
    drwxrwxr-x. 3 hadoop hadoop 4096 Feb 22 06:05 data
    drwxrwxr-x. 2 hadoop hadoop 4096 Feb 22 01:37 lib
    -rw-rw-r--. 1 hadoop hadoop 11358 Aug 24 2016 LICENSE.txt
    drwxrwxr-x. 2 hadoop hadoop 4096 Feb 25 05:15 logs
    drwxrwxr-x. 5 hadoop hadoop 4096 Dec 8 00:41 modules
    -rw-rw-r--. 1 hadoop hadoop 150 Aug 24 2016 NOTICE.txt
    drwxrwxr-x. 5 hadoop hadoop 4096 Feb 25 06:31 plugins
    -rw-rw-r--. 1 hadoop hadoop 8700 Aug 24 2016 README.textile
    [hadoop@HadoopMaster elasticsearch-2.4.3]$ cd plugins/
    [hadoop@HadoopMaster plugins]$ ll
    total 12
    drwxrwxr-x. 5 hadoop hadoop 4096 Feb 22 05:28 head
    drwxrwxr-x. 3 hadoop hadoop 4096 Feb 25 06:32 ik
    drwxrwxr-x. 8 hadoop hadoop 4096 Feb 22 05:34 kopf
    [hadoop@HadoopMaster plugins]$ cd ik/
    [hadoop@HadoopMaster ik]$ ll
    total 5828
    -rw-r--r--. 1 hadoop hadoop 263965 Dec 1 2015 commons-codec-1.9.jar
    -rw-r--r--. 1 hadoop hadoop 61829 Dec 1 2015 commons-logging-1.2.jar
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 1 12:46 config
    -rw-r--r--. 1 hadoop hadoop 55998 Jan 1 13:27 elasticsearch-analysis-ik-1.10.3.jar
    -rw-r--r--. 1 hadoop hadoop 4505518 Jan 15 08:59 elasticsearch-analysis-ik-1.10.3.zip
    -rw-r--r--. 1 hadoop hadoop 736658 Jan 1 13:26 httpclient-4.5.2.jar
    -rw-r--r--. 1 hadoop hadoop 326724 Jan 1 13:07 httpcore-4.4.4.jar
    -rw-r--r--. 1 hadoop hadoop 2667 Jan 1 13:27 plugin-descriptor.properties
    [hadoop@HadoopMaster ik]$ cd config/
    [hadoop@HadoopMaster config]$ ll

    total 3016
    drwxr-xr-x. 2 hadoop hadoop 4096 Jan 1 12:46 custom
    -rw-r--r--. 1 hadoop hadoop 697 Dec 14 10:34 IKAnalyzer.cfg.xml
    -rw-r--r--. 1 hadoop hadoop 3058510 Dec 14 10:34 main.dic
    -rw-r--r--. 1 hadoop hadoop 123 Dec 14 10:34 preposition.dic
    -rw-r--r--. 1 hadoop hadoop 1824 Dec 14 10:34 quantifier.dic
    -rw-r--r--. 1 hadoop hadoop 164 Dec 14 10:34 stopword.dic
    -rw-r--r--. 1 hadoop hadoop 192 Dec 14 10:34 suffix.dic
    -rw-r--r--. 1 hadoop hadoop 752 Dec 14 10:34 surname.dic
    [hadoop@HadoopMaster config]$ cd custom/
    [hadoop@HadoopMaster custom]$ ll
    total 5252
    -rw-r--r--. 1 hadoop hadoop 156 Dec 14 10:34 ext_stopword.dic
    -rw-r--r--. 1 hadoop hadoop 130 Dec 14 10:34 mydict.dic
    -rw-r--r--. 1 hadoop hadoop 63188 Dec 14 10:34 single_word.dic
    -rw-r--r--. 1 hadoop hadoop 63188 Dec 14 10:34 single_word_full.dic
    -rw-r--r--. 1 hadoop hadoop 10855 Dec 14 10:34 single_word_low_freq.dic
    -rw-r--r--. 1 hadoop hadoop 5225922 Dec 14 10:34 sougou.dic

    [hadoop@HadoopMaster custom]$ cat ext_stopword.dic





    使
























    但[hadoop@HadoopMaster custom]$  

    大家,有兴趣,可以看看,英文停用词

    http://www.ranks.nl/stopwords

       

        大家,有兴趣,可以看看,中文停用词

  • 相关阅读:
    windows anaconda下安装Python的tesserocr库
    windows10上安装docker与碰到的坑
    阿里云centos下部署python flask应用。
    LeetCode--Python合并两个有序链表
    Linux(CentOS)下重置MySQL根(Root)密码,以及远程登录mysql连接IP受限问题解决
    windows下anaconda安装词云wordcloud
    关于selenium使用中谷歌浏览器驱动chromedriver的问题
    LeetCode 184. Department Highest Salary(找出每个部门中最高薪水)
    机器学习七--回归--多元线性回归Multiple Linear Regression
    机器学习六--回归--简单线性回归Simple Linear Regression
  • 原文地址:https://www.cnblogs.com/jpfss/p/10711811.html
Copyright © 2011-2022 走看看