zoukankan      html  css  js  c++  java
  • elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词

    elasticsearch 自定义分词器

    安装拼音分词器、ik分词器

      拼音分词器: https://github.com/medcl/elasticsearch-analysis-pinyin/releases

      ik分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases

      下载源码需要使用maven打包

      下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下,可以重命名

    1.在es中设置分词

    创建索引,添加setting属性

    PUT myindex
    {
      "settings": {
        "index":{
          "analysis":{
            "analyzer":{
              "ik_pinyin_analyzer":{
                "type":"custom",
                "tokenizer":"ik_smart",
                "filter":"pinyin_filter"
              }
            },
            "filter":{
              "pinyin_filter":{
                "type":"pinyin",
                "keep_separate_first_letter" : false,                
           "keep_full_pinyin" : true,
           "keep_original" : false,
    "limit_first_letter_length" : 10,
    "lowercase" : true,
    "remove_duplicated_term" : true } } } } } }

    添加属性 设置mapping属性

    PUT myindex/_mapping/users
    {
      "properties": {
        "uname":{
          "type": "text",
          "analyzer": "ik_smart",
          "search_analyzer": "ik_smart",
          "fields": {
            "my_pinyin":{
              "type": "text"
              , "analyzer": "ik_pinyin_analyzer",
              "search_analyzer": "ik_pinyin_analyzer"
            }
          }
        },
        "age":{
          "type": "integer"
        }
      }
    }

    2.spring data elasticsearch设置分词

    创建实体类

    @Mapping(mappingPath = "elasticsearch_mapping.json")//设置mapping
    @Setting(settingPath = "elasticsearch_setting.json")//设置setting
    @Document(indexName = "myindex",type = "users")
    public class User {
    @Id
    private Integer id;
    //
    // @Field(type =FieldType.keyword ,analyzer = "pinyin_analyzer",searchAnalyzer = "pinyin_analyzer")//没有作用
    private String name1;
    @Field(type = FieldType.keyword)
    private String userName;
    @Field(type = FieldType.Nested)
    private List<Product> products;

    }
    在resources下创建elasticsearch_mapping.json 文件
    {
      "properties": {
        "uname": {
          "type": "text",
          "analyzer": "ik_smart",
          "search_analyzer": "ik_smart",
          "fields": {
            "my_pinyin": {
              "type": "text",
              "analyzer": "ik_pinyin_analyzer",
              "search_analyzer": "ik_pinyin_analyzer"
            }
          }
        },
        "age": {
          "type": "integer"
        }
      }
    }
    在resources下创建elasticsearch_setting.json 文件
    
    
    {
    "index": {
    "analysis": {
    "analyzer": {
    "ik_pinyin_analyzer": {
    "type": "custom",
    "tokenizer": "ik_smart",
    "filter": "pinyin_filter"
    }
    },
    "filter": {
    "pinyin_filter": {
    "type": "pinyin",
    //true:支持首字母
    "keep_first_letter":true,
    //false:不支持首字母分隔
    "keep_separate_first_letter": false,
    //true:支持全拼
    "keep_full_pinyin": true,
    "keep_original": false,
    //设置最大长度
    "limit_first_letter_length": 10,
    //小写非中文字母
    "lowercase": true,
    //重复的项将被删除
    "remove_duplicated_term": true
    }
    }
    }
    }
    }
     
    • ik_max_word:会将文本做最细粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」,会穷尽各种可能的组合;
    • ik_smart:会将文本做最粗粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」;

    程序启动后分词并没有设置分词

    实体创建后需要加上,创建的索引才可以分词

    elasticsearchTemplate.putMapping(User.class);
  • 相关阅读:
    新概念英语(1-115)Knock! Knock!
    新概念英语(1-113)Small Change
    新概念英语(1-111)The most expensive model
    新概念英语(1-109)A Good Idea
    新概念英语(1-107)It's Too Small.
    新概念英语(1-105)Full Of Mistakes
    新概念英语(1-103)The French Test
    洛谷P4591 [TJOI2018]碱基序列(hash dp)
    洛谷P4492 [HAOI2018]苹果树(组合数)
    洛谷P4577 [FJOI2018]领导集团问题(dp 线段树合并)
  • 原文地址:https://www.cnblogs.com/double-yuan/p/9742567.html
Copyright © 2011-2022 走看看