zoukankan      html  css  js  c++  java
  • es分词器

    1、默认的分词器

    standard

    standard tokenizer:以单词边界进行切分
    standard token filter:什么都不做
    lowercase token filter:将所有字母转换为小写
    stop token filer(默认被禁用):移除停用词,比如a the it等等

    2、修改分词器的设置

    启用english停用词token filter

    PUT /my_index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "es_std": {
              "type": "standard",
              "stopwords": "_english_"
            }
          }
        }
      }
    }

    GET /my_index/_analyze
    {
      "analyzer": "standard",
      "text": "a dog is in the house"
    }

    GET /my_index/_analyze
    {
      "analyzer": "es_std",
      "text":"a dog is in the house"
    }

    3、定制化自己的分词器

    PUT /my_index
    {
      "settings": {
        "analysis": {
          "char_filter": {
            "&_to_and": {
              "type": "mapping",
              "mappings": ["&=> and"]
            }
          },
          "filter": {
            "my_stopwords": {
                "type": "stop",
                "stopwords": ["the", "a"]
            }
          },
          "analyzer": {
            "my_analyzer": {
              "type": "custom",
              "char_filter": ["html_strip", "&_to_and"],
              "tokenizer": "standard",
              "filter": ["lowercase", "my_stopwords"]
            }
          }
        }
      }
    }

    GET /my_index/_analyze
    {
      "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
      "analyzer": "my_analyzer"
    }

    PUT /my_index/_mapping/my_type
    {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }

  • 相关阅读:
    更改套接字I/O缓冲大小
    读取创建套接字时默认IO缓冲大小
    利用getsockopt读取套接字可选项
    如何查看安装的ubuntu是多少位的系统
    使用虚函数所带来的扩展性
    python学习第17天----接口类/抽象类、多态、封装
    python学习第16天----继承、查找顺序(深度、广度优先)
    python学习第15天----名称空间、组合
    python学习第14天----函数复习、面向对象初始
    python学习第13天----lambda、sorted、map、filter、递归、二分查找
  • 原文地址:https://www.cnblogs.com/qinjf/p/8546440.html
Copyright © 2011-2022 走看看