zoukankan      html  css  js  c++  java
  • 使用分析器进行分词

    
    #Simple Analyzer – 按照非字母切分(符号被过滤),小写处理
    #Stop Analyzer – 小写处理,停用词过滤(the,a,is)
    #Whitespace Analyzer – 按照空格切分,不转小写
    #Keyword Analyzer – 不分词,直接将输入当作输出
    #Patter Analyzer – 正则表达式,默认 W+ (非字符分隔)
    #Language – 提供了30多种常见语言的分词器
    #2 running Quick brown-foxes leap over lazy dogs in the summer evening
    
    #查看不同的analyzer的效果
    #standard
    GET _analyze
    {
      "analyzer": "standard",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    #simpe
    GET _analyze
    {
      "analyzer": "simple",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    
    GET _analyze
    {
      "analyzer": "stop",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    
    #stop
    GET _analyze
    {
      "analyzer": "whitespace",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    #keyword
    GET _analyze
    {
      "analyzer": "keyword",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    GET _analyze
    {
      "analyzer": "pattern",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    
    #english
    GET _analyze
    {
      "analyzer": "english",
      "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
    }
    
    
    POST _analyze
    {
      "analyzer": "icu_analyzer",
      "text": "他说的确实在理”"
    }
    
    
    POST _analyze
    {
      "analyzer": "standard",
      "text": "他说的确实在理”"
    }
    
    
    POST _analyze
    {
      "analyzer": "icu_analyzer",
      "text": "这个苹果不大好吃"
    }
    
    
  • 相关阅读:
    爬虫入门(五)
    爬虫入门(四)
    爬虫入门(三)
    爬虫入门(二)
    爬虫入门(一)
    openpyxl的简单使用
    ansible(三)
    ansible(二)
    ansible(一)
    CF Global Round 10-F
  • 原文地址:https://www.cnblogs.com/sanduzxcvbnm/p/14506581.html
Copyright © 2011-2022 走看看