zoukankan      html  css  js  c++  java
  • HTML Strip Char Filter

    The html_strip character filter strips HTML elements from the text and replaces HTML entities with their decoded value (e.g. replacing & with &).

    Example outputedit

    POST _analyze
    {
      "tokenizer":      "keyword", 
      "char_filter":  [ "html_strip" ],
      "text": "<p>I&apos;m so <b>happy</b>!</p>"
    }

    The keyword tokenizer returns a single term.

    The above example returns the term:

    [ 
    I'm so happy!
     ]

    The same example with the standard tokenizer would return the following terms:

    [ I'm, so, happy ]

    Configurationedit

    The html_strip character filter accepts the following parameter:

    escaped_tags

    An array of HTML tags which should not be stripped from the original text.

    Example configurationedit

    In this example, we configure the html_strip character filter to leave <b> tags in place:

    PUT my_index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "keyword",
              "char_filter": ["my_char_filter"]
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "html_strip",
              "escaped_tags": ["b"]
            }
          }
        }
      }
    }
    
    POST my_index/_analyze
    {
      "analyzer": "my_analyzer",
      "text": "<p>I&apos;m so <b>happy</b>!</p>"
    }

    The above example produces the following term:

    [ 
    I'm so <b>happy</b>!
     ]


    源文:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html#analysis-htmlstrip-charfilter
  • 相关阅读:
    day5 元组,字典,集合
    day4预习
    day4字符串、列表
    day3预习
    day3 数据类型
    day2 python 基础入门
    动态三角形(动态规划思想入门)
    百度之星资格赛
    Audiophobia(Floyd算法)
    Hat’s Words(字典树的运用)
  • 原文地址:https://www.cnblogs.com/a-du/p/7278302.html
Copyright © 2011-2022 走看看