zoukankan      html  css  js  c++  java
  • Elasticsearch Tutorial

    Elasticsearch Tutorial

    Concepts

    Mapping concepts across SQL and Elasticsearch

    While SQL and Elasticsearch have different terms for the way the data is organized, essentially their purpose is the same.

    SQL ElasticSearch Description
    column field In both cases, at the lowest level, data is stored in named entries, of a variety of data types, containing one value.
    row document Columns and fields do not exist by themselves; they are part of a row or a document.
    table index The target against which queries, whether in SQL or Elasticsearch get executed against.
    database cluster In SQL, catalog or database are used interchangeably and represent a set of schemas that is, a number of tables. In Elasticsearch the set of indices available are grouped in a cluster.

    Field Data Type

    Common types

    type description
    binary Binary value encoded as a Base64 string.
    boolean true and false values.
    Keywords The keyword family, including keyword, constant_keyword, and wildcard.
    Numbers Numeric types, such as long and double, used to express amounts.
    Dates Date types, including date and date_nanos.

    Mapping

    Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

    Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.

    Dynamic mapping

    Dynamic mapping allows you to experiment with and explore data when you’re just getting started. Elasticsearch adds new fields automatically, just by indexing a document.

    Explicit mapping

    Explicit mapping allows you to precisely choose how to define the mapping definition. For example,

    {
      "mappings": {
        "properties": {
          "uuid": {
            "type": "keyword"
          },
          "title": {
            "type": "text"
          },
          "main_body": {
            "type": "text",
            "index": "false"
          }
        }
      }
    }
    

    The index type "keyword" indicates this field should be searched by term query, which means do not be analyzed.

    The index type "text" indicates this field should be searched by match query, and it is going to be analyzed.

    The "index:false" specify this field should not be indexed, meanwhile, this field could not be searched.

    Query and filter contextedit

    Relevance scoresedit

    By default, Elasticsearch sorts matching search results by relevance score, which measures how well each document matches a query.

    Query context

    In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field.

    Filter context

    In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated.

    Query DSL

    Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries.

    Leaf query clauses

    query type description
    match Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
    term Returns documents that contain an exact term in a provided field.
    range Returns documents that contain terms within a provided range.

    Compound query clauses

    query type description
    bool A query that matches documents matching boolean combinations of other queries. It is built using one or more boolean clauses, each clause with a typed occurrence.
    dis_max Returns documents matching one or more wrapped queries, called query clauses or clauses. If a returned document matches multiple query clauses, the dis_max query assigns the document the highest relevance score from any matching clause, plus a tie breaking increment for any additional matching subqueries.
    constant_score Wraps a filter query and returns every matching document with a relevance score equal to the boost parameter value.

    Allow expensive queries

    query type description
    script queries Filters documents based on a provided script. The script query is typically used in a filter context.
    fuzzy queries Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
    regexp queries Returns documents that contain terms matching a regular expression.
    prefix queries Returns documents that contain a specific prefix in a provided field.
    wildcard queries Returns documents that contain terms matching a wildcard pattern. A wildcard operator is a placeholder that matches one or more characters.
    range queries Returns documents that contain terms within a provided range.
    Joining queries Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive.
    Geo-shape query Filter documents indexed using the geo_shape or geo_point type.
    Script score query Uses a script to provide a custom score for returned documents. The script_score query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.
    Percolate query The percolate query can be used to match queries stored in an index. The percolate query itself contains the document that will be used as query to match with the stored queries.

    Python3 ElasticSearch in Action

    Index

    create

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        body = {
          "mappings": {
            "properties": {
              "uuid": {
                "type": "keyword"
              },
              "title": {
                "type": "text"
              },
              "main_body": {
                "type": "text"
              }
            }
          }
        }
        ret = es.indices.create(index="forward", body=body)
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
        
    

    delete

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        ret = es.indices.delete(index="forward")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
        
    

    update

    Update mapping API

    Adds new fields to an existing data stream or index. You can also use this API to change the search settings of existing fields.

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        body = {
          "properties": {
            "uuid": {
              "type": "keyword"
            },
            "title": {
              "type": "text"
            },
            "main_body": {
              "type": "text"
            },
            "publish_date": {
              "type": "keyword"
            }
          }
        }
        ret = es.indices.put_mapping(index=args.name, body=body)
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
        
    
    Reindex API

    Copies documents from a source to a destination.

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        body = {
          "source": {
            "index": "forward"
          },
          "dest": {
            "index": "document"
          }
        }
        ret = es.reindex(body=body)
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    get

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        ret = es.indices.get(index="forward")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    Document

    create

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        body = {
          "uuid": "1000",
          "title": "中国银行在港交所上市挂牌成功",
          "main_body": "中国银行在港交所上市挂牌成功,成为中国大陆首家在国际市场上市的银行。"
        }
        es = Elasticsearch()
        ret = es.index(index="forward", body=body)
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
        
    

    delete

    # encoding=utf-8
    
    import argparse
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        ret = es.delete(index="forward", id="WRemuHkBd6vf16HuHzHq")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    update

    To fully replace an existing document, use the index API, which is designed to creates or updates a document in an index.

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
    		body = {
          "uuid": "1000",
          "title": "<<中国银行在港交所上市挂牌成功>>",
          "main_body": "<<成为中国大陆首家在国际市场上市的银行>>"
        }
        ret = es.index(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    Updates a document with a script or partial document.

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
    		body = {
          "uuid": "1000",
          "title": "<<中国银行在港交所上市挂牌成功>>",
          "main_body": "<<成为中国大陆首家在国际市场上市的银行>>"
        }
        ret = es.index(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    Updates a document using the specified script.

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
    		body = {
          "script" : {
            "source": "ctx._source.counter += params.count",
            "lang": "painless",
            "params" : {
              "count" : 4
            }
          }
        }
        ret = es.update(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    get

    Returns a document.

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        ret = es.get(index="forward", id="WRemuHkBd6vf16HuHzHq")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    match_phrase query,可以实现基于字的中文布尔检索,实现中文精准匹配、中文精准查询。

    # encoding=utf-8
    
    from elasticsearch import Elasticsearch
    from pprint import pprint
    
    
    def main():
        args = parse_args()
        es = Elasticsearch()
        body = {
          "query": {
            "match_phrase": {
              "title": "中国石油"
            },
            "match_phrase": {
              "main_body": "中国石油"
            }
          }
        }
        ret = es.search(body=body, index="forward")
        pprint(ret)
    
    
    if __name__ == '__main__':
        main()
    
    

    Multi-match query, The multi_match query builds on the match query to allow multi-field queries.

    {
      "query": {
        "multi_match" : {
          "query":    "中国石油",
          "fields": [ "title", "main_body" ]
        }
      }
    }
    

    Allows to highlight search results on one or more fields.

    {
        "query" : {
            "match": { "title": "中国石油" }
        },
        "highlight" : {
            "pre_tags" : ["<tag1>"],
            "post_tags" : ["</tag1>"],
            "fields" : {
                "_all" : {}
            }
        }
    }
    
    智慧在街市上呼喊,在宽阔处发声。
  • 相关阅读:
    嵌入式系统之微处理器篇
    嵌入式系统之基础概念篇
    八大排序算法简述
    进程-PV操作
    实时操作系统与分时操作系统
    串口助手
    STM32通用定时器功能和用法
    三种主流芯片架构简单比较
    python 我的第一个自动化脚本
    jquery部分实用功能
  • 原文地址:https://www.cnblogs.com/fengyubo/p/14827681.html
Copyright © 2011-2022 走看看