zoukankan      html  css  js  c++  java
  • Elasticsearch(全文搜索)

    前言

    收集大量的日志信息之后,把这些日志存放在哪里?才能对其日志内容进行搜素呢?MySQL?

    如果MySQL里存储了1000W条这样的数据,每条记录的details字段有128个字。

    用户想要查询details字段包含“ajax”这个关键词的记录。

    MySQL执行

    select * from logtable where details like "%ajax%";

    每次执行这条SQL语句,都需要逐一查询logtable中每条记录,最头痛的是找到这条记录之后,每次还要对这条记录中details字段里的文本内容进行全文扫描。

    判断这个当前记录中的details字段是的内容否包含 “ajax”?有可能会查询 10000w*128次.

    如果用户想搜素 “ajax”拼错了拼成了“ajxa”,这个sql无法搜素到用户想要的信息。因为不支持尝试把用户输入的错别字"ajxa"拆分开使用‘a‘,‘j‘,‘x‘,'a' 去尽可能多的匹配我想要的信息

    所以想要支持搜素details字段的Text内容的情况下,把海量的日志信息存在MySQL中是不太合理的。

    Elasticsearch简介

    1.倒排索引

    倒排索引是一种索引数据结构:从文本数据内容中提取出不重复的单词进行分词,每1个单词对应1个ID对单词进行区分,还对应1个该单词在那些文档中出现的列表 把这些信息组建成索引。

    倒排索引还记录了该单词在文档中出现位置、频率(次数/TF)用于快速定位文档和对搜素结果进行排序。

    (出现在文档1,<11位置>频率1次)
    (1,<11>,1),(2,<7>,1),(3,<3,9>,2)

    2.全文检索

    全文检索:把用户输入的关键词也进行分词,利用倒排索引,快速锁定关键词出现在那些文档。

    说白了就是根据value查询key(根据文档中内容关键字,找到该该关键字所在的文档的)而非根据key查询value。

    3.Lucene

    Lucene是apache软件基金会4 jakarta项目组的一个java子项目,是一个开放源代码的全文检索引擎JAR包。帮助我我们实现了以上的需求。

    lucene实现倒排索引之后,那么海量的数据如何分布式存储?如何高可用?集群节点之间如何管理?这是Elasticsearch实现的功能。

    常说的ELK是Elasticsearch(全文搜素)+Logstash(内容收集)+Kibana(内容展示)三大开源框架首字母大写简称。

    本文主要简单的介绍Elaticsearch,Elasticsearch是一个基于Lucene的分布式、高性能、可伸缩的搜素和分析系统,它提供了RESTful web API。

    Elasticsearch安装 

    官网下载

    ES的版本和Kibana的版本必须一致,官网下载比较慢,还好有好心人

    系统配置

    vm.max_map_count = 655360
    vim /etc/sysctl.conf

    权限
     chown -R elsearch:elsearch /data/elastic-search
    

     安全

    /etc/security/limits.conf
    [root@zhanggen config]# java -version
    openjdk version "1.8.0_102"
    OpenJDK Runtime Environment (build 1.8.0_102-b14)
    OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
    
    [root@zhanggen config]# cat /etc/security/limits.conf
    * soft nofile 65536
    * hard nofile 131072
    * soft nproc 4096
    * hard nproc 4096
    

      

    es配置文件

    # ======================== Elasticsearch Configuration =========================
    #
    # NOTE: Elasticsearch comes with reasonable defaults for most settings.
    #       Before you set out to tweak and tune the configuration, make sure you
    #       understand what are you trying to accomplish and the consequences.
    #
    # The primary way of configuring a node is via this file. This template lists
    # the most important settings you may want to configure for a production cluster.
    #
    # Please consult the documentation for further information on configuration options:
    # https://www.elastic.co/guide/en/elasticsearch/reference/index.html
    #
    # ---------------------------------- Cluster -----------------------------------
    #
    # Use a descriptive name for your cluster:
    #
    cluster.name: my-application
    #
    # ------------------------------------ Node ------------------------------------
    #
    # Use a descriptive name for the node:
    #
    node.name: node-1
    #
    # Add custom attributes to the node:
    #
    #node.attr.rack: r1
    #
    # ----------------------------------- Paths ------------------------------------
    #
    # Path to directory where to store the data (separate multiple locations by comma):
    #
    path.data: /data/elastic-search/data/
    #
    # Path to log files:
    #
    path.logs: /data/elastic-search/log/
    #
    #
    # ----------------------------------- Memory -----------------------------------
    #
    # Lock the memory on startup:
    #
    #bootstrap.memory_lock: false
    
    #
    # Make sure that the heap size is set to about half the memory available
    # on the system and that the owner of the process is allowed to use this
    # limit.
    #
    # Elasticsearch performs poorly when the system is swapping the memory.
    #
    # ---------------------------------- Network -----------------------------------
    #
    # Set the bind address to a specific IP (IPv4 or IPv6):
    #
    network.host: 0.0.0.0
    #
    # Set a custom port for HTTP:
    #
    http.port: 9200
    #
    # For more information, consult the network module documentation.
    #
    # --------------------------------- Discovery ----------------------------------
    #
    # Pass an initial list of hosts to perform discovery when this node is started:
    # The default list of hosts is ["127.0.0.1", "[::1]"]
    #
    #discovery.seed_hosts: ["host1", "host2"]
    #
    # Bootstrap the cluster using an initial set of master-eligible nodes:
    #
    cluster.initial_master_nodes: ["node-1"]
    #
    # For more information, consult the discovery and cluster formation module documentation.
    #
    # ---------------------------------- Gateway -----------------------------------
    #
    # Block initial recovery after a full cluster restart until N nodes are started:
    #
    #gateway.recover_after_nodes: 3
    #
    # For more information, consult the gateway module documentation.
    #
    # ---------------------------------- Various -----------------------------------
    #
    # Require explicit names when deleting indices:
    #
    #action.destructive_requires_name: true
    /elasticsearch-7.3.2/config/elasticsearch.yml

    启动

    [elsearch@zhanggen /]$ ./elasticsearch-7.3.2/bin/elasticsearch

    访问

    Elasticsearch使用

    关于Elasticsearch的使用都是基于RESTful风格的API进行的

    1.查看健康状态

    http://192.168.56.135:9200/_cat/health?v

    epoch      timestamp cluster        status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
    1590655194 08:39:54  my-application green        

    2.创建索引

    {
        "acknowledged": true,
        "shards_acknowledged": true,
        "index": "web"
    }

    3.删除索引

    {
        "acknowledged": true
    }

    4.插入数据

     request body

    {
        "name":"张根",
        "age":22,
        "marrid":"false"
    }

    response body

    {
        "_index": "students",
        "_type": "go",
        "_id": "cuWEWnIBWnQK6MVivzvO",
        "_version": 1,
        "result": "created",
        "_shards": {
            "total": 2,
            "successful": 1,
            "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1
    }

    ps:也可以使用PUT方法,但是需要传入id

     reqeust body

    {
        "name":"李渊",
        "age":1402,
        "marrid":"true"
    }

    response body

    {
        "_index": "students",
        "_type": "go",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
            "total": 2,
            "successful": 1,
            "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1
    }

    5.查询all

    [root@zhanggen zhanggen]#  curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": { "match_all":{}}}'
    {
      "took" : 211,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 6,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "cuWEWnIBWnQK6MVivzvO",
            "_score" : 1.0,
            "_source" : {
              "name" : "张根",
              "age" : 22,
              "marrid" : "false"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 1.0,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "dOWPWnIBWnQK6MVimTsg",
            "_score" : 1.0,
            "_source" : {
              "name" : "李渊",
              "age" : 1402,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "deWQWnIBWnQK6MViazuu",
            "_score" : 1.0,
            "_source" : {
              "name" : "姜尚",
              "age" : 5903,
              "marrid" : "fale"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "duWSWnIBWnQK6MViXDtD",
            "_score" : 1.0,
            "_source" : {
              "name" : "孛儿只斤.铁木真",
              "age" : 814,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "query" : {
                "match" : {
                  "name" : "张根"
                }
              }
            }
          }
        ]
      }
    }
    返回

    6.分页查询 (from, size)

    from 偏移,默认为0,size 返回的结果数,默认为10

    [root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{
    "query": { "match_all": {} },
    "from":1,
    "size":2
    }'

    返回

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 6,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 1.0,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "dOWPWnIBWnQK6MVimTsg",
            "_score" : 1.0,
            "_source" : {
              "name" : "李渊",
              "age" : 1402,
              "marrid" : "true"
            }
          }
        ]
      }
    }
    View Code

     7.模糊查询字段中包含某些关键词

    [root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": {"term": {"name":""}}}'

    返回

    {
      "took" : 155,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 0.8161564,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "cuWEWnIBWnQK6MVivzvO",
            "_score" : 0.8161564,
            "_source" : {
              "name" : "张根",
              "age" : 22,
              "marrid" : "false"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 0.7083998,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          }
        ]
      }
    }
    View Code

    8.range范围查找

    范围查询接收以下参数:

    • gte:大于等于
    • gt:大于
    • lte:小于等于
    • lt:小于
    • boost:设置查询的推动值(boost),默认为1.0
    curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query":{"range":{"age":{"gt":"18"}}}}'
    {
      "took" : 11,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 5,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "cuWEWnIBWnQK6MVivzvO",
            "_score" : 1.0,
            "_source" : {
              "name" : "张根",
              "age" : 22,
              "marrid" : "false"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 1.0,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "dOWPWnIBWnQK6MVimTsg",
            "_score" : 1.0,
            "_source" : {
              "name" : "李渊",
              "age" : 1402,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "deWQWnIBWnQK6MViazuu",
            "_score" : 1.0,
            "_source" : {
              "name" : "姜尚",
              "age" : 5903,
              "marrid" : "fale"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "duWSWnIBWnQK6MViXDtD",
            "_score" : 1.0,
            "_source" : {
              "name" : "孛儿只斤.铁木真",
              "age" : 814,
              "marrid" : "true"
            }
          }
        ]
      }
    }
    返回

    安装Kibana

    kibana是针对elasticsearch操作及数据展示的工具,支持中文。安装时请确保kibama和Elasticsearch的版本一致。

    配置文件

    [root@zhanggen config]# cat ./kibana.yml|grep -Ev '^$|#'
    server.port: 5601
    server.host: "0.0.0.0"
    elasticsearch.hosts: ["http://localhost:9200"]
    elasticsearch.username: "kibana"
    elasticsearch.password: "xxxxxxx"
    i18n.locale: "zh-CN"
    [root@zhanggen config]# 

    启动

    [root@zhanggen bin]# ./kibana  --allow-root

    kibana使用

    管理-----》kibana索引模式-----》创建索引模式 

    ps:

    Elasticsearch果然是全文检索, 666。果然对用户输入的搜素关键词,进行了分词。我输入了(访问日志为例)把包含“问”的文档也搜素出来了!

    而不是仅仅搜素内容包含“访问日志为例”这个1个词的文档。

    Go操作Elasticsearch

    我们使用第三方库https://github.com/olivere/elastic来连接ES并进行操作。

    注意下载与你的ES相同版本的client,例如我们这里使用的ES是7.2.1的版本,那么我们下载的client也要与之对应为github.com/olivere/elastic/v7

    使用go.mod来管理依赖下载指定版本的第三库:

    module go相关模块/elasticsearch
    
    go 1.13
    
    require github.com/olivere/elastic/v7 v7.0.4
    

      

    代码 

    package main
    
    
    import (
    "context"
    "fmt"
    
    "github.com/olivere/elastic/v7"
    )
    
    // Elasticsearch demo
    
    type Person struct {
    	Name    string `json:"name"`
    	Age     int    `json:"age"`
    	Married bool   `json:"married"`
    }
    
    func main() {
    	client, err := elastic.NewClient(elastic.SetURL("http://192.168.56.135:9200/"))
    	if err != nil {
    		// Handle error
    		panic(err)
    	}
    
    	fmt.Println("connect to es success")
    	p1 := Person{Name: "曹操", Age: 155, Married: true}
    	put1, err := client.Index().
    		Index("students").Type("go").
    		BodyJson(p1).
    		Do(context.Background())
    	if err != nil {
    		// Handle error
    		panic(err)
    	}
    	fmt.Printf("Indexed user %s to index %s, type %s
    ", put1.Id, put1.Index, put1.Type)
    }
    

    kafka消息---->elasticsearch支持消息检索

    参考

  • 相关阅读:
    《PostgreSQL 指南:内幕探索》之基础备份与时间点恢复(下)
    数据和云,半年文章精选(文末赠书)
    js for 循环跳过undefined值
    js for循环,会遇到undefined
    洛谷P3018 [USACO11MAR]树装饰Tree Decoration
    洛谷P3018 [USACO11MAR]树装饰Tree Decoration
    洛谷P3018 [USACO11MAR]树装饰Tree Decoration
    洛谷P3018 [USACO11MAR]树装饰Tree Decoration
    js特效:鼠标滑过图片时切换为动图
    js特效:鼠标滑过图片时切换为动图
  • 原文地址:https://www.cnblogs.com/sss4/p/12964422.html
Copyright © 2011-2022 走看看