zoukankan      html  css  js  c++  java
  • ElasticSearch 简单入门

    一、前言

    ElasticSearch 是一个分布式、可扩展、实时的搜索与数据分析引擎。它建立在 Apache Lucene 基础之上。Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库(无论是开源还是私有)。ElasticSearch 将所有的功能打包成一个单独的服务,这样你可以通过程序与它提供的简单的 RESTful API 进行通信,可以使用自己喜欢的编程语言充当客户端。

    二、使用场景
     
    eBay 内部上百个 ElasticSearch 集群,超过 4000 个数据节点的规模,这些集群在 eBay 的生产环境中,支撑了包括订单搜索,商品推荐,集中化日志管理,风险控制,IT 运维,安全监控等不同领域的服务。
     
      场景举例:
    • 当你在 Github 上搜索时,ElasticSearch 不仅可以帮你找到相关的代码库,还可以帮助你实现代码级的搜索与高亮显示
    • 当你在网上购物时,ElasticSearch 可以帮你推荐相关的商品
    • 当你下班打车回家时,ElasticSearch 可以通过定位附近的乘客和司机,帮助平台优化调度
    • Wikipedia 使用 ElasticSearch 提供高亮片段的全文搜索。
    除了搜索,结合 Kibana、Logstash、Beats、Elastic Stack 还被广泛运用在大数据近实时分析领域,包括日志分析、指标监控、信息安全多个领域。它可以帮助你探索海量结构化、非结构化数据,按需创建可视化报表,对监控数据设置报警阈值。甚至通过使用机器学习技术,自动识别异常状况。

    三、单实例安装

    介质准备:

    elasticsearch-7.10.2-linux-x86_64.tar.gz
    elasticsearch-analysis-ik-7.10.2.zip
    elasticsearch-analysis-pinyin-7.10.2.zip
    kibana-7.10.2-linux-x86_64.tar.gz

    主机参数设置(/etc/sysctl.conf):

    # sysctl settings are defined through files in
    # /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
    #
    # Vendors settings live in /usr/lib/sysctl.d/.
    # To override a whole file, create a new file with the same in
    # /etc/sysctl.d/ and put new settings there. To override
    # only specific settings, add a file with a lexically later
    # name in /etc/sysctl.d/ and put new settings there.
    #
    # For more information, see sysctl.conf(5) and sysctl.d(5).
    net.ipv4.tcp_tw_reuse = 0
    net.ipv4.tcp_tw_recycle = 0
    net.ipv4.tcp_fin_timeout = 5
    net.ipv4.tcp_keepalive_time = 15
    net.ipv4.ip_local_port_range = 21000 61000
    fs.file-max = 6553600
    kernel.sem = 250 32000 100 128
    net.ipv4.conf.all.accept_redirects = 0
    net.core.somaxconn = 32768
    vm.max_map_count = 524288

    生效:sysctl -p

    主机参数设置(/etc/security/limits.conf):

    *  soft  nofile   1048576
    *  hard  nofile   1048576
    *  soft  nproc    65536
    *  hard  nproc    65536
    *  soft  memlock  unlimited
    *  hard  memlock  unlimited

    目录规划:

    .
    |-- bin
    |   |-- schema
    |   |-- start-es.sh
    |   |-- start-kibana.sh
    |   |-- stop-es.sh
    |   `-- sync
    |-- data -> /data/es-data
    |-- etc
    |-- lib
    |   |-- ojdbc8-19.8.0.0.jar
    |   `-- orai18n-19.8.0.0.jar
    |-- logs
    |-- sbin
    |-- support
        |-- elasticsearch-7.10.2
        |-- es -> elasticsearch-7.10.2
        |-- kibana -> kibana-7.10.2-linux-x86_64
        |-- kibana-7.10.2-linux-x86_64
        |-- logstash -> logstash-7.10.2
        `-- logstash-7.10.2

    .bash_profiler 设置

    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
        . ~/.bashrc
    fi
    
    # +-------------------------------------+
    # |      AI'S PROFILE, DON'T MODIFY!    |
    # +-------------------------------------+
    alias grep='grep --colour=auto'
    alias vi='vim'
    alias ll='ls -l'
    alias ls='ls --color=auto'
    alias mv='mv -i'
    alias rm='rm -i'
    alias ups='ps -u `whoami` -f'
    
    export ES_HOME=${HOME}/support/es
    export JAVA_HOME=${ES_HOME}/jdk
    export PS1="[33[01;32m]u@h[33[01;34m] w $[33[00m] "
    export TERM=linux
    export EDITOR=vim
    export PATH=${HOME}/bin:${HOME}/sbin:${JAVA_HOME}/bin:${ES_HOME}/bin:${HOME}/support/logstash/bin:$PATH
    export LANG=zh_CN.utf8
    export TIMOUT=3000
    export HISTSIZE=1000

    根据环境调整 JVM 内存:~/support/es/config/jvm.options

    -Xms16g
    -Xmx16g

    根据环境设置基础配置:~/support/es/config/elasticsearch.yml

    # ======================== Elasticsearch Configuration =========================
    #
    # NOTE: Elasticsearch comes with reasonable defaults for most settings.
    #       Before you set out to tweak and tune the configuration, make sure you
    #       understand what are you trying to accomplish and the consequences.
    #
    # The primary way of configuring a node is via this file. This template lists
    # the most important settings you may want to configure for a production cluster.
    #
    # Please consult the documentation for further information on configuration options:
    # https://www.elastic.co/guide/en/elasticsearch/reference/index.html
    #
    # ---------------------------------- Cluster -----------------------------------
    #
    # Use a descriptive name for your cluster:
    #
    cluster.name: crm
    #
    # ------------------------------------ Node ------------------------------------
    #
    # Use a descriptive name for the node:
    #
    node.name: node-1
    #
    # Add custom attributes to the node:
    #
    node.attr.rack: r1
    #
    # ----------------------------------- Paths ------------------------------------
    #
    # Path to directory where to store the data (separate multiple locations by comma):
    #
    path.data: /home/es/data
    #
    # Path to log files:
    #
    path.logs: /home/es/logs
    #
    # ----------------------------------- Memory -----------------------------------
    #
    # Lock the memory on startup:
    #
    #bootstrap.memory_lock: true
    #
    # Make sure that the heap size is set to about half the memory available
    # on the system and that the owner of the process is allowed to use this
    # limit.
    #
    # Elasticsearch performs poorly when the system is swapping the memory.
    #
    # ---------------------------------- Network -----------------------------------
    #
    # Set the bind address to a specific IP (IPv4 or IPv6):
    #
    network.host: 10.230.55.48
    #
    # Set a custom port for HTTP:
    #
    http.port: 9200
    #
    # For more information, consult the network module documentation.
    #
    # --------------------------------- Discovery ----------------------------------
    #
    # Pass an initial list of hosts to perform discovery when this node is started:
    # The default list of hosts is ["127.0.0.1", "[::1]"]
    #
    discovery.seed_hosts: ["10.230.55.48"]
    #
    # Bootstrap the cluster using an initial set of master-eligible nodes:
    #
    cluster.initial_master_nodes: ["10.230.55.48"]
    #
    # For more information, consult the discovery and cluster formation module documentation.
    #
    # ---------------------------------- Gateway -----------------------------------
    #
    # Block initial recovery after a full cluster restart until N nodes are started:
    #
    #gateway.recover_after_nodes: 1
    #
    # For more information, consult the gateway module documentation.
    #
    # ---------------------------------- Various -----------------------------------
    #
    # Require explicit names when deleting indices:
    #
    #action.destructive_requires_name: true
    
    # 安全认证配置:
    http.cors.enabled: true
    http.cors.allow-origin: "*"
    http.cors.allow-headers: Authorization
    xpack.security.enabled: true
    xpack.security.transport.ssl.enabled: true

    启动脚本(~/bin/start-es.sh):

    #!/bin/sh
    
    cd ~/support/es/bin
    ./elasticsearch -d

    设置密码:

    ~/support/es/bin/elasticsearch-setup-passwords interactive

    需要设置 elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user 这些用户的密码,设置完就可以了。

    验证:

    es@centos01 ~/bin $ curl --user elastic:123456 -XGET http://10.230.55.48:9200?pretty=true
    Enter host password for user 'elastic':
    {
      "name" : "node-1",
      "cluster_name" : "crm",
      "cluster_uuid" : "1SAd8U-zRyGKy8ztRWAQhQ",
      "version" : {
        "number" : "7.10.2",
        "build_flavor" : "default",
        "build_type" : "tar",
        "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
        "build_date" : "2021-01-13T00:42:12.435326Z",
        "build_snapshot" : false,
        "lucene_version" : "8.7.0",
        "minimum_wire_compatibility_version" : "6.8.0",
        "minimum_index_compatibility_version" : "6.0.0-beta1"
      },
      "tagline" : "You Know, for Search"
    }

    Kibana 安装:

    目录:~/support/kibana

    修改配置:~/support/kibana/config/kibana.yml

    server.port: 5601
    server.host: "10.230.55.48"
    elasticsearch.hosts: ["http://10.230.55.48:9200"]
    elasticsearch.username: "elastic"
    elasticsearch.password: "123456"
    i18n.locale: "en"

    Dev Tools:

    # 查看 Elastic 版本信息
    GET /
    # 查看集群健康情况
    GET _cluster/health
    
    # 查看集群节点
    GET _cat/nodes
    
    # 分片情况
    GET _cat/shards
    
    # 查看索引清单
    GET _cat/indices
    
    # 查看索引数据量
    GET sec_function/_count

    四、索引

    查看当前节点的所有 Index:

    es@centos01 ~ $ curl --user elastic:123456 -XGET http://10.230.55.48:9200/_cat/indices?v
    Enter host password for user 'elastic': health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
    green open sec_function 90qf16nIQNqfd_l_deqWpA 10 0 8040 51 4.2mb 4.2mb
    green open pm_offer_for_trans GSehvq3EQZKgWnkztShSyw 10 0 1056308 172784 313.6mb 313.6mb green open tf_r_address_tree F5xwcaRfTYmfReiPgOE3Fg 10 0 11490425 0 1gb 1gb

    新建和删除索引:

    es@centos01 ~ curl --user elastic:123456 -XPUT 'http://10.230.55.48:9200/weather'
    Enter host password for user 'elastic':
    {"acknowledged":true,"shards_acknowledged":true,"index":"weather"} 
    
    es@centos01 ~ curl -uelastic -XGET http://10.230.55.48:9200/_cat/indices?v
    health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    green open sec_function 90qf16nIQNqfd_l_deqWpA 10 0 8040 51 4.2mb 4.2mb
    green open pm_offer_for_trans GSehvq3EQZKgWnkztShSyw 10 0 1056308 172784 313.6mb 313.6mb green open tf_r_address_tree F5xwcaRfTYmfReiPgOE3Fg 10 0 11490425 0 1gb 1gb
    green open weather vIVMeX22SReCpKGD0Pk5uw 5 1 0 0 2.2kb 1.1kb
    es@centos01 ~ curl -uelastic -XDELETE 'http://10.230.55.48:9200/weather'
    {"acknowledged":true}

    五、中文分词

    将 elasticsearch-analysis-ik-7.10.2.zip、elasticsearch-analysis-pinyin-7.10.2.zip 解压到 ~/support/es/plugins 目录下,并重启 ES。

    es@centos01 ~/support $ tree ~/support/es/plugins/
    /home/es/support/es/plugins/
    |-- ik
    |   |-- commons-codec-1.9.jar
    |   |-- commons-logging-1.2.jar
    |   |-- config
    |   |   |-- extra_main.dic
    |   |   |-- extra_single_word.dic
    |   |   |-- extra_single_word_full.dic
    |   |   |-- extra_single_word_low_freq.dic
    |   |   |-- extra_stopword.dic
    |   |   |-- IKAnalyzer.cfg.xml
    |   |   |-- main.dic
    |   |   |-- preposition.dic
    |   |   |-- quantifier.dic
    |   |   |-- stopword.dic
    |   |   |-- suffix.dic
    |   |   `-- surname.dic
    |   |-- elasticsearch-analysis-ik-7.10.2.jar
    |   |-- httpclient-4.5.2.jar
    |   |-- httpcore-4.4.4.jar
    |   |-- plugin-descriptor.properties
    |   `-- plugin-security.policy
    `-- pinyin
        |-- elasticsearch-analysis-pinyin-7.10.2.jar
        |-- nlp-lang-1.7.jar
        `-- plugin-descriptor.properties
    
    3 directories, 22 files

    六、数据操作

    新索引准备:

    curl --user elastic:123456 -XPUT 'http://10.230.55.48:9200/student' -H 'Content-Type: application/json' -d '
    {
      "mappings" : {
        "properties" : {
          "name" : {
            "type" : "keyword"
          },
          "age" : {
            "type" : "integer"
          }
        }
      },
      "settings" : {
        "index" : {
          "number_of_shards" : 1,
          "number_of_replicas" : 0
        }
      }
    }'
    新增记录(使用 POST):
     
    添加数据示例一:(POST 用于更新数据,如果不存在,则会创建。)
    # 请求,没有指定 _id 的情况下,Elastic 将为你自动生成一个随机字符串作为 _id。
    curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc?pretty=true' -H 'Content-Type: application/json' -d '
    {
      "name": "张三"
    }'
    
    # 响应
    {
      "_index" : "student",
      "_type" : "_doc",
      "_id" : "q6ek7XcBqu3Z6vLyxDD4",
      "_version" : 1,
      "result" : "created",
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "failed" : 0
      },
      "_seq_no" : 0,
      "_primary_term" : 1
    }

    添加数据实例二:(指定 _id 为 2)

    # 请求,指定 _id 为 2
    curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d '
    {
      "name": "李四"
    }'
    
    # 响应
    {
      "_index" : "student",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 1,
      "result" : "created",
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "failed" : 0
      },
      "_seq_no" : 1,
      "_primary_term" : 1
    }

    一种错误的数据更新方式:

    # 请求
    curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'

    # 响应 {
    "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 1, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "name" : "李四" } }

    我们注意到结果中没有 age 字段。

    # 请求
    curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d ' { "age": 10 }'
    # 响应
    { "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 2, "result" : "updated", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1 }

    # 请求
    curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'

    # 响应
    { "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 2, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "age" : 10 } }

    结果是 version 从1变成了2,而 name 字段不见了。原因是 POST student/_doc/2 这种语法的效果是覆盖数据。可以理解为先把原文档删除,再索引新文档。

    使用 _update 更新文档

    es@centos01 ~ $ curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d '
    {
      "name": "李四"
    }'
    
    es@centos01 ~ $ curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2/_update?pretty=true' -H 'Content-Type: application/json' -d '
    {
      "doc": {
        "age": 10
      }
    }'
    
    
    # 请求
    es@centos01 ~ $ curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'                                                  
    {
      "_index" : "student",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 4,
      "_seq_no" : 4,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "李四",
        "age" : 10
      }
    }

    使用 _update 时,ES 做了下面几件事:

    • 从旧文档构建 JSON
    • 更改该 JSON
    • 删除旧文档
    • 索引一个新文档
  • 相关阅读:
    使用pthread_create时参数的传递
    借用 Google 构建自己的搜索系统
    编辑Servlet程序
    线程简单介绍
    Apache Tomcat服务器配置基础
    win2000server IIS和tomcat5多站点配置
    COmega 概述
    用Flash制作Google搜索程序
    浅析.Net下的多线程编程
    Mozilla宣布XForms开发项目 XForms是什么?它带来了什么?
  • 原文地址:https://www.cnblogs.com/steven-note/p/14463634.html
Copyright © 2011-2022 走看看