zoukankan      html  css  js  c++  java
  • elasticsearch入门使用(四) 索引、安装IK分词器及增删改查数据

    一、查看、创建索引

    创建一个名字为user索引:
    curl -X PUT 'localhost:9200/stu'

    {"acknowledged":true,"shards_acknowledged":true,"index":"stu"}
    

    二、查看索引:http://192.168.56.101:9200/_cat/indices?v IP地址请修改为自己的IP

    pri:分片数量 rep:副本集

    三、删除索引

    curl -X DELETE 'localhost:9200/stu'

    {"acknowledged":true} 
    

    四、安装ik6.2.2分词器,注意ik的版本最好跟es的版本保持一致

    cd /
    /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.2/elasticsearch-analysis-ik-6.2.2.zip
    重新启动elasticsearch
    systemctl restart elasticsearch
    测试IK中文分词器是否安装成功
    curl -XGET -H 'Content-Type: application/json' 'http://localhost:9200/_analyze?pretty' -d '{ "analyzer" : "ik_max_word", "text": "中华人民共和国国歌" }'
    返回json

    {
      "tokens" : [
        { "token" : "中华人民共和国",  "start_offset" : 0,  "end_offset" : 7,  "type" : "CN_WORD", "position" : 0  },
        { "token" : "中华人民",  "start_offset" : 0,  "end_offset" : 4, "type" : "CN_WORD",  "position" : 1  },
        { "token" : "中华",  "start_offset" : 0,  "end_offset" : 2,  "type" : "CN_WORD",  "position" : 2 },
        { "token" : "华人", "start_offset" : 1,  "end_offset" : 3, "type" : "CN_WORD", "position" : 3 },
        { "token" : "人民共和国",  "start_offset" : 2,  "end_offset" : 7,  "type" : "CN_WORD", "position" : 4 },
        { "token" : "人民",  "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD",  "position" : 5 },
        { "token" : "共和国",  "start_offset" : 4,  "end_offset" : 7, "type" : "CN_WORD",  "position" : 6 },
        { "token" : "共和", "start_offset" : 4,  "end_offset" : 6, "type" : "CN_WORD",  "position" : 7  },
        { "token" : "国",  "start_offset" : 6, "end_offset" : 7,  "type" : "CN_CHAR",  "position" : 8 },
        { "token" : "国歌",  "start_offset" : 7,  "end_offset" : 9,  "type" : "CN_WORD", "position" : 9 }
      ]
    }
    

    五、设置索引

    假设这个是我们的数据结构,数据类型覆盖还是比较全

    stu:索引名称
    person:Type名称
    analyzer:字段文本分词器 ,默认是analyzed
    search_analyzer:搜索分词器,默认是analyzed
    ik_max_word:中文分词器

    curl -XGET -H 'Content-Type: application/json' 'http://127.0.0.1:9200/stu' -d '
    {
      "mappings": {
        "person": {
          "dynamic":true,
          "dynamic_date_formats":["yyyy-MM-dd hh:mm:ss", "yyyy-MM-dd" ],
          "properties": {
            "id": { "type": "integer",  "store":true  },
            "name": { "type": "text", "store": true },
            "cname": { "type": "text",  "analyzer": "ik_max_word", "search_analyzer": "ik_max_word", "store": true },
            "age": { "type": "integer"  },
            "score": { "type": "float"  },
            "email": { "type": "text", "store":true },
            "birthday": {  "type": "date", "format":"yyyy-MM-dd","store":true },
            "regdate": { "type": "date", "format":"yyyy-MM-dd hh:mm:ss","store":true  },
            "city": { "type": "keyword", "analyzer": "keyword", "store":true },
            "address": { "type": "text", "analyzer": "ik_max_word" }
          }
        }
      }
    }'
    

    六、新增数据

    curl -XPOST -H 'Content-Type: application/json' '127.0.0.1:9200/stu/person' -d '
    {
      "id": "11",
      "name": "zhang san",
      "cnname": "张三",
      "age": 20,
      "score":80.8,
      "email":"zhang.san@163.com",
      "birthday":"2000-03-03",
      "regdate":"2018-03-03T15:33:33Z",
      "city":"PEK",
      "address":"上海市闸北区保德路389号"
    }'
    

    '127.0.0.1:9200/stu/person' 不指定的话会分配一个ID,如下"_id":"aiO0EWIB1IWtAj8my_8s"

    {"_index":"stu","_type":"person","_id":"aiO0EWIB1IWtAj8my_8s","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
    

    注意:以下参数如果在person后指定id=abc123的话会根据这个ID更新或者新增数据,result=created/updated

    curl -XPOST -H 'Content-Type: application/json' '127.0.0.1:9200/stu/person/abc123' -d '
    {
      "id": "111",
      "name": "li si",
      "cnname": "李四",
      "age": 21,
      "score":98.9,
      "email":"lisi@qq.com",
      "birthday":"2008-03-03",
      "regdate":"2019-03-03T15:33:33Z",
      "city":"SHA",
      "address":"江苏省苏州市园区现代大道188号"
    }'
    
    {"_index":"stu","_type":"person","_id":"abc123","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
    
    

    六、删除数据

    指定具体的_id删除

    curl -X DELETE 'localhost:9200/stu/person/Dpts-2EBY6Wnp0_K3NkH'
    
    

    根据Query DSL删除,参考语法官方:Delete By Query API

    curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/stu/person/_delete_by_query?pretty' -d '
    {
      "query": {
        "bool":{
          "filter": [
            {"term":{ "city": "pek" }}
          ]
        }
      }
    }'
    

    七、修改数据

    注意:更新操作会重新更新索引

    1. 带ID全部字段更新(参考上面带ID新增数据,同样的道理)
      略...
    2. 带ID部分字段更新 Elasticsearch Reference [6.2] » Document APIs » Update API
      注意doc和script不能同时在一次请求里POST
      更新id=abc123设置name="lisi" ,并新增一个属性 "bugs": 0
    curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/stu/person/abc123/_update?pretty' -d '
    {
        "doc" : {
          "name" : "lisi",
          "bugs": 0
       }
    }'
    


    更新id=abc123设置年龄+=4

    curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/stu/person/abc123/_update?pretty' -d '
    {
        "script" : {
            "source": "ctx._source.age += 4"
        }
    }'
    


    3. _update_by_query根据条件更新 Elasticsearch Reference [6.2] » Document APIs » Update By Query API

    在不更新源文件的情况下根据index更新文档,也可以用于新增字段属性
    _update_by_query支持script,如果script和doc同时存在会忽略doc
    _update_by_query在执时候快照内部索引,当文档生成快照是document正则变更的话将会发生版本冲突,否则的话会更新版本号
    0不是一个有效的版本号,因此版本号为0不支持_update_by_query更新
    所有的更新和查询失败都会终止,如果只是想进行简单的类似计数器类的功能可以在请求参数里加conflicts=proceed重新尝试更新

    URL参数

    除了标准的pretty参数外,Update_By_Query还可以支持refresh, wait_for_completion, wait_for_active_shards, timeout and scroll

    refresh:URL发送refresh参数会在update完之后更新所有分片的索引,与Index API中的refresh不一样的是只会接受新数据进行索引
    wait_for_completion:如果请求中包含wait_for_completion=false,则会进行与检查启动request返回一个task,可以被Index API取消或者查看状态
    wait_for_active_shards:控制在处理请求之前必须激活多少个分片副本
    timeout:设置分片的从不可用变成可用的时间,
    scroll:由于Update_By_Query会进行所有上下文检索,默认时间是5分钟,实例 ?scroll=10m 修改为10分钟
    requests_per_second:设置一个正整数,控制等待时间内每个批次操作索引的数量

    完整实例:
    GET stu/_update_by_query?pretty&conflicts=proceed&refresh=true&timeout=1s
    {
      "script": {
       "source": "ctx._source.name="lisi2";ctx._source.bugs=10"
      },
      "query": {
        "bool": {
          "filter": [
            {"term": { "id": "111" } }
          ]
        }
      }
    }
    
    下面错误示范:无法更新bugs。
    curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/stu/person/_update_by_query?conflicts=proceed&pretty' -d '
    {
      "script": {
        "source": "ctx._source.age++",
        "bugs": 10
      },
      "query": {
        "bool":{
          "filter": [
            {"term":{ "id": "111" }},
            {"term":{ "name": "lisi" }}
          ]
        }
      }
    }'
    

    八、查询数据

    查询部分请参考 elasticsearch入门使用(三) Query DSL

  • 相关阅读:
    linux学习笔记---grep
    node.js读取到的文件列表
    node 按行读取文件
    NodeJS遍历文件生产文件列表
    常用linux命令行
    2017/11/13
    Linux下输出 excel文件
    位映射对大数据的排重
    算法中的渐进记号
    KMP算法原理
  • 原文地址:https://www.cnblogs.com/nickchou/p/8547185.html
Copyright © 2011-2022 走看看