zoukankan      html  css  js  c++  java
  • 使用painless将ElasticSearch字符串拆分为数组

    一、实现场景:

    ES字符串类型字段imgs,有些历史数据是用逗号分隔的字符串,需要将历史数据拆分为数组形式。

    示例:

    1.构造测试数据:

    创建索引并推送几条典型的历史数据,涵盖以下几种情况:

    • 逗号分隔字符串;
    • 数组类型;
    • 长度为0的字符串;
    • 空数组。
    PUT test_cj/test/id_1
    {
      "imgs": "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
    }
    
    
    PUT test_cj/test/id_2
    {
      "imgs": [
        "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
        "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
        "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
        "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
        "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
      ]
    }
    
    PUT test_cj/test/id_3
    {
      "imgs": ""
    }
    
    PUT test_cj/test/id_4
    {
      "imgs": []
    }
    

    2.确认一下数据。

    GET test_cj/_search
    
    [
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_1",
            "_score" : 1.0,
            "_source" : {
              "imgs" : "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
            }
          },
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_2",
            "_score" : 1.0,
            "_source" : {
              "imgs" : [
                "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
              ]
            }
          },
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_3",
            "_score" : 1.0,
            "_source" : {
              "imgs" : ""
            }
          },
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_4",
            "_score" : 1.0,
            "_source" : {
              "imgs" : [ ]
            }
          }
        ]
    

    3.执行painless脚本

    使用painless脚本更新历史数据。有几点需要注意:

    • 只更新符合某些条件的数据,可以使用_update_by_query操作,这个例子比较简单没有设置query语句。
    • 执行过程中冲突处理方式,这里使用的是conflicts=proceed,表示继续执行;
    • painless检测对象类型使用关键字instanceof;
    • painless脚本拆分字符串,想避免使用正则表达式,而是选用了StringTokenizer实现。
    POST test_cj/_update_by_query?conflicts=proceed
    {
      "script": {
        "source": """
        if(ctx._source['imgs'] instanceof String){
          String s=ctx._source['imgs'];
          ArrayList array=new ArrayList();
          if(!s.isEmpty()){
             String splitter = ",";
             StringTokenizer tokenValue = new StringTokenizer(s, splitter);
             while (tokenValue.hasMoreTokens()) {
                array.add(tokenValue.nextToken());
             }
          }
         ctx._source.imgs=array;
        }
    """
      }
    }
    

    4.如果更新数据量较大,需要执行一段时间,期间查看执行进度:

    GET _tasks?detailed=true&actions=*byquery
    

    5.查看执行结果。

    GET test_cj/_search
    
    [
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_1",
            "_score" : 1.0,
            "_source" : {
              "imgs" : [
                "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg",
                "https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg",
                "https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
              ]
            }
          },
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_2",
            "_score" : 1.0,
            "_source" : {
              "imgs" : [
                "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
                "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
              ]
            }
          },
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_3",
            "_score" : 1.0,
            "_source" : {
              "imgs" : [ ]
            }
          },
          {
            "_index" : "test_cj",
            "_type" : "test",
            "_id" : "id_4",
            "_score" : 1.0,
            "_source" : {
              "imgs" : [ ]
            }
          }
        ]
    
  • 相关阅读:
    mybatis动态sql和分页
    mybatis入门
    IDEA
    Linux环境搭建
    svn
    jwt
    Vuex
    SPA项目开发之CRUD+表单验证
    JavaScript可视化框架——Echarts
    python+selenium六:隐式等待
  • 原文地址:https://www.cnblogs.com/janes/p/13914360.html
Copyright © 2011-2022 走看看