zoukankan      html  css  js  c++  java
  • ES doc_values介绍——本质是field value的列存储,做聚合分析用,ES默认开启,会占用存储空间(列存储压缩技巧,除公共除数或者同时减去最小数,字符串压缩的话,直接去重后用数字ID压缩)

    doc_values

    Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations.(本质!!!) Doc values are supported on almost all field types, with the notable exception of analyzed string fields.

    All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space:

    PUT my_index
    {
      "mappings": {
        "my_type": {
          "properties": {
            "status_code": { 
              "type":       "keyword"
            },
            "session_id": { 
              "type":       "keyword",
              "doc_values": false
            }
          }
        }
      }
    }

    The status_code field has doc_values enabled by default.

    The session_id has doc_values disabled, but can still be queried.

    摘自:https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html

     

    Column-store compression

    At a high level, doc values are essentially a serialized column-store. As we discussed in the last section, column-stores excel at certain operations because the data is naturally laid out in a fashion that is amenable to those queries.

    But they also excel at compressing data, particularly numbers. This is important for both saving space on disk and for faster access. Modern CPU’s are many orders of magnitude faster than disk drives (although the gap is narrowing quickly with upcoming NVMe drives). That means it is often advantageous to minimize the amount of data that must be read from disk, even if it requires extra CPU cycles to decompress.

    To see how it can help compression, take this set of doc values for a numeric field:

    Doc      Terms
    -----------------------------------------------------------------
    Doc_1 | 100
    Doc_2 | 1000
    Doc_3 | 1500
    Doc_4 | 1200
    Doc_5 | 300
    Doc_6 | 1900
    Doc_7 | 4200
    -----------------------------------------------------------------

    The column-stride layout means we have a contiguous block of numbers:[100,1000,1500,1200,300,1900,4200]

    xxx

    Doc values use several tricks like this. In order, the following compression schemes are checked:

    1. If all values are identical (or missing), set a flag and record the value
    2. If there are fewer than 256 values, a simple table encoding is used
    3. If there are > 256 values, check to see if there is a common divisor
    4. If there is no common divisor, encode everything as an offset from the smallest value

    You’ll note that these compression schemes are not "traditional" general purpose compression like DEFLATE or LZ4. Because the structure of column-stores are rigid and well-defined, we can achieve higher compression by using specialized schemes rather than the more general compression algorithms like LZ4.

    Note

    You may be thinking "Well that’s great for numbers, but what about strings?" Strings are encoded similarly, with the help of an ordinal table. The strings are de-duplicated and sorted into a table, assigned an ID, and then those ID’s are used as numeric doc values. Which means strings enjoy many of the same compression benefits that numerics do.

    The ordinal table itself has some compression tricks, such as using fixed, variable or prefix-encoded strings.

       

    摘自:https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html

  • 相关阅读:
    在Android应用程序使用YouTube API来嵌入视频
    一个现代化的JSON库Moshi针对Android和Java
    安卓蓝牙技术Bluetooth使用流程(Bluetooth详解)
    android和javascript之间相互通信实例分析
    Android开发JDBC连接mysql数据库导入驱动方法
    android zxing自定义界面,点击按钮开关闪光灯
    Android性能优化之如何避免Overdraw
    android自定义控件实现刮刮乐效果
    关于linux 添加新的硬盘
    java整型数与网络字节序的 byte[] 数组转换关系
  • 原文地址:https://www.cnblogs.com/bonelee/p/6401466.html
Copyright © 2011-2022 走看看