Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储

zoukankan html css js c++ java

Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储
```
public final class Lucene54DocValuesFormat
extends DocValuesFormat
```
Lucene 5.4 DocValues format.
Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:

NUMERIC:

Delta-compressed: per-document integers written as deltas from the minimum value, compressed with bitpacking. For more information, see DirectWriter.

Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as SmallFloat), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (DirectWriter).

GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.

Monotonic-compressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.

Const-compressed: when there is only one possible non-missing value, only the missing bitset is encoded.

Sparse-compressed: only documents with a value are stored, and lookups are performed using binary search.

BINARY:

Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed directly with multiplication (docID * length).

Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotonic-compressed numerics.

Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotonic-compressed numerics. A reverse lookup index is written from a portion of every 1024th term.

SORTED:

Sorted: a mapping of ordinals to deduplicated terms is written as Binary, along with the per-document ordinals written using one of the numeric strategies above.

SORTED_SET:

Single: if all documents have 0 or 1 value, then data are written like SORTED.

SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.

SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and per-document index into this list are written using the numeric strategies above.

SORTED_NUMERIC:

Single: if all documents have 0 or 1 value, then data are written like NUMERIC.

SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.

SortedNumeric: a value list and per-document index into this list are written using the numeric strategies above.

Files:

.dvd: DocValues data

.dvm: DocValues metadata

转自：http://lucene.apache.org/core/6_4_2/core/org/apache/lucene/codecs/lucene54/Lucene54DocValuesFormat.html

可以看到占用空间非常小！！！

du -sm elasticsearch/nodes/0/indices/hec_test2/0/index/* 299 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdt 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdx 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fnm 148 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.doc 130 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tim 5 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tip 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvd 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvm 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.si 1 elasticsearch/nodes/0/indices/hec_test2/0/index/segments_7 0 elasticsearch/nodes/0/indices/hec_test2/0/index/write.lock
查看全文

相关阅读:
HDU 2072（字符串的流式操作，学习了）
HDU 1007 Quoit Design（经典最近点对问题）
HDU1005 Number Sequence（找规律，周期是变化的）
HDU 1004 Let the Balloon Rise（map的使用）
ZCMU 2177 Lucky Numbers (easy)
2018 HNUCM ACM集训队选拔第一场
 HDU 1162Eddy's picture(MST问题）
HDU 1142 A Walk Through the Forest(dijkstra+记忆化DFS)
HDU 1198 Farm Irrigation（并查集，自己构造连通条件或者dfs）
nyoi 42(欧拉回路）

原文地址：https://www.cnblogs.com/bonelee/p/6669414.html