zoukankan      html  css  js  c++  java
  • lucene segment会包含所有的索引文件,如tim tip等,可以认为是mini的独立索引

    A Lucene index segment can be viewed as a "mini" index or a shard. Each segment is a collection of all needed files for an index, including .tim and .tip. If you list your Lucene index directory, you'll see files belonging to the same segment have the same names with all different types. In fact, if you force a merge, you'll get an index of one single segment.

    Each segment  contains an index of a subset of your document collection. Lucene usually creates a new segment when new documents are added to a working index, to avoid (or rather delay and batch later) reindexing cost.

    When a search is executed, Lucene will fan that query over all segments, and all the index wide statistics required for relevance ranking (such as idf) are combined, so from the client's perspective, the ranking is the same as searching from an index of one segment. Note that the other famous stat, tf, is per-document, so it is already available at the segment reader layer.

    Now things get more interesting when you have Lucene indexes across machines (as the case in Solr Cloud, which is one of the distributed search service built on Lucene). Due to performance and complexity, Solr Cloud don't aggregate global stats across clusters (yet), so each machine would use their own stats on the index it holds (which could be consisted of multiple segments :).

    摘自:https://www.quora.com/Are-the-individual-tim-and-tip-files-term-dictionaries-of-a-Lucene-index-segment-updated-when-a-new-segment-is-added-to-Lucene

  • 相关阅读:
    lcd驱动解析(二)
    php参数引用
    Curl来做冒烟测试
    TIB自动化测试工作室QTP脚本汇总比较有价值
    使用QTP自动化 DZ消息复选框选择实例!
    正则表达式30分钟
    sqlserver查询数据库中有多少个表
    怎样获取页面上所有链接的名称和url
    Curl 来做自动跟踪重定向
    sqlserver2008 删除指定表
  • 原文地址:https://www.cnblogs.com/bonelee/p/6668774.html
Copyright © 2011-2022 走看看