zoukankan      html  css  js  c++  java
  • lucene IndexOptions可以设置DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS DOCS,ES里也可以设置

    org.apache.lucene.index

    Java Code Examples for org.apache.lucene.index.IndexOptions

    Example 4
    Project: languagetool   File: EmptyLuceneIndexCreator.java View source code 6 votes vote downvote up
    public static void main(String[] args) throws IOException {
      if (args.length != 1) {
        System.out.println("Usage: " + EmptyLuceneIndexCreator.class.getSimpleName() + " <indexPath>");
        System.exit(1);
      }
      Analyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      Directory directory = FSDirectory.open(new File(args[0]).toPath());
      IndexWriter writer = new IndexWriter(directory, config);
    
      FieldType fieldType = new FieldType();
      fieldType.setIndexOptions(IndexOptions.DOCS);
      fieldType.setStored(true);
      Field countField = new Field("totalTokenCount", String.valueOf(0), fieldType);
      Document doc = new Document();
      doc.add(countField);
      writer.addDocument(doc);
    
      writer.close();
    }
     
    ES里,
    first of all index_options & term_vectors are two totally different things. 
    index_options are "options" for the index you are searching on, a 
    datastructure that holds "terms" to document lists (posting lists). 
    TermVectors are a datastructure that gives you the "terms" for a given 
    document and in addition their position in the document as well as their 
    start and end character offsets. Now the index (each field has such an 
    index) holds a sorted list of terms and each term points to a posting list. 
    these posting lists are a list of documents that contain the term. On the 
    posting list you can also store information like frequencies (how often did 
    term Y occur in document X -> useful for scoring) as well as "positions" 
    (at which position did term Y occur in document X -> this is required fo 
    phrase & span queries). 

    if you have for instance a field that you only use for filtering you don't 
    need freqs and postions so documents only will do the job. In an index the 
    position information is the biggest piece of data usually aside stored 
    fields. If you don't do phrase queries or spans you don't need them at all 
    so safe the disk space and improve perf by only use docs and freqs. In 
    previous version it wasn't possible to have only freqs but no positions 
    (index_options supersede omit_term_frequencies_and_positions) so this is an 
    improvement overall since the most common usecase might only need freqs but 
    no positions. 
     
    附上一些选项:
    1:term_vector
    TermVector.YES: Only store number of occurrences.
    TermVector.WITH_POSITIONS: Store number of occurrence and positions of terms, but no offset.
    TermVector.WITH_OFFSETS: Store number of occurrence and offsets of terms, but no positions.
    TermVector.WITH_POSITIONS_OFFSETS:number of occurrence and positions , offsets of terms.
    TermVector.NO:Don't store any term vector information.
    2: index_options
    Allows to set the indexing options, possible values are docs (only doc numbers are indexed), freqs (doc numbers and term frequencies), and positions (doc numbers, term frequencies and positions). Defaults to positions for analyzed fields, and to docs for not_analyzed fields. It is also possible to set it to offsets (doc numbers, term frequencies, positions and offsets).
     
    参考:https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/FieldInfo.IndexOptions.html
    http://elasticsearch.cn/question/119
  • 相关阅读:
    c# 发送邮件
    C# Android 开发中使用 Sqlite.NET ORM
    VS2015 使用 Visual Studio Emulator For Android 调试无法命中断点的解决办法?
    asp.net 下载文件
    ScriptManager 发送错误到客户端
    C# 比较两个路径是否指向同一对象
    IIS 集成模式 导致 AjaxPro 无法正常运行
    C#编码、解码与ASP.NET编码解码对应函数
    FTP 命令连接(带用户名和密码)方法
    医学-药物-大环内酯类-阿奇霉素(Azithromycin)
  • 原文地址:https://www.cnblogs.com/bonelee/p/6397455.html
Copyright © 2011-2022 走看看