zoukankan      html  css  js  c++  java
  • (四)lucene之文本域加权

    一、前言

      1.1  应用场景  

    • 有时在搜索的时候,会根据需要的不同,对不同的关键值或者不同的关键索引分配不同的权值,让权值高的内容更容易被用户搜索出来,而且排在前面。

      为索引域添加权是再创建索引之前,把索引域的权值设置好,这样,在进行搜索时,lucene会对文档进行评分,这个评分机制是跟权值有关的,而且其它情况相同时,权值跟评分是成正相关的。

      1.2  案例  

    public class IndexTest2 {
    
        private String ids[] = { "1", "2", "3", "4" };
        private String authors[] = { "Jack", "Marry", "John", "Json" };
        private String positions[] = { "accounting", "technician", "salesperson", "boss" };
        private String titles[] = { "Java is a good language.", "Java is a cross platform language", "Java powerful",
                "You should learn java" };
        private String contents[] = { "If possible, use the same JRE major version at both index and search time.",
                "When upgrading to a different JRE major version, consider re-indexing. ",
                "Different JRE major versions may implement different versions of Unicode,",
                "For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6," };
    
        /**
         * 获取IndexWriter写索引实例对象
         * 
         * @return
         * @throws IOException
         * @throws Exception
         */
        public IndexWriter getWriter() throws IOException {
    
            IndexWriter writer = null;
            Directory dir = FSDirectory.open(Paths.get("E:\lucene3"));
            Analyzer analyzer = new StandardAnalyzer();
            IndexWriterConfig conf = new IndexWriterConfig(analyzer);
    
            writer = new IndexWriter(dir, conf);
    
            return writer;
        }
    
        /**
         * 生成索引
         * 
         * @throws IOException
         */
        @Test
        public void index() throws IOException {
            IndexWriter writer = getWriter();
    
            for (int i = 0; i < ids.length; i++) {
                Document doc = new Document();
                /**
                 * Document.add方法中添加的如果是StringField,则不会分词,不管字符串有多长, 如果需要分词则使用TextField类
                 */
                doc.add(new StringField("id", ids[i], Field.Store.YES));
                doc.add(new StringField("author", authors[i], Field.Store.YES));
                doc.add(new StringField("position", positions[i], Field.Store.YES));
                
                /**
                 * 加权
                 */
                TextField field=new TextField("title", titles[i], Field.Store.YES);
                if(positions[i].equals("boss")) {
                    field.setBoost(2.0f);
                }
                doc.add(field);
                doc.add(new TextField("content", contents[i], Field.Store.NO));
                
                writer.addDocument(doc);
            }
            writer.close();
    
        }
    
        /**
         * 根据关键字搜索搜索
         * @throws Exception
         */
        @Test
        public void search() throws Exception {
    
            //directory 指向索引所在目录
            Directory directory = FSDirectory.open(Paths.get("E:\lucene3"));
            IndexReader reader = DirectoryReader.open(directory);
            IndexSearcher searcher = new IndexSearcher(reader);
            //key为要搜索的内容
            String key="java";
            Term t=new Term("title",key);
            Query query=new TermQuery(t);
            TopDocs hits=searcher.search(query, 20);
            System.out.println("匹配 '"+key+"',总共查询到"+hits.totalHits+"个文档");
            for(ScoreDoc scoreDoc:hits.scoreDocs) {
                Document doc=searcher.doc(scoreDoc.doc);
                System.out.println(doc.get("author"));
            }
            reader.close();
        }
    
    }
    • 注意代码中橙色加注的代码为加权操作
    • field.setBoost(2.0f); 该方法在lucene7.0以上是没有的,本文的lucene的版本为5.5.0
    •  lucene5.5.0 版本 只能使用luke5.5.0版本打开索引,否则打开luke报错
    •  结果:

       1.3  番外

    •  如果没有加权操作,即上述代码去掉下面内容:
    field.setBoost(2.0f);
    • 结果:

    • 可见之前的加权操作是生效的。Json的position为“boss”,则其权重被调到了2.0f(小于1.0f则是降权)。
  • 相关阅读:
    C语言I博客作业06
    C语言I博客作业05
    C语言I博客作业04
    C语言I博客作业03
    C语言I博客作业02
    作业01
    java ui 点点记
    eclipse修改workspace目录
    postgres恢复
    JDK1.4和JDK1.5以及1.6
  • 原文地址:https://www.cnblogs.com/shyroke/p/7923152.html
Copyright © 2011-2022 走看看