zoukankan html css js c++ java

（四）lucene之文本域加权

一、前言

　　1.1　　应用场景　　

有时在搜索的时候，会根据需要的不同，对不同的关键值或者不同的关键索引分配不同的权值，让权值高的内容更容易被用户搜索出来，而且排在前面。
为索引域添加权是再创建索引之前，把索引域的权值设置好，这样，在进行搜索时，lucene会对文档进行评分，这个评分机制是跟权值有关的，而且其它情况相同时，权值跟评分是成正相关的。

　　1.2　　案例　　

public class IndexTest2 {

    private String ids[] = { "1", "2", "3", "4" };
    private String authors[] = { "Jack", "Marry", "John", "Json" };
    private String positions[] = { "accounting", "technician", "salesperson", "boss" };
    private String titles[] = { "Java is a good language.", "Java is a cross platform language", "Java powerful",
            "You should learn java" };
    private String contents[] = { "If possible, use the same JRE major version at both index and search time.",
            "When upgrading to a different JRE major version, consider re-indexing. ",
            "Different JRE major versions may implement different versions of Unicode,",
            "For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6," };

    /**
     * 获取IndexWriter写索引实例对象
     * 
     * @return
     * @throws IOException
     * @throws Exception
     */
    public IndexWriter getWriter() throws IOException {

        IndexWriter writer = null;
        Directory dir = FSDirectory.open(Paths.get("E:\lucene3"));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig conf = new IndexWriterConfig(analyzer);

        writer = new IndexWriter(dir, conf);

        return writer;
    }

    /**
     * 生成索引
     * 
     * @throws IOException
     */
    @Test
    public void index() throws IOException {
        IndexWriter writer = getWriter();

        for (int i = 0; i < ids.length; i++) {
            Document doc = new Document();
            /**
             * Document.add方法中添加的如果是StringField，则不会分词，不管字符串有多长， 如果需要分词则使用TextField类
             */
            doc.add(new StringField("id", ids[i], Field.Store.YES));
            doc.add(new StringField("author", authors[i], Field.Store.YES));
            doc.add(new StringField("position", positions[i], Field.Store.YES));
            
            /**
             * 加权
             */
            TextField field=new TextField("title", titles[i], Field.Store.YES);
            if(positions[i].equals("boss")) {
                field.setBoost(2.0f);
            }
            doc.add(field);
            doc.add(new TextField("content", contents[i], Field.Store.NO));
            
            writer.addDocument(doc);
        }
        writer.close();

    }

    /**
     * 根据关键字搜索搜索
     * @throws Exception
     */
    @Test
    public void search() throws Exception {

        //directory 指向索引所在目录
        Directory directory = FSDirectory.open(Paths.get("E:\lucene3"));
        IndexReader reader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(reader);
        //key为要搜索的内容
        String key="java";
        Term t=new Term("title",key);
        Query query=new TermQuery(t);
        TopDocs hits=searcher.search(query, 20);
        System.out.println("匹配 '"+key+"'，总共查询到"+hits.totalHits+"个文档");
        for(ScoreDoc scoreDoc:hits.scoreDocs) {
            Document doc=searcher.doc(scoreDoc.doc);
            System.out.println(doc.get("author"));
        }
        reader.close();
    }

}

注意代码中橙色加注的代码为加权操作

field.setBoost(2.0f); 该方法在lucene7.0以上是没有的，本文的lucene的版本为5.5.0

lucene5.5.0 版本只能使用luke5.5.0版本打开索引，否则打开luke报错

结果：

　　1.3　　番外

如果没有加权操作，即上述代码去掉下面内容：

field.setBoost(2.0f);

结果：

可见之前的加权操作是生效的。Json的position为“boss”，则其权重被调到了2.0f（小于1.0f则是降权）。

查看全文

相关阅读:
关于自适应屏幕方向和大小的一些经验
 在线升级Android应用程序完善版
 H263&H264&MPEG4
PyCharm2019 激活
 VMware Workstation下载安装破解秘钥
 linux/kali安装及更新源以及输入法等配置
 python推倒式(列表、字典、集合)
协程
 Flask中获取参数(路径，查询，请求体，请求头)
Flask中获取参数(路径，查询，请求体，请求头)

原文地址：https://www.cnblogs.com/shyroke/p/7923152.html

（四）lucene之文本域加权

一、前言

1.1 应用场景

1.2 案例

1.3 番外

　　1.1　　应用场景　　

　　1.2　　案例　　

　　1.3　　番外