zoukankan      html  css  js  c++  java
  • Lucene-高效高亮搜索技术

      环境:jdk8或更高版本

      参考链接:1.   how2j-lucene

            2  . txt导入mysql数据 

      实现效果:能高效搜索(较普通数据库搜索,然后将关键字标红,效果可以放到html中查看)

      性能对比:1.能将不同相关度的结果都查询出来,而like模糊查询就做不到这一点

           2.如果数据量很大,比如下面我拿了14万条数据对比,时间差距还是很大的

     

    Java测试文件:

     1.mysql数据   :下载链接 (里面包含txt数据和mysql建表源代码)

     2. jar包下载:  https://files.cnblogs.com/files/meditation5201314/lucene-lib.rar 

     3.Java文件:

     1 package com.empirefree.lucene;
     2 /**
     3 * @author Empirefree 胡宇乔:
     4 * @version 创建时间:2020年3月31日 下午5:48:13
     5 */
     6 public class Product {
     7     int id;
     8     String name;
     9     String category;
    10     float price;
    11     String place;
    12  
    13     String code;
    14     public int getId() {
    15         return id;
    16     }
    17     public void setId(int id) {
    18         this.id = id;
    19     }
    20     public String getName() {
    21         return name;
    22     }
    23     public void setName(String name) {
    24         this.name = name;
    25     }
    26     public String getCategory() {
    27         return category;
    28     }
    29     public void setCategory(String category) {
    30         this.category = category;
    31     }
    32     public float getPrice() {
    33         return price;
    34     }
    35     public void setPrice(float price) {
    36         this.price = price;
    37     }
    38     public String getPlace() {
    39         return place;
    40     }
    41     public void setPlace(String place) {
    42         this.place = place;
    43     }
    44  
    45     public String getCode() {
    46         return code;
    47     }
    48     public void setCode(String code) {
    49         this.code = code;
    50     }
    51     @Override
    52     public String toString() {
    53         return "Product [id=" + id + ", name=" + name + ", category=" + category + ", price=" + price + ", place="
    54                 + place + ", code=" + code + "]";
    55     }
    56 }
    Product.java
      1 package com.empirefree.lucene;
      2 
      3 import java.io.File;
      4 import java.io.IOException;
      5 import java.sql.Connection;
      6 import java.sql.DriverManager;
      7 import java.sql.SQLException;
      8 import java.sql.Statement;
      9 import java.util.ArrayList;
     10 import java.util.List;
     11 
     12 import org.apache.commons.io.FileUtils;
     13 import com.empirefree.lucene.JdbcConnection;
     14 import com.mysql.jdbc.ResultSet;
     15 
     16 /**
     17 * @author Empirefree 胡宇乔:
     18 * @version 创建时间:2020年3月31日 下午5:49:56
     19 */
     20 public class ProductUtil {
     21     private static final String URL="jdbc:mysql://127.0.0.1:3306/campus_system?useUnicode=true&characterEncoding=utf-8";
     22     private static final String USER="root";
     23     private static final String PASSWORD="root";
     24     
     25     private static Connection connection=null;
     26     
     27     static {
     28         try {
     29             //1.加载驱动程序
     30             Class.forName("com.mysql.jdbc.Driver");
     31             //2.获得数据库的连接
     32             connection=DriverManager.getConnection(URL, USER, PASSWORD);
     33         } catch (ClassNotFoundException e) {
     34             e.printStackTrace();
     35         } catch (SQLException e) {
     36             e.printStackTrace();
     37         }
     38     }
     39     
     40     
     41     public static Product lineproduct(String line) {
     42         Product p = new Product();
     43         String[] fields = line.split(",");
     44         p.setId(Integer.parseInt(fields[0]));
     45         p.setName(fields[1]);
     46         p.setCategory(fields[2]);
     47         p.setPrice(Float.parseFloat(fields[3]));
     48         p.setPlace(fields[4]);
     49         p.setCode(fields[5]);
     50         
     51         return p;
     52     }
     53     
     54     public static List<Product> filelist(String filename) throws IOException {
     55         File file = new File(filename);
     56         List<String> lines = FileUtils.readLines(file, "UTF-8");
     57         List<Product> products = new ArrayList<>();
     58         for(String line : lines){
     59             Product p = lineproduct(line);
     60             products.add(p);
     61         }
     62         return products;
     63     }
     64     public static List<Product> mysqllist(){
     65 //        Connection connection = new JdbcConnection().getConnection();
     66         Statement statement = null;
     67         List<Product>products = new ArrayList<>();
     68 
     69         try {
     70             //执行数据库操作语句(注意是包sql,不是mysql)
     71             statement = connection.createStatement();
     72             
     73             String sql = "select * from product";
     74             ResultSet resultSet = (ResultSet) statement.executeQuery(sql);
     75             while (resultSet.next()) {
     76                 Product product = new Product();
     77                 product.setId(resultSet.getInt("id"));
     78                 product.setName(resultSet.getString("name"));
     79                 product.setCategory(resultSet.getString("category"));
     80                 product.setPrice(resultSet.getFloat("price"));
     81                 product.setPlace(resultSet.getString("place"));
     82                 product.setCode(resultSet.getString("code"));
     83                 products.add(product);
     84             }
     85             
     86         } catch (SQLException e) {
     87             // TODO Auto-generated catch block
     88             e.printStackTrace();
     89         } finally {
     90             //数据库连接关闭:先关闭statement,后关闭connection
     91             if (statement != null) {
     92                 try {
     93                     statement.close();
     94                 } catch (SQLException e2) {
     95                     // TODO: handle exception
     96                     e2.printStackTrace();
     97                 }
     98             }
     99             if (connection != null) {
    100                 try {
    101                     connection.close();
    102                 } catch (SQLException e2) {
    103                     // TODO: handle exception
    104                     e2.printStackTrace();
    105                 }
    106             }
    107         }
    108         return products;
    109     }
    110     
    111     public static List<Product> mysqllist2(String searchname){
    112 //        Connection connection = new JdbcConnection().getConnection();
    113         Statement statement = null;
    114         List<Product>products = new ArrayList<>();
    115 
    116         try {
    117             //执行数据库操作语句(注意是包sql,不是mysql)
    118             statement = connection.createStatement();
    119             
    120             String sql = "select * from product where name like  '%" + searchname + "%'";
    121             ResultSet resultSet = (ResultSet) statement.executeQuery(sql);
    122             while (resultSet.next()) {
    123                 Product product = new Product();
    124                 product.setId(resultSet.getInt("id"));
    125                 product.setName(resultSet.getString("name"));
    126                 product.setCategory(resultSet.getString("category"));
    127                 product.setPrice(resultSet.getFloat("price"));
    128                 product.setPlace(resultSet.getString("place"));
    129                 product.setCode(resultSet.getString("code"));
    130                 products.add(product);
    131             }
    132             
    133         } catch (SQLException e) {
    134             // TODO Auto-generated catch block
    135             e.printStackTrace();
    136         } finally {
    137             //数据库连接关闭:先关闭statement,后关闭connection
    138             if (statement != null) {
    139                 try {
    140                     statement.close();
    141                 } catch (SQLException e2) {
    142                     // TODO: handle exception
    143                     e2.printStackTrace();
    144                 }
    145 //            }
    146 //            if (connection != null) {
    147 //                try {
    148 //                    connection.close();
    149 //                } catch (SQLException e2) {
    150 //                    // TODO: handle exception
    151 //                    e2.printStackTrace();
    152 //                }
    153             }
    154         }
    155         return products;
    156     }
    157     public static void deleteconnection() throws SQLException {
    158         connection.close();
    159     }
    160     
    161     public static void main(String[] args) throws IOException {
    162         String filename = "140k_products.txt";
    163 //        List<Product> products = filelist(filename);
    164         List<Product> products = mysqllist();
    165         for(Product name : products){
    166             System.out.println(name);
    167         }
    168 //        System.out.println(products.size());
    169         
    170     }
    171 }
    ProductUtil.java(与mysql的连接,单独写成一个文件,方便以后调用)
      1 package com.empirefree.lucene;
      2 /**
      3 * @author Empirefree 胡宇乔:
      4 * @version 创建时间:2020年3月31日 下午5:45:39
      5 */
      6 
      7 import java.io.IOException;
      8 import java.io.StringReader;
      9 import java.util.List;
     10 import java.util.Scanner;
     11  
     12 import org.apache.lucene.analysis.TokenStream;
     13 import org.apache.lucene.document.Document;
     14 import org.apache.lucene.document.Field;
     15 import org.apache.lucene.document.TextField;
     16 import org.apache.lucene.index.DirectoryReader;
     17 import org.apache.lucene.index.IndexReader;
     18 import org.apache.lucene.index.IndexWriter;
     19 import org.apache.lucene.index.IndexWriterConfig;
     20 import org.apache.lucene.index.IndexableField;
     21 import org.apache.lucene.queryparser.classic.QueryParser;
     22 import org.apache.lucene.search.IndexSearcher;
     23 import org.apache.lucene.search.Query;
     24 import org.apache.lucene.search.ScoreDoc;
     25 import org.apache.lucene.search.highlight.Highlighter;
     26 import org.apache.lucene.search.highlight.QueryScorer;
     27 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
     28 import org.apache.lucene.store.Directory;
     29 import org.apache.lucene.store.RAMDirectory;
     30 import org.wltea.analyzer.lucene.IKAnalyzer;
     31 
     32 
     33 public class TestLucene2 {
     34     
     35     private static Directory createIndex(IKAnalyzer analyzer) throws IOException {
     36         Directory index = new RAMDirectory();
     37         IndexWriterConfig config = new IndexWriterConfig(analyzer);
     38         IndexWriter writer = new IndexWriter(index, config);
     39         String fileName = "140k_products.txt";
     40         
     41 //        List<Product> products = ProductUtil.filelist(fileName);
     42         List<Product> products = ProductUtil.mysqllist();
     43         int total = products.size();
     44         int count = 0;
     45         int per = 0;
     46         int oldPer = 0;
     47         for (Product p : products) {
     48             addDoc(writer, p);
     49             count++;
     50             per = count*100/total;
     51             if(per!=oldPer){
     52                 oldPer = per;
     53                 System.out.printf("索引中,总共要添加 %d 条记录,当前添加进度是: %d%% %n",total,per);
     54             }
     55         }
     56         writer.close();
     57         return index;
     58     }
     59  
     60     private static void addDoc(IndexWriter w, Product p) throws IOException {
     61         Document doc = new Document();
     62 //        doc.add(new TextField("id", String.valueOf(p.getId()), Field.Store.YES));
     63         doc.add(new TextField("name", p.getName(), Field.Store.YES));
     64 //        doc.add(new TextField("category", p.getCategory(), Field.Store.YES));
     65 //        doc.add(new TextField("price", String.valueOf(p.getPrice()), Field.Store.YES));
     66 //        doc.add(new TextField("place", p.getPlace(), Field.Store.YES));
     67 //        doc.add(new TextField("code", p.getCode(), Field.Store.YES));
     68         w.addDocument(doc);
     69     }
     70     
     71     private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception {
     72         System.out.println("找到 " + hits.length + " 个命中.");
     73  
     74         SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
     75         Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
     76  
     77         System.out.println("找到 " + hits.length + " 个命中.");
     78         System.out.println("序号	匹配度得分	结果");
     79         for (int i = 0; i < hits.length; ++i) {
     80             ScoreDoc scoreDoc= hits[i];
     81             int docId = scoreDoc.doc;
     82             Document d = searcher.doc(docId);
     83             List<IndexableField> fields= d.getFields();
     84             System.out.print((i + 1) );
     85             System.out.print("	" + scoreDoc.score);
     86             for (IndexableField f : fields) {
     87  
     88                 if("name".equals(f.name())){
     89                     TokenStream tokenStream = analyzer.tokenStream(f.name(), new StringReader(d.get(f.name())));
     90                     String fieldContent = highlighter.getBestFragment(tokenStream, d.get(f.name()));
     91                     System.out.print("	"+fieldContent);
     92                     System.out.print("?????????
    ");
     93                 }
     94                 else{
     95                     System.out.print("	"+d.get(f.name()));
     96                 }
     97             }
     98             System.out.println("<br>");
     99         }
    100     }
    101  
    102     
    103     
    104     public static void main(String[] args) throws Exception {
    105         Scanner s = new Scanner(System.in);
    106         System.out.print("请输入查询关键字:");
    107         String keyword = s.nextLine();
    108         System.out.println("当前关键字是:"+keyword);
    109         long startTime = System.currentTimeMillis();
    110         List<Product> products = ProductUtil.mysqllist2(keyword);
    111         long endTime = System.currentTimeMillis();
    112         System.out.println("Like程序运行时间:" + (endTime - startTime) + "ns");
    113         
    114         for(Product name : products){
    115             System.out.println(name.getName());
    116         }
    117        
    118         /******************************************************************************/
    119         // 1. 准备中文分词器
    120         IKAnalyzer analyzer = new IKAnalyzer();
    121         // 2. 索引
    122         Directory index = createIndex(analyzer);
    123         
    124         // 3. 查询器
    125         s = new Scanner(System.in);
    126         System.out.print("请输入查询关键字:");
    127         keyword = s.nextLine();
    128         System.out.println("当前关键字是:"+keyword);
    129         Query query = new QueryParser("name", analyzer).parse(keyword);
    130         
    131         startTime = System.currentTimeMillis();
    132         // 4. 搜索
    133         IndexReader reader = DirectoryReader.open(index);
    134         IndexSearcher searcher=new IndexSearcher(reader);
    135         int numberPerPage = 10;
    136         ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
    137         endTime = System.currentTimeMillis();
    138         System.out.println("Lucene程序运行时间:" + (endTime - startTime) + "ns");
    139         
    140         // 5. 显示查询结果
    141         showSearchResults(searcher, hits,query,analyzer);
    142         // 6. 关闭查询
    143         reader.close();
    144        
    145         ProductUtil.deleteconnection();
    146     }
    147 }
    TestLucene2.java-数据库

    TestLucene2.java注意点:

    1.我将Product全提取出来了,如果只需要查name(或者username等更改即可),dou.add就注释掉其他内容

    2.dou.add(中,p.getID()是int就要转成String)

    3.最后输出结果可以用List保存下来,然后前端EL表达式显示即可(也可以控制标题显示数目)

      Lucene讲解:

        1.addDou():将Product赋值,方便后面查询

        2.createIndex():创建索引,同时调用mysqllist()连接数据库(存储数据)和addDou,完成存储数据

        3.showSearchResults():在上面存储数据返回的结果中搜索数据,然后标红.

        详细过程:先是创建内存索引(createIndex()函数,普通like是数据库查询,而Lucene是先加载到内存中,然后再查询,就是加载一次,到处查询的样子),创建内存索引Directory的时候,

    将查询对象属性Product全加载到Document中(这样后面无论查Product的什么内容都可以查,只需要修改name成别的就行)。

    ----------------------------------------------------扩展知识--------------------------------------------------------

    1.mysql连接:普通mysql就是连接,然后close,但是开发时候很多次都要查询,所以就写成static,然后调用deleteconnection就可以删除连接了

    (详细过程见ProductUtil.java)

    2.

     txt导入数据到mysql表中:

    LOAD DATA INFILE 'E:/xxx.txt' 
    REPLACE INTO TABLE test FIELDS TERMINATED BY ',' LINES TERMINATED BY '
    '

    txt数据格式应该如下所示

  • 相关阅读:
    hdu 4622 Reincarnation 字符串hash 模板题
    NYOJ 994 海盗分金 逆向递推
    hdu 4679 Terrorist’s destroy 树形DP
    Educational Codeforces Round 12 E. Beautiful Subarrays 预处理+二叉树优化
    hdu 5535 Cake 构造+记忆化搜索
    poj 3415 Common Substrings 后缀数组+单调栈
    poj 3518 Corporate Identity 后缀数组->多字符串最长相同连续子串
    poj 2774 Long Long Message 后缀数组LCP理解
    hdu 3518 Boring counting 后缀数组LCP
    poj 3641 Pseudoprime numbers Miller_Rabin测素裸题
  • 原文地址:https://www.cnblogs.com/meditation5201314/p/12612057.html
Copyright © 2011-2022 走看看