zoukankan      html  css  js  c++  java
  • Java团队课程设计——基于学院的搜索引擎

    团队名称、团队成员介绍、任务分配,团队成员课程设计博客链接

    姓名 成员介绍 任务分配 课程设计博客地址
    谢晓淞(组长) 团队输出主力 爬虫功能实现,Web前端设计及其后端衔接 爬虫:https://www.cnblogs.com/Rasang/p/12169420.html
    前端设计:https://www.cnblogs.com/Rasang/p/12169449.html
    康友煌 团队智力担当 Elasticsearch后台功能实现 https://www.cnblogs.com/xycm/p/12168554.html
    闫栩宁 团队颜值担当 GUI版搜索引擎实现 https://www.cnblogs.com/20000519yxn/p/12169954.html

    项目简介,涉及技术

    基于学院网站的搜索引擎,能够对学院网站的文章进行全文检索,时间范围检索,具有关键词联想功能。
    项目前身:https://www.cnblogs.com/dycsy/p/8351584.html (借鉴16学长们的课设进行重做)

    涉及技术:

    • Jsoup
    • HttpClient
    • HTML+CSS
    • Javascript/Jquery
    • Elasticsearch
    • IK analyzer
    • Java Swing
    • JSP
    • 多线程
    • Web
    • Linux

    项目git地址

    https://github.com/rasang/searchEngine

    项目git提交记录截图

    前期调查

    搜索主页界面

    搜索结果界面

    项目功能架构图、主要功能流程图

    面向对象设计类图

    爬虫类图

    Elasticsearch类图

    GUI类图

    项目运行截图或屏幕录制

    项目关键代码:模块名称-文字说明-关键代码

    爬虫模块

    SingleCrawler.java : 对单个页面进行爬取,并使用linksWriter进行数据保存

    for (Element link : links) {
        String newHref = link.attr("href");
        String httpPattern = "^http";
        Pattern p = Pattern.compile(httpPattern);
        Matcher m = p.matcher(newHref);
        if(m.find()){
            continue;
        }
        String newUrl = null;
        /**
         *  判断href是相对路径还是决定路径,以及是否是传参
         */
        if(newHref.length()>=1 &&  newHref.charAt(0)=='?') {
            newUrl = this.url.substring(0, this.url.indexOf('?')) + newHref;
        }
        else if(newHref.length()>=1 &&  newHref.charAt(0)=='/') {
            Matcher matcher = httpRegexPattern.matcher(this.url);
            if(matcher.find()) {
                String rootUrl = matcher.group(0);
                newUrl = rootUrl + newHref.substring(1);
            }
            else {
                continue;
            }
        }
        else if(newHref.length()>=1 &&  newHref.charAt(0)!='/'){
            Matcher matcher = httpRegexPattern.matcher(this.url);
            if(matcher.find()) {
                String rootUrl = matcher.group(0);
                newUrl = rootUrl + newHref;
            }
            else {
                continue;
            }
        }
        else {
            continue;
        }
        this.linksWriter.write(newUrl, doc);
    }
    

    UrlCollector.java :

    爬取菜单URL

    String url = "http://cec.jmu.edu.cn/";
    String cssSelector = "a[href~=\.jsp\?urltype=tree\.TreeTempUrl&wbtreeid=[0-9]+]";
    List<String> menu = null;
    List<String> list = null;
    PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
    CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(cm).build();
    LinksListWriter tempListWriter = new LinksListWriter();
    /**
     * 1. 第一层是Menu,先把Menu的href爬取下来
     */
    SingleCrawler menuCrawler = new SingleCrawler(url, cssSelector, httpClient, tempListWriter);
    menuCrawler.start();
    try {
        menuCrawler.join();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    menu = new ArrayList<String>(tempListWriter.getLinks());
    

    爬取文章URL

    SingleCrawler[] listCrawler = new SingleCrawler[menu.size()];
    for (int i = 0; i < listCrawler.length; i++) {
        String menuUrl = menu.get(i);
        if(menuUrl.contains("?")) {
            istCrawler[i] = new SingleCrawler(menu.get(i)+"&a3c=1000000&a2c=10000000", "a[href~=^info/[0-9]+/[0-9]+\.htm]", httpClient, tempListWriter);
            listCrawler[i].start();
        }
        else {
            listCrawler[i] = null;
        }
    }
    

    爬取文章内容

    SingleCrawler[] documentCrawler = new SingleCrawler[list.size()];
    for (int i = 0; i < documentCrawler.length; i++) {
        documentCrawler[i] = new SingleCrawler(list.get(i), "", httpClient, resultWriter);
        documentCrawler[i].start();
    }
    

    Web前端

    自动补全JS代码,监听search-input的input标签,当用户输入时会自动异步请求suggest.jsp页面,然后将返回的结果通过autocomplete呈现

    $(function() {
        $(".search-input").keyup(function(event) {
            var jsonData = "";
            $.ajax({
                type : "get",
                url : "suggest.jsp?term=" + document.getElementById("input").value,
                datatype : "json",
                async : true,
                error : function() {
                    console.error("Load recommand data failed!");
                },
                success : function(data) {
                    data = JSON.parse(data);
                    $(".search-input").autocomplete({
                        source : data
                    });
                }
            });
    })
    });
    

    JS实现翻页功能,主要是获取点击事件的元素,然后判断innerHTML的值:

    function turnPage(e) {
      page = e.innerHTML;
      if (page == "«") {
        var currentPage = GetQueryString("page");
        //无page传参
        if (currentPage == null) {
          AddParamVal("page", 1);
        }
        //有page传参
        else {
          currentPage = parseInt(currentPage) - 1;
          if (currentPage <= 0) currentPage = 1;
          replaceParamVal("page", currentPage);
        }
      } else if (page == "»") {
        var currentPage = GetQueryString("page");
        //无page传参
        if (currentPage == null) {
          AddParamVal("page", 2);
        }
        //有page传参
        else {
          currentPage = parseInt(currentPage) + 1;
          //if(currentPage<=0) currentPage=1;
          replaceParamVal("page", currentPage);
        }
      } else {
        var currentPage = GetQueryString("page");
        //无page传参
        if (currentPage == null) {
          AddParamVal("page", parseInt(e.innerHTML));
        }
        //有page传参
        else {
          replaceParamVal("page", parseInt(e.innerHTML));
        }
      }
    }
    

    设置JQuery日期选择插件

    $(function(){
        if(GetQueryString("timeLimit")!=null){ 
            var option=GetQueryString("timeLimit");
            document.getElementById("test6").setAttribute("placeholder",option);
        }
    })
    

    获得关键词并根据条件进行检索

    String keyword = request.getParameter("keyword");
    if(keyword!=null){
        int pageCount = request.getParameter("page")==null?1:Integer.parseInt(request.getParameter("page"));
        search = new EsSearch();
        search.inseartSearch(keyword);
        String timeLimit = request.getParameter("timeLimit");
        if(timeLimit==null){
            result = search.fullTextSerch(keyword,pageCount);
        }
        else{
            String[] time = timeLimit.split(" - ");
            result = search.rangeSerch(keyword, pageCount, time[0], time[1]);
        }
    }
    

    打印搜索结果

    if(result!=null){
        for(int i=0;i<result.size();i++){
            out.println("<div class="result-container">");
            out.println("<a href=""+result.get(i).getUrl()+"" target="_blank" class="title">"+result.get(i).getTitle()+"</a>");
            int index = result.get(i).getText().indexOf("<span style="color:red;">");
            out.println("<div class="text">"+result.get(i).getText()+"</div>");
            out.println("<div style="float: left;" class="url">"+result.get(i).getUrl()+"</div>");
            out.println("<div style="float: left;color: grey;margin-left: 30px;margin-top: 4px;">"+result.get(i).getTime()+"</div>");
            out.println("</div>");
            out.println("<div class="clear"></div>");
        }
    }
    

    Elasticsearch模块

    jest API连接elasticsearch

    public static JestClient getJestClient() {
    	if(jestClient==null) {
    		JestClientFactory factory = new JestClientFactory();  
    		factory.setHttpClientConfig(new HttpClientConfig
    			.Builder("http://127.0.0.1:9200")
    			//.gson(new GsonBuilder().setDateFormat("yyyy-MM-dd HH:mm").create())
    			.multiThreaded(true)
    			.readTimeout(10000)
    			.build());
    		jestClient=factory.getObject();
    	}
    	return jestClient;
    }
    

    索引的创建与mapping的写入

    public EsCreatIndex() {
    	jestClient =EsClient.getJestClient();
    	try {
    		jestClient.execute(new CreateIndex.Builder(EsClient.indexName).build());
    		jestClient.execute(new CreateIndex.Builder(EsClient.suggestName).build());
    	} catch (IOException e) {
    		e.printStackTrace();
    	}
    	this.createIndexMapping();
    }
    
    
    private void createIndexMapping() {
    	String sourceIndex="{"" + EsClient.typeName + "":{"properties":{"
    		+""title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart""
    		+ ","fields":{"suggest":{"type":"completion","analyzer":"ik_max_word","search_analyzer":"ik_smart"}}}"
    		+","text":{"type":"text","index":"true","analyzer":"ik_max_word","search_analyzer":"ik_smart"}"
    		+","url":{"type":"keyword"}"
    		+","time":{"type":"date"}"
    		+ "}}}";
    	PutMapping putMappingIndex=new PutMapping.Builder(EsClient.indexName, EsClient.typeName, sourceIndex).build();
    		
    	String sourceSuggest="{"" + EsClient.typeName + "":{"properties":{"
    		+""text":{"type":"completion","analyzer":"ik_max_word","search_analyzer":"ik_smart"}"
    		+ "}}}";
    	PutMapping putMappingSuggest=new PutMapping.Builder(EsClient.suggestName, EsClient.typeName, sourceSuggest).build();
    		
    	try {
    		jestClient.execute(putMappingIndex);
    		jestClient.execute(putMappingSuggest);
    	} catch (IOException e) {
    		e.printStackTrace();
    	}
    }
    

    插入数据

    public void bulkIndex(List<SearchResultEntry> list) throws IOException {
    	Bulk.Builder bulk=new Bulk.Builder();
    	for(SearchResultEntry e:list) {
    		Index index=new Index.Builder(e).index(EsClient.indexName).type(EsClient.typeName).build();
    		bulk.addAction(index);
    	}
    	jestClient.execute(bulk.build());
    }
    
    public void insertIndex(SearchResultEntry webpage) throws IOException {
    	Index index=new Index.Builder(webpage).index(EsClient.indexName).type(EsClient.typeName).build();
    	jestClient.execute(index);
    }
    

    删除数据

    public void deleteIndex() throws IOException {
    	jestClient.execute(new DeleteIndex.Builder(EsClient.indexName).build());
    }
    

    根据页数进行全文检索

    public List<SearchResultEntry> fullTextSerch(String queryString,int page) {
    	//声明一个搜索请求体
    	SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
    		
    	BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
    	boolQueryBuilder.must(QueryBuilders.queryStringQuery(queryString));
    		
    	if(this.isDateRangeQuery) {
    		QueryBuilder queryBuilder = QueryBuilders  
    			.rangeQuery("time")  
    			.gte(this.startDate)  
    			.lte(this.closingDate)  
    			.includeLower(true)  
    			.includeUpper(true);
    				/**区间查询*/ 
    		boolQueryBuilder=boolQueryBuilder.filter(queryBuilder);
    	}
    		
    	searchSourceBuilder.query(boolQueryBuilder);
    		
    	//设置高亮字段
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.field("text");
        highlightBuilder.preTags("<span style="color:red;">").postTags("</span>");
        highlightBuilder.fragmentSize(200);
        searchSourceBuilder.highlighter(highlightBuilder);
    		
    	//设置分页
        searchSourceBuilder.from((page-1)*10);
        searchSourceBuilder.size(10);
            
        //构建Search对象
        Search search=new Search.Builder(searchSourceBuilder.toString())
        	.addIndex(EsClient.indexName)
        	.addType(EsClient.typeName)
        	.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(search);
        } catch (IOException e) {
            e.printStackTrace();
        }
        resultNum=searchResult.getTotal();
        return this.storageList(searchResult);
    }
    

    根据日期进行检索

    public List<SearchResultEntry> rangeSerch(String queryString,int page,String startDate,String closingDate){
    	this.isDateRangeQuery=true;
    	this.startDate=startDate;
    	this.closingDate=closingDate;
    	List<SearchResultEntry> list=this.fullTextSerch(queryString,page);
    	this.isDateRangeQuery=false;
    	return list;
    }
    

    GUI模块

    根据页数刷新页面

    private void displayResult(){
           resultJpanel.removeAll();
           resultJpanel.setLayout(new GridLayout(2, 1));
           resultJpanel.add(resultList.get(currentPage*2-2));
           if(currentPage+currentPage <= resultNum){
               resultJpanel.add(resultList.get(currentPage*2-1));
           }
           resultJpanel.revalidate();
           resultJpanel.repaint();
           page.setText(currentPage+"/"+pageNum);
       }
    

    显示出结果后,点击跳转按钮可以用默认浏览器打开原网页

    public void actionPerformed(ActionEvent e) {
        if(Desktop.isDesktopSupported()){
            try {
                URI uri=URI.create(url);
                Desktop dp=Desktop.getDesktop();
                if(dp.isSupported(Desktop.Action.BROWSE)){
                    dp.browse(uri);
                }
            } catch (Exception o) {
                o.printStackTrace();
            }
        }
    }
    

    用List保存结果页面

    private List<JPanel> getJpanelList(List<SearchResultEntry> list) { 
           List<JPanel> resultList = new ArrayList<>();
           for(SearchResultEntry e:list){
               JPanel jPanel=new SearchLook(e);
               resultList.add(jPanel);
           }
           return resultList;
       }
    

    项目代码扫描结果及改正

    扫描结果

    1.对if-else添加完整的大括号,更正前:

    更正后:

    2.没有添加作者,更正前:

    更正后:

    3.覆盖方法没有进行注解,更正前:

    更正后:

    项目总结

    又是一年的课程设计,通过这次课程设计,学到了很多知识,Elasticsearch,JS,GUI,无一不强化了我们的编程技能。可惜的是关键词模糊推荐的功能还不是很好用,没能优化好这个功能。Web页面没有对手机进行适应,GUI也略有不足,希望将来如果有学弟学妹继续这个项目的时候能优化这些缺憾。

  • 相关阅读:
    【面积并】 Atlantis
    【动态前k大 贪心】 Gone Fishing
    【复杂枚举】 library
    【双端队列bfs 网格图建图】拯救大兵瑞恩
    【奇偶传递关系 边带权】 奇偶游戏
    【权值并查集】 supermarket
    CF w4d3 A. Pythagorean Theorem II
    CF w4d2 C. Purification
    CF w4d2 B. Road Construction
    CF w4d2 A. Cakeminator
  • 原文地址:https://www.cnblogs.com/Rasang/p/12169899.html
Copyright © 2011-2022 走看看