zoukankan      html  css  js  c++  java
  • Java团队课程设计——基于学院的搜索引擎

    团队名称、团队成员介绍、任务分配,团队成员课程设计博客链接

    姓名 成员介绍 任务分配 课程设计博客地址
    谢晓淞(组长) 团队输出主力 爬虫功能实现,Web前端设计及其后端衔接 爬虫:https://www.cnblogs.com/Rasang/p/12169420.html
    前端设计:https://www.cnblogs.com/Rasang/p/12169449.html
    康友煌 团队智力担当 Elasticsearch后台功能实现 https://www.cnblogs.com/xycm/p/12168554.html
    闫栩宁 团队颜值担当 GUI版搜索引擎实现 https://www.cnblogs.com/20000519yxn/p/12169954.html

    项目简介,涉及技术

    基于学院网站的搜索引擎,能够对学院网站的文章进行全文检索,时间范围检索,具有关键词联想功能。
    项目前身:https://www.cnblogs.com/dycsy/p/8351584.html (借鉴16学长们的课设进行重做)

    涉及技术:

    • Jsoup
    • HttpClient
    • HTML+CSS
    • Javascript/Jquery
    • Elasticsearch
    • IK analyzer
    • Java Swing
    • JSP
    • 多线程
    • Web
    • Linux

    项目git地址

    https://github.com/rasang/searchEngine

    项目git提交记录截图

    前期调查

    搜索主页界面

    搜索结果界面

    项目功能架构图、主要功能流程图

    面向对象设计类图

    爬虫类图

    Elasticsearch类图

    GUI类图

    项目运行截图或屏幕录制

    项目关键代码:模块名称-文字说明-关键代码

    爬虫模块

    SingleCrawler.java : 对单个页面进行爬取,并使用linksWriter进行数据保存

    for (Element link : links) {
        String newHref = link.attr("href");
        String httpPattern = "^http";
        Pattern p = Pattern.compile(httpPattern);
        Matcher m = p.matcher(newHref);
        if(m.find()){
            continue;
        }
        String newUrl = null;
        /**
         *  判断href是相对路径还是决定路径,以及是否是传参
         */
        if(newHref.length()>=1 &&  newHref.charAt(0)=='?') {
            newUrl = this.url.substring(0, this.url.indexOf('?')) + newHref;
        }
        else if(newHref.length()>=1 &&  newHref.charAt(0)=='/') {
            Matcher matcher = httpRegexPattern.matcher(this.url);
            if(matcher.find()) {
                String rootUrl = matcher.group(0);
                newUrl = rootUrl + newHref.substring(1);
            }
            else {
                continue;
            }
        }
        else if(newHref.length()>=1 &&  newHref.charAt(0)!='/'){
            Matcher matcher = httpRegexPattern.matcher(this.url);
            if(matcher.find()) {
                String rootUrl = matcher.group(0);
                newUrl = rootUrl + newHref;
            }
            else {
                continue;
            }
        }
        else {
            continue;
        }
        this.linksWriter.write(newUrl, doc);
    }
    

    UrlCollector.java :

    爬取菜单URL

    String url = "http://cec.jmu.edu.cn/";
    String cssSelector = "a[href~=\.jsp\?urltype=tree\.TreeTempUrl&wbtreeid=[0-9]+]";
    List<String> menu = null;
    List<String> list = null;
    PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
    CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(cm).build();
    LinksListWriter tempListWriter = new LinksListWriter();
    /**
     * 1. 第一层是Menu,先把Menu的href爬取下来
     */
    SingleCrawler menuCrawler = new SingleCrawler(url, cssSelector, httpClient, tempListWriter);
    menuCrawler.start();
    try {
        menuCrawler.join();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    menu = new ArrayList<String>(tempListWriter.getLinks());
    

    爬取文章URL

    SingleCrawler[] listCrawler = new SingleCrawler[menu.size()];
    for (int i = 0; i < listCrawler.length; i++) {
        String menuUrl = menu.get(i);
        if(menuUrl.contains("?")) {
            istCrawler[i] = new SingleCrawler(menu.get(i)+"&a3c=1000000&a2c=10000000", "a[href~=^info/[0-9]+/[0-9]+\.htm]", httpClient, tempListWriter);
            listCrawler[i].start();
        }
        else {
            listCrawler[i] = null;
        }
    }
    

    爬取文章内容

    SingleCrawler[] documentCrawler = new SingleCrawler[list.size()];
    for (int i = 0; i < documentCrawler.length; i++) {
        documentCrawler[i] = new SingleCrawler(list.get(i), "", httpClient, resultWriter);
        documentCrawler[i].start();
    }
    

    Web前端

    自动补全JS代码,监听search-input的input标签,当用户输入时会自动异步请求suggest.jsp页面,然后将返回的结果通过autocomplete呈现

    $(function() {
        $(".search-input").keyup(function(event) {
            var jsonData = "";
            $.ajax({
                type : "get",
                url : "suggest.jsp?term=" + document.getElementById("input").value,
                datatype : "json",
                async : true,
                error : function() {
                    console.error("Load recommand data failed!");
                },
                success : function(data) {
                    data = JSON.parse(data);
                    $(".search-input").autocomplete({
                        source : data
                    });
                }
            });
    })
    });
    

    JS实现翻页功能,主要是获取点击事件的元素,然后判断innerHTML的值:

    function turnPage(e) {
      page = e.innerHTML;
      if (page == "«") {
        var currentPage = GetQueryString("page");
        //无page传参
        if (currentPage == null) {
          AddParamVal("page", 1);
        }
        //有page传参
        else {
          currentPage = parseInt(currentPage) - 1;
          if (currentPage <= 0) currentPage = 1;
          replaceParamVal("page", currentPage);
        }
      } else if (page == "»") {
        var currentPage = GetQueryString("page");
        //无page传参
        if (currentPage == null) {
          AddParamVal("page", 2);
        }
        //有page传参
        else {
          currentPage = parseInt(currentPage) + 1;
          //if(currentPage<=0) currentPage=1;
          replaceParamVal("page", currentPage);
        }
      } else {
        var currentPage = GetQueryString("page");
        //无page传参
        if (currentPage == null) {
          AddParamVal("page", parseInt(e.innerHTML));
        }
        //有page传参
        else {
          replaceParamVal("page", parseInt(e.innerHTML));
        }
      }
    }
    

    设置JQuery日期选择插件

    $(function(){
        if(GetQueryString("timeLimit")!=null){ 
            var option=GetQueryString("timeLimit");
            document.getElementById("test6").setAttribute("placeholder",option);
        }
    })
    

    获得关键词并根据条件进行检索

    String keyword = request.getParameter("keyword");
    if(keyword!=null){
        int pageCount = request.getParameter("page")==null?1:Integer.parseInt(request.getParameter("page"));
        search = new EsSearch();
        search.inseartSearch(keyword);
        String timeLimit = request.getParameter("timeLimit");
        if(timeLimit==null){
            result = search.fullTextSerch(keyword,pageCount);
        }
        else{
            String[] time = timeLimit.split(" - ");
            result = search.rangeSerch(keyword, pageCount, time[0], time[1]);
        }
    }
    

    打印搜索结果

    if(result!=null){
        for(int i=0;i<result.size();i++){
            out.println("<div class="result-container">");
            out.println("<a href=""+result.get(i).getUrl()+"" target="_blank" class="title">"+result.get(i).getTitle()+"</a>");
            int index = result.get(i).getText().indexOf("<span style="color:red;">");
            out.println("<div class="text">"+result.get(i).getText()+"</div>");
            out.println("<div style="float: left;" class="url">"+result.get(i).getUrl()+"</div>");
            out.println("<div style="float: left;color: grey;margin-left: 30px;margin-top: 4px;">"+result.get(i).getTime()+"</div>");
            out.println("</div>");
            out.println("<div class="clear"></div>");
        }
    }
    

    Elasticsearch模块

    jest API连接elasticsearch

    public static JestClient getJestClient() {
    	if(jestClient==null) {
    		JestClientFactory factory = new JestClientFactory();  
    		factory.setHttpClientConfig(new HttpClientConfig
    			.Builder("http://127.0.0.1:9200")
    			//.gson(new GsonBuilder().setDateFormat("yyyy-MM-dd HH:mm").create())
    			.multiThreaded(true)
    			.readTimeout(10000)
    			.build());
    		jestClient=factory.getObject();
    	}
    	return jestClient;
    }
    

    索引的创建与mapping的写入

    public EsCreatIndex() {
    	jestClient =EsClient.getJestClient();
    	try {
    		jestClient.execute(new CreateIndex.Builder(EsClient.indexName).build());
    		jestClient.execute(new CreateIndex.Builder(EsClient.suggestName).build());
    	} catch (IOException e) {
    		e.printStackTrace();
    	}
    	this.createIndexMapping();
    }
    
    
    private void createIndexMapping() {
    	String sourceIndex="{"" + EsClient.typeName + "":{"properties":{"
    		+""title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart""
    		+ ","fields":{"suggest":{"type":"completion","analyzer":"ik_max_word","search_analyzer":"ik_smart"}}}"
    		+","text":{"type":"text","index":"true","analyzer":"ik_max_word","search_analyzer":"ik_smart"}"
    		+","url":{"type":"keyword"}"
    		+","time":{"type":"date"}"
    		+ "}}}";
    	PutMapping putMappingIndex=new PutMapping.Builder(EsClient.indexName, EsClient.typeName, sourceIndex).build();
    		
    	String sourceSuggest="{"" + EsClient.typeName + "":{"properties":{"
    		+""text":{"type":"completion","analyzer":"ik_max_word","search_analyzer":"ik_smart"}"
    		+ "}}}";
    	PutMapping putMappingSuggest=new PutMapping.Builder(EsClient.suggestName, EsClient.typeName, sourceSuggest).build();
    		
    	try {
    		jestClient.execute(putMappingIndex);
    		jestClient.execute(putMappingSuggest);
    	} catch (IOException e) {
    		e.printStackTrace();
    	}
    }
    

    插入数据

    public void bulkIndex(List<SearchResultEntry> list) throws IOException {
    	Bulk.Builder bulk=new Bulk.Builder();
    	for(SearchResultEntry e:list) {
    		Index index=new Index.Builder(e).index(EsClient.indexName).type(EsClient.typeName).build();
    		bulk.addAction(index);
    	}
    	jestClient.execute(bulk.build());
    }
    
    public void insertIndex(SearchResultEntry webpage) throws IOException {
    	Index index=new Index.Builder(webpage).index(EsClient.indexName).type(EsClient.typeName).build();
    	jestClient.execute(index);
    }
    

    删除数据

    public void deleteIndex() throws IOException {
    	jestClient.execute(new DeleteIndex.Builder(EsClient.indexName).build());
    }
    

    根据页数进行全文检索

    public List<SearchResultEntry> fullTextSerch(String queryString,int page) {
    	//声明一个搜索请求体
    	SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
    		
    	BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
    	boolQueryBuilder.must(QueryBuilders.queryStringQuery(queryString));
    		
    	if(this.isDateRangeQuery) {
    		QueryBuilder queryBuilder = QueryBuilders  
    			.rangeQuery("time")  
    			.gte(this.startDate)  
    			.lte(this.closingDate)  
    			.includeLower(true)  
    			.includeUpper(true);
    				/**区间查询*/ 
    		boolQueryBuilder=boolQueryBuilder.filter(queryBuilder);
    	}
    		
    	searchSourceBuilder.query(boolQueryBuilder);
    		
    	//设置高亮字段
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.field("text");
        highlightBuilder.preTags("<span style="color:red;">").postTags("</span>");
        highlightBuilder.fragmentSize(200);
        searchSourceBuilder.highlighter(highlightBuilder);
    		
    	//设置分页
        searchSourceBuilder.from((page-1)*10);
        searchSourceBuilder.size(10);
            
        //构建Search对象
        Search search=new Search.Builder(searchSourceBuilder.toString())
        	.addIndex(EsClient.indexName)
        	.addType(EsClient.typeName)
        	.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(search);
        } catch (IOException e) {
            e.printStackTrace();
        }
        resultNum=searchResult.getTotal();
        return this.storageList(searchResult);
    }
    

    根据日期进行检索

    public List<SearchResultEntry> rangeSerch(String queryString,int page,String startDate,String closingDate){
    	this.isDateRangeQuery=true;
    	this.startDate=startDate;
    	this.closingDate=closingDate;
    	List<SearchResultEntry> list=this.fullTextSerch(queryString,page);
    	this.isDateRangeQuery=false;
    	return list;
    }
    

    GUI模块

    根据页数刷新页面

    private void displayResult(){
           resultJpanel.removeAll();
           resultJpanel.setLayout(new GridLayout(2, 1));
           resultJpanel.add(resultList.get(currentPage*2-2));
           if(currentPage+currentPage <= resultNum){
               resultJpanel.add(resultList.get(currentPage*2-1));
           }
           resultJpanel.revalidate();
           resultJpanel.repaint();
           page.setText(currentPage+"/"+pageNum);
       }
    

    显示出结果后,点击跳转按钮可以用默认浏览器打开原网页

    public void actionPerformed(ActionEvent e) {
        if(Desktop.isDesktopSupported()){
            try {
                URI uri=URI.create(url);
                Desktop dp=Desktop.getDesktop();
                if(dp.isSupported(Desktop.Action.BROWSE)){
                    dp.browse(uri);
                }
            } catch (Exception o) {
                o.printStackTrace();
            }
        }
    }
    

    用List保存结果页面

    private List<JPanel> getJpanelList(List<SearchResultEntry> list) { 
           List<JPanel> resultList = new ArrayList<>();
           for(SearchResultEntry e:list){
               JPanel jPanel=new SearchLook(e);
               resultList.add(jPanel);
           }
           return resultList;
       }
    

    项目代码扫描结果及改正

    扫描结果

    1.对if-else添加完整的大括号,更正前:

    更正后:

    2.没有添加作者,更正前:

    更正后:

    3.覆盖方法没有进行注解,更正前:

    更正后:

    项目总结

    又是一年的课程设计,通过这次课程设计,学到了很多知识,Elasticsearch,JS,GUI,无一不强化了我们的编程技能。可惜的是关键词模糊推荐的功能还不是很好用,没能优化好这个功能。Web页面没有对手机进行适应,GUI也略有不足,希望将来如果有学弟学妹继续这个项目的时候能优化这些缺憾。

  • 相关阅读:
    【python】虎牙直播爬虫项目
    【python】读取文件夹中所有文件名,os.walk( )
    【人工智能】使用百度api批量识别图片上的文字,进阶版
    【人工智能】使用百度api识别图片上的文字
    【python】词云图,超简单入门版
    【python】词云图,轮廓+着色,进阶版
    【python】词云图,进阶版
    【python】词云图,入门版
    【python】前程无忧51job岗位招聘信息爬虫程序,自动翻页,进阶版
    【HTML】使用百度地图api制作地图热力图,带控件,进阶版
  • 原文地址:https://www.cnblogs.com/Rasang/p/12169899.html
Copyright © 2011-2022 走看看