zoukankan      html  css  js  c++  java
  • 社区帖子全文搜索实战(基于ElasticSearch)

    要为社区APP的帖子提供全文搜索的功能,考察使用ElasticSearch实现此功能。

    ES的安装不再描述。

    • es集成中文分词器(根据es版本选择对应的插件版本)

      下载源码:https://github.com/medcl/elasticsearch-analysis-ik
      maven编译得到:elasticsearch-analysis-ik-1.9.5.zip

      在plugins目录下创建ik目录,将elasticsearch-analysis-ik-1.9.5.zip解压在此目录。

    • 创建索引(settings,mapping)

      配置

    {
        "settings":{
            "number_of_shards":5,
            "number_of_replicas":1
        },
        "mappings":{
            "post":{
                "dynamic":"strict",
                "properties":{
                    "id":{"type":"integer","store":"yes"},
                    "title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
                    "content":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
                    "author":{"type":"string","store":"yes","index":"no"},
                    "time":{"type":"date","store":"yes","index":"no"}
                }
            }
        }
    }

      执行命令,创建索引

      curl -XPOST 'spark2:9200/community' -d @post.json

    •  插入数据

      工程代码依赖的jar包

    pom.xml
    <dependency>
      <groupId>org.elasticsearch</groupId>
      <artifactId>elasticsearch</artifactId>
      <version>2.3.3</version>
    </dependency>
    <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>fastjson</artifactId>
      <version>1.2.7</version>
    </dependency>

    ES client工具类

    public class EsClient {
    
      private static TransportClient transportClient;
    
      static {
        Settings settings = Settings.builder().put("cluster.name", "es_cluster").build();
        try {
          transportClient = new TransportClient.Builder().settings(settings)
              .build()
              .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark2"), 9300))
              .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark3"), 9300));
        } catch (UnknownHostException e) {
          throw new RuntimeException(e);
        }
      }
    
      public static TransportClient getInstance() {
        return transportClient;
      }
    }

    插入数据

    TransportClient client = EsClient.getInstance();
    
    
        for (int i = 0; i < 10000; i++) {
          Post post = new Post(i + "", "hll", "百度百科", "ES即etamsports ,全名上海英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松", new Date());
          client.prepareIndex("community", "post", post.getId())
              .setSource(JSON.toJSONString(post))
              .execute()
              .actionGet();
        }
    • 查询,高亮
     TransportClient client = EsClient.getInstance();
        SearchResponse response = client.prepareSearch("community")
            .setTypes("post")
            .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
            .setQuery(QueryBuilders.multiMatchQuery("上海", "title", "content")) 
            .setFrom(0).setSize(10)
            .addHighlightedField("content")
            .setHighlighterPreTags("<red>")
            .setHighlighterPostTags("</red>")
            .execute()
            .actionGet();
    
        SearchHits hits = response.getHits();
        for (SearchHit hit : hits) {
          String s = "";
          System.out.println(hit.getHighlightFields());
          for (Text text : hit.highlightFields().get("content").getFragments()) {
            s += text.string();
          }
          Map<String, Object> source = hit.getSource();
          source.put("content", s);
          System.out.println(source);
        }

    查询结果

    
    

    {author=hll, id=782, time=1490165237878, title=百度百科, content=ES即etamsports ,全名<red>上海</red>英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松}

     
  • 相关阅读:
    晚上打死个老鼠
    今早服务器出现的问题
    打球
    出于对Atlas自带AutoCompleteBehavior的不满,自定义了一个支持模版的AutoCompleteBehavior
    PetShop4.0项目分解
    WebSnapr-生成你的网站缩略图
    Lost HTML Intellisense within ASP.NET AJAX Controls
    调整调出输入法的顺序
    儿童网址大全
    gridview列 数字、货币和日期 显示格式
  • 原文地址:https://www.cnblogs.com/huangll99/p/6600008.html
Copyright © 2011-2022 走看看