zoukankan      html  css  js  c++  java
  • Solr学习笔记之5、Component(组件)与Handler(处理器)学习

    Solr学习笔记之5、Component(组件)与Handler(处理器)学习

    一、搜索篇

    拼写检查(spellCheck)

    作用:用来检查用户输入的检索内容是否存在,如果不存在则给它提示出相近或相似的内容

    配置:在solrconfig.xml中配置如下

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">  
     <lst name="spellchecker">  
       <str name="name">default</str>  
       <!--这里指明需要根据哪个字段的索引为依据进行拼写检查。现配置名为 Title 的字段-->  
       <str name="field">Title</str>  
       <!--拼写检查索引的目录-->  
       <str name="spellcheckIndexDir">spellchecker</str>  
       <!--当commit的时候,对拼写检查索引进行构建。(只有构建后,拼写检查才有效果)-->  
       <!--当然,也可以选择在optimize的时候,进行构建。那么只需要将"buildOnCommint"换为 "buildOnOptimize"-->  
        <str name="buildOnCommit">true</str>  
      </lst>  
    </searchComponent> 
    
    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">  
      <!--默认参数-->  
      <lst name="defaults">  
        <str name="spellcheck.onlyMorePopular">false</str>  
        <str name="spellcheck.extendedResults">false</str>  
        <!--配置拼写检查提示结果的个数(可以根据需要适当加大)-->  
        <str name="spellcheck.count">1</str>  
      </lst>  
      <arr name="last-components">  
        <str>spellcheck</str>  
      </arr>  
    </requestHandler>
    View Code

    举例:

    http://localhost:8080/solr/collection1/spell?q=Title:tests&spellcheck=true

    请求结果如下图:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">119</int>
      </lst>
      <result name="response" start="0" numFound="0"/>
      <lst name="spellcheck">
        <lst name="suggestions">
          <lst name="tests">
            <int name="numFound">1</int>
            <int name="startOffset">6</int>
            <int name="endOffset">11</int>
            <int name="origFreq">0</int>
            <arr name="suggestion">
              <lst>
                <str name="word">test</str>
                <int name="freq">6</int>
              </lst>
            </arr>
          </lst>
          <bool name="correctlySpelled">false</bool>
          <lst name="collation">
            <str name="collationQuery">Title:test</str>
            <int name="hits">6</int>
            <lst name="misspellingsAndCorrections">
              <str name="tests">test</str>
            </lst>
          </lst>
        </lst>
      </lst>
    </response>
    View Code

    检索建议(suggest)

    作用:检索建议则是用户输入某个检索条件后,会立刻友好的给出一系列提示内容,并推荐首个出现的相似的词,作为推荐词。如果这个条件想关的东西一个都没有,则不会提示,所以某种意义上来说,可以在用户输入检索条件时使用suggest,而在点击完搜索时,使用拼写检查,二者结合给可以用户带来比较好的用户体验。

    配置:在solrconfig.xml中配置如下

      <!--搜索建议-->
      <searchComponent name="suggest" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text</str>
        <lst name="spellchecker">
          <str name="name">suggest</str>
          <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
          <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
          <str name="field">Title</str>
          <float name="threshold">0.0001</float>
          <!-- 使用自定义suggest词库词可以将如下两行的注释取消
          <str name="sourceLocation">suggest.txt</str>
          <str name="spellcheckIndexDir">spellchecker</str>
          -->
    
          <str name="comparatorClass">freq</str>
          <str name="buildOnOptimize">true</str>
          <str name="buildOnCommit">true</str>
        </lst>
      </searchComponent>
    
      <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
        <lst name="defaults">
          <str name="spellcheck">true</str>
          <str name="spellcheck.dictionary">suggest</str>
          <str name="spellcheck.count">10</str>
          <str name="spellcheck.onlyMorePopular">true</str>
          <str name="spellcheck.extendedResults">false</str>
          <str name="spellcheck.collate">true</str>
          <!--<str name="spellcheck.build">true</str>  -->
        </lst>
        <arr name="components">
          <str>suggest</str>
        </arr>
      </requestHandler>
    View Code

    举例:

    http://localhost:8080/solr/collection1/suggest?wt=xml&indent=true&spellcheck=true&spellcheck.q=tes

    http://localhost:8080/solr/collection1/suggest?q=Title:tes&wt=xml&indent=true

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
      </lst>
      <lst name="spellcheck">
        <lst name="suggestions">
          <lst name="tes">
            <int name="numFound">1</int><int name="startOffset">0</int><int name="endOffset">3</int>-<arr name="suggestion">
              <str>test</str>
            </arr>
          </lst>
          <str name="collation">test</str>
        </lst>
      </lst>
    </response>
    View Code

    分层查询(facet)

    作用:Facet是solr的高级搜索功能之一,可以给用户提供更友好的搜索体验。在搜索关键字的同时,能够按照Facet的字段进行分组并统计。Facet是Solr默认集成的一个组件。

    配置:无需额外配置

    特别说明:

    1、适宜被Facet的字段

      一般代表了实体的某种公共属性,如商品的分类、商品的制造厂家、书籍的出版商等等。

    2、Facet字段的要求

      Facet的字段必须被索引,一般来说该字段无需分词,无需存储。

           无需分词是因为该字段的值代表了一个整体概念,另外该字段的值无需进行大小写转换等处理,保持其原貌即可。

           无需存储是因为一般而言用户所关心的并不是该字段的具体值,而是作为对查询结果进行分组的一种手段,用户一般会沿着这个分组进一步深入搜索。

    3、特殊情况

           对于一般查询而言,分词和存储都是必要的。比如CPU类型”Intel 酷睿2双核 P7570”, 拆分成”Intel”,”酷睿”,”P7570”这样一些关键字并分别索引,可能提供更好的搜索体验。但是如果将CPU作为Facet字段,最好不进行分词,这样就造成了矛盾,解决方法为,将CPU字段设置为不分词不存储,然后建立另外一个字段为它的COPY,对这个COPY的字段进行分词和存储。

    参数说明:

    Field Facet :Facet字段通过在请求中加入facet.field参数加以声明,如果需要对多个字段进行Facet查询,那么将该参数声明多次。

    各个Facet字段互不影响,且可以针对每个Facet字段设置查询参数。形式为:f.字段名.参数名=参数值,字段为为空代表应用于所有facet字段

    举例:

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.field=ArticleTypeName&facet.field=EditorialOfficeName

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="q">ArticleId:5</str>
          <arr name="facet.field">
            <str>ArticleTypeName</str>
            <str>EditorialOfficeName</str>
          </arr>
        </lst>
      </lst>
      <result name="response" start="0" numFound="1">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries"/>
        <lst name="facet_fields">
          <lst name="ArticleTypeName">
            <int name="体育">1</int>
            <int name="财经">0</int>
          </lst>
          <lst name="EditorialOfficeName">
            <int name="燕赵都市报">1</int>
            <int name="光明日报">0</int>
            <int name="北京晚报">0</int>
          </lst>
        </lst>
        <lst name="facet_dates"/>
        <lst name="facet_ranges"/>
      </lst>
    </response>
    View Code

    Date Facet :Solr为日期字段提供了更为方便的日期查询统计方式,字段的类型必须是DateField(或其子类型)。

    需要注意的是使用Date Facet时,字段名、起始时间、结束时间、时间间隔这4个参数都必须提供。

    举例:

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.date=CreateDate&facet.date.start=2014-3-10T0:0:0Z&facet.date.end=2014-3-26T0:0:0Z&facet.date.gap=%2B1DAY&facet.date.other=all

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">39</int>
        <lst name="params">
          <str name="facet.date.start">2014-3-10T0:0:0Z</str>
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="q">*:*</str>
          <str name="facet.date">CreateDate</str>
          <str name="facet.date.other">all</str>
          <str name="facet.date.gap">+1DAY</str>
          <str name="facet.date.end">2014-3-26T0:0:0Z</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="6">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries"/>
        <lst name="facet_fields"/>
        <lst name="facet_dates">
          <lst name="CreateDate">
            <int name="2014-03-10T00:00:00Z">0</int>
            <int name="2014-03-11T00:00:00Z">0</int>
            <int name="2014-03-12T00:00:00Z">0</int>
            <int name="2014-03-13T00:00:00Z">0</int>
            <int name="2014-03-14T00:00:00Z">0</int>
            <int name="2014-03-15T00:00:00Z">0</int>
            <int name="2014-03-16T00:00:00Z">0</int>
            <int name="2014-03-17T00:00:00Z">0</int>
            <int name="2014-03-18T00:00:00Z">0</int>
            <int name="2014-03-19T00:00:00Z">0</int>
            <int name="2014-03-20T00:00:00Z">0</int>
            <int name="2014-03-21T00:00:00Z">0</int>
            <int name="2014-03-22T00:00:00Z">0</int>
            <int name="2014-03-23T00:00:00Z">1</int>
            <int name="2014-03-24T00:00:00Z">1</int>
            <int name="2014-03-25T00:00:00Z">1</int>
            <str name="gap">+1DAY</str>
            <date name="start">2014-03-10T00:00:00Z</date>
            <date name="end">2014-03-26T00:00:00Z</date>
            <int name="before">0</int>
            <int name="after">3</int>
            <int name="between">3</int>
          </lst>
        </lst>
        <lst name="facet_ranges"/>
      </lst>
    </response>
    View Code

    Facet Query :Facet Query利用类似于filter query的语法提供了更为灵活的Facet,通过facet.query参数,可以对任意字段进行筛选。

    举例:

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.query=CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="facet.query">CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]</str>
          <str name="q">*:*</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="6">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries">
          <int name="CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]">2</int>
        </lst>
        <lst name="facet_fields"/>
        <lst name="facet_dates"/>
        <lst name="facet_ranges"/>
      </lst>
    </response>
    View Code

     Range Facet 举例:范围查询统计

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.range=CreateDate&facet.range.start=2014-03-24T16:00:00Z&facet.range.end=2014-03-26T16:00:00Z&facet.range.gap=%2B1DAY

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="q">*:*</str>
          <str name="facet.range.start">2014-03-24T16:00:00Z</str>
          <str name="facet.range">CreateDate</str>
          <str name="facet.range.gap">+1DAY</str>
          <str name="facet.range.end">2014-03-26T16:00:00Z</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="6">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries"/>
        <lst name="facet_fields"/>
        <lst name="facet_dates"/>
        <lst name="facet_ranges">
          <lst name="CreateDate">
            <lst name="counts">
              <int name="2014-03-24T16:00:00Z">1</int>
              <int name="2014-03-25T16:00:00Z">1</int>
            </lst>
            <str name="gap">+1DAY</str>
            <date name="start">2014-03-24T16:00:00Z</date>
            <date name="end">2014-03-26T16:00:00Z</date>
          </lst>
        </lst>
      </lst>
    </response>
    View Code

    分组统计:

    分组示例(group):

    http://localhost:8080/solr/collection1/select?q=*:*&wt=xml&indent=true&group=true&group.field=TypeId&group.ngroups=true

    统计示例(stats):

    httphttp://localhost:8080/solr/select?q=*:*&stats=true&stats.field=Price&rows=10&indent=true

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">32</int>
        <lst name="params">
          <str name="indent">true</str>
          <str name="stats.field">Price</str>
          <str name="stats">true</str>
          <str name="q">*:*</str>
          <str name="rows">10</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="5">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <double name="Price">6.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463715628722421760</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <double name="Price">7.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463715628782190592</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <double name="Price">8.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463715628784287744</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <double name="Price">9.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463715628786384896</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <double name="Price">10.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463715628788482048</long>
        </doc>
      </result>
      <lst name="stats">
        <lst name="stats_fields">
          <lst name="Price">
            <double name="min">6.0</double>
            <double name="max">10.0</double>
            <long name="count">5</long>
            <long name="missing">0</long>
            <double name="sum">40.0</double>
            <double name="sumOfSquares">330.0</double>
            <double name="mean">8.0</double>
            <double name="stddev">1.5811388300841898</double>
            <lst name="facets"/>
          </lst>
        </lst>
      </lst>
    </response>
    View Code

    注:统计字段应为数字类型,如果为字符串类型则统计结果不全。

    自动聚合(clustering)

    作用:能够把检索到的内容自动分类。

    配置:在solrconfig.xml中配置如下

    <config>
      <searchComponent name="clustering"
                       enable="${solr.clustering.enabled:true}"
                       class="solr.clustering.ClusteringComponent" >
        <lst name="engine">
          <str name="name">lingo</str>
          <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
          <str name="carrot.resourcesDir">clustering/carrot2</str>
        </lst>
    
        <!-- An example definition for the STC clustering algorithm. -->
        <lst name="engine">
          <str name="name">stc</str>
          <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
        </lst>
    
        <!-- An example definition for the bisecting kmeans clustering algorithm. -->
        <lst name="engine">
          <str name="name">kmeans</str>
          <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
        </lst>
      </searchComponent>
    
      <requestHandler name="/clustering"
                      startup="lazy"
                      enable="${solr.clustering.enabled:true}"
                      class="solr.SearchHandler">
        <lst name="defaults">
          <bool name="clustering">true</bool>
          <bool name="clustering.results">true</bool>
          <str name="carrot.title">name</str>
          <str name="carrot.url">id</str>
          <str name="carrot.snippet">features</str>
          <bool name="carrot.produceSummary">true</bool>
          <bool name="carrot.outputSubClusters">false</bool>
          <str name="defType">edismax</str>
          <str name="qf">
            text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
          </str>
          <str name="q.alt">*:*</str>
          <str name="rows">10</str>
          <str name="fl">*,score</str>
        </lst>
        <arr name="last-components">
          <str>clustering</str>
        </arr>
      </requestHandler>
    </config>
    View Code

    举例:

    http://localhost:8080/solr/clustering?q=*:*&rows=10&LingoClusteringAlgorithm.desiredClusterCountBase=20

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">12</int>
      </lst>
      <result name="response" start="0" numFound="6" maxScore="0.42292467">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
          <float name="score">0.42292467</float>
        </doc>
      </result>
      <arr name="clusters">
        <lst>
          <arr name="labels">
            <str>Other Topics</str>
          </arr>
          <double name="score">0.0</double>
          <bool name="other-topics">true</bool>
          <arr name="docs">
            <str>5</str>
            <str>6</str>
            <str>7</str>
            <str>8</str>
            <str>9</str>
            <str>10</str>
          </arr>
        </lst>
      </arr>
    </response>
    View Code

    注意事项:

    使用该功能需要在%solr_home%/lib目录下添加扩展包:

    从下载的solr项目中将

    dist/apache-solr-clustering-*.jar,

    contrib/clustering目录下的所有jar包,

    contrib/clustering/downloads 目录下的所有jar包

    加入到%solr_home%/lib中。

    简单方法:直接拷贝源码中 dist 及 contrib 文件夹到 %solr_home%/collection1\conf中即可。

    相似匹配(MoreLikeThis)

    作用:查找相似的document

    配置:在solrconfig.xml中配置如下

      <!--相似查询-->
      <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
      </requestHandler>

    参数说明:
    mlt:在查询时,打开/关闭 MoreLikeThisComponent 的布尔值。 (true|false)
    mlt.count:可选。每一个结果要检索的相似文档数。 (> 0)
    mlt.fl:用于创建 MLT 查询的字段。 模式中任何被储存的或含有检索词向量的字段。
    mlt.maxqt:可选。查询词语的最大数量。由于长文档可能会有很多关键词语,这样 MLT 查询可能会很大,从而导致反应缓慢或可怕的 TooManyClausesException,该参数只保留最关键的词语。 (> 0)

    举例:

    http://localhost:8080/solr/mlt?q=ArticleId:5&mlt.true&mlt.fl=Title&mlt.mintf=1&mlt.mindf=1

    该请求的意思是查找 ArticleId为 5 的 document ,然后返回与此 document 在 Title 字段上相似的其他 document。需要注意的是 mlt.fl 中的 field 的 termVector=true 才有效果

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">34</int>
      </lst>
      <result name="match" start="0" numFound="1">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
      </result>
      <result name="response" start="0" numFound="5">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
    </response>
    View Code

     高亮显示

    作用:将结果中与搜索关键词匹配的地方高亮显示。

    配置:无需额外配置

    参数说明:

    hl 是否启用高亮显示 (true|false)

    hl.fl 要进行高亮显示的字段,如需对多个字段显示用逗号分隔(hl.fl=name,name2,name3)

    hl.simple.pre 高亮显示前缀标签 (默认<em>)

    hl.simple.post 高亮显示后缀标签(默认</em>)

    举例:

    http://localhost:8080/solr/select?q=ArticleId:9&start=0&rows=10&hl=true&hl.fl=Title

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
          <str name="start">0</str>
          <str name="q">ArticleId:9</str>
          <str name="hl.fl">Title</str>
          <str name="hl">true</str>
          <str name="rows">10</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="1">
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
      </result>
      <lst name="highlighting">
        <lst name="9">
          <arr name="Title">
            <str>
              test title <em>9</em> nine
            </str>
          </arr>
        </lst>
      </lst>
    </response>
    View Code

    二、索引篇

    更新索引(update)

  • 相关阅读:
    Jmeter之http性能测试实战 非GUI模式压测 NON-GUI模式 结果解析TPS——干货(十一)
    UI Recorder 自动化测试 回归原理(九)
    UI Recorder 自动化测试 录制原理(八)
    UI Recorder 自动化测试 整体架构(七)
    UI Recorder 自动化测试 配置项(六)
    UI Recorder 自动化测试 工具栏使用(五)
    UI Recorder 自动化测试 回归测试(四)
    UI Recorder 自动化测试 录制(三)
    UI Recorder 自动化测试工具安装问题疑难杂症解决(二)
    UI Recorder 自动化测试安装教程(一)
  • 原文地址:https://www.cnblogs.com/wangwangfei/p/3623459.html
Copyright © 2011-2022 走看看