zoukankan      html  css  js  c++  java
  • Solr学习笔记之5、Component(组件)与Handler(处理器)学习

    Solr学习笔记之5、Component(组件)与Handler(处理器)学习

    一、搜索篇

    拼写检查(spellCheck)

    作用:用来检查用户输入的检索内容是否存在,如果不存在则给它提示出相近或相似的内容

    配置:在solrconfig.xml中配置如下

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">  
     <lst name="spellchecker">  
       <str name="name">default</str>  
       <!--这里指明需要根据哪个字段的索引为依据进行拼写检查。现配置名为 Title 的字段-->  
       <str name="field">Title</str>  
       <!--拼写检查索引的目录-->  
       <str name="spellcheckIndexDir">spellchecker</str>  
       <!--当commit的时候,对拼写检查索引进行构建。(只有构建后,拼写检查才有效果)-->  
       <!--当然,也可以选择在optimize的时候,进行构建。那么只需要将"buildOnCommint"换为 "buildOnOptimize"-->  
        <str name="buildOnCommit">true</str>  
      </lst>  
    </searchComponent> 
    
    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">  
      <!--默认参数-->  
      <lst name="defaults">  
        <str name="spellcheck.onlyMorePopular">false</str>  
        <str name="spellcheck.extendedResults">false</str>  
        <!--配置拼写检查提示结果的个数(可以根据需要适当加大)-->  
        <str name="spellcheck.count">1</str>  
      </lst>  
      <arr name="last-components">  
        <str>spellcheck</str>  
      </arr>  
    </requestHandler>
    View Code

    举例:

    http://localhost:8080/solr/collection1/spell?q=Title:tests&spellcheck=true

    请求结果如下图:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">119</int>
      </lst>
      <result name="response" start="0" numFound="0"/>
      <lst name="spellcheck">
        <lst name="suggestions">
          <lst name="tests">
            <int name="numFound">1</int>
            <int name="startOffset">6</int>
            <int name="endOffset">11</int>
            <int name="origFreq">0</int>
            <arr name="suggestion">
              <lst>
                <str name="word">test</str>
                <int name="freq">6</int>
              </lst>
            </arr>
          </lst>
          <bool name="correctlySpelled">false</bool>
          <lst name="collation">
            <str name="collationQuery">Title:test</str>
            <int name="hits">6</int>
            <lst name="misspellingsAndCorrections">
              <str name="tests">test</str>
            </lst>
          </lst>
        </lst>
      </lst>
    </response>
    View Code

    检索建议(suggest)

    作用:检索建议则是用户输入某个检索条件后,会立刻友好的给出一系列提示内容,并推荐首个出现的相似的词,作为推荐词。如果这个条件想关的东西一个都没有,则不会提示,所以某种意义上来说,可以在用户输入检索条件时使用suggest,而在点击完搜索时,使用拼写检查,二者结合给可以用户带来比较好的用户体验。

    配置:在solrconfig.xml中配置如下

      <!--搜索建议-->
      <searchComponent name="suggest" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text</str>
        <lst name="spellchecker">
          <str name="name">suggest</str>
          <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
          <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
          <str name="field">Title</str>
          <float name="threshold">0.0001</float>
          <!-- 使用自定义suggest词库词可以将如下两行的注释取消
          <str name="sourceLocation">suggest.txt</str>
          <str name="spellcheckIndexDir">spellchecker</str>
          -->
    
          <str name="comparatorClass">freq</str>
          <str name="buildOnOptimize">true</str>
          <str name="buildOnCommit">true</str>
        </lst>
      </searchComponent>
    
      <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
        <lst name="defaults">
          <str name="spellcheck">true</str>
          <str name="spellcheck.dictionary">suggest</str>
          <str name="spellcheck.count">10</str>
          <str name="spellcheck.onlyMorePopular">true</str>
          <str name="spellcheck.extendedResults">false</str>
          <str name="spellcheck.collate">true</str>
          <!--<str name="spellcheck.build">true</str>  -->
        </lst>
        <arr name="components">
          <str>suggest</str>
        </arr>
      </requestHandler>
    View Code

    举例:

    http://localhost:8080/solr/collection1/suggest?wt=xml&indent=true&spellcheck=true&spellcheck.q=tes

    http://localhost:8080/solr/collection1/suggest?q=Title:tes&wt=xml&indent=true

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
      </lst>
      <lst name="spellcheck">
        <lst name="suggestions">
          <lst name="tes">
            <int name="numFound">1</int><int name="startOffset">0</int><int name="endOffset">3</int>-<arr name="suggestion">
              <str>test</str>
            </arr>
          </lst>
          <str name="collation">test</str>
        </lst>
      </lst>
    </response>
    View Code

    分层查询(facet)

    作用:Facet是solr的高级搜索功能之一,可以给用户提供更友好的搜索体验。在搜索关键字的同时,能够按照Facet的字段进行分组并统计。Facet是Solr默认集成的一个组件。

    配置:无需额外配置

    特别说明:

    1、适宜被Facet的字段

      一般代表了实体的某种公共属性,如商品的分类、商品的制造厂家、书籍的出版商等等。

    2、Facet字段的要求

      Facet的字段必须被索引,一般来说该字段无需分词,无需存储。

           无需分词是因为该字段的值代表了一个整体概念,另外该字段的值无需进行大小写转换等处理,保持其原貌即可。

           无需存储是因为一般而言用户所关心的并不是该字段的具体值,而是作为对查询结果进行分组的一种手段,用户一般会沿着这个分组进一步深入搜索。

    3、特殊情况

           对于一般查询而言,分词和存储都是必要的。比如CPU类型”Intel 酷睿2双核 P7570”, 拆分成”Intel”,”酷睿”,”P7570”这样一些关键字并分别索引,可能提供更好的搜索体验。但是如果将CPU作为Facet字段,最好不进行分词,这样就造成了矛盾,解决方法为,将CPU字段设置为不分词不存储,然后建立另外一个字段为它的COPY,对这个COPY的字段进行分词和存储。

    参数说明:

    Field Facet :Facet字段通过在请求中加入facet.field参数加以声明,如果需要对多个字段进行Facet查询,那么将该参数声明多次。

    各个Facet字段互不影响,且可以针对每个Facet字段设置查询参数。形式为:f.字段名.参数名=参数值,字段为为空代表应用于所有facet字段

    举例:

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.field=ArticleTypeName&facet.field=EditorialOfficeName

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="q">ArticleId:5</str>
          <arr name="facet.field">
            <str>ArticleTypeName</str>
            <str>EditorialOfficeName</str>
          </arr>
        </lst>
      </lst>
      <result name="response" start="0" numFound="1">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries"/>
        <lst name="facet_fields">
          <lst name="ArticleTypeName">
            <int name="体育">1</int>
            <int name="财经">0</int>
          </lst>
          <lst name="EditorialOfficeName">
            <int name="燕赵都市报">1</int>
            <int name="光明日报">0</int>
            <int name="北京晚报">0</int>
          </lst>
        </lst>
        <lst name="facet_dates"/>
        <lst name="facet_ranges"/>
      </lst>
    </response>
    View Code

    Date Facet :Solr为日期字段提供了更为方便的日期查询统计方式,字段的类型必须是DateField(或其子类型)。

    需要注意的是使用Date Facet时,字段名、起始时间、结束时间、时间间隔这4个参数都必须提供。

    举例:

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.date=CreateDate&facet.date.start=2014-3-10T0:0:0Z&facet.date.end=2014-3-26T0:0:0Z&facet.date.gap=%2B1DAY&facet.date.other=all

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">39</int>
        <lst name="params">
          <str name="facet.date.start">2014-3-10T0:0:0Z</str>
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="q">*:*</str>
          <str name="facet.date">CreateDate</str>
          <str name="facet.date.other">all</str>
          <str name="facet.date.gap">+1DAY</str>
          <str name="facet.date.end">2014-3-26T0:0:0Z</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="6">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries"/>
        <lst name="facet_fields"/>
        <lst name="facet_dates">
          <lst name="CreateDate">
            <int name="2014-03-10T00:00:00Z">0</int>
            <int name="2014-03-11T00:00:00Z">0</int>
            <int name="2014-03-12T00:00:00Z">0</int>
            <int name="2014-03-13T00:00:00Z">0</int>
            <int name="2014-03-14T00:00:00Z">0</int>
            <int name="2014-03-15T00:00:00Z">0</int>
            <int name="2014-03-16T00:00:00Z">0</int>
            <int name="2014-03-17T00:00:00Z">0</int>
            <int name="2014-03-18T00:00:00Z">0</int>
            <int name="2014-03-19T00:00:00Z">0</int>
            <int name="2014-03-20T00:00:00Z">0</int>
            <int name="2014-03-21T00:00:00Z">0</int>
            <int name="2014-03-22T00:00:00Z">0</int>
            <int name="2014-03-23T00:00:00Z">1</int>
            <int name="2014-03-24T00:00:00Z">1</int>
            <int name="2014-03-25T00:00:00Z">1</int>
            <str name="gap">+1DAY</str>
            <date name="start">2014-03-10T00:00:00Z</date>
            <date name="end">2014-03-26T00:00:00Z</date>
            <int name="before">0</int>
            <int name="after">3</int>
            <int name="between">3</int>
          </lst>
        </lst>
        <lst name="facet_ranges"/>
      </lst>
    </response>
    View Code

    Facet Query :Facet Query利用类似于filter query的语法提供了更为灵活的Facet,通过facet.query参数,可以对任意字段进行筛选。

    举例:

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.query=CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="facet.query">CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]</str>
          <str name="q">*:*</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="6">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries">
          <int name="CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]">2</int>
        </lst>
        <lst name="facet_fields"/>
        <lst name="facet_dates"/>
        <lst name="facet_ranges"/>
      </lst>
    </response>
    View Code

     Range Facet 举例:范围查询统计

    http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.range=CreateDate&facet.range.start=2014-03-24T16:00:00Z&facet.range.end=2014-03-26T16:00:00Z&facet.range.gap=%2B1DAY

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
          <str name="facet">on</str>
          <str name="indent">on</str>
          <str name="q">*:*</str>
          <str name="facet.range.start">2014-03-24T16:00:00Z</str>
          <str name="facet.range">CreateDate</str>
          <str name="facet.range.gap">+1DAY</str>
          <str name="facet.range.end">2014-03-26T16:00:00Z</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="6">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
      <lst name="facet_counts">
        <lst name="facet_queries"/>
        <lst name="facet_fields"/>
        <lst name="facet_dates"/>
        <lst name="facet_ranges">
          <lst name="CreateDate">
            <lst name="counts">
              <int name="2014-03-24T16:00:00Z">1</int>
              <int name="2014-03-25T16:00:00Z">1</int>
            </lst>
            <str name="gap">+1DAY</str>
            <date name="start">2014-03-24T16:00:00Z</date>
            <date name="end">2014-03-26T16:00:00Z</date>
          </lst>
        </lst>
      </lst>
    </response>
    View Code

    分组统计:

    分组示例(group):

    http://localhost:8080/solr/collection1/select?q=*:*&wt=xml&indent=true&group=true&group.field=TypeId&group.ngroups=true

    统计示例(stats):

    httphttp://localhost:8080/solr/select?q=*:*&stats=true&stats.field=Price&rows=10&indent=true

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">32</int>
        <lst name="params">
          <str name="indent">true</str>
          <str name="stats.field">Price</str>
          <str name="stats">true</str>
          <str name="q">*:*</str>
          <str name="rows">10</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="5">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <double name="Price">6.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463715628722421760</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <double name="Price">7.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463715628782190592</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <double name="Price">8.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463715628784287744</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <double name="Price">9.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463715628786384896</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <double name="Price">10.0</double>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463715628788482048</long>
        </doc>
      </result>
      <lst name="stats">
        <lst name="stats_fields">
          <lst name="Price">
            <double name="min">6.0</double>
            <double name="max">10.0</double>
            <long name="count">5</long>
            <long name="missing">0</long>
            <double name="sum">40.0</double>
            <double name="sumOfSquares">330.0</double>
            <double name="mean">8.0</double>
            <double name="stddev">1.5811388300841898</double>
            <lst name="facets"/>
          </lst>
        </lst>
      </lst>
    </response>
    View Code

    注:统计字段应为数字类型,如果为字符串类型则统计结果不全。

    自动聚合(clustering)

    作用:能够把检索到的内容自动分类。

    配置:在solrconfig.xml中配置如下

    <config>
      <searchComponent name="clustering"
                       enable="${solr.clustering.enabled:true}"
                       class="solr.clustering.ClusteringComponent" >
        <lst name="engine">
          <str name="name">lingo</str>
          <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
          <str name="carrot.resourcesDir">clustering/carrot2</str>
        </lst>
    
        <!-- An example definition for the STC clustering algorithm. -->
        <lst name="engine">
          <str name="name">stc</str>
          <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
        </lst>
    
        <!-- An example definition for the bisecting kmeans clustering algorithm. -->
        <lst name="engine">
          <str name="name">kmeans</str>
          <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
        </lst>
      </searchComponent>
    
      <requestHandler name="/clustering"
                      startup="lazy"
                      enable="${solr.clustering.enabled:true}"
                      class="solr.SearchHandler">
        <lst name="defaults">
          <bool name="clustering">true</bool>
          <bool name="clustering.results">true</bool>
          <str name="carrot.title">name</str>
          <str name="carrot.url">id</str>
          <str name="carrot.snippet">features</str>
          <bool name="carrot.produceSummary">true</bool>
          <bool name="carrot.outputSubClusters">false</bool>
          <str name="defType">edismax</str>
          <str name="qf">
            text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
          </str>
          <str name="q.alt">*:*</str>
          <str name="rows">10</str>
          <str name="fl">*,score</str>
        </lst>
        <arr name="last-components">
          <str>clustering</str>
        </arr>
      </requestHandler>
    </config>
    View Code

    举例:

    http://localhost:8080/solr/clustering?q=*:*&rows=10&LingoClusteringAlgorithm.desiredClusterCountBase=20

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">12</int>
      </lst>
      <result name="response" start="0" numFound="6" maxScore="0.42292467">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
          <float name="score">0.42292467</float>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
          <float name="score">0.42292467</float>
        </doc>
      </result>
      <arr name="clusters">
        <lst>
          <arr name="labels">
            <str>Other Topics</str>
          </arr>
          <double name="score">0.0</double>
          <bool name="other-topics">true</bool>
          <arr name="docs">
            <str>5</str>
            <str>6</str>
            <str>7</str>
            <str>8</str>
            <str>9</str>
            <str>10</str>
          </arr>
        </lst>
      </arr>
    </response>
    View Code

    注意事项:

    使用该功能需要在%solr_home%/lib目录下添加扩展包:

    从下载的solr项目中将

    dist/apache-solr-clustering-*.jar,

    contrib/clustering目录下的所有jar包,

    contrib/clustering/downloads 目录下的所有jar包

    加入到%solr_home%/lib中。

    简单方法:直接拷贝源码中 dist 及 contrib 文件夹到 %solr_home%/collection1\conf中即可。

    相似匹配(MoreLikeThis)

    作用:查找相似的document

    配置:在solrconfig.xml中配置如下

      <!--相似查询-->
      <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
      </requestHandler>

    参数说明:
    mlt:在查询时,打开/关闭 MoreLikeThisComponent 的布尔值。 (true|false)
    mlt.count:可选。每一个结果要检索的相似文档数。 (> 0)
    mlt.fl:用于创建 MLT 查询的字段。 模式中任何被储存的或含有检索词向量的字段。
    mlt.maxqt:可选。查询词语的最大数量。由于长文档可能会有很多关键词语,这样 MLT 查询可能会很大,从而导致反应缓慢或可怕的 TooManyClausesException,该参数只保留最关键的词语。 (> 0)

    举例:

    http://localhost:8080/solr/mlt?q=ArticleId:5&mlt.true&mlt.fl=Title&mlt.mintf=1&mlt.mindf=1

    该请求的意思是查找 ArticleId为 5 的 document ,然后返回与此 document 在 Title 字段上相似的其他 document。需要注意的是 mlt.fl 中的 field 的 termVector=true 才有效果

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">34</int>
      </lst>
      <result name="match" start="0" numFound="1">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">5</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content five</str>
          <date name="CreateDate">2014-03-24T16:00:00Z</date>
          <str name="Title">test title 5 five</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978500177920</long>
        </doc>
      </result>
      <result name="response" start="0" numFound="5">
        <doc>
          <str name="TypeId">2</str>
          <str name="ArticleId">6</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content six</str>
          <date name="CreateDate">2014-03-25T16:00:00Z</date>
          <str name="Title">test title 6 six</str>
          <str name="ArticleTypeName">体育</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978552606720</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">7</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">1</str>
          <str name="Content">content seven</str>
          <date name="CreateDate">2014-03-26T16:00:00Z</date>
          <str name="Title">test title 7 seven</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">光明日报</str>
          <long name="_version_">1463443978554703872</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">8</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content eight</str>
          <date name="CreateDate">2014-03-27T16:00:00Z</date>
          <str name="Title">test title 8 eight</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978556801024</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">10</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">2</str>
          <str name="Content">content ten</str>
          <date name="CreateDate">2014-03-23T16:00:00Z</date>
          <str name="Title">test title 10 ten</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">燕赵都市报</str>
          <long name="_version_">1463443978559946752</long>
        </doc>
      </result>
    </response>
    View Code

     高亮显示

    作用:将结果中与搜索关键词匹配的地方高亮显示。

    配置:无需额外配置

    参数说明:

    hl 是否启用高亮显示 (true|false)

    hl.fl 要进行高亮显示的字段,如需对多个字段显示用逗号分隔(hl.fl=name,name2,name3)

    hl.simple.pre 高亮显示前缀标签 (默认<em>)

    hl.simple.post 高亮显示后缀标签(默认</em>)

    举例:

    http://localhost:8080/solr/select?q=ArticleId:9&start=0&rows=10&hl=true&hl.fl=Title

    该请求执行结果如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
          <str name="start">0</str>
          <str name="q">ArticleId:9</str>
          <str name="hl.fl">Title</str>
          <str name="hl">true</str>
          <str name="rows">10</str>
        </lst>
      </lst>
      <result name="response" start="0" numFound="1">
        <doc>
          <str name="TypeId">1</str>
          <str name="ArticleId">9</str>
          <bool name="IsDelete">false</bool>
          <str name="EditorialOfficeId">3</str>
          <str name="Content">content nine</str>
          <date name="CreateDate">2014-03-28T16:00:00Z</date>
          <str name="Title">test title 9 nine</str>
          <str name="ArticleTypeName">财经</str>
          <str name="EditorialOfficeName">北京晚报</str>
          <long name="_version_">1463443978558898176</long>
        </doc>
      </result>
      <lst name="highlighting">
        <lst name="9">
          <arr name="Title">
            <str>
              test title <em>9</em> nine
            </str>
          </arr>
        </lst>
      </lst>
    </response>
    View Code

    二、索引篇

    更新索引(update)

  • 相关阅读:
    机器学习15卷积神经网络处理手写数字图片
    机器学习12卷积神经网络
    机器学习11贝叶斯处理邮件分类问题------后续
    机器学习11贝叶斯处理邮件分类问题------待更新
    机器学习10贝叶斯
    机器学习9主成分分析
    机器学习7逻辑回归实践
    机器学习8特征选择
    机器学习6逻辑回归算法
    机器学习5线性回归算法
  • 原文地址:https://www.cnblogs.com/wangwangfei/p/3623459.html
Copyright © 2011-2022 走看看