zoukankan      html  css  js  c++  java
  • [ solr入门 ] Distributed Searching基础

    Distributed Searching基础

    在单机的情况下,当索引越来越大,检索就显得力不从心了。

    solr容许我们将索引切开(多个适当大小的索引,称之为shards),并分布到多台“服务器”上。

    solr通过一台服务器(single shard)接受检索任务,并将其分发到各个shards上,最后合并检索结果。

    详细信息参见:http://wiki.apache.org/solr/DistributedSearch

    1.通过shards参数执行Distributed Searching

    我们可以检索请求中加入shards参数执行Distributed Searching,其格式为:

    host:port/base_url[,host:port/base_url]*

    例如:

    http://localhost:8983/solr3.5/core1/
    select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on
    &shards=localhost:8983/solr3.5/core0,localhost:8983/solr3.5/core1
    

      

    2.Distributed Searching支持的组件

    只有以下组件支持Distributed Searching:

    • The Query component that returns documents matching a query
    • The Facet component, for facet.query and facet.field requests where facets are sorted by count (the default). Solr 1.4 and later also support sorting by name.
    • The Highlighting component
    • The Stats component
    • The Spell Check Component
    • The Terms Component
    • The Term Vector Component
    • The Debug component 

    3.Distributed Searching的限定(不足)

    Distributed Searching还有种种限定条件,如下:

    • Each document indexed must have a unique key.
      (每个doc都要有唯一标识,因为solr要对结果进行合并)
    • If Solr discovers duplicate document IDs, Solr selects the first document and discards subsequent ones.
      (solr如果发现重复的id,取首!)
    • Inverse-document frequency (IDF) calculations cannot be distributed.
      (idf计算失效,idf牵涉到总文档数,distributed在各个shards进行检索时不方便计算文档总数。)
    • Distributed searching does not support the QueryElevationComponent, which configures the top results for a given query regardless of Lucene's scoring. For more information, see http://wiki.apache.org/solr/QueryElevationComponent.
      (QueryElevationComponent不顾及scoring,有用户对结果进行编辑,那么简单的结果合并也就无从谈起。)
    • The index for distributed searching may become out of date; for example, a document that once matched a query and was subsequently changed may no longer match the query but will still be retrieved.
      (索引会在distributed searching过程中过时。???)
    • Distributed searching supports only sorted-field faceting, not date faceting
      (distributed searching仅支持sorted-field faceting)
    • The number of shards is limited by number of characters allowed for GET method's URI; most Web servers generally support at least 4000 characters, but many servers limit URI length to reduce their vulnerability to Denial of Service (DoS) attacks.
      (shards数量受GET地址长度的限制)
    • TF/IDF computations are per shard. This may not matter if content is well (randomly) distributed.
      (和第三点类似,tf/idf在各自shard上计算,因此合并出来的scoring排序也不是很“公正”。)
  • 相关阅读:
    ASP.NET Web Optimization Framework
    HearthBuddy Plugin编写遇到的问题
    HearthBuddy的plugin加载
    Unexpected ConvertTo-Json results? Answer: it has a default -Depth of 2
    HearthBuddy卡牌无法识别
    HearthstoneBot
    网络传输中的三张表,MAC地址表、ARP缓存表以及路由表
    DNS原理及其解析过程(转)
    React系列之--props属性
    react中constructor( )和super( )的具体含义以及如何使用
  • 原文地址:https://www.cnblogs.com/huangfox/p/2345335.html
Copyright © 2011-2022 走看看