zoukankan      html  css  js  c++  java
  • Riak Search

    Basho: Riak Search
    Riak Search

    Introduction
    Operations
    Indexing
    Querying
    Persistence
    Major Components
    Replication
    Further Reading

    Introduction

    Riak Search is a distributed, easily-scalable, failure-tolerant, real-time, full-text search engine built around Riak Core and tightly integrated with Riak KV.

    Riak Search allows you to find and retrieve your Riak objects using the objects’ values. When a Riak KV bucket has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search.

    The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak map/reduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.
    Riak KV and Riak Search

    Riak Search is a superset of Riak KV, so if you are running Riak Search, then you are automatically running a Riak KV cluster. You don’t need to set up a separate Riak KV cluster to use Riak Search. Downloading a Riak Search binary will give you both Riak KV and Riak Search.
    Search is “Beta Software”

    Note that Riak Search should be considered beta software. Please be aware that there may be bugs and issues that we have not yet covered that may require a full reindexing of your data when migrating to a different release.
    Operations

    Operationally, Riak Search is very similar to Riak KV. An administrator can add nodes to a cluster on the fly with simple commands to increase performance or capacity. Index and query operations can be run from any node. Multiple replicas of data are stored, allowing the cluster to continue serving full results in the face of machine failure. Partitions are handed off and replicated across clusters using the same mechanisms as Riak KV.
    Indexing

    At index time, Riak Search tokenizes a document into an inverted index using standard Lucene Analyzers. (For improved performance, the team re-implemented some of these in Erlang to reduce hops between Erlang and Java.) Custom analyzers can be created in either Java or Erlang. The system consults a schema (defined per-index) to determine required fields, the unique key, the default analyzer, and which analyzer should be used for each field. Field aliases (grouping multiple fields into one field) and dynamic fields (wildcard field matching) are supported.

    After analyzing a document into an inverted index, the system uses a consistent hash to divide the inverted index entries (called postings) by term across the cluster. This is called term-partitioning and is a key difference from other commonly used distributed indexes. Term-partitioning was chosen because it provides higher overall query throughput with large data sets. (This can come at the expense of higher-latency queries for especially large result sets.)
    Querying

    Search queries use the same syntax as Lucene, and support most Lucene operators including term searches, field searches, boolean operators, grouping, lexicographical range queries, and wildcards (at the end of a word only).

    Querying has two distinct stages, planning and execution. During query planning, the system creates a directed graph of the query, grouping points on the graph in order to maximize data locality and minimize inter-node traffic. Single term queries can be executed on a single node, while range queries and fuzzy matches are executed using the minimal set of nodes that cover the query.

    As the query executes, Riak Search uses a series of merge-joins, merge-intersections, and filters to generate the resulting set of matching bucket/key pairs.
    Persistence

    For a backing store, the Riak Search team developed merge\index. merge\index takes inspiration from the Lucene file format, Bitcask (our standard backing store for Riak KV), and SSTables (from Google’s BigTable paper), and was designed to have a simple, easily-recoverable data structure, to allow simultaneous reads and writes with no performance degredation, and to be forgiving of write bursts while taking advantage of low-write periods to perform data compactions and optimizations.
    Major Components

    Riak Search is comprised of:

    Riak Core – Dynamo-inspired distributed-systems framework
    Riak KV – Distributed Key/Value store inspired by Amazon’s Dynamo.
    Bitcask – Default storage backend used by Riak KV.
    Riak Search – Distributed index and full-text search engine.
    Merge Index – Storage backend used by Riak Search. This is a pure Erlang storage format based roughly on ideas borrowed from other storage formats including log structured merge trees, sstables, bitcask, and the Lucene file format.
    Qilr – Library for parsing queries into execution plans and documents into terms.
    Riak Solr – Adds a subset of Solr HTTP interface capabilities to Riak Search.

    Replication

    Riak Search data is replicated in a manner similar to Riak KV data: A search index has an n_val setting that determines how many copies of the data exist. Copies are written across different partitions located on different physical nodes.

    The underlying data for Riak Search lives in Riak KV and replicates in precisely the same manner. The Search index, created from the underlying data, replicates differently for technical reasons. Those differences are:

    Riak Search uses timestamps, rather than vector clocks, to resolve version conflicts. This leads to fewer guarantees about your data (as depending on wall-clock time can cause problems if the clock is wrong) but was a necessary tradeoff for performance reasons.
    Riak Search does not use quorum values when writing (indexing) data. The data is written in a fire and forget model. Riak Search does use hinted-handoff to remain write-available when a node goes offline.
    Riak Search does not use quorum values when reading (querying) data. Only one copy of the data is read, and the partition is chosen based on what will create the most efficient query plan overall.

    Further Reading

    Riak Search - Installation and Setup
    Riak Search - Schema
    Riak Search - Indexing
    Riak Search - Querying
    Riak Search - Indexing and Querying Riak KV Data
    Riak Search - Operations and Troubleshooting

    Basho Technologies, Inc.


  • 相关阅读:
    debian下使用mplayer
    Linux缺点要挟网银平安 SSL证书遽需改换
    linux用C如何鉴别一个目录能否为空
    Ubuntu旗舰版(Ultimate)1.8
    Linux系统下IP以及DNS设置方法
    升级firefox 3.0 beta 到RC1
    微星下月推Wind超低价NB 可运转XP和Linux
    Linux下hosts、host.conf、resolv.conf的区别
    传华硕将在6月3日闪现EBox台式电脑
    linux批改ssh端口和避免root远程上岸设置
  • 原文地址:https://www.cnblogs.com/lexus/p/2207980.html
Copyright © 2011-2022 走看看