zoukankan      html  css  js  c++  java
  • Riak Search

    Basho: Riak Search
    Riak Search

    Introduction
    Operations
    Indexing
    Querying
    Persistence
    Major Components
    Replication
    Further Reading

    Introduction

    Riak Search is a distributed, easily-scalable, failure-tolerant, real-time, full-text search engine built around Riak Core and tightly integrated with Riak KV.

    Riak Search allows you to find and retrieve your Riak objects using the objects’ values. When a Riak KV bucket has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search.

    The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak map/reduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.
    Riak KV and Riak Search

    Riak Search is a superset of Riak KV, so if you are running Riak Search, then you are automatically running a Riak KV cluster. You don’t need to set up a separate Riak KV cluster to use Riak Search. Downloading a Riak Search binary will give you both Riak KV and Riak Search.
    Search is “Beta Software”

    Note that Riak Search should be considered beta software. Please be aware that there may be bugs and issues that we have not yet covered that may require a full reindexing of your data when migrating to a different release.
    Operations

    Operationally, Riak Search is very similar to Riak KV. An administrator can add nodes to a cluster on the fly with simple commands to increase performance or capacity. Index and query operations can be run from any node. Multiple replicas of data are stored, allowing the cluster to continue serving full results in the face of machine failure. Partitions are handed off and replicated across clusters using the same mechanisms as Riak KV.
    Indexing

    At index time, Riak Search tokenizes a document into an inverted index using standard Lucene Analyzers. (For improved performance, the team re-implemented some of these in Erlang to reduce hops between Erlang and Java.) Custom analyzers can be created in either Java or Erlang. The system consults a schema (defined per-index) to determine required fields, the unique key, the default analyzer, and which analyzer should be used for each field. Field aliases (grouping multiple fields into one field) and dynamic fields (wildcard field matching) are supported.

    After analyzing a document into an inverted index, the system uses a consistent hash to divide the inverted index entries (called postings) by term across the cluster. This is called term-partitioning and is a key difference from other commonly used distributed indexes. Term-partitioning was chosen because it provides higher overall query throughput with large data sets. (This can come at the expense of higher-latency queries for especially large result sets.)
    Querying

    Search queries use the same syntax as Lucene, and support most Lucene operators including term searches, field searches, boolean operators, grouping, lexicographical range queries, and wildcards (at the end of a word only).

    Querying has two distinct stages, planning and execution. During query planning, the system creates a directed graph of the query, grouping points on the graph in order to maximize data locality and minimize inter-node traffic. Single term queries can be executed on a single node, while range queries and fuzzy matches are executed using the minimal set of nodes that cover the query.

    As the query executes, Riak Search uses a series of merge-joins, merge-intersections, and filters to generate the resulting set of matching bucket/key pairs.
    Persistence

    For a backing store, the Riak Search team developed merge\index. merge\index takes inspiration from the Lucene file format, Bitcask (our standard backing store for Riak KV), and SSTables (from Google’s BigTable paper), and was designed to have a simple, easily-recoverable data structure, to allow simultaneous reads and writes with no performance degredation, and to be forgiving of write bursts while taking advantage of low-write periods to perform data compactions and optimizations.
    Major Components

    Riak Search is comprised of:

    Riak Core – Dynamo-inspired distributed-systems framework
    Riak KV – Distributed Key/Value store inspired by Amazon’s Dynamo.
    Bitcask – Default storage backend used by Riak KV.
    Riak Search – Distributed index and full-text search engine.
    Merge Index – Storage backend used by Riak Search. This is a pure Erlang storage format based roughly on ideas borrowed from other storage formats including log structured merge trees, sstables, bitcask, and the Lucene file format.
    Qilr – Library for parsing queries into execution plans and documents into terms.
    Riak Solr – Adds a subset of Solr HTTP interface capabilities to Riak Search.

    Replication

    Riak Search data is replicated in a manner similar to Riak KV data: A search index has an n_val setting that determines how many copies of the data exist. Copies are written across different partitions located on different physical nodes.

    The underlying data for Riak Search lives in Riak KV and replicates in precisely the same manner. The Search index, created from the underlying data, replicates differently for technical reasons. Those differences are:

    Riak Search uses timestamps, rather than vector clocks, to resolve version conflicts. This leads to fewer guarantees about your data (as depending on wall-clock time can cause problems if the clock is wrong) but was a necessary tradeoff for performance reasons.
    Riak Search does not use quorum values when writing (indexing) data. The data is written in a fire and forget model. Riak Search does use hinted-handoff to remain write-available when a node goes offline.
    Riak Search does not use quorum values when reading (querying) data. Only one copy of the data is read, and the partition is chosen based on what will create the most efficient query plan overall.

    Further Reading

    Riak Search - Installation and Setup
    Riak Search - Schema
    Riak Search - Indexing
    Riak Search - Querying
    Riak Search - Indexing and Querying Riak KV Data
    Riak Search - Operations and Troubleshooting

    Basho Technologies, Inc.


  • 相关阅读:
    Oracle 11g R2性能优化 SQL TRACE
    Oracle 11g R2创建数据库之手工建库方式
    Oracle 11g R2创建数据库之DBCA静默方式
    CentOS 7静默安装Oracle 11g R2数据库软件
    如何在Windows上使用Git创建一个可执行脚本?
    我们为什么推荐在Json中使用string表示Number属性值?
    [麻雀虽小,五脏俱全] 之网站重写之路
    2020年必须掌握的硬核技能k8s
    [半翻] 设计面向DDD的微服务
    Quartz.net在集群环境下的 部署任务的姿势
  • 原文地址:https://www.cnblogs.com/lexus/p/2207980.html
Copyright © 2011-2022 走看看