zoukankan      html  css  js  c++  java
  • Realtime Search: Solr vs Elasticsearch

    Realtime Search: Solr vs Elasticsearch | Socialcast Engineering
    Realtime Search: Solr vs Elasticsearch
    Ryan SonnekRyan Sonnek
    Tuesday May 31st, 2011 by Ryan Sonnek
    19 comments
    Tweet
    What is Elasticsearch?

    Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem). It’s simple, scalable and “cool, bonsai cool“.
    Why is it better than Solr?

    First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).

    Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.

    Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!

    It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.
    Realworld Results…

    After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!
    And now for something a bit more interesting…

    The typical realtime search architecture goes something like this:

    index user content into the search engine
    perform set of queries against search engine to determine if content matches particular criteria
    perform specific logic notifying registered channels that new content is available

    Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.
    Introducing: Percolation!

    Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.

    The new workflow looks like this:

    register specific query (percolation) in Elasticsearch
    index new content (passing a flag to trigger percolation)
    the response to the indexing operation will contain the matched percolations

    This is the perfect architecture for realtime search and a true gamechanger.
    The Bottom Line

    Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.
    Tagged: search
    Comments

    David says:

    Cool article. Now, i know why I love ES ! ;-)
    Commented on May 31, 2011
    jrawlings says:

    Was the ‘Search Fresh Index while Idle’ performed against an elasticsearch 5 shard index (the default setup for a newly created index) or a single shard index?
    Commented on May 31, 2011
    Ryan Sonnek
    Ryan Sonnek says:

    @jrawlings these benchmarks are for the “out of the box” vanilla install of Elasticsearch and Solr so yes, this is using the 5 shard index setting.
    Commented on May 31, 2011
    umad says:

    Elasticsearch is a peach, when it doesn’t break. I’ve had so many nightmares trying to recover from a broken elasticsearch cluster that I wouldn’t recommend it to anyone.

    I guess for small sites it’s ok. For serious business, I’ll stick with solr.

    It would be nice to see a comparison with riaksearch as well.
    Commented on May 31, 2011
    Ryan Sonnek
    Ryan Sonnek says:

    @umad in our experience, the exact opposite is true. We pushed Solr so hard to try and support realtime search that we constantly had to deal with Java out of memory issues. Elasticsearch is much more stable (even for a beta application) and runs *so* much smoother.

    I’m not sure what you classify a “small” site. Our search index contains millions of documents and we’re performing hundreds of requests per minute and Elasticsearch has not had a single hiccup yet.
    Commented on June 1, 2011
    Philip Ingram says:

    That percolation business is awesome. Webhooks make updating realtime data sources easy, and it’s brilliant that Elasticsearch takes that approach. Thanks for sharing.
    Commented on May 31, 2011
    Ben says:

    Good blog post. What were some of the parameters around index sizes (per shard) and commit rates? We have some massive warming times on our solr indexes that requires us to batch our adds before a commit, certainly not a position to be in with real time search though. I can see how without tuning and default cache warming you might run into bunches of overlapping warming searchers.
    Commented on May 31, 2011
    MarcMarc says:

    And why not using master-slave configuration in SOLR? Isn`t that perfect solution for sepearating add doc/query operations?
    Commented on June 1, 2011
    Ryan Sonnek
    Ryan Sonnek says:

    @MarcMarc master-slave really isn’t an option for realtime search applications. The current Solr replication solution is not synchronous so once your update operation is complete on the master, the data is not yet available on all slaves for subsequent searches.

    Introducing master-slave for the search index also introduces a lot of operational complexity that if you can avoid, you really should. :)
    Commented on June 1, 2011
    Vlad Zloteanu says:

    Ryan, what was the commit strategy you used with Solr? Commit after each request, autocommit after X secs, autocommit after X docs? This can greatly impact update performance. See http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs, http://blog.raspberry.nl/2011/04/08/solr-update-performance/ and http://www.elevatedcode.com/articles/2009/01/14/speeding-up-solr-indexing/
    Commented on June 1, 2011
    Ryan Sonnek
    Ryan Sonnek says:

    @vlad we require all content to be immediately available for searches after indexing, so we commit after each update operation. this the nature of the beast when building a true realtime search application and as you point out is not the “preferred” way to integrate with Solr.
    Commented on June 1, 2011
    Otis Gospodnetic says:

    Nice post. You’ll need to compare ES and Solr once Solr starts making use of the underlying Lucene NRT mechanism.

    Just to make it clear to readers not familiar with the underlying details:
    It is Lucene that adds the NRT support. ES uses it, while Solr does not use it yet, which is different from Solr using the same Lucene API as ES and doing it/still performing poorly.
    Commented on June 1, 2011
    Peter Bengtsson says:

    Being a Xapian fan as of many years I’d love to see Xapian benchmarked against ES.
    Commented on June 1, 2011
    Andy says:

    What’s the difference between “search fresh index” and “search full index”?

    Were you running Solr and ElasticSearch on the same hardware?
    Commented on June 1, 2011
    Ryan Sonnek
    Ryan Sonnek says:

    @andy the fresh index benchmarks are done against an empty/clean index. the “full index” benchmarks were done after populating the index with a few million documents. The index is never technically “full”, but it was just a quick way of getting more realistic and real world benchmarks.
    Commented on June 1, 2011
    db says:

    Interesting that umad says he had so many issues with broken clusters, that he stopped recommending ES for production usage. We’ve been running in production for 6 months with significant traffic volume on behalf of demanding clients.

    There have been some nice robustness improvements in ES 0.16

    We evaluated Solr vs ES and for our data with a wide range of queries, ES was significantly faster than Solr. Tuning Solr is challenging.

    David
    Commented on June 7, 2011
    Steven Hildreth says:

    Solr doesn’t support GeoPolygons either, so if you need spatial searches look to ElasticSearch.
    Commented on August 24, 2011
    David says:

    Field collapsing (grouping, or whatever you call it) is still awaited in ES, but exists in Solr.

    This is in some particular use cases a must have feature (think about SKUs in an index and search results must be products (and not SKU)
    Commented on September 16, 2011
  • 相关阅读:
    Permutation Sequence
    Sqrt(x)
    Search in Rotated Sorted Array ||
    [STL]list的erase正确与错误用法
    一个支持Git应用编程开发的第三方库(API)
    VC++生成full dump文件
    Maven构建C++工程的插件-NAR
    VC++ Watch窗口查看指针指向的数组
    Android SDK更新失败的解决方法
    ADT20新建项目Android Support library not installed问题
  • 原文地址:https://www.cnblogs.com/lexus/p/2207984.html
Copyright © 2011-2022 走看看