zoukankan      html  css  js  c++  java
  • RabbitMQ and batch processing 批提交

    RabbitMQ - RabbitMQ and batch processing
    http://rabbitmq.1065348.n5.nabble.com/RabbitMQ-and-batch-processing-td35634.html

    Reply | Threaded | More 

    RabbitMQ and batch processing

    Greg Poirier
    20 posts
    I mentioned this on Twitter and a couple of people have requested that I bring this up on the mailing list.
     
    It seems to be a given that RabbitMQ was not designed for the batch processing use case (i.e. using RabbitMQ as a buffer between large serial steps). We have a system in place that attempts to do just that, however.
     
    I have been working with the developers of the software involved in an attempt to help them redesign around a more ideal use of RabbitMQ (or to help them move to a different bus altogether -- database or something like kafka) and some of them have been able to simply operate in smaller batch sizes (thus keeping their queues relatively small).
     
    However, I cannot stem the tide of improper RabbitMQ use.
     
    When things go poorly, millions of messages end up in the queues. 
     
    In 3.1.x we saw this regularly cause our clusters to partition.
     
    In 3.1.x and 3.2.x when we would delete large queues (5+ million messages enqueued), this would cause the cluster to become unresponsive, run out of memory, and then crash.
     
    During the 3.1 -> 3.2 upgrade, we had to completely rebuild our clusters. When 3.2 came up, it soon crashed.
     
    In the most recent upgrade, we saw a 3.2.3 cluster in our dev environment crash. I performed an opportunistic upgrade to 3.3.1, because hey... downtime already, so let's see if 3.3.1 addresses some of the issues we've been seeing.
     
     
    After the upgrade, 3.3.1 would not startup at all. I removed /var/lib/rabbitmq/mnesia on all of the nodes and brought RabbitMQ back up.
     
    3.3.1 has been up and running alright so far, but we haven't done another end-to-end test in our development environment in a while. One of these tests can lead to at least a million messages in the queue over a period of time on average.
     
    So, I guess my question is:
     
    If I know that I have people using RabbitMQ like this, and there is nothing I can do to change that fact... what do I do?

    _______________________________________________ 
    rabbitmq-discuss mailing list 
    [hidden email] 
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
    Reply | Threaded | More 

    Re: RabbitMQ and batch processing

    Laing, Michael P.
    144 posts
    I'll respond inline w our experience:

    On Sun, May 18, 2014 at 2:55 PM, Greg Poirier <[hidden email]> wrote:
    I mentioned this on Twitter and a couple of people have requested that I bring this up on the mailing list.
     
    It seems to be a given that RabbitMQ was not designed for the batch processing use case (i.e. using RabbitMQ as a buffer between large serial steps). We have a system in place that attempts to do just that, however.
     
    It is not a 'given' as far as we are concerned. We have some processes that result in a million or more messages being queued within a minute or so. These messages are processed over the ensuing several minutes (for 'dismissals' of news items from individual devices) to several hours (for lower-priority individualized  'offers'). This is the new 'batch'.
     
     
    I have been working with the developers of the software involved in an attempt to help them redesign around a more ideal use of RabbitMQ (or to help them move to a different bus altogether -- database or something like kafka) and some of them have been able to simply operate in smaller batch sizes (thus keeping their queues relatively small).
     
    We put large message bodies in S3 and pass them by reference. We never use RabbitMQ persistence and compensate for that with replication. For 'real' persistence we use Cassandra. Most importantly, none of our internal users know this, as we provide them with an abstracted interface.
     
     
    However, I cannot stem the tide of improper RabbitMQ use.
     
    We try to make it easier to use us than not. We work hard to be the most reliable, fastest, most scalable, most flexible and cheapest component of our customers technology mix.
     
     
    When things go poorly, millions of messages end up in the queues. 
     
    We target zero length queues. If they grow unexpectedly we: 1) autoscale, 2) shift load, 3) start new regions - usually all those. Then we diagnose.
     
     
    In 3.1.x we saw this regularly cause our clusters to partition.
     
    We have never had a partition in production because we always overprovision RabbitMQ so it can maintain cluster communications. We basically avoid disk IO due to the risk of IO wait interfering w the cluster heartbeat.
     
     
    In 3.1.x and 3.2.x when we would delete large queues (5+ million messages enqueued), this would cause the cluster to become unresponsive, run out of memory, and then crash.
     
    When we have tested situations like this, we found it best to just wipe out the cluster and restart. Before doing this, we shift the load to other regions operating in parallel.
     
     
    During the 3.1 -> 3.2 upgrade, we had to completely rebuild our clusters. When 3.2 came up, it soon crashed.
     
    We have not had that problem.
     
     
    In the most recent upgrade, we saw a 3.2.3 cluster in our dev environment crash. I performed an opportunistic upgrade to 3.3.1, because hey... downtime already, so let's see if 3.3.1 addresses some of the issues we've been seeing.
     
     
    After the upgrade, 3.3.1 would not startup at all. I removed /var/lib/rabbitmq/mnesia on all of the nodes and brought RabbitMQ back up.
     
    We are not yet in production w 3.3.1 but 3.2.4 is running solidly in stage and we will upgrade stage to 3.3.1 this coming week.
     
     
    3.3.1 has been up and running alright so far, but we haven't done another end-to-end test in our development environment in a while. One of these tests can lead to at least a million messages in the queue over a period of time on average.
     
    A million is not that many - depending on size of course. As I said - our target is 0, but really the question is: what's your rate of change? I try to have enough 'headroom' to easily handle the surges - volumes can vary 20 to 1 depending on the news of the moment etc. If a queue builds and stays high we add resources until it goes down and then investigate.
     
     
    So, I guess my question is:
     
    If I know that I have people using RabbitMQ like this, and there is nothing I can do to change that fact... what do I do?
     
    You need enough resource. And it is good to be able to autoscale. 
     
    A specific suggestion I would make for any internal service provider is to use an amqp proxy. We locate proxy clusters that we control in our internal customers' computing environments. They publish to and subscribe from these proxies. We control the shoveling/federation of the proxies to/from our core pipelines in regions, redirecting as needed. The proxies are an additional buffer and also allow us to 'launder' incoming messages, e.g. by forcing persistence off.
     
    We also track and account for every message using metadata, and can charge back... We are cheap but not free.
     
    Anyway, I hope this helps.
     
    ml
     

    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


    _______________________________________________ 
    rabbitmq-discuss mailing list 
    [hidden email] 
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
  • 相关阅读:
    纯快排
    Quick Sort(快速排序)
    归并排序 ALDS1_5_B:Merge Sort
    单调栈(POJ2559)
    Sequence
    Codeforces Round #503
    度度熊学队列
    Always Online hdu 6350
    Hills And Valleys
    Problem G. Depth-First Search
  • 原文地址:https://www.cnblogs.com/rsapaper/p/11008236.html
Copyright © 2011-2022 走看看