python之rabbitMQ
前言:这次整理写一篇关于rabbitMQ的博客,相比上一篇redis,感觉rabbitMQ难度是提高不少。这篇博客会插入一些英文讲解,不过不难理解的。rabbitMQ的下载与安装,请参考redis&rabbitMQ安装。
rabbitMQ是消息队列;想想之前的我们学过队列queue:threading queue(线程queue,多个线程之间进行数据交互)、进程Queue(父进程与子进程进行交互或者同属于同一父进程下的多个子进程进行交互);如果两个独立的程序,那么之间是不能通过queue进行交互的,这时候我们就需要一个中间代理即rabbitMQ.
一、简单的rabbitMQ队列通信
由上图可知,数据是先发给exchange交换器,exchage再发给相应队列。pika模块是python对rabbitMQ的API接口。接收端有一个回调函数,一接收到数据就调用该函数。一条消息被一个消费者接收后,该消息就从队列删除。OK,了解上面的知识后,先来看看一个简单的rabbitMQ列队通信。
send端:
1 import pika 2 #连上rabbitMQ 3 connection=pika.BlockingConnection(pika.ConnectionParameters('localhost')) 4 channel=connection.channel() #生成管道,在管道里跑不同的队列 5 6 #声明queue 7 channel.queue_declare(queue='hello1') 8 9 #n RabbitMQ a message can never be sent directly to the queue,it always needs to go through an exchange. 10 #向队列里发数据 11 channel.basic_publish(exchange='', #先把数据发给exchange交换器,exchage再发给相应队列 12 routing_key='hello1', #向"hello'队列发数据 13 body='HelloWorld!!') #发的消息 14 print("[x]Sent'HelloWorld!'") 15 connection.close()
receive端:
1 import pika 2 3 connection=pika.BlockingConnection(pika.ConnectionParameters('localhost')) 4 channel=connection.channel() 5 6 # You may ask why we declare the queue again ‒ we have already declared it in our previous code. 7 # We could avoid that if we were sure that the queue already exists. For example if send.py program 8 # was run before. But we're not yet sure which program to run first. In such cases it's a good 9 # practice to repeat declaring the queue in both programs. 10 channel.queue_declare(queue='hello1')#声明队列,保证程序不出错 11 12 13 def callback(ch,method,properties,body): 14 print("-->ch",ch) 15 print("-->method",method) 16 print("-->properties",properties) 17 print("[x] Received %r" % body) #一条消息被一个消费者接收后,该消息就从队列删除 18 19 20 channel.basic_consume(callback, #回调函数,一接收到消息就调用回调函数 21 queue='hello1', 22 no_ack=False) #消费完毕后向服务端发送一个确认,默认为False 23 24 print('[*] Waiting for messages.To exit press CTRL+C') 25 channel.start_consuming()
运行结果:(上面的代码对应我写的注释相信是看得懂的~)
经过深入的测试,有以下两个发现:
- 先运行rabbitMQ_1_send.py发送数据,rabbitMQ_2_receive.py未运行。发现当receive运行时仍能接收数据。
- 运行多个(eg:3个)接收数据的客户端,再运行发送端,客户端1收到数据,再运行发送端,客户端2收到数据,再运行发送端,客户端3收到数据。
RabbitMQ会默认把p发的消息依次分发给各个消费者(c),跟负载均衡差不多。
二、全英文ack
在看上面的例子,你会发现有一句代码no_ack=False(消费完毕后向服务端发送一个确认,默认为False),以我英语四级飘过的水平,看完下面关于ack的讲解感觉写得很牛啊!!于是分享一下:
Doing a task can take a few seconds. You may wonder what happens if one of the consumers starts a long task and dies with it only partly done. With our current code once RabbitMQ delivers message to the customer it immediately removes it from memory. In this case, if you kill a worker we will lose the message it was just processing. We'll also lose all the messages that were dispatched to this particular worker but were not yet handled.
But we don't want to lose any tasks. If a worker dies, we'd like the task to be delivered to another worker.
In order to make sure a message is never lost, RabbitMQ supports message acknowledgments. An ack(nowledgement) is sent back from the consumer to tell RabbitMQ that a particular message had been received, processed and that RabbitMQ is free to delete it.
If a consumer dies (its channel is closed, connection is closed, or TCP connection is lost) without sending an ack, RabbitMQ will understand that a message wasn't processed fully and will re-queue it. If there are other consumers online at the same time, it will then quickly redeliver it to another consumer. That way you can be sure that no message is lost, even if the workers occasionally die.
There aren't any message timeouts; RabbitMQ will redeliver the message when the consumer dies. It's fine even if processing a message takes a very, very long time.
Message acknowledgments are turned on by default. In previous examples we explicitly turned them off via the no_ack=True flag. It's time to remove this flag and send a proper acknowledgment from the worker, once we're done with a task.
Using this code we can be sure that even if you kill a worker using CTRL+C while it was processing a message, nothing will be lost. Soon after the worker dies all unacknowledged messages will be redelivered.
我把发送端和接收端分别比作生产者与消费者。生产者发送任务A,消费者接收任务A并处理,处理完后生产者将消息队列中的任务A删除。现在我们遇到了一个问题:如果消费者接收任务A,但在处理的过程中突然宕机了。而此时生产者将消息队列中的任务A删除。实际上任务A并未成功处理完,相当于丢失了任务/消息。为解决这个问题,应使消费者接收任务并成功处理完后发送一个ack到生产者!生产者收到ack后就明白任务A已被成功处理,这时才从消息队列中将任务A删除,如果没有收到ack,就需要把任务A发送给下一个消费者,直到任务A被成功处理。
三、消息持久化
前面已经知道,生产者生产数据,消费者再启动是可以接收数据的。
但是,生产者生产数据,然后重启rabbitMQ,消费者是无法接收数据。
eg:消息在传输过程中rabbitMQ服务器宕机了,会发现之前的消息队列就不存在了,这时我们就要用到消息持久化,消息持久化会让队列不随着服务器宕机而消失,会永久的保存下去。下面看下关于消息持久化的英文讲解:
We have learned how to make sure that even if the consumer dies, the task isn't lost(by default, if wanna disable use no_ack=True). But our tasks will still be lost if RabbitMQ server stops.
When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to. Two things are required to make sure that messages aren't lost: we need to mark both the queue and messages as durable.
First, we need to make sure that RabbitMQ will never lose our queue. In order to do so, we need to declare it as durable:
1 channel.queue_declare(queue='hello', durable=True)
Although this command is correct by itself, it won't work in our setup. That's because we've already defined a queue called hello which is not durable. RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error(会曝错) to any program that tries to do that. But there is a quick workaround - let's declare a queue with different name, for exampletask_queue:
1 channel.queue_declare(queue='task_queue', durable=True)
This queue_declare change needs to be applied to both the producer and consumer code.
At that point we're sure that the task_queue queue won't be lost even if RabbitMQ restarts. Now we need to mark our messages as persistent - by supplying a delivery_mode property with a value 2.
1 channel.basic_publish(exchange='',
2 routing_key="task_queue",
3 body=message,
4 properties=pika.BasicProperties(
5 delivery_mode = 2, # make message persistent
6 ))
上面的英文对消息持久化讲得很好。消息持久化分为两步:
- 持久化队列。通过代码实现持久化hello队列:channel.queue_declare(queue='hello', durable=True)
- 持久化队列中的消息。通过代码实现:properties=pika.BasicProperties( delivery_mode = 2, )
这里有个点要注意下:
如果你在代码中已实现持久化hello队列与队列中的消息。那么你重启rabbitMQ后再次运行代码可能会爆错!
因为: RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error.
为了解决这个问题,可以声明一个与重启rabbitMQ之前不同的队列名(queue_name).
四、消息公平分发
如果Rabbit只管按顺序把消息发到各个消费者身上,不考虑消费者负载的话,很可能出现,一个机器配置不高的消费者那里堆积了很多消息处理不完,同时配置高的消费者却一直很轻松。为解决此问题,可以在各个消费者端,配置perfetch=1,意思就是告诉RabbitMQ在我这个消费者当前消息还没处理完的时候就不要再给我发新消息了。
带消息持久化+公平分发的完整代码
生产者端:
消费者端:
我在运行上面程序时对消费者端里回调函数的一句代码(ch.basic_ack(delivery_tag =method.delivery_tag))十分困惑。这句代码去掉消费者端也能照样收到消息啊。这句代码有毛线用处??
生产者端消息持久后,需要在消费者端加上(ch.basic_ack(delivery_tag =method.delivery_tag)): 保证消息被消费后,消费端发送一个ack,然后服务端从队列删除该消息.
五、消息发布与订阅
之前的例子都基本都是1对1的消息发送和接收,即消息只能发送到指定的queue里,但有些时候你想让你的消息被所有的queue收到,类似广播的效果,这时候就要用到exchange了。PS:有兴趣的了解redis的发布与订阅,可以看看我写的博客python之redis。
An exchange is a very simple thing. On one side it receives messages from producers and the other side it pushes them to queues. The exchange must know exactly what to do with a message it receives. Should it be appended to a particular queue? Should it be appended to many queues? Or should it get discarded(丢弃). The rules for that are defined by the exchange type.
Exchange在定义的时候是有类型的,以决定到底是哪些Queue符合条件,可以接收消息
fanout: 所有bind到此exchange的queue都可以接收消息
direct: 通过routingKey和exchange决定的那个唯一的queue可以接收消息
topic:所有符合routingKey(此时可以是一个表达式)的routingKey所bind的queue可以接收消息
表达式符号说明: #代表一个或多个字符,*代表任何字符
例:#.a会匹配a.a,aa.a,aaa.a等
*.a会匹配a.a,b.a,c.a等
注:使用RoutingKey为#,Exchange Type为topic的时候相当于使用fanout
下面我分别讲下fanout,direct,topic:
1、fanout
fanout: 所有bind到此exchange的queue都可以接收消息
send端:
receive端:
有两个点要注意下:
- fanout-广播,send端的routing_key='', #fanout的话为空(默认)
- receive端有一句代码:result=channel.queue_declare(exclusive=True),作用:不指定queue名字(为了收广播),rabbitMQ会随机分配一个queue名字,exclusive=True会在使用此queue的消费者断开后,自动将queue删除。
2、有选择的接收消息(exchange type=direct)
RabbitMQ还支持根据关键字发送,即:队列绑定关键字,发送者将数据根据关键字发送到消息exchange,exchange根据 关键字 判定应该将数据发送至指定队列。
send端:
receive端:
其实最开始我看代码是一脸懵逼的~ 下面是我在cmd进行测试的截图(配合着截图看会容易理解些),一个send端,两个receive端(先起receive端,再起receive端):
send端:
receive端-1:
receive端-2:
3、更细致的消息过滤topic(供参考)
Although using the direct exchange improved our system, it still has limitations - it can't do routing based on multiple criteria.
In our logging system we might want to subscribe to not only logs based on severity, but also based on the source which emitted the log. You might know this concept from the syslog unix tool, which routes logs based on both severity (info/warn/crit...) and facility (auth/cron/kern...).
That would give us a lot of flexibility - we may want to listen to just critical errors coming from 'cron' but also all logs from 'kern'.
感觉我英文水平不高啊~,我对照着垃圾有道翻译,加上自己的理解,大概知道上面在讲什么。
举例: 如果是系统的错误,就把信息发送到A,如果是MySQL的错误,就把信息发送到B。但是对B来说,想实现接收MySQL的错误信息,可以用有选择的接收消息(exchange type=direct),让关键字为error就实现了啊!现在B有个需求:不是所有的错误信息都接收,只接收指定的错误。在某种信息再进行过滤,这就是更细致的消息过滤topic。
send端:
receive端:
六、RPC(Remote Procedure Call)
RPC的概念可看我百度的(其实就类似我之前做的FTP,我从客户端发一个指令,服务端返回相关信息):
下面重点讲下RPC通信,我刚开始学挺难的,学完之后感觉RPC通信的思想很有启发性,代码的例子写得也很牛!!
client端发的消息被server端接收后,server端会调用callback函数,执行任务后,还需要把相应的信息发送到client,但是server如何将信息发还给client?如果有多个client连接server,server又怎么知道是要发给哪个client??
RPC-server默认监听rpc_queue.肯定不能把要发给client端的信息发到rpc_queue吧(rpc_queue是监听client端发到server端的数据)。
合理的方案是server端另起一个queue,通过queue将信息返回给对应client。但问题又来了,queue是server端起的,故client端肯定不知道queue_name,连queue_name都不知道,client端接收毛线的数据??
解决方法:
客户端在发送指令的同时告诉服务端:任务执行完后,数据通过某队列返回结果。客户端监听该队列就OK了。
client端:
1 import pika 2 import uuid 3 4 5 class FibonacciRpcClient(object): 6 def __init__(self): 7 self.connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost')) 8 9 self.channel = self.connection.channel() 10 #随机建立一个queue,为了监听返回的结果 11 result = self.channel.queue_declare(exclusive=True) 12 self.callback_queue = result.method.queue ##队列名 13 14 self.channel.basic_consume(self.on_response, #一接收客户端发来的指令就调用回调函数on_response 15 no_ack=True, 16 queue=self.callback_queue) 17 18 def on_response(self, ch, method, props, body): #回调 19 #每条指令执行的速度可能不一样,指令1比指令2先发送,但可能指令2的执行结果比指令1先返回到客户端, 20 #此时如果没有下面的判断,客户端就会把指令2的结果误认为指令1执行的结果 21 if self.corr_id == props.correlation_id: 22 self.response = body 23 24 def call(self, n): 25 self.response = None ##指令执行后返回的消息 26 self.corr_id = str(uuid.uuid4()) ##可用来标识指令(顺序) 27 self.channel.basic_publish(exchange='', 28 routing_key='rpc_queue', #client发送指令,发到rpc_queue 29 properties=pika.BasicProperties( 30 reply_to=self.callback_queue, #将指令执行结果返回到reply_to队列 31 correlation_id=self.corr_id, 32 ), 33 body=str(n)) 34 while self.response is None: 35 self.connection.process_data_events() #去queue接收数据(不阻塞) 36 return int(self.response) 37 38 39 fibonacci_rpc = FibonacciRpcClient() 40 41 print(" [x] Requesting fib(30)") 42 response = fibonacci_rpc.call(30) 43 print(" [.] Got %r" % response)
server端:
1 import pika 2 import time 3 4 connection = pika.BlockingConnection(pika.ConnectionParameters( 5 host='localhost')) 6 7 channel = connection.channel() 8 9 channel.queue_declare(queue='rpc_queue') 10 11 12 def fib(n): 13 if n == 0: 14 return 0 15 elif n == 1: 16 return 1 17 else: 18 return fib(n - 1) + fib(n - 2) 19 20 21 def on_request(ch, method, props, body): 22 n = int(body) 23 24 print(" [.] fib(%s)" % n) 25 response = fib(n) #从客户端收到的消息 26 27 ch.basic_publish(exchange='', ##服务端发送返回的数据到props.reply_to队列(客户端发送指令时声明) 28 routing_key=props.reply_to, #correlation_id (随机数)每条指令都有随机独立的标识符 29 properties=pika.BasicProperties(correlation_id= 30 props.correlation_id), 31 body=str(response)) 32 ch.basic_ack(delivery_tag=method.delivery_tag) #客户端持久化 33 34 35 channel.basic_qos(prefetch_count=1) #公平分发 36 channel.basic_consume(on_request, #一接收到消息就调用on_request 37 queue='rpc_queue') 38 39 print(" [x] Awaiting RPC requests") 40 channel.start_consuming()