Sending large message to RabbitMQ with Pika 0.9.5: message is silently dropped by Rabbit - rabbitmq

I've got a bunch of celery tasks that take their results and post them to a RabbitMQ message queue. The results that get posted can become quite large (up to a few meg). Opinion is mixed as to whether putting large amounts of data in a RabbitMQ message is a good idea, but I've seen this work in other situations and as long as memory is kept under control, it seems to work.
However, for my current set of tasks, rabbit appears to be just dropping messages that seem to be too big. I've reduced it down to a fairly simple test case:
#!/usr/bin/env python
import string
import random
import pika
import os
qname='examplequeue'
connection = pika.BlockingConnection(pika.ConnectionParameters(
host='mq.example.com'))
channel = connection.channel()
channel.queue_declare(queue=qname,durable=True)
N=100000
body = ''.join(random.choice(string.ascii_uppercase) for x in range(N))
promise = channel.basic_publish(exchange='', routing_key=qname, body=body, mandatory=0, immediate=0, properties=pika.BasicProperties(content_type="text/plain",delivery_mode=2))
print " [x] Sent 'Hello World!'"
connection.close()
I have a 3-node RabbitMQ cluster, and mq.example.com round-robins to each node. Client is using Pika 0.9.5 on Ubuntu 12.04 and the RabbitMQ cluster is running RabbitMQ 2.8.7 on Erlang R14B04.
Executing this script prints the print statement and exits without any exceptions being raised. The message never appears in RabbitMQ.
Changing N to 10000 makes it work as expected.
Why?

I suppose you have problem with tcp-backpressure mechanizm in RabbitMq. You can read about http://www.rabbitmq.com/memory.html.
I see two ways to solve this problem:
Add tcp-callback and make reconnect every tcp-call from rabbit
Use compressing messages before sending it to rabbit, It will make easier push to rabbit.
def compress(s):
return binascii.hexlify(zlib.compress(s))
def decompress(s):
return zlib.decompress(binascii.unhexlify(s))

This is what I do to send and receive packets. It is somewhat more efficient than hexlify, because base64 may use one byte where two bytes are needed by hexlify to represent one character.
import zlib
import base64
def hexpress(send: str):
print(f"send: {send}")
bsend = send.encode()
print(f"byte-encoded send: {bsend}")
zbsend = zlib.compress(bsend)
print(f"zipped-byte-encoded-send: {zbsend}")
hzbsend = base64.b64encode(zbsend)
print(f"hex-zip-byte-encoded-send: {hzbsend}")
shzbsend = hzbsend.decode()
print(f"string-hex-zip-byte-encoded-send: {shzbsend}")
return shzbsend
def hextract(recv: str):
print(f"string-hex-zip-byte-encoded-recv: {recv}")
zbrecv = base64.b64decode(recv)
print(f"zipped-byte-encoded-recv: {zbrecv}")
brecv = zlib.decompress(zbrecv)
print(f"byte-encoded-recv: {brecv}")
recv = brecv.decode()
print(f"recv: {recv}")
return recv
print("sending ...\n")
send = "hello this is dog"
packet = hexpress(send)
print("\nover the wire -------->>>>>\n")
print("receiving...\n")
recv = hextract(packet)

Related

Task queues and result queues with Celery and Rabbitmq

I have implemented Celery with RabbitMQ as Broker. I rely on Celery v4.4.7 since I have read that v5.0+ doesn't support RabbitMQ anymore. RabbitMQ is a MUST in my case.
Everything has been containerized then deployed as pods within Kubernetes 1.19. I am able to execute long running tasks and everything apparently looks fine at first glance. However, I have few concerns which require your expertise.
I have declared inbound and outbound queues but Celery created his owns and I do not see any message within those queues (inbound or outbound) :
inbound_queue = "_IN"
outbound_queue = "_OUT"
app = Celery()
app.conf.update(
broker_url = 'pyamqp://%s//' % path,
broker_heartbeat = None,
broker_connection_timeout = int(timeout)
result_backend = 'rpc://',
result_persistent = True,
task_queues = (
Queue(algorithm_queue, Exchange(inbound_queue), routing_key='default', auto_delete=False),
Queue(result_queue, Exchange(outbound_queue), routing_key='default', auto_delete=False),
),
task_default_queue = inbound_queue,
task_default_exchange = inbound_exchange,
task_default_exchange_type = 'direct',
task_default_routing_key = 'default',
)
#app.task(bind=True,
name='osmq.tasks.add',
queue=inbound_queue,
reply_to = outbound_queue,
autoretry_for=(Exception,),
retry_kwargs={'max_retries': 5, 'countdown': 2})
def execute(self, data):
<method_implementation>
I have implemented callbacks to get results back via REST APIs. However, randomly, it can return or not some results when the status is successfull. This is probably related to message persistency. In details, when I implement flower API to get info, status is successfull and the result is partially displayed (shortened json messages) - when I call AsyncResult, for the same status, result is either None or the right one. I do not understand the mechanism between rabbitmq queues and kombu which seems to cache the resulting message. I must guarantee to retrieve results everytime the task has been successfully executed.
def callback(uuid):
task = app.AsyncResult(uuid)
Specifically, it was that Celery 5.0+ did not support amqp:// as a result back end anymore. However, as your example, rpc:// is supported.
The relevant snippet is here: https://docs.celeryproject.org/en/stable/getting-started/backends-and-brokers/index.html#rabbitmq
We tend to always ignore_results=True in our implementation, so I can't give any practical tips of how to use rpc://, other than to infer that any response is put on an application-specific queue, instead of being able to put on a specified queue (or even different broker / rabbitmq instance) via amqp://.

Is there a way to let Pika BlockingConnection consume one message at a time?

import pika
params = pika.URLParameters([URL])
connection = pika.BlockingConnection(params)
channel = connection.channel()
channel.queue_declare(queue='test', durable=True)
channel.basic_consume(do_things, queue='test')
try:
channel.start_consuming()
except KeyboardInterrupt:
channel.stop_consuming()
except:
rollbar.report_exc_info()
finally:
channel.close()
connection.close()
This is the code I used to consume messages. The problem is, say I have 100 messages in the test queue. Once I start the consumer, it will get all 100 messages and process it one by one, i.e. the queue status became: message ready: 0, unacked: 100, total: 100. As a result, I wouldn't be able to spin up new consumers to process the 100 message in parallel, because there are no messages left for new consumers (all have been taken by the existing consumer, although most messages haven't be processed). Is there a way to let the consumer to only take 1 message at a time?
You need to specify the Quality of Service which is desired for your channel.
In your case, the prefetch_count is the parameter you need.
import pika
params = pika.URLParameters([URL])
connection = pika.BlockingConnection(params)
channel = connection.channel()
channel.basic_qos(prefetch_count=1)

celery - Programmatically list queues

How can I programmatically, using Python code, list current queues created on a RabbitMQ broker and the number of workers connected to them? It would be the equivalent to:
rabbitmqctl list_queues name consumers
I do it this way and display all the queues and their details (messages ready, unacknowledged etc.) on a web page -
import kombu
conn = kombu.Connection(broker_url)# example 'amqp://guest:guest#localhost:5672/'
conn.connect()
client = conn.get_manager()
queues = client.get_queues('/')#assuming vhost as '/'
You will need kombu to be installed and queues will be a dictionary with keys representing the queue names.
I think I got this when digging through the code of celery flower (The tool used for monitoring celery).
Update: As pointed out by #zaq178miami, you will also need the management plugin that has the http API. I had forgotten that I had enabled than in rabbitmq.
This way did it for me:
def get_queue_info(queue_name):
with celery.broker_connection() as conn:
with conn.channel() as channel:
return channel.queue_declare(queue_name, passive=True)
This will return a namedtuple with the name, number of messages waiting and consumers of that queue.
ksrini answer is correct too and can be used when you require more information about a queue.
Thanks to Ask Solem who gave me the hint.
As a rabbitmq client you can use pika. However it doesn't have option for list_queues. The easiest solution would be calling rabbitmqctl command from python using subprocess:
import subprocess
command = "/usr/local/sbin/rabbitmqctl list_queues name consumers"
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
print process.communicate()
I would use simply this:
Just replace the user(default= guest), passwd(default= guest) and port with your values.
import requests
import json
def call_rabbitmq_api(host, port, user, passwd):
url = 'https://%s:%s/api/queues' % (host, port)
r = requests.get(url, auth=(user,passwd),verify=False)
return r
def get_queue_name(json_list):
res = []
for json in json_list:
res.append(json["name"])
return res
if __name__ == '__main__':
host = 'rabbitmq_host'
port = 55672
user = 'guest'
passwd = 'guest'
res = call_rabbitmq_api(host, port, user, passwd)
print ("--- dump json ---")
print (json.dumps(res.json(), indent=4))
print ("--- get queue name ---")
q_name = get_queue_name(res.json())
print (q_name)
Referred from here: https://gist.github.com/hiroakis/5088513#file-example_rabbitmq_api-py-L2

Twisted IRC Bot connection lost repeatedly to localhost

I am trying to implement an IRC Bot on a local server. The bot that I am using is identical to the one found at Eric Florenzano's Blog. This is the simplified code (which should run)
import sys
import re
from twisted.internet import reactor
from twisted.words.protocols import irc
from twisted.internet import protocol
class MomBot(irc.IRCClient):
def _get_nickname(self):
return self.factory.nickname
nickname = property(_get_nickname)
def signedOn(self):
print "attempting to sign on"
self.join(self.factory.channel)
print "Signed on as %s." % (self.nickname,)
def joined(self, channel):
print "attempting to join"
print "Joined %s." % (channel,)
def privmsg(self, user, channel, msg):
if not user:
return
if self.nickname in msg:
msg = re.compile(self.nickname + "[:,]* ?", re.I).sub('', msg)
prefix = "%s: " % (user.split('!', 1)[0], )
else:
prefix = ''
self.msg(self.factory.channel, prefix + "hello there")
class MomBotFactory(protocol.ClientFactory):
protocol = MomBot
def __init__(self, channel, nickname='YourMomDotCom', chain_length=3,
chattiness=1.0, max_words=10000):
self.channel = channel
self.nickname = nickname
self.chain_length = chain_length
self.chattiness = chattiness
self.max_words = max_words
def startedConnecting(self, connector):
print "started connecting on {0}:{1}"
.format(str(connector.host),str(connector.port))
def clientConnectionLost(self, connector, reason):
print "Lost connection (%s), reconnecting." % (reason,)
connector.connect()
def clientConnectionFailed(self, connector, reason):
print "Could not connect: %s" % (reason,)
if __name__ == "__main__":
chan = sys.argv[1]
reactor.connectTCP("localhost", 6667, MomBotFactory('#' + chan,
'YourMomDotCom', 2, chattiness=0.05))
reactor.run()
I added the startedConnection method in the client factory, which it is reaching and printing out the proper address:host. It then disconnects and enters the clientConnectionLost and prints the error:
Lost connection ([Failure instance: Traceback (failure with no frames):
<class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
]), reconnecting.
If working properly it should log into the appropriate channel, specified as the first arg in the command (e.g. python module2.py botwar. would be channel #botwar.). It should respond with "hello there" if any one in the channel sends anything.
I have NGIRC running on the server, and it works if I connect from mIRC or any other IRC client.
I am unable to find a resolution as to why it is continually disconnecting. Any help on why would be greatly appreciated. Thank you!
One thing you may want to do is make sure you will see any error output produced by the server when your bot connects to it. My hunch is that the problem has something to do with authentication, or perhaps an unexpected difference in how ngirc handles one of the login/authentication commands used by IRCClient.
One approach that almost always applies is to capture a traffic log. Use a tool like tcpdump or wireshark.
Another approach you can try is to enable logging inside the Twisted application itself. Use twisted.protocols.policies.TrafficLoggingFactory for this:
from twisted.protocols.policies import TrafficLoggingFactory
appFactory = MomBotFactory(...)
logFactory = TrafficLoggingFactory(appFactory, "irc-")
reactor.connectTCP(..., logFactory)
This will log output to files starting with "irc-" (a different file for each connection).
You can also hook directly into your protocol implementation, at any one of several levels. For example, to display any bytes received at all:
class MomBot(irc.IRCClient):
def dataReceived(self, bytes):
print "Got", repr(bytes)
# Make sure to up-call - otherwise all of the IRC logic is disabled!
return irc.IRCClient.dataReceived(self, bytes)
With one of those approaches in place, hopefully you'll see something like:
:irc.example.net 451 * :Connection not registered
which I think means... you need to authenticate? Even if you see something else, hopefully this will help you narrow in more closely on the precise cause of the connection being closed.
Also, you can use tcpdump or wireshark to capture the traffic log between ngirc and one of the working IRC clients (eg mIRC) and then compare the two logs. Whatever different commands mIRC is sending should make it clear what changes you need to make to your bot.

Pika + RabbitMQ: setting basic_qos to prefetch=1 still appears to consume all messages in the queue

I've got a python worker client that spins up a 10 workers which each hook onto a RabbitMQ queue. A bit like this:
#!/usr/bin/python
worker_count=10
def mqworker(queue, configurer):
connection = pika.BlockingConnection(pika.ConnectionParameters(host='mqhost'))
channel = connection.channel()
channel.queue_declare(queue=qname, durable=True)
channel.basic_consume(callback,queue=qname,no_ack=False)
channel.basic_qos(prefetch_count=1)
channel.start_consuming()
def callback(ch, method, properties, body):
doSomeWork();
ch.basic_ack(delivery_tag = method.delivery_tag)
if __name__ == '__main__':
for i in range(worker_count):
worker = multiprocessing.Process(target=mqworker)
worker.start()
The issue I have is that despite setting basic_qos on the channel, the first worker to start accepts all the messages off the queue, whilst the others sit there idle. I can see this in the rabbitmq interface, that even when I set worker_count to be 1 and dump 50 messages on the queue, all 50 go into the 'unacknowledged' bucket, whereas I'd expect 1 to become unacknowledged and the other 49 to be ready.
Why isn't this working?
I appear to have solved this by moving where basic_qos is called.
Placing it just after channel = connection.channel() appears to alter the behaviour to what I'd expect.