This is what i'm getting in rabbitmq message broker
=INFO REPORT==== 13-Jan-2015::12:40:24 ===
vm_memory_high_watermark set. Memory used:478063864 allowed:415868518
=WARNING REPORT==== 13-Jan-2015::12:40:24 ===
memory resource limit alarm set on node 'rabbit#matchpointgps-141110'.
* Publishers will be blocked until this alarm clears *
This has happened twice at our server.
I'm still not able to get the correct solution for this.
We had a similar issue when the queue lengths got very high and it tried to write the messages to disk but couldn't do it fast enough. In our testing, we did not have this problem when we used SSD drives.
The easiest solution for us was to have the messages written to disk immediately by setting durable=true on the messages. This was also a good idea because if rabbit restarted the data in the queues wouldn't be lost.
Take a look at this blog post on how RabbitMQ queues use memory: http://www.rabbitmq.com/blog/2011/10/27/performance-of-queues-when-less-is-more/
TL;DR try to keep your queues as empty as possible
Finally, found some better configuration for rabbitmq queues.
I've added following line to celery config since it was creating one additional queue for each tasks.
CELERY_IGNORE_RESULT = True
And also created separate queue for my task.
This keeps the memory free and ready to take up more heavy and longer tasks
more information
https://denibertovic.com/posts/celery-best-practices/
I had similar issue in a rabbitMQ server running on docker. Everything was blocked and rabbit won't accept any message.
I simply did to reconfigure the Disk Free Space Limit:
rabbitmqctl set_disk_free_limit 1GB
This changed the "xx GiB low watermark" and solved the problem.
If you are using a "bitnami/rabbitmq" docker image, you can set this varibale with:
RABBITMQ_DISK_FREE_ABSOLUTE_LIMIT: "1GB"
Related
erlang version = 1:24.0.2-1
rabbitmq-server version = 3.8.16-1
Recently installed latest rabbitmq on Ubuntu20.
I verified that all was working fine and consumer was consuming the notification from messaging queue as required.
After approximately a day, rabbbitmq crashed as there was 0 disk space left.
After analysis found that around 10G was consumed by msg_store_transient, to which restarting rabbitmq solved the issue.
But after a day, it happens again.
Can someone help me further?
most likely you are consuming messages without sending back the basic_ack, see for example here the ch.basic_ack
What to do:
check the unacked messages see:
check if you are using too many not persistent messages
check if you are using too many not persistent queues
Issue is Fixed:
We had high number of Ready messages because of which rdq files was taking huge space
There was a bug in code, that was listening to only one queue not all.
"rabbitmqctl list_connections" shows as running but on the UI in the connections tab, under client properties, i see "connection.blocked: true".
I can see that messages are in queued in RabbitMq and the connection is in idle state.
I am running Airflow with Celery. My jobs are not executing at all.
Is this the reason why jobs are not executing?
How to resolve the issue so that my jobs start running
I'm experiencing the same kind of issue by just using celery.
It seems that when you have a lot of messages in the queue, and these are fairly chunky, and your node memory goes high, the rabbitMQ memory watermark gets trespassed and this triggers a blocking into consumer connections, so no worker can access that node (and related queues).
At the same time publishers are happily sending stuff via the exchange so you get in a lose-lose situation.
The only solution we had is to avoid hitting that memory watermark and scale up the number of consumers.
Keep messages/tasks lean so that the signature is not MB but KB
We have been having below issues from RabbitMQ and had been manually restarting the servers every weekend as a work around.
Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions.
We have gone through other popular posts on the topic e.g. here and here
Our network is not highly reliable and occasional blips are expected but when it does come up I would have expected 1 of the 4 node RabbitMQ cluster to join the rest of cluster - as is the case with 4 nodes of Tomcat installed on same servers.
Although the nodes on single partition continue to run independently but doesnt seem like that is a graceful recovery from failure in one node.
We didnt have great luck with using any rabbitmqctl commands like rabbitmqctl cluster_status - It used to sporadically cause the rabbitmq process to hang which needed a sudo kill to RabbitMQ process.
We are at a point of evaluating moving to Kafka or any other message broker that handles message partition well
Any thoughts on working around not needing manual RabbitMQ restarts or ability of Kafka to handle such situation is highly appreciated
I think Kafka with replication should be able to handle network partitions quite easily, as long as the number of brokers partitioned is inferior to the replication factor of your topic (aka, the consumers and producers can always reach at least 1 broker for the topics they're operating with).
To avoid backpressure in the clients while Zookeeper discover the partition and propagate the information to the producers and consumer, you may want to set short ZK heartbeating (yes, you'll need ZK, and a cluster too since you absolutely don't want your whole ZK cluster partitioned).
Fair warning though : using a cluster of kafka brokers will drop the FIFO aspect of your message queue which can be pretty disturbing if you're expecting the same order of messages produced by the producers and read by the consumers, which you could expect with RabbitMQ.
I'm wondering what are the pros and cons of using redis as a broker in an infrastructure?
At the moment, all my agents are sending to a central NXLog server which proxies the requests to logstash --> ES.
What would I gain by using a redis server in between my nxlog collector and logstash? To me, it seems pointless as nxlog has already good mem and disk buffers in case logstash is down.
What would I gain?
Thank you
On a heavy load : calling ES (HTTP) directly can be dangerous and you can have problems if ES break down .
Redis can handle More (Much more) Write request and send it in asynch logic to ES(HTTP).
I started using redis because I felt that it would separat the input and the filter part.
At least during periodes in which I change the configuration a lot.
As you know if you change the logstash configuration you have to restart the thing. All clients (in my case via syslog) are doomed to reconnect to the logstash daemon when he is back in business.
By putting an indexer in front which holds the relativly static input configuration and pusing everything to redis I am able to restart logstash without causing hickups throughout the datacenter.
I encountered some issues, because our developers hadn't found time (yet) to reduce the amount of useless logs send to syslog, thus overflowing the server. Before we had logstash they overflowed the disk space for logs - more general issue though... :)
When used with Logstash, Redis acts as a message queue. You can have multiple writers and multiple readers.
By using Redis (or any other queueing service) allows you to scale Logstash horizontaly by adding more servers to the 'cluster'. This will not matter for small operations but can be extremely useful for larger installations.
When using Logstash with Redis, you can configure Redis to only store all the log entries in memory which would like a in memory queue (like memcache).
You mat come to the point where the number of logs sent will not be processed by Logstash and it can bring down your system on constant basis (observed in our environment).
If you feel Redis is an overhead for your disk, you can configure it to store all the logs in memory until they are processed by logstash.
As we built our ELK infrastructure, we originally had a lot of problems with the logstash indexer (reading from redis). Redis would back up and eventually die. I believe this was because, in the hope of not losing log files, redis was configured to persist the cache to disk once in a while. When the queue got "too large" (but still within available disk space), redis would die, taking all of the cached entries with it.
If this is the best redis can do, I wouldn't recommend it.
Fortunately, we were able to resolve the issues with the indexer, which typically kept the redis queue empty. We set our monitoring to alert quickly when the queue did back up, and it was a good sign that the indexer was unhappy again.
Hope that helps.
I have a problem with RabbitMQ 2.8.2 server.
After one or two days usage I receive Disk Space warning from RabbitMQ and the only solution I found is to clear directory /var/lib/rabbitmq/mnesia/rabbit#linux-3blg/msg_store_transient and restart RabbitMQ. I use rather huge messages in my program 1-50MB may be problem is there, but I really need stability.
Does anybody knows the solution?
This means that messages are not being acknowledged and consumed. It might help to make use of the Time-To-Live Extensions. See http://www.rabbitmq.com/ttl.html#per-queue-message-ttl and set an expiration for messages.