If a consumer of a RabbitMQ crashes, with no graceful disconnection, will a subsequent declare-ok request fired several milliseconds later report a diminished consumer-count? Or is there an amount of time that needs to pass before the reported number will change?
declare-ok count all known consumers regardless their actual state.
See, in fact, some time after connection get dangled it still marked as alive (exact time depends on OS settings and whether do you use heartbeats and whether there are any network operation over that connection). In RabbitMQ management panel you may see connection and it channels with consumer tags listed some time after connection died.
Related
I'm not sure how to resiliently handle RabbitMQ messages in the event of an intermittent outage.
I subscribe in a windows service, read the message, then store it my database. If I can't process the record because of the data I publish it to a dead letter queue for a human to address and reprocess.
I am not sure what to do if I have some intermittent technical issue that will fix itself (database reboot, network outage, drive space, etc). I don't want hundreds of messages showing up on dead letter that just needed to wait for a for a glitch but now would be waiting on a human.
Currently, I re-queue the event and retry it once, but it retries so fast the issue is not usually resolved. I thought of retrying forever but I don't want a real issue to get stuck in an infinite loop.
Is a broad topic but from the server side you could persist your messages and make your queues durable, this means that in the eventuality the server gets restarted they won't be lost, check more here How to persist messages during RabbitMQ broker restart?
For the consumer (client) it will depend on how you configure your client, from the docs:
In the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, the simplest way to handle this is to ensure that your consumers handle messages in an idempotent way rather than explicitly deal with deduplication.
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
Check more here: https://www.rabbitmq.com/reliability.html#consumer
I was trying to perform LoadTest on the RabbitMQ messaging to see to what extent it can take messages into the queue and transfer it to the target machine over shovel.
Steps i followed:
producer has 20 threads. Each thread sends message to a dedicated queue(Say suppose ProducerQueue1 -- ProducerQueue20).The message is of size 51Mb each. The messages are sent in random interval using java.util.Random(50) seconds.
After each message sent at random seconds(A random second between 1- 50),
there is a sleep of 2 minutes.Therefore each of the producer threads sleep for
2 min after every send.
The messages are sent in a infinite while loop.
There are shovels from each dedicated queue to the consumer side dedicated queues(Say suppose ConsumerQueue1 --- ConsumerQueue20).
The link speed is 100mbps.
Issue observed:
Initially the messages are transferred with no issues, but after some time the NETWORK AT CONSUMER SIDE IS CHOKED.
The reason for choking is that after certain period of time, even if 4/5 out of 20 thread's random second coincides, then the consumer receives close to 250Mb message in one shot. Since the network speed is 100mbps as mentioned above, the network gets choked.
Due to this, the shovels will not be able to exchange heartbeats to stay in "running" state. The leads shovels to move from "running" to "terminated" state. The shovels try to establish a connection depending upon the "reconnect delay".
Due to break in shovels at producer side, the queues at the producer starts getting accumulated.
My Question:
The consumer's rabbitMq memory starts increasing as the queues start accumulating more messages. The memory is crossing the water mark. The purpose of water mark is not served. I have 16gb ram and i have set watermark to 40%(i.e 6.4gb ram). But still the memory shoots up to 10gb and doesnt recover and the producer system hangs.
Can any one please answer my question. and also tell me can there be any other reason for network choking which i mentioned above.
Thanks in advance.
RabbitMQ allows you to "heartbeat" a connection, i.e. from time to time the client and the server check (using empty messages) that the other party is still there and available. So far, so good.
Unfortunately, I was not able to find a place in the documentation where a suggestion is made what a reasonable value for this is. I know that you need to specify the heartbeat in seconds, but what is a real-world best practice value?
Obviously, it should not be too often (traffic), but also not too rare (proxies, …). Any suggestions?
Is 15 seconds fine? 30? 60? …?
This answer if for RabbitMQ < 3.5.5, for newer versions see the answer from #bmaupin.
It depends on your application needs. Out of the box it is 10 min for RabbitMQ. If you fail to ack heartbeat twice (20min of inactivity), connection will be closed immediately without sending any connection.close method or any error from the broker side.
The case to use heartbeat is firewalls that closes inactive for a long time connection or some other network settings that doesn't allow you to have waiting connections.
In fact, hearbeat is not a must, from RabbitMQ config doc
heartbeat
Value representing the heartbeat delay, in seconds, that the server sends in the connection.tune frame. If set to 0, heartbeats are disabled. Clients might not follow the server suggestion, see the AMQP reference for more details. Disabling heartbeats might improve performance in situations with a great number of connections, but might lead to connections dropping in the presence of network devices that close inactive connections.
Default: 580
Note, that having hearbeat interval too short may result in significant network overhead. Keep in mind, that hearbeat frames are sent when there are no other activity on the connection for a hearbeat time interval.
The RabbitMQ documentation now provides a recommended heartbeat timeout value between 5 and 20 seconds:
Setting heartbeat timeout value too low can lead to false positives (peer being considered unavailable while it really isn't the case) due to transient network congestion, short-lived server flow control, and so on. This should be taken into consideration when picking a timeout value.
Several years worth of feedback from the users and client library maintainers suggest that values lower than 5 seconds are fairly likely to cause false positives, and values of 1 second or lower are very likely to do so. Values within the 5 to 20 seconds range are optimal for most environments.
Source: https://www.rabbitmq.com/heartbeats.html#false-positives
In addition, as of RabbitMQ 3.5.5 the default heartbeat timeout value is 60 seconds (https://www.rabbitmq.com/heartbeats.html#heartbeats-timeout)
I was wondering if this is possible. I want to pull a task from a queue and have some work that could potentially take anywhere from 3 seconds or longer (possibly) minutes before an ack is sent back to RabbitMQ notifying that the work has been completed. The work is done by a user, hence this is why the time it takes to process the job varies.
I don't want to ack the message immediately after I pop off the queue because I want the message to be requeued if no ack is received. Can anyone give me any insights into how to solve my problem?
Having a long timeout should be fine, and certainly as you say you want redelivery if something goes wrong, so you want to only ack after you finish.
The best way to achieve that, IMO, would be to have multiple consumers on the queue (i.e. multiple threads/processes consuming from the same queue). That should be fine as long as there's no particular ordering constraint on your queue contents (i.e. the way there might be if the queue were to contain contents representing Postgres data that involves FK constraints).
This tutorial on the RabbitMQ website provides more info (Python linked, but there should be similar tutorials for other languages): https://www.rabbitmq.com/tutorials/tutorial-two-python.html
Edit in response to comment from OP:
What's your heartbeat set to? If your worker doesn't acknowledge the heartbeat within the set period of time, the server will consider the connection to be dead.
Not sure which language you're using, but for Java you would use the setRequestedHeartbeat method to specify the heartbeat.
The way you implement your workers, it's vital that the heartbeat can still be sent back to the RabbitMQ server. If something blocks the client from sending the heartbeat, the server will kill the connection after the time interval expires.
In the console pane rabbitmq one day I had accumulated 8000 posts, but I am embarrassed that their status is idle at the counter ready and total equal to 1. What status should be completed at the job, idle? In what format is registered x-pires? It seems to me that I had something wrong =(
While it's difficult to fully understand what you are asking, it seems that you simply don't have anything pulling messages off of the queue in question.
In general, RabbitMQ will hold on to a message in a queue until a listener pulls it off and successfully ACKs, indicating that the message was successfully processed. You can configure queues to behave differently by setting a Time-To-Live (TTL) on messages or having different queue durabilities (eg. destroyed when there are no more listeners), but the default is to play it safe.