Is there a way to register to memory alarm of rabbitmq - rabbitmq

My application just got freeze because the memory usage of rabbitmq exceeded its threshold.
I am using pika and pyrabbit as a python wrappers for handling channels and connections.
I wander if there is a way that my process will register to something and get a notification when that event occurs (and hopefully even a bit before it does).

When using rabbitpy you can check if the blocked flag is set. This flag means that the connection is being blocked due to resource constraints (most likely due to low memory).
with rabbitpy.Connection('amqp://guest:guest#localhost:5672/%2f') as conn:
print(conn.blocked)
e.g.
while conn.blocked:
time.sleep(0.1) # wait until connection is unblocked

Related

Limit total size of inflight iot message

I am using IoTHub device client SDK on an embedded device. The application will send telemetry message to iot hub periodically. The iot device connect to a wireless router and wireless connect to internet via WAN port.
When the wireless router lost internet connection, iot device will not get notified immediately about the disconnection. It takes about 60s to get notified, before that iot device will continue to send telemetry message with IoTHubDeviceClient_LL_SendEventAsync(), all those message get queued in SDK layer and eat memory. Since it's on embedded device with limited resource, memory get eaten up and cause program been killed by a lower memory killer app.
Is there way to specified total size of iot message can be queued in sdk layer? If exceed this quota, IoTHubDeviceClient_LL_SendEventAsync() will failed immediately.
Actually this is also needed for normal scenario too. When iot device send message, seems message been queued in low layer and get flushed out at certain time. I don't see any API that can control the flush. That create another problem, even when there is internet connection, from application level, there is no control of how many message been queued and how long it been queued, in turn, app has no control of how much memory been used by process. On my device, there is system monitor that will kill process use too much memory.
The question is what do you do even in that case if the message failure occurs in the case that the queue is full? Do you lose the information then because of lack of storage capacity? From the IoT perspective, I would recommend in this case to consider if your device is reliable IoT device to handle these edge cases as well. And also knowing the limits of the devices, and knowing how long it can be without the internet connection helps to mitigate these risks from your application, not SDK.
From the GitHub, default sendMessageAsync method throws timeout exception in case your message sending fails, unless you have some kind of retry policies implemented(according to the documentation C SDK does not allow custom retry policies
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-reliability-features-in-sdks).
According to the documentation in case of connection failure based on the retry policy(if you have set it), SDK will try to initiate connection this way or that way and queue the messages created in the meantime:
https://github.com/Azure/azure-iot-sdk-c/blob/master/doc/connection_and_messaging_reliability.md
So, an expectation here is that SDK does not take responsibility for the memory limits. This is up to the application to deal. Since your device has some limitations, I would recommend implementing your own queuing mechanism(maybe set no-retry as a policy and that way avoid queuing). That way you have under the control what will happen in the case that there is no internet connection and have under the control memory limitations. Maybe your business case accepts that you calculate an average value and instead of 50 you store 1 message over the time etc..
If this something you do not like, the documentation says also that you set the timeout for the queue - maybe not the memory limit but timeout yes, so maybe you can try to investigate this a bit deeper:
"There are two timeout controls in this system. An original one in the iothub_client_ll layer - which controls the "waiting to send" queue - and a modern one in the protocol transport layer - that applies to the "in progress" list. However, since IoTHubClient_LL_DoWork causes the Telemetry messages to be immediately* processed, sent and moved to the "in progress" list, the first timeout control is virtually non-applicable.
Both can be fine-tuned by users through IoTHubClient_LL_SetOption, and because of that removing the original control could cause a break for existing customers. For that reason it has been kept as is, but it will be re-designed when we move to the next major version of the product."

active mq delay recovery

I'm trying to use an ActiveMQ queue as a Apache Storm Spout.
I use the "INDIVIDUAL_ACK" strategy.
In my idea, I'm planning to trigger a session.recover() periodically, to resend messages that would not be acknowledged (error in the Bolt processing chain).
But if I do that, all the messages corresponding to Storm tuple, currently processed, would be resent. I would try to limit this phenomenon.
Ideally, I would like to parameter a delay, all the message sent younger that this delay should not be resent (this delay should also be in phase with the timeout of Storm tuple processing)
I've read about AMQ policies (http://activemq.apache.org/redelivery-policy.html) but I'm not sure that the redeliveryDelay param applies to my problem.
Any hint?
Franck
With INDIVIDUAL_ACKNOWLEDGE you don't use a session.recover(), you do a message.acknowledge(). Additionally, its best practice to only use JMS-style transactions for automatically recoverable errors (ie. host is down). For a contextual issue (ie.. bad data) that will never work, you should move the message to another queue.. i.e. some .DLQ or .ERR queue.

RabbitMQ consumer crash and consumer-count

If a consumer of a RabbitMQ crashes, with no graceful disconnection, will a subsequent declare-ok request fired several milliseconds later report a diminished consumer-count? Or is there an amount of time that needs to pass before the reported number will change?
declare-ok count all known consumers regardless their actual state.
See, in fact, some time after connection get dangled it still marked as alive (exact time depends on OS settings and whether do you use heartbeats and whether there are any network operation over that connection). In RabbitMQ management panel you may see connection and it channels with consumer tags listed some time after connection died.

How to prevent an I/O Completion Port from blocking when completion packets are available?

I have a server application that uses Microsoft's I/O Completion Port (IOCP) mechanism to manage asynchronous network socket communication. In general, this IOCP approach has performed very well in my environment. However, I have encountered an edge case scenario for which I am seeking guidance:
For the purposes of testing, my server application is streaming data (lets say ~400 KB/sec) over a gigabit LAN to a single client. All is well...until I disconnect the client's Ethernet cable from the LAN. Disconnecting the cable in this manner prevents the server from immediately detecting that the client has disappeared (i.e. the client's TCP network stack does not send notification of the connection's termination to the server)
Meanwhile, the server continues to make WSASend calls to the client...and being that these calls are asynchronous, they appear to "succeed" (i.e. the data is buffered by the OS in the outbound queue for the socket).
While this is all happening, I have 16 threads blocked on GetQueuedCompletionStatus, waiting to retrieve completion packets from the port as they become available. Prior to disconnecting the client's cable, there was a constant stream of completion packets. Now, everything (as expected) seems to have come to a halt...for about 32 seconds. After 32 seconds, IOCP springs back into action returning FALSE with a non-null lpOverlapped value. GetLastError returns 121 (The semaphore timeout period has expired.) I can only assume that error 121 is an artifact of WSASend finally timing out after the TCP stack determined the client was gone?
I'm fine with the network stack taking 32 seconds to figure out my client is gone. The problem is that while the system is making this determination, my IOCP is paralyzed. For example, WSAAccept events that post to the same IOCP are not handled by any of the 16 threads blocked on GetQueuedCompletionStatus until the failed completion packet (indicating error 121) is received.
My initial plan to work around this involved using WSAWaitForMultipleEvents immediately after calling WSASend. If the socket event wasn't signaled within (e.g. 3 seconds), then I terminate the socket connection and move on (in hopes of preventing the extensive blocking effect on my IOCP). Unfortunately, WSAWaitForMultipleEvents never seems to encounter a timeout (so maybe asynchronous sockets are signaled by virtue of being asynchronous? Or copying data to the TCP queue qualifies for a signal?)
I'm still trying to sort this all out, but was hoping someone had some insights as to how to prevent the IOCP hang.
Other details: My server application is running on Win7 with 8 cores; IOCP is configured to use at most 8 concurrent threads; my thread pool has 16 threads. Plenty of RAM, processor and bandwidth.
Thanks in advance for your suggestions and advice.
It's usual for the WSASend() completions to stall in this situation. You won't get them until the TCP stack times out its resend attempts and completes all of the outstanding sends in error. This doesn't block any other operations. I expect you are either testing incorrectly or have a bug in your code.
Note that your 'fix' is flawed. You could see this 'delayed send completion' situation at any point during a normal connection if the sender is sending faster than the consumer can consume. See this article on TCP flow control and async writes. A better plan is to use a counter for the amount of oustanding writes (per connection) that you want to allow and stop sending if that counter gets reached and then resume when it drops below a 'low water mark' threshold value.
Note that if you've pulled out the network cable into the machine how do you expect any other operations to complete? Reads will just sit there and only fail once a write has failed and AcceptEx will simply sit there and wait for the condition to rectify itself.

RabbitMQ memory usage creeping up and blocking calls... why?

I'm using RabbitMQ to handle app logs (windows server 2008 install). apps send messages to the exchange. I have a dedicated queue that gets messages forwarded to it. I then have a windows service connecting to that queue, pulling messages off, and persisting them to DB. I have a n-number of clients connecting to the exchange in real time to latch on the the stream so there are n-number of connections at a time. It is possible that some of these clients may not Close() their connections in code. Many clients have long running connections.
As messages are pulled off the queue, they are auto-ack'ed, so I don't have any unacknowledged messages on the queue. However, I'm seeing the memory of Rabbit grow over time. It starts at 32K or so when first turned on then creeps up until it exceeds the threshold and blocks incoming connections.
I have both .NET and Java clients--but both are auto-ack.
Reading the docs, I didn't see any description of how Rabbit is using memory--i.e. I don't understand why memory would be bloating over time. The messages are getting pulled off and ack'ed which seems to me would mean that Rabbit wouldn't be holding on to it any more and thus can free the associated memory, causing a stable mem usage profile.
I don't see how fiddling with the memory dial in Rabbit would help either--usage just creeps upwards over time: eventually I'll exceed it.
My guess is that there is something I'm doing wrong with my clients that is causing the memory to grow over time, but I can't think of why that would be.
why does Rabbit memory usage creep up when no messages are kept on any queues?
what coding practices could cause the RabbitMQ server to
retain (and grow) memory?
Is it possible that you have other queues bound to the exchange perhaps? Check the Rabbit admin page under exchanges, click on your exchange, and check for queues bound to it. It may be that one of your clients, when declaring the exchange, is inadvertently binding an unnamed (system random named) queue to the exchange, and messages are piling up in there.
The other thing to check is the QoS settings - if you leave QoS set at the default (infinite) then Rabbit will send out messages immediately to any client regardless of how many messages they are already holding. This results in a lot of book-keeping, like which client has which message on the server, and a large buffer on the client.
Make sure to set your QoS pre-fetch limit to something much more reasonable, like say 100. That way, if you have 1M messages and only 1 client with prefetch of 100, Rabbit will send only 100 to the client and keep the other 999900 on disk on the server, and not use nearly as much memory.
This was a big cause of memory bloat in my application, and now that I've addressed prefetch, everything is fine.