I have a very basic demo application for testing the RabbitMQ blocking behaviour. I use RabbitMQ 3.10.6 with the .NET library RabbitMQ.Client 6.2.4 in .NET Framework 4.8.
The connection is created using ConnectionFactory.CreateConnection() and uses AutomaticRecoveryEnabled = true.
The application creates one channel and one queue for sending messages:
IModel sendChannel = Connection.CreateModel();
sendChannel.ConfirmSelect();
sendChannel.QueueDeclare("sendQueueName", true, false, false);
For receiving messages, again one channel and one queue are created:
IModel receiveChannel = Connection.CreateModel();
receiveChannel.ConfirmSelect();
receiveChannel.QueueDeclare("receiveQueueName", true, false, false);
var receiveQueueConsumer = new QueueConsumer(receiveChannel); // This is my own class which inherits from 'DefaultBasicConsumer' and passes 'receiveChannel' to its base in the constructor.
receiveChannel.BasicConsume("receiveQueueName", false, receiveQueueConsumer);
Now I fill my disk until the configured threshold in the RabbitMQ config file is reached.
As expected, the ConnectionBlocked event is fired. The connection now is in state "blocking".
Now I queue a message. AMQP properties are added to the message using channel.CreateBasicProperties() with Persistent = true. It is then queued:
sendChannel.BasicPublish("", "sendQueueName", amqpProperties, someBytes);
sendChannel.WaitForConfirms(TimeSpan.FromSeconds(5)); // Returns as expected after 5 seconds with return value 'true'.
The connection now is in state "blocked".
Now I shut down my demo application and I have to realize that disposing does not work as expected.
sendChannel.Close(); // Blocks for 10 seconds.
if (receiveChannel.IsOpen) receiveChannel.BasicCancel(ConsumerTags.First()); // In 'receiveQueueConsumer'. Throws a 'TimeoutException'.
Connection.Close(); // Freezes for at least a minute.
The behaviour is the same when calling Dispose() or Abort() instead of Close(). When I finally force kill the application (or when I set a timeout for Abort()) then the application closes but the underlying connection and channels are not removed. The connection still is in state "blocked".
At least once there is enough space on the disk again, the blocked connections and their channels are automatically removed by the broker. Without the need to restart it.
Here and here it sounds like the broker just won't react when it is "blocked".
There can be a broad number of reasons for a timeout, from a genuine connection interruption to a resource alarm in effect that prevents target node from reading any data coming from clients unless the alarm clears.
Nodes will temporarily block publishing connections by suspending reading from client connection.
Which would mean I can't free my resources unless I restart the broker or I make sure that the broker has plenty of resources to turn off the resource alarm. Is there an official confirmation for this? Or how do I need to adjust the dispose mechanism to make it work when the broker is blocked?
Related
I have a very basic demo application for testing the RabbitMQ blocking behaviour. I use RabbitMQ 3.10.6 with the .NET library RabbitMQ.Client 6.2.4 in .NET Framework 4.8.
The disk is filled until the configured threshold in the RabbitMQ config file is exceeded. The connection state is "blocking".
I queue a message this way:
AMQP properties are added to the message using channel.CreateBasicProperties() with Persistent = true. It is then queued:
sendChannel.BasicPublish("", "sendQueueName", amqpProperties, someBytes);
sendChannel.WaitForConfirmsOrDie(TimeSpan.FromSeconds(5));
WaitForConfirmsOrDie() closes the underlying channel when the broker is blocking or blocked. Because this is the case the channel is closed and I need to create a new one if I want to queue messages again.
The connection state is "blocked".
First example: I catch the TimeoutException that is thrown, remove the resource alarm by providing enough disk space and create a new channel in the catch block. This works.
Second example: I catch the TimeoutException that is thrown but do nothing in the catch block. I remove the resource alarm by providing enough disk space and wait for the ConnectionUnblocked event to be fired. In here I create a new channel. But here it doesn't work. I get a TimeoutException.
Why can't I create any more channels outside the catch block once the connection was blocked?
The connection is created using ConnectionFactory.CreateConnection() and uses AutomaticRecoveryEnabled = true (although this doesn't seem to make any difference).
A channel is created using Connection.CreateModel().
Setting up a CMS consumer with a listener involves two separate calls: first, acquiring a consumer:
cms::MessageConsumer* cms::Session::createConsumer( const cms::Destination* );
and then, setting a listener on the consumer:
void cms::MessageConsumer::setMessageListener( cms::MessageListener* );
Could messages be lost if the implementation subscribes to the destination (and receives messages from the broker/router) before the listener is activated? Or are such messages queued internally and delivered to the listener upon activation?
Why isn't there an API call to create the consumer with a listener as a construction argument? (Is it because the JMS spec doesn't have it?)
(Addendum: this is probably a flaw in the API itself. A more logical order would be to instantiate a consumer from a session, and have a cms::Consumer::subscribe( cms::Destination*, cms::MessageListener* ) method in the API.)
I don't think the API is flawed necessarily. Obviously it could have been designed a different way, but I believe the solution to your alleged problem comes from the start method on the Connection object (inherited via Startable). The documentation for Connection states:
A CMS client typically creates a connection, one or more sessions, and a number of message producers and consumers. When a connection is created, it is in stopped mode. That means that no messages are being delivered.
It is typical to leave the connection in stopped mode until setup is complete (that is, until all message consumers have been created). At that point, the client calls the connection's start method, and messages begin arriving at the connection's consumers. This setup convention minimizes any client confusion that may result from asynchronous message delivery while the client is still in the process of setting itself up.
A connection can be started immediately, and the setup can be done afterwards. Clients that do this must be prepared to handle asynchronous message delivery while they are still in the process of setting up.
This is the same pattern that JMS follows.
In any case I don't think there's any risk of message loss regardless of when you invoke start(). If the consumer is using an auto-acknowledge mode then messages should only be automatically acknowledged once they are delivered synchronously via one of the receive methods or asynchronously through the listener's onMessage. To do otherwise would be a bug in my estimation. I've worked with JMS for the last 10 years on various implementations and I've never seen any kind of condition where messages were lost related to this.
If you want to add consumers after you've already invoked start() you could certainly call stop() first, but I don't see any problem with simply adding them on the fly.
I am fairly new to RabbitMQ, and starting on a project that is using RabbitMQ in a fairly old-fashioned "RPC" pattern. So I'm trying something like this on the "server" side:
ConnectionFactory factory = new ConnectionFactory();
factory.setUri(uri);
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.queueDeclare(queueName, false, false, false, null);
while (!shutdown) {
GetResponse gr = channel.basicGet(queueName, true);
... build reply ...
channel.basicPublish("", gr.getProps().getReplyTo(), replyProps, response);
}
My question is: can the thread waiting on basicGet() be interrupted? If so, what happens (InterruptedException is not declared). It realize this is not a great pattern, but I just want some way to cleanly shutdown a service.
UPDATE: one comment indicates that basicGet() does not block at all, and returns immediately if the queue is empty. If that is the case, let me revise my question to be more precise: How do I wait for a message on a queue and retrieve it, with a timeout?
UPDATE2: After experimenting and asking questions on the rabbitmq mailing list, I conclude that this cannot be done directly. It is simply not The Way That You Do Things in RabbitMQ. Instead, you launch a consumer thread pool using Channel.basicConsume() and wait for your handler method to be called. It can be done indirectly by having your consumer post to a SynchronizedQueue or something similar and having your foreground thread(s) wait on that, but be warned that this defeats the automatic scaling offered by basicConsume() and also makes it harder to properly ACK all requests, and also creates additional message buffering that makes it difficult to honor QOS semantics set by the basicQos() call.
It should also be noted that, once you go down the basicConsume() route, the consumer can be interrupted. This is done something like:
// This starts a background thread pool
String consumerTag = channel.basicConsume(consumer);
...
// Shutdown the consumer thread pool
channel.basicCancel(consumerTag);
UPDATE3: See last answer. RabbitMQ comes with an RpcClient class that works splendidly.
basicGet doesn't block, it returns immediately (well, just after a network roundtrip) and returns null if there's no messages on the queue. So it's not necessary to interrupt the thread.
RabbitMQ java API comes with a very nice implementation of an RPCClient called, (unsurprisingly) RPCClient. Use it! I made an example in github
I want to ensure that certain kind of messages couldn't be lost, hence I should use Confirms (aka Publisher Acknowledgements).
The broker loses persistent messages if it crashes before said
messages are written to disk. Under certain conditions, this causes
the broker to behave in surprising ways.
For instance, consider this scenario:
a client publishes a persistent message to a durable queue
a client consumes the message from the queue (noting that the message is persistent and the queue durable), but doesn't yet ack it,
the broker dies and is restarted, and
the client reconnects and starts consuming messages.
At this point, the client could reasonably assume that the message
will be delivered again. This is not the case: the restart has caused
the broker to lose the message. In order to guarantee persistence, a
client should use confirms.
But what if, using confirms, the Publisher goes down before receive the ack and the message wasn't delivery to the queue for some reason (i.e. network failure).
Suppose we have a simple REST endpoint where we can POST new COMMENTS and, when a new COMMENT is created we want to publish a message in a queue. (Note: it doesn't matter if I send a message of a new COMMENT that at the end isn't created due to a rollback for example).
CommentEndpoint {
Channel channel;
post(String comment) {
channel.publish("comments-queue",comment) // is a persistent queue
Comment aNewComment = new Comment(comment)
repository.save(comment)
// what happens if the server where this publisher is running terminates here ?
channel.waitConfirmations()
}
}
When the server restarts the channel is gone and the message could never be delivered.
One solution that comes to my mind is that after a restart, query the recent comments (¿something like the comments created between the last 3 min before the crash?) in the repository and send one message for each one and await confirmations.
What you are worried about is really no longer RabbitMQ only issue, it is a distributed transaction issue. This discussion gives one reasonable lightweight solution. And there are more strict solutions, for instance, two-phase commit, three-phase commit, etc, to ensure data consistent when it is really necessary.
The below text is an effort to expand and add color to this question:
How do I prevent a misbehaving client from taking down the entire service?
I have essentially this scenario: a WCF service is up and running with a client callback having a straight forward, simple oneway communication, not very different from this one:
public interface IMyClientContract
{
[OperationContract(IsOneWay = true)]
void SomethingChanged(simpleObject myObj);
}
I'm calling this method potentially thousands of times a second from the service to what will eventually be about 50 concurrently connected clients, with as low latency as possible (<15 ms would be nice). This works fine until I set a break point on one of the client apps connected to the server and then everything hangs after maybe 2-5 seconds the service hangs and none of the other clients receive any data for about 30 seconds or so until the service registers a connection fault event and disconnects the offending client. After this all the other clients continue on their merry way receiving messages.
I've done research on serviceThrottling, concurrency tweaking, setting threadpool minimum threads, WCF secret sauces and the whole 9 yards, but at the end of the day this article MSDN - WCF essentials, One-Way Calls, Callbacks and Events describes exactly the issue I'm having without really making a recommendation.
The third solution that allows the service to safely call back to the client is to have the callback contract operations configured as one-way operations. Doing so enables the service to call back even when concurrency is set to single-threaded, because there will not be any reply message to contend for the lock.
but earlier in the article it describes the issue I'm seeing, only from a client perspective
When one-way calls reach the service, they may not be dispatched all at once and may be queued up on the service side to be dispatched one at a time, all according to the service configured concurrency mode behavior and session mode. How many messages (whether one-way or request-reply) the service is willing to queue up is a product of the configured channel and the reliability mode. If the number of queued messages has exceeded the queue's capacity, then the client will block, even when issuing a one-way call
I can only assume that the reverse is true, the number of queued messages to the client has exceeded the queue capacity and the threadpool is now filled with threads attempting to call this client that are now all blocked.
What is the right way to handle this? Should I research a way to check how many messages are queued at the service communication layer per client and abort their connections after a certain limit is reached?
It almost seems that if the WCF service itself is blocking on a queue filling up then all the async / oneway / fire-and-forget strategies I could ever implement inside the service will still get blocked whenever one client's queue gets full.
Don't know much about the client callbacks, but it sounds similar to generic wcf code blocking issues. I often solve these problems by spawning a BackgroundWorker, and performing the client call in the thread. During that time, the main thread counts how long the child thread is taking. If the child has not finished in a few milliseconds, the main thread just moves on and abandons the thread (it eventually dies by itself, so no memory leak). This is basically what Mr.Graves suggests with the phrase "fire-and-forget".
Update:
I implemented a Fire-and-forget setup to call the client's callback channel and the server no longer blocks once the buffer fills to the client
MyEvent is an event with a delegate that matches one of the methods defined in the WCF client contract, when they connect I'm essentially adding the callback to the event
MyEvent += OperationContext.Current.GetCallbackChannel<IFancyClientContract>().SomethingChanged
etc... and then to send this data to all clients, I'm doing the following
//serialize using protobuff
using (var ms = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(ms, new SpecialDataTransferObject(inputData));
byte[] data = ms.GetBuffer();
Parallel.ForEach(MyEvent.GetInvocationList(), p => ThreadUtil.FireAndForget(p, data));
}
in the ThreadUtil class I made essentially the following change to the code defined in the fire-and-foget article
static void InvokeWrappedDelegate(Delegate d, object[] args)
{
try
{
d.DynamicInvoke(args);
}
catch (Exception ex)
{
//THIS will eventually throw once the client's WCF callback channel has filled up and timed out, and it will throw once for every single time you ever tried sending them a payload, so do some smarter logging here!!
Console.WriteLine("Error calling client, attempting to disconnect.");
try
{
MyService.SingletonServiceController.TerminateClientChannelByHashcode(d.Target.GetHashCode());//this is an IContextChannel object, kept in a dictionary of active connections, cross referenced by hashcode just for this exact occasion
}
catch (Exception ex2)
{
Console.WriteLine("Attempt to disconnect client failed: " + ex2.ToString());
}
}
}
I don't have any good ideas how to go and kill all the pending packets the server is still waiting to see if they'll get delivered on. Once I get the first exception I should in theory be able to go and terminate all the other requests in some queue somewhere, but this setup is functional and meets the objectives.