All,
I am trying to bind a queue with multiple binding keys. However, not all keys are known upfront. So, I do amqp_queue_bind with one known key and then amqp_basic_consume and later amqp_queue_bind again whren second key is known.
The second amqp_queue_bind gets stuck and on getting core of my process using SIGSEGV, I see following stack trace:
poll
recv_with_timeout
wait_frame_inner
amqp_simple_rpc
amqp_simple_rpc_decoded
amqp_queue_bind
Do I need to do amqp_basic_cancel and re-do amqp_basic_consume if I want to add a binding key later?
Here is the mistake that I had made and solution:
Here is the app flow:
There is a message processing thread:
amqp_exchange_declare
amqp_queue_declare
while (1) {
if (consume_done) {
amqp_maybe_release_buffers
amqp_consume_message
app specific processing.
}
}
This is main app thread:
On getting to know a binding key, which happens later:
amqp_queue_bind
if (!consume_done) {
amqp_basic_consume
consume_done
}
There are multiple binding keys learnt at a later stage and so the above routine gets called multiple times in main app thread context.
The first time biding is successful and I also get messages. But second time amqp_queue_bind call hangs.
Here, the problem is message processing thread is doing amqp_consume_message with infinite timeout and we should not expect amqp_queue_bind to go through from main thread in this condition.
The solution is to call amqp_queue_bind only when amqp_consume_message is not waiting for messages.
It now seems it is so silly of me to have made this mistake.
Related
I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.
So I am running into a race condition and I have a few solutions on how to fix the issue. I am new to threading so obviously, my opinion and research is limited. I have a large amount of asynchronization calls that can happen if a user receives certain messages from server. Thus, my design is poor due to the dependent nature of my objects.
Lets say I have a function called
adduser:(NSString s){
does some asynchronize activity
}
Messageuser:(NSString s)
{
Does some more asychronize activity
}
if a user were to recieve a message telling it to addUser "Ryan". he would than create a thread and proceed with looking up Ryan and storing him. However, if the user has the application in suspended mode, and in the buffered of messages waiting to be recieved there is a addUser request and a MessageUser request, a race condition occures because it takes longer to complete Adduser than it does to complete MessageUser. Thus, If messageUser is called , and (in our example) "Ryan" has not been fully added, it throws an error.
What would be a possible solution to this issue. I looked into locks and semaphores, and what I am trying to do is, when MessageUser recieves a call, check to make sure there is no thread currently proccessing addUser. If there is none, proceed. Else wait, than proceed after it has finished.
Well it depends on how the messages are being issued in the first place and what the async response events are.
If the operations have dependencies (ordering requirements) then perhaps a background serial queue would be appropriate? That is a simple way to ensure the messages are processed in order.
If the async operations take completion blocks, then you could have the completion block issue the request for the next operation to be performed, though you may not know about that ahead of time.
If you need to solve this in a more general way then you need some kind of system for tracking prerequisites so you can skip work items that don't have their prerequisites met yet. That probably means your own background thread that monitors a list of waiting tasks and receives notification of all task completions so it can scan for items waiting on that completion and issue them.
It seems really complicated though... I suspect you don't really have such strong async parallel processing requirements and a much simpler design would be just as effective. Given your situation where you are receiving messages from a server, I think a serial queue would be the best option. Then you can process messages in the order the server sent them and keep things simple.
//do this once at app startup
dispatch_queue_t queue = dispatch_queue_create("com.example.myapp", NULL);
//handle server responses
dispatch_async(queue, ^{
//handle server message here, one at a time
});
In reality, depending on how you connect to your server you might be able to just move the entire connection handling to the background queue and communicate with it via messages from the UI, and update the UI by dispatching to the dispatch_get_main_queue() which will be the UI thread.
I have a producer sending durable messages to a RabbitMQ exchange. If the RabbitMQ memory or disk exceeds the watermark threshold, RabbitMQ will block my producer. The documentation says that it stops reading from the socket, and also pauses heartbeats.
What I would like is a way to know in my producer code that I have been blocked. Currently, even with a heartbeat enabled, everything just pauses forever. I'd like to receive some sort of exception so that I know I've been blocked and I can warn the user and/or take some other action, but I can't find any way to do this. I am using both the Java and C# clients and would need this functionality in both. Any advice? Thanks.
Sorry to tell you but with RabbitMQ (at least with 2.8.6) this isn't possible :-(
had a similar problem, which centred around trying to establish a channel when the connection was blocked. The result was the same as what you're experiencing.
I did some investigation into the actual core of the RabbitMQ C# .Net Library and discovered the root cause of the problem is that it goes into an infinite blocking state.
You can see more details on the RabbitMQ mailing list here:
http://rabbitmq.1065348.n5.nabble.com/Net-Client-locks-trying-to-create-a-channel-on-a-blocked-connection-td21588.html
One suggestion (which we didn't implement) was to do the work inside of a thread and have some other component manage the timeout and kill the thread if it is exceeded. We just accepted the risk :-(
The Rabbitmq uses a blocking rpc call that listens for a reply indefinitely.
If you look the Java client api, what it does is:
AMQChannel.BlockingRpcContinuation k = new AMQChannel.SimpleBlockingRpcContinuation();
k.getReply(-1);
Now -1 passed in the argument blocks until a reply is received.
The good thing is you could pass in your timeout in order to make it return.
The bad thing is you will have to update the client jars.
If you are OK with doing that, you could pass in a timeout wherever a blocking call like above is made.
The code would look something like:
try {
return k.getReply(200);
} catch (TimeoutException e) {
throw new MyCustomRuntimeorTimeoutException("RabbitTimeout ex",e);
}
And in your code you could handle this exception and perform your logic in this event.
Some related classes that might require this fix would be:
com.rabbitmq.client.impl.AMQChannel
com.rabbitmq.client.impl.ChannelN
com.rabbitmq.client.impl.AMQConnection
FYI: I have tried this and it works.
How do I get a list of worker threads of nservicebus. I need to register workerThread ids in to db and then bind some type of messages to the exact workerthread. Real idea is handling poison messages. Want to block all the threads not to handle poison messages except specified ones. There will be a seperate service that will manage threads through database.
I would not try to do that. It is almost sure to run into problems.
Of course, in order to get some sort of "identity" for each thread, you could place something like this in your message handler:
[ThreadStatic]
private static readonly Guid ThreadId = Guid.NewGuid();
But again, I wouldn't do that! The guids would change every time the endpoint was restarted, for one.
You could also query the list of threads direct from .NET and try to determine which ones were the message handling threads, but that sounds so scary I don't even want to go into it.
The real issue: Poison Message Handling
As your comment states, the real problem is that a poison message is REALLY poison. Not only is it failing, but it's taking so long to do so that it's really screwing up all the other threads!
Since you are able to identify these messages based on certain properties of the message, I would detect and throw an exception before the operation that times out. All the time.
If you want to be able to test periodically to see if the issue has been fixed, you have a few options:
Test via other means, and return the messages to the source queue when it has been fixed.
Add an appSetting so that the quick-throw behavior is skipped when the config setting is enabled. Then periodically you can edit the config, restart the endpoint, see if it's fixed, and then switch it back if it isn't.
Create another message handler that maintains a thread-locked increment value of zero. Send it a control message to say "Hey, try one now." Then your quick-throw behavior can decrement that value and allow one through to see what happens. This is also dangerous of course. Make sure your locking is tight since you are now sharing this state between different message processing threads.
Handling multiple push notifications from Exchange Server in WCF WF service, I am getting the following exception and the WF aborts:
"Some context on the correlation handler was not consumed properly". No documentation on this error anywhere afai can see.
Full message:
System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: Some context on the correlation handler was not consumed properly. Make sure that the handler was initialized properly by the runtime and the workflow has a Send followed by Receive or ReceiveReply activity. (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:
System.InvalidOperationException: Some context on the correlation handler was not consumed properly. Make sure that the handler was initialized properly by the runtime and the workflow has a Send followed by Receive or ReceiveReply activity.
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Activities.Dispatcher.ControlOperationInvoker.ControlOperationAsyncResult.End(Object[]& outputs, IAsyncResult result)
at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeEnd(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage7(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)).
OK I figured this out. What was happening, to cut a long story short, was a race condition where a Receive with both query correlation and Request-Reply correlation was handling multiple messages with the same correlation ID by sitting in a do-while loop. The aim was to process the first one of many messages with the same id, discarding the rest. The first message caused a parallel process to start by raising a flag for a waiting Sequence (with the Send Reply a second later), while subsequent messages were discarded (with the Send Reply occuring immediately).
The problem was that the second message was coming in before the first SendReply had passed. This caused the correlation initialiser to overwrite the Request Reply correlation handler that the first one needed to use. It seems that this is what causes the above exception. I fixed this temporarily by making the SendReply happen immediately, though when I have time I will look at using multiple Correlation Handlers for the Receive Reply. (A dictionary of handlers based on some message id??)