NServiceBus: Cannot enlist the transaction (failed to send msg to audit queue) - nservicebus

I have an endpoint that works as a distributor with three other worker endpoints.
The handling endpoint of the received message opens a transaction and tries to import some xml data into an sql db. If some exception is thrown during this process, the exception is caught, the transaction is rolled back and the xml data is written to an error folder.
Simplified, it looks like that:
public void Handle(doSomethingCmd message)
{
System.Data.SqlClient.BeginTransaction();
try
{
//... some xml data import
throw new Exception();
//Commit if succeded
}
catch (Exception exception)
{
System.Data.IDbTransaction.Rollback();
//...Write file to error folder
}
}
In the first place, no retry happens after the transaction rollback. But when the message is sent again, all the worker endpoints (only the workers) get an exception (Cannot enlist transaction, failed to send msg to control queue --> see stacktrace below) and nservicebus does retry the message (this leads to the case, that the file appears several times in the error folder)
It looks like distributed transaction is in an invalid state. I could just handing over the exception (re-throw the exception), so nservicebus handles the rollback for me, but in that case the file is written to the error folder several times as well (due to retry mechanism)
Failed raising finished message processing event.|NServiceBus.Unicast.Queuing.FailedToSendMessageException: Failed to send message to address: someEndpoint.distributor.control#SRVPS01 ---> System.Messaging.MessageQueueException: Cannot enlist the transaction.
at System.Messaging.MessageQueue.SendInternal(Object obj, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageQueue.Send(Object obj, MessageQueueTransactionType transactionType)
at NServiceBus.Transports.Msmq.MsmqMessageSender.Send(TransportMessage message, Address address) in c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqMessageSender.cs:line 49
--- End of inner exception stack trace ---
at NServiceBus.Transports.Msmq.MsmqMessageSender.ThrowFailedToSendException(Address address, Exception ex) in c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqMessageSender.cs:line 88
at NServiceBus.Transports.Msmq.MsmqMessageSender.Send(TransportMessage message, Address address) in c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqMessageSender.cs:line 75
at NServiceBus.Distributor.MSMQ.ReadyMessages.ReadyMessageSender.SendReadyMessage(String sessionId, Int32 capacityAvailable, Boolean isStarting) in c:\BuildAgent\work\c3100604bbd3ca20\src\NServiceBus.Distributor.MSMQ\ReadyMessages\ReadyMessageSender.cs:line 62
at NServiceBus.Distributor.MSMQ.ReadyMessages.ReadyMessageSender.TransportOnFinishedMessageProcessing(Object sender, FinishedMessageProcessingEventArgs e) in c:\BuildAgent\work\c3100604bbd3ca20\src\NServiceBus.Distributor.MSMQ\ReadyMessages\ReadyMessageSender.cs:line 50
at System.EventHandler1.Invoke(Object sender, TEventArgs e)
at NServiceBus.Unicast.Transport.TransportReceiver.OnFinishedMessageProcessing(TransportMessage msg) in c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Unicast\Transport\TransportReceiver.cs:line 435
NServicebus version: 4.6.0.0
Queueing: MSMQ

The worker ends it unit of work by sending a message back to the distributor. This send will join the existing distributed transaction. The error you get is caused by this new transactional resource trying to join an already failing transaction. Something has marked the distributed transaction as rolling back.
This is normally caused by your code. Either your database operation is failing somehow or you probably have exceeded the transaction timeout limit handling the message. (Default one minute)
Check your logs to see if you are using above the transaction timeout limit to process the message on the worker.

Related

Nservice bus queue not found exception

I have a connector( which receives messages from different connectors. While receiving the message my connector gives the following message:
[Worker.11] WARN NServiceBus.Unicast.Transport.Transact
ional.TransactionalTransport [(null)] <(null)> - Failed raising 'transport messa
ge received' event for message with ID=fd970068-55ad-49c0-8abc-4133b7f7fe12\2138
47
NServiceBus.Unicast.Queuing.QueueNotFoundException ---> System.Messaging.Message
QueueException: Cannot enlist the transaction.
at System.Messaging.MessageQueue.SendInternal(Object obj, MessageQueueTransac
tion internalTransaction, MessageQueueTransactionType transactionType)
at NServiceBus.Unicast.Queuing.Msmq.MsmqMessageSender.NServiceBus.Unicast.Que
uing.ISendMessages.Send(TransportMessage message, Address address)
--- End of inner exception stack trace ---
at NServiceBus.Unicast.UnicastBus.HandleTransportMessage(IBuilder childBuilde
r, TransportMessage msg)
at NServiceBus.Unicast.UnicastBus.TransportMessageReceived(Object sender, Tra
nsportMessageReceivedEventArgs e)
at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.OnTrans
portMessageReceived(TransportMessage msg)
The weird thing is when i was doing the internal testing it was working all fine as soon as this has been now hosted on to amazon server it has started blowing up.
The following is how i have configured my nservice bus with
NServiceBus.Configure.With(
AllAssemblies.Except("libBL.dll").And("libCommon.dll").And("libExtra.dll")).StructureMapBuilder()
.JsonSerializer()
.UnicastBus()
.DoNotAutoSubscribe()
.TransactionTimeout(TimeSpan.FromMinutes(5));
Any help would be much appreciated. If any more information is needed please let me know
Thanks
On searching through the code found out that if a message coming in was failing then i was not rolling back the transaction in which the whole operation began and hence the nservicebus was not able to query the database as it was locked. After putting in the logic for transaction roll back got my connector to work.

What is causing EventStore to throw ConcurrencyException so easily?

Using JOliver EventStore 3.0, and just getting started with simple samples.
I have a simple pub/sub CQRS implementation using NServiceBus. A client sends commands on the bus, a domain server recieves and processes the commands and stores events to the eventstore, which are then published on the bus by the eventstore's dispatcher. a read-model server then subscribes to those events to update the read-model. Nothing fancy, pretty much by-the-book.
It is working, but just in simple tests I am getting lots of concurrency exceptions (intermittantly) on the domain server when the event is stored to the EventStore. It properly retries, but sometimes it hits the 5 retry limit and the command ends up on the error queue.
Where could I start investigating to see what is causing the concurrency exception? I remove the dispatcher and just focus on storing events and it has the same issue.
I'm using RavenDB for persistence of my EventStore. I'm not doing anything fancy, just this:
using (var stream = eventStore.OpenStream(entityId, 0, int.MaxValue))
{
stream.Add(new EventMessage { Body = myEvent });
stream.CommitChanges(Guid.NewGuid());
}
The stack trace for the exception looks like this:
2012-03-17 18:34:01,166 [Worker.14] WARN
NServiceBus.Unicast.UnicastBus [(null)] <(null)> -
EmployeeCommandHandler failed handling message.
EventStore.ConcurrencyException: Exception of type
'EventStore.ConcurrencyException' was thrown. at
EventStore.OptimisticPipelineHook.PreCommit(Commit attempt) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticPipelineHook.cs:line
55 at EventStore.OptimisticEventStore.Commit(Commit attempt) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticEventStore.cs:line
90 at EventStore.OptimisticEventStream.PersistChanges(Guid
commitId) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticEventStream.cs:line
168 at EventStore.OptimisticEventStream.CommitChanges(Guid
commitId) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticEventStream.cs:line
149 at CQRSTest3.Domain.Extensions.StoreEvent(IStoreEvents
eventStore, Guid entityId, Object evt) in
C:\dev\test\CQRSTest3\CQRSTest3.Domain\Extensions.cs:line 13 at
CQRSTest3.Domain.ComandHandlers.EmployeeCommandHandler.Handle(ChangeEmployeeSalary
message) in
C:\dev\test\CQRSTest3\CQRSTest3.Domain\ComandHandlers\Emplo
yeeCommandHandler.cs:line 55
I figured it out. Had to dig through source code to find it though. I wish this was better documented! Here's my new eventstore wireup:
EventStore = Wireup.Init()
.UsingRavenPersistence("RavenDB")
.ConsistentQueries()
.InitializeStorageEngine()
.Build();
I had to add .ConsistentQueries() in order for the raven persistence provider to internally use WaitForNonStaleResults on the queries eventstore was making to raven.
Basically when I add a new event, and then try to add another before raven has caught up with indexing, the stream revision was not up to date. The second event would step on the first one.

Suppressing NServicebus Transaction to write errors to database

I'm using NServiceBus to handle some calculation messages. I have a new requirement to handle calculation errors by writing them the same database. I'm using NHibernate as my DAL which auto enlists to the NServiceBus transaction and provides rollback in case of exceptions, which is working really well. However if I write this particular error to the database, it is also rolled back which is a problem.
I knew this would be a problem, but I thought I could just wrap the call in a new transaction with the TransactionScopeOption = Suppress. However the error data is still rolled back. I believe that's because it was using the existing session with has already enlisted in the NServiceBus transaction.
Next I tried opening a new session from the existing SessionFactory within the suppression transaction scope. However the first call to the database to retrieve or save data using this new session blocks and then times out.
InnerException: System.Data.SqlClient.SqlException
Message=Timeout expired. The timeout period elapsed prior to completion of the >operation or the server is not responding.
Finally I tried creating a new SessionFactory using it to open a new session within the suppression transaction scope. However again it blocks and times out.
I feel like I'm missing something obvious here, and would greatly appreciate any suggestions on this probably common task.
As Adam suggests in the comments, in most cases it is preferred to let the entire message fail processing, giving the built-in Retry mechanism a chance to get it right, and eventually going to the error queue. Then another process can monitor the error queue and do any required notification, including logging to a database.
However, there are some use cases where the entire message is not a failure, i.e. on the whole, it "succeeds" (whatever the business-dependent definition of that is) but there is some small part that is in error. For example, a financial calculation in which the processing "succeeds" but some human element of the data is "in error". In this case I would suggest catching that exception and sending a new message which, when processed by another endpoint, will log the information to your database.
I could see another case where you want the entire message to fail, but you want the fact that it was attempted noted somehow. This may be closest to what you are describing. In this case, create a new TransactionScope with TransactionScopeOption = Suppress, and then (again) send a new message inside that scope. That message will be sent whether or not your full message transaction rolls back.
You are correct that your transaction is rolling back because the NHibernate session is opened while the transaction is in force. Trying to open a new session inside the suppressed transaction can cause a problem with locking. That's why, most of the time, sending a new message asynchronously is part of the solution in these cases, but how you do it is dependent upon your specific business requirements.
I know I'm late to the party, but as an alternative suggestion, you coudl simply raise another separate log message, which NSB handles independently, for example:
public void Handle(DebitAccountMessage message)
{
var account = this.dbcontext.GetById(message.Id);
if (account.Balance <= 0)
{
// log request - new handler
this.Bus.Send(new DebitAccountLogMessage
{
originalMessage = message,
account = account,
timeStamp = DateTime.UtcNow
});
// throw error - NSB will handle
throw new DebitException("Not enough funds");
}
}
public void Handle(DebitAccountLogMessage message)
{
var messageString = message.originalMessage.Dump();
var accountString = message.account.Dump(DumpOptions.SuppressSecurityTokens);
this.Logger.Log(message.UniqueId, string.Format("{0}, {1}", messageString, accountString);
}

NServicebus : {WARN} failed raising transport message with ID =

Here's the scenario.
I am on NServiceBus 3.0.0.1504 (core dll version). I get above warning message when there's an exception on one of the handlers. Queues are setup as DTC. The interesting thing is, though I get this warning message in QA - it successfully retries N number of times and places the message in the error queue. However, in Production, (in few cases - I haven't seen this happening consistently), it just tries processing the message once and it stops. It didn't put the message in the error queue or neither it retired N Number of times nor it put the message back in the queue.
I don't really see the difference between QA and Prod environment, its the same code base, and we give permission to queues using build scripts.
Here's the stack trace
NServiceBus.Unicast.Transport.Transactional.TransactionalTransport - Failed raising 'transport message received' event for message with ID=1fb282b5-7a9e-41ea-834a-5f6767273324\195311762 NServiceBus.Unicast.Transport.TransportMessageHandlingFailedException: Exception of type 'NServiceBus.Unicast.Transport.TransportMessageHandlingFailedException' was thrown.
at NServiceBus.Unicast.UnicastBus.DispatchMessageToHandlersBasedOnType(IBuilder builder, IMessage toHandle, Type messageType) in d:\BuildAgent-01\work\NServiceBus.Trunk\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 899
at NServiceBus.Unicast.UnicastBus.HandleMessage(IBuilder builder, TransportMessage m) in d:\BuildAgent-01\work\NServiceBus.Trunk\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 827
at NServiceBus.Unicast.UnicastBus.HandleTransportMessage(IBuilder builder, TransportMessage msg) in d:\BuildAgent-01\work\NServiceBus.Trunk\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 1026
at NServiceBus.Unicast.UnicastBus.TransportMessageReceived(Object sender, TransportMessageReceivedEventArgs e) in d:\BuildAgent-01\work\NServiceBus.Trunk\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 975
at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.OnTransportMessageReceived(TransportMessage msg) in d:\BuildAgent-01\work\NServiceBus.Trunk\src\impl\unicast\transport\NServiceBus.Unicast.Transport.Transactional\TransactionalTransport.cs:line 409
I tried dropping the same message in QA to test the scenario, however, QA seems to work (with the warning exception log).
Questions
1) Why would this exception happen? Has anyone seen in their system? Why is it logged only as a WARN? I looked at the code and it seems to come from catch block of Unicast.cs where it is handling "Exception".
2) Any suggestions to resolve / dig into the issue?

The destination queue '<QueueName>#<servername>' could not be found

While testing the pub/sub model, I changed the name of the subscriber queue, while the subscription for the old queue still exists in the DB, so there is a dangling subscription in the DB.
So when publisher and subscriber started and I tried to send message from publisher, following exception happened and basically publisher stopped and no longer send any more message
2011-02-09 09:56:21,115 [6] ERROR Publisher.ServerEndpoint [(null)] <(null)> - Problem occurred when starting the endpoint.
System.Configuration.ConfigurationErrorsException: The destination queue 'StoreInputQueue#' could not be found. You may have misconfigured the destination for this kind of message (Message.EventMessage) in the MessageEndpointMappings of the UnicastBusConfig section in your configuration file.It may also be the case that the given queue just hasn't been created yet, or has been deleted. ---> System.Messaging.MessageQueueException: The queue does not exist or you do not have sufficient permissions to perform the operation.
at System.Messaging.MessageQueue.MQCacheableInfo.get_WriteHandle()
at System.Messaging.MessageQueue.StaleSafeSendMessage(MQPROPS properties, IntPtr transaction)
at System.Messaging.MessageQueue.SendInternal(Object obj, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageQueue.Send(Object obj, MessageQueueTransactionType transactionType)
at NServiceBus.Unicast.Transport.Msmq.MsmqTransport.Send(TransportMessage m, String destination) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\impl\unicast\NServiceBus.Unicast.Msmq\MsmqTransport.cs:line 334
--- End of inner exception stack trace ---
at NServiceBus.Unicast.Transport.Msmq.MsmqTransport.Send(TransportMessage m, String destination) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\impl\unicast\NServiceBus.Unicast.Msmq\MsmqTransport.cs:line 346
at NServiceBus.Unicast.UnicastBus.SendMessage(IEnumerable`1 destinations, String correlationId, MessageIntentEnum messageIntent, IMessage[] messages) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 593
at NServiceBus.Unicast.UnicastBus.Publish[T](T[] messages) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 343
at Publisher.ServerEndpoint.Run() in C:\Downloads\ESB\NServiceBus\publisher\publisher\ServerEndpoint.cs:line 26
at NServiceBus.Host.Internal.ConfigManager.<>c_DisplayClass1.b_0() in d:\BuildAgent-02\work\20b5f701adefe8f8\src\host\NServiceBus.Host\Internal\ConfigurationManager.cs:line 56
Is there a timeout period after which it will try to send message to rest of subscribers, I waited quite long...
I don't think it will retry.
Pulling the rug (queue) out from under a running endpoint is not a good thing to do. In production this really should never happen.
Since you're just testing, delete the offending subscription row from the database, and restart the endpoint, and everything should be fine.