MassTransit JobConsumer - rabbitmq

We recently discovered that a couple of our long-running consumers were starting to bomb out after upgrading our rabbitmq version. This was due to a 30min ack timeout that rabbitmq added in newer versions. Looking through some examples and recommendations online, we noticed that MassTransit has JobConsumers for longer running consumers. We've refactored our consumers to use this new JobConsumer with an in-memory repository for the different Job sagas (JobType, JobAttempt, and JobState) for the time-being. The in-memory repository is temporary until we can set aside some time to move this to a backing state machine database.
This works for the most part on about 3/4 jobs that we've moved over. There is one job, however, that we are still running into issues with. What we've noticed is that every time we do a deployment, this job will run fine the first time, but then doesn't run again unless we do another deployment.
What we've found in the logs are these messages that pop up after the first run -
"The AttemptCompleted event is not handled during the Faulted state for the JobStateMachine state machine"
"MassTransit.JobService.Components.StateMachines.JobSaga(cecd0c80-e9ca-4c94-98d1-e496ce0afc9f) Saga exception on receipt of MassTransit.Contracts.JobService.JobAttemptCompleted: Not accepted in state Faulted | The AttemptCompleted event is not handled during the Faulted state for the JobStateMachine state machine"
Our configuration below. Thanks in advance.
IRabbitMqBusFactoryConfigurator RegisterAdditionalRabbitMQSettings(IRabbitMqBusFactoryConfigurator rabbitMqBusFactoryConfigurator,
IBusRegistrationContext busRegistrationContext)
{
rabbitMqBusFactoryConfigurator.UseDelayedMessageScheduler();
var options = new ServiceInstanceOptions()
.EnableJobServiceEndpoints();
rabbitMqBusFactoryConfigurator.ServiceInstance(options, instance =>
{
instance.ConfigureJobServiceEndpoints(serviceConfigurator =>
{
serviceConfigurator.FinalizeCompleted = true;
serviceConfigurator.ConfigureSagaRepositories(busRegistrationContext);
});
instance.ConfigureEndpoints(busRegistrationContext);
});
return rabbitMqBusFactoryConfigurator;
}
IServiceCollectionBusConfigurator RegisterAdditionalSagaStorageSettings(
IServiceCollectionBusConfigurator serviceCollectionBusConfigurator)
{
serviceCollectionBusConfigurator.AddDelayedMessageScheduler();
serviceCollectionBusConfigurator.AddSagaRepository<JobSaga>().InMemoryRepository();
serviceCollectionBusConfigurator.AddSagaRepository<JobTypeSaga>().InMemoryRepository();
serviceCollectionBusConfigurator.AddSagaRepository<JobAttemptSaga>().InMemoryRepository();
return serviceCollectionBusConfigurator;
}

Related

Free resources in blocked RabbitMQ connection

I have a very basic demo application for testing the RabbitMQ blocking behaviour. I use RabbitMQ 3.10.6 with the .NET library RabbitMQ.Client 6.2.4 in .NET Framework 4.8.
The connection is created using ConnectionFactory.CreateConnection() and uses AutomaticRecoveryEnabled = true.
The application creates one channel and one queue for sending messages:
IModel sendChannel = Connection.CreateModel();
sendChannel.ConfirmSelect();
sendChannel.QueueDeclare("sendQueueName", true, false, false);
For receiving messages, again one channel and one queue are created:
IModel receiveChannel = Connection.CreateModel();
receiveChannel.ConfirmSelect();
receiveChannel.QueueDeclare("receiveQueueName", true, false, false);
var receiveQueueConsumer = new QueueConsumer(receiveChannel); // This is my own class which inherits from 'DefaultBasicConsumer' and passes 'receiveChannel' to its base in the constructor.
receiveChannel.BasicConsume("receiveQueueName", false, receiveQueueConsumer);
Now I fill my disk until the configured threshold in the RabbitMQ config file is reached.
As expected, the ConnectionBlocked event is fired. The connection now is in state "blocking".
Now I queue a message. AMQP properties are added to the message using channel.CreateBasicProperties() with Persistent = true. It is then queued:
sendChannel.BasicPublish("", "sendQueueName", amqpProperties, someBytes);
sendChannel.WaitForConfirms(TimeSpan.FromSeconds(5)); // Returns as expected after 5 seconds with return value 'true'.
The connection now is in state "blocked".
Now I shut down my demo application and I have to realize that disposing does not work as expected.
sendChannel.Close(); // Blocks for 10 seconds.
if (receiveChannel.IsOpen) receiveChannel.BasicCancel(ConsumerTags.First()); // In 'receiveQueueConsumer'. Throws a 'TimeoutException'.
Connection.Close(); // Freezes for at least a minute.
The behaviour is the same when calling Dispose() or Abort() instead of Close(). When I finally force kill the application (or when I set a timeout for Abort()) then the application closes but the underlying connection and channels are not removed. The connection still is in state "blocked".
At least once there is enough space on the disk again, the blocked connections and their channels are automatically removed by the broker. Without the need to restart it.
Here and here it sounds like the broker just won't react when it is "blocked".
There can be a broad number of reasons for a timeout, from a genuine connection interruption to a resource alarm in effect that prevents target node from reading any data coming from clients unless the alarm clears.
Nodes will temporarily block publishing connections by suspending reading from client connection.
Which would mean I can't free my resources unless I restart the broker or I make sure that the broker has plenty of resources to turn off the resource alarm. Is there an official confirmation for this? Or how do I need to adjust the dispose mechanism to make it work when the broker is blocked?

WCF service hosted on Azure App service never seems to finish threads opened for processing

I have deployed a WCF service to Azure App Service that performs just one task - send a message to the topic. Although app works fine with normal load, it starts experiencing higher thread count as soon as load on the app increases.
The app instance becomes unhealthy when the threads count limit is reached.
Those threads stay in waiting state forever. We tried scaleout option on thread count metrics but the app just keeps on adding more instances as the earlier instance still had almost all threads waiting and remain unhealthy forever.
This is performed in the below sequence.
Accept a request.
initialize a Service bus topic client
Send the requested message to the topic.
Closed the topic client.
While sending a burst of 1000 requests, the app works but the number of threads initiated always stays in the waiting state. However, while these threads are waiting CPU stays at 0%. The average response time from this service is also under 100 ms avg.
After sending 1000 requests to this service, I see a similar number of threads open.
What could be the potential root cause of this issue? Is there any issue with my code to send the message to the topic?
public async Task SendAsync(Message message)
{
try
{
await _topicClient.SendAsync(message);
}
catch(Exception exc)
{
throw new Exception(exc.Message);
}
finally
{
await _topicClient.CloseAsync();
}
}
enter image description here
The code sample you provided does not really tell us much. We do not know how SendAsync(Message message) is being invoked. Is your image your queue count that drops to 0 before accepting more messages? I'm assuming a client calls your WCF app service which tells it send the message to service bus?
It does sound like you are hitting the 1000 maximum connections. Your _topicClinet should be a singleton for your app domain that all clients use. You also should only need one app service instance if all you're doing is message forwarding. No need for scaling unless there's more processing that you haven't alluded to.
Have a look at the Service Bus messaging best practices doc for more suggestions.
Thanks for responding. These are good suggestions and I will look to review my implementation inline with these.
The good news is that I was able to resolve the issue, it wasn't related to the topic client as I thought earlier. It was due to how I was registering dependency injection.
I am implementing a WCF service based on .Net Framework 4.8 and initially, we did not include Global.asax but registered DI in the service controller constructor. The implementation worked till we realized (as part of performance testing) it seems to add additional threads when we added ILogger dependency. Those additional threads never cool down but were adding up as the service received more requests.
To resolve, I moved DI registration into Application_Start in global.asax.

Azure service bus multiple instances for the same subscriber

I have a situation where I have an asp.net core application which registers a subscription client to a topic on startup (IHostedService), this subscription client essentially has a dictionary of callbacks that need to be fired whenever it detects a new message in a topic with an id (this id is stored on the message properties). This dictionary lives throughout the lifetime of the application, and is in memory.
Everything works fine on a single instance of the asp.net core app service on azure, as soon as I scale up to 2, I notice that sometimes the callbacks in the subscription are not firing. This makes sense, as we have two instances now, each with its own dictionary store of callbacks.
So I updated the code to check if the id of the subscription exists, if not, abandon message, if yes, get the callback and invoke it.
public async Task HandleMessage(Microsoft.Azure.ServiceBus.Message message, CancellationToken cancellationToken)
{
var queueItem = this.converter.DeserializeItem(message);
var sessionId = // get the session id from the message
if (string.IsNullOrEmpty(sessionId))
{
await this.subscriptionClient.AbandonAsync(message.SystemProperties.LockToken);
return;
}
if (!this.subscriptions.TryGetValue(sessionId, out var subscription))
{
await this.subscriptionClient.AbandonAsync(message.SystemProperties.LockToken);
return;
}
await subscription.Call(queueItem);
// subscription was found and executed. Complete message
await this.subscriptionClient.CompleteAsync(message.SystemProperties.LockToken);
}
However, the problem still occurs. My only guess is that when calling AbandonAsync, the same instance is picking up the message again?
I guess what I am really trying to ask is, if I have multiple instances of a topic subscription client all pointing to the same subscriber for the topic, is it possible for all the instances to get a copy of the message? Or is that not guaranteed.
if I have multiple instances of a topic subscription client all pointing to the same subscriber for the topic, is it possible for all the instances to get a copy of the message? Or is that not guaranteed.
No. If it's the same subscription all clients are pointing to, only one will be receiving that message.
You're running into an issue of scaling out with competing consumers. If you're scaling out, you never know what instance will pick the message. And since your state is local (in memory of each instance), this will fail from time to time. Additional downside is the cost. By fetching messages on the "wrong" instance and abandoning, you're going to pay higher cost on the messaging side.
To address this issue you either need to have a shared/centralized or change your architecture around this.
I managed to solve the issue by making use of service bus sessions. What I was trying to do with the dictionary of callbacks is basically a session manager anyway!
Service bus sessions allow me to have multiple instances of a session client all pointing to the same subscription. However, each instance will only know or care about the sessions it is currently dealing with.

Creating queues dynamical on MassTransit

I have a particular scenario with RabbitMQ that needs to have dynamically created queues and binds to exchanges, that are also dynamically created (not by me). This creation and binding is triggered by a new SignalR subscription.
This issue: https://github.com/MassTransit/MassTransit/issues/398 is about it, but I still don't know the answer.
Seems that mass transit is not very flexible on creating things on the move.
How can I achieve this? What if I stop the bus and recreate all the queues and bindings plus the new one, and start the bus again?
Thanks in advance.
Receive endpoints can be connected via the bus, as shown in the documentation.
For example:
var handle = bus.ConnectReceiveEndpoint("queue-name", x =>
{
x.Consumer<SomeConsumer>();
})
// the code below waits for the receive endpoint to be ready
// and throws an exception if a fault occurs
var ready = await handle.Ready;

Scoping transactions and sessions in NHibernate for long running tasks

When using NHibernate in web applications, I will usually let my IoC container take care of opening and closing an ISession per request and commit/rollback the transaction. The nature of HTTP makes it very easy to define a clear Unit-of-Work in such applications.
Now, I have been tasked with putting together a small program, which will be invoked regularly by a task scheduler, for sending out newsletters. The concepts of both newsletters and subscribers are already well defined entities in our domain model, and sending a newsletter to all subscribers would involve doing something similar to this:
var subscribers = _session
.QueryOver<Subscription>()
.Where(s => !s.HasReceivedNewsletter)
.List();
foreach (var subscriber in subscribers)
{
SendNewsletterTo(subscriber);
subscriber.HasReceivedNewsletter = true;
}
Notice how each Subscriber object is updated within the loop, recording that she has now received the newsletter. The idea is, that if the mail sending program should crash, it can be restarted and continue sending newsletters from where it left off.
The problem I am facing, is in defining and implementing the Unit-of-Work pattern here. I will probably need to commit changes to the database by the end of each iteration of the loop. Simply wrapping the loop body with a using (var trans = _session.BeginTransaction()) block seems to be extremely expensive in running time, and I also seem to experience locking issues between this long running process and other (web) applications using the same database.
After reading some articles and documentation on NHibernate transactions, I have come to think, that I might need to detach the list of subscribers from the session to avoid the locking issues, and reattach each to a fresh session in the loop body. I am not sure how this will work for performance, though.
So, NHibernate experts, how would you design and implement a long running job like this?
Don't you want to use asynchronous durable messaging here? Something like NServiceBus, Rhino Service Bus or MassTransit. It seems you don't have to send a lot of messages as soon as possible, so I think you should do it asynchronously with 1 durable message per user basis
Don't you think that Stateless session with no transaction will do better here?
There's no problem having multiple transactions in a session. It's appropriate here to scope the transaction to updating a single subscriber because it's an independent operation. Depending on the number of subscribers and the likelihood of failure, it might be best to grab a small number of subscribers at a time.
foreach (var subscriber in subscribers)
{
using (var txn = _session.BeginTransaction())
{
try
{
SendNewsletterTo(subscriber);
subscriber.HasReceivedNewsletter = true;
txn.Commit();
}
catch (Exception ex)
{
txn.Rollback();
// log exception, clean up any actions SendNewsletterTo has taken if needed
// Dispose of session and start over
}
}
}