I have a situation where I have an asp.net core application which registers a subscription client to a topic on startup (IHostedService), this subscription client essentially has a dictionary of callbacks that need to be fired whenever it detects a new message in a topic with an id (this id is stored on the message properties). This dictionary lives throughout the lifetime of the application, and is in memory.
Everything works fine on a single instance of the asp.net core app service on azure, as soon as I scale up to 2, I notice that sometimes the callbacks in the subscription are not firing. This makes sense, as we have two instances now, each with its own dictionary store of callbacks.
So I updated the code to check if the id of the subscription exists, if not, abandon message, if yes, get the callback and invoke it.
public async Task HandleMessage(Microsoft.Azure.ServiceBus.Message message, CancellationToken cancellationToken)
{
var queueItem = this.converter.DeserializeItem(message);
var sessionId = // get the session id from the message
if (string.IsNullOrEmpty(sessionId))
{
await this.subscriptionClient.AbandonAsync(message.SystemProperties.LockToken);
return;
}
if (!this.subscriptions.TryGetValue(sessionId, out var subscription))
{
await this.subscriptionClient.AbandonAsync(message.SystemProperties.LockToken);
return;
}
await subscription.Call(queueItem);
// subscription was found and executed. Complete message
await this.subscriptionClient.CompleteAsync(message.SystemProperties.LockToken);
}
However, the problem still occurs. My only guess is that when calling AbandonAsync, the same instance is picking up the message again?
I guess what I am really trying to ask is, if I have multiple instances of a topic subscription client all pointing to the same subscriber for the topic, is it possible for all the instances to get a copy of the message? Or is that not guaranteed.
if I have multiple instances of a topic subscription client all pointing to the same subscriber for the topic, is it possible for all the instances to get a copy of the message? Or is that not guaranteed.
No. If it's the same subscription all clients are pointing to, only one will be receiving that message.
You're running into an issue of scaling out with competing consumers. If you're scaling out, you never know what instance will pick the message. And since your state is local (in memory of each instance), this will fail from time to time. Additional downside is the cost. By fetching messages on the "wrong" instance and abandoning, you're going to pay higher cost on the messaging side.
To address this issue you either need to have a shared/centralized or change your architecture around this.
I managed to solve the issue by making use of service bus sessions. What I was trying to do with the dictionary of callbacks is basically a session manager anyway!
Service bus sessions allow me to have multiple instances of a session client all pointing to the same subscription. However, each instance will only know or care about the sessions it is currently dealing with.
Related
I have deployed a WCF service to Azure App Service that performs just one task - send a message to the topic. Although app works fine with normal load, it starts experiencing higher thread count as soon as load on the app increases.
The app instance becomes unhealthy when the threads count limit is reached.
Those threads stay in waiting state forever. We tried scaleout option on thread count metrics but the app just keeps on adding more instances as the earlier instance still had almost all threads waiting and remain unhealthy forever.
This is performed in the below sequence.
Accept a request.
initialize a Service bus topic client
Send the requested message to the topic.
Closed the topic client.
While sending a burst of 1000 requests, the app works but the number of threads initiated always stays in the waiting state. However, while these threads are waiting CPU stays at 0%. The average response time from this service is also under 100 ms avg.
After sending 1000 requests to this service, I see a similar number of threads open.
What could be the potential root cause of this issue? Is there any issue with my code to send the message to the topic?
public async Task SendAsync(Message message)
{
try
{
await _topicClient.SendAsync(message);
}
catch(Exception exc)
{
throw new Exception(exc.Message);
}
finally
{
await _topicClient.CloseAsync();
}
}
enter image description here
The code sample you provided does not really tell us much. We do not know how SendAsync(Message message) is being invoked. Is your image your queue count that drops to 0 before accepting more messages? I'm assuming a client calls your WCF app service which tells it send the message to service bus?
It does sound like you are hitting the 1000 maximum connections. Your _topicClinet should be a singleton for your app domain that all clients use. You also should only need one app service instance if all you're doing is message forwarding. No need for scaling unless there's more processing that you haven't alluded to.
Have a look at the Service Bus messaging best practices doc for more suggestions.
Thanks for responding. These are good suggestions and I will look to review my implementation inline with these.
The good news is that I was able to resolve the issue, it wasn't related to the topic client as I thought earlier. It was due to how I was registering dependency injection.
I am implementing a WCF service based on .Net Framework 4.8 and initially, we did not include Global.asax but registered DI in the service controller constructor. The implementation worked till we realized (as part of performance testing) it seems to add additional threads when we added ILogger dependency. Those additional threads never cool down but were adding up as the service received more requests.
To resolve, I moved DI registration into Application_Start in global.asax.
I'm considering using socketcluster to build a realtime app. The docs are very clear but I could not find a way to create a channel on demand programmatically.
My need is: as a user, I would like to call a REST API which will create a channel which would immediately be up and running on the server.
For example, calling from client side: POST https://<myServer>/api/channels with JSON body { "channel": "myChannel} would create a myChannel channel on the server and my client side code would be able to subscribe directly (after having received the server response):
var myChannel = socket.subscribe('myChannel');
myChannel.publish('myChannel', 'I am here !');
myChannel.watch(function (data) {
console.log('received data from myChannel:', data);
});
I suppose that this newly created channel would use my authorization middleware as middlewares are defined at server level (wsServer.addMiddleware(wsServer.MIDDLEWARE_SUBSCRIBE, ...)
Thanks a lot for your help,
Pierre
With SocketCluster, channels are created and destroyed for you automatically so you don't need to manage their lifecycle. A channel will be created on the back end if there is at least one client subscribed to it (based on the channel name) and will be automatically destroyed once all of those clients have disconnected or unsubscribed from it. SC also accounts for failure cases too - E.g. if internet connections are unexpectedly lost.
SC is designed to be efficient at creating and destroying lots of unique channels on the fly. You can have hundreds of unique channels per user (so possibly many thousands or even millions of unique channels in total). Channels don't consume any CPU at all if they're idle and each channel has a tiny memory footprint.
Channels in SC are not message queues (unlike what is offered by RabbitMQ, NSQ, Kafka, Stomp...); SC does not store messages on a persistent queue (though you can extend SC with your own persistence logic).
i have a azure worker role with an nservicebus host 4.7.5 . This host sends events on azure servicebus transport and to a topic. Is there a way to either delay the sending of the event or setting some properties to make sure that the message appears after a delay on the topic subscription? The host sends out events after it notices a change in the primary database. There are several secondary databases into which the primary data write is replicated. The receivers are also azure worker roles that use nservicebus host and have subscription to the topics.
By the time the receivers receive the message, due to replication lag the secondaries may have out of sync data.
one option is to use primary database to read but that is a route which i dont want to take.
Would it be possible to fail-early in your subscription endpoints and let the retries take care of it? You can fine-tune the retry times/delays to make sure your secondary databases are updated before the message is retried.
You still need to find the best way to lookup your data from the database and a way to differentiate between the version in the event. You could use version numbers or last update dates in case of updates, or just lookup by an identifier in case of creation.
The endpoint reading data off the secondary database might have a event handler like this:
public class CustomerCreationHandler : IHandlesMessage<CustomerCreated>
{
public void Handle(CustomerCreated #event)
{
var customer = Database.Load(#event.CustomerId);
if(customer == null)
{
throw new CustomerNotFoundException("Customer was not found.");
}
//Your business logic goes here
}
}
You can control how many times the event handler will retry and how much delay there'll be between each attempt. In this case, the message will be retried by First-Level retries and then handed over to Second-Level retries which is configured below.
class ProvideConfiguration :
IProvideConfiguration<SecondLevelRetriesConfig>
{
public SecondLevelRetriesConfig GetConfiguration()
{
return new SecondLevelRetriesConfig
{
Enabled = true,
NumberOfRetries = 2,
TimeIncrease = TimeSpan.FromSeconds(10)
};
}
}
Alternatively, instead of just publishing the event, you can send a deferred message to the same endpoint to then publish the actual event after certain amount of time is passed.
My requirement is to make the Subscriber pause processing the messages depending on whether a web service is up or not. So, when the web service is down, the messages should keep coming to the subscriber queue from Publisher and keep piling up until the web service is up again. (These messages should not go to the error queue, but stay on the Subscriber queue.)
I tried to use unsubscribe, but the publisher stops sending messages as the unsubscribe seems to clear the subscription info on RavenDB. I have also tried setting the MaxConcurrencyLevel on the Transport class, if I set the worker threads to 0, the messages coming to Subscriber go directly to the error queue. Finally, I tried Defer, which seems to put the current message in audit queue and creates a clone of the message and sends it locally to the subscriber queue when the timeout is completed. Also, since I have to keep checking the status of service and keep defering, I cannot control the order of messages as I cannot predict when the web service will be up.
What is the best way to achieve the behavior I have explained? I am using NServiceBus version 4.5.
It sounds like you want to keep trying to handle a message until it succeeds, and not shuffle it back in the queue (keep it at the top and keep trying it)?
I think your only pure-NSB option is to tinker with the MaxRetries setting, which controls First Level Retries: http://docs.particular.net/nservicebus/msmqtransportconfig. Setting MaxRetries to a very high number may do what you are looking for, but I can't imagine doing so would be a good practice.
Second Level Retries will defer the message for a configurable amount of time, but IIRC will allow other messages to be handled from the main queue.
I think your best option is to put retry logic into your own code. So the handler can try to access the service x number of times in a loop (maybe on a delay) before it throws an exception and NSB's retry features kick in.
Edit:
Your requirement seems to be something like:
"When an MyEvent comes in, I need to make a webservice call. If the webservice is down, I need to keep trying X number of times at Y intervals, at which point I will consider it a failure and handle a failure condition. Until I either succeed or fail, I will block other messages from being handled."
You have some potentially complex logic on handling a message (retry, timeout, error condition, blocking additional messages, etc.). Keep in mind the role that NSB is intended to play in your system: communication between services via messaging. While NSB does have some advanced features that allow message orchestration (e.g. sagas), it's not really intended to be used to replace Domain or Application logic.
Bottom line, you may need to write custom code to handle your specific scenario. A naive solution would be a loop with a delay in your handler, but you may need to create a more robust in-memory collection/queue that holds messages while the service is down and processes them serially when it comes back up.
The easiest way to achieve somewhat the required behavior is the following:
Define a message handler which checks whether the service is available and if not calls HandleCurrentMessageLater and a message handler which does the actual message processing. Then you specify the message handler order so that the handler which checks the service availability gets executed first.
public interface ISomeCommand {}
public class ServiceAvailabilityChecker : IHandleMessages<ISomeCommand>{
public IBus Bus { get; set; }
public void Handle(ISomeCommand message) {
try {
// check service
}
catch(SpecificException ex) {
this.Bus.HandleCurrentMessageLater();
}
}
}
public class ActualHandler : IHandleMessages<ISomeCommand>{
public void Handle(ISomeCommand message) {
}
}
public class SomeCommandHandlerOrdering : ISpecifyMessageHandlerOrdering{
public void SpecifyOrder(Order order){
order.Specify(First<ServiceAvailabilityChecker>.Then<ActualHandler>());
}
}
With that design you gain the following:
You can check the availability before the actual business code is invoked
If the service is not available the message is put back into the queue
If the service is available and your business code gets invoked but just before the ActualHandler is invoked the service becomes unavailable you get First and Second Level retries (and again the availability check in the pipeline)
The below text is an effort to expand and add color to this question:
How do I prevent a misbehaving client from taking down the entire service?
I have essentially this scenario: a WCF service is up and running with a client callback having a straight forward, simple oneway communication, not very different from this one:
public interface IMyClientContract
{
[OperationContract(IsOneWay = true)]
void SomethingChanged(simpleObject myObj);
}
I'm calling this method potentially thousands of times a second from the service to what will eventually be about 50 concurrently connected clients, with as low latency as possible (<15 ms would be nice). This works fine until I set a break point on one of the client apps connected to the server and then everything hangs after maybe 2-5 seconds the service hangs and none of the other clients receive any data for about 30 seconds or so until the service registers a connection fault event and disconnects the offending client. After this all the other clients continue on their merry way receiving messages.
I've done research on serviceThrottling, concurrency tweaking, setting threadpool minimum threads, WCF secret sauces and the whole 9 yards, but at the end of the day this article MSDN - WCF essentials, One-Way Calls, Callbacks and Events describes exactly the issue I'm having without really making a recommendation.
The third solution that allows the service to safely call back to the client is to have the callback contract operations configured as one-way operations. Doing so enables the service to call back even when concurrency is set to single-threaded, because there will not be any reply message to contend for the lock.
but earlier in the article it describes the issue I'm seeing, only from a client perspective
When one-way calls reach the service, they may not be dispatched all at once and may be queued up on the service side to be dispatched one at a time, all according to the service configured concurrency mode behavior and session mode. How many messages (whether one-way or request-reply) the service is willing to queue up is a product of the configured channel and the reliability mode. If the number of queued messages has exceeded the queue's capacity, then the client will block, even when issuing a one-way call
I can only assume that the reverse is true, the number of queued messages to the client has exceeded the queue capacity and the threadpool is now filled with threads attempting to call this client that are now all blocked.
What is the right way to handle this? Should I research a way to check how many messages are queued at the service communication layer per client and abort their connections after a certain limit is reached?
It almost seems that if the WCF service itself is blocking on a queue filling up then all the async / oneway / fire-and-forget strategies I could ever implement inside the service will still get blocked whenever one client's queue gets full.
Don't know much about the client callbacks, but it sounds similar to generic wcf code blocking issues. I often solve these problems by spawning a BackgroundWorker, and performing the client call in the thread. During that time, the main thread counts how long the child thread is taking. If the child has not finished in a few milliseconds, the main thread just moves on and abandons the thread (it eventually dies by itself, so no memory leak). This is basically what Mr.Graves suggests with the phrase "fire-and-forget".
Update:
I implemented a Fire-and-forget setup to call the client's callback channel and the server no longer blocks once the buffer fills to the client
MyEvent is an event with a delegate that matches one of the methods defined in the WCF client contract, when they connect I'm essentially adding the callback to the event
MyEvent += OperationContext.Current.GetCallbackChannel<IFancyClientContract>().SomethingChanged
etc... and then to send this data to all clients, I'm doing the following
//serialize using protobuff
using (var ms = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(ms, new SpecialDataTransferObject(inputData));
byte[] data = ms.GetBuffer();
Parallel.ForEach(MyEvent.GetInvocationList(), p => ThreadUtil.FireAndForget(p, data));
}
in the ThreadUtil class I made essentially the following change to the code defined in the fire-and-foget article
static void InvokeWrappedDelegate(Delegate d, object[] args)
{
try
{
d.DynamicInvoke(args);
}
catch (Exception ex)
{
//THIS will eventually throw once the client's WCF callback channel has filled up and timed out, and it will throw once for every single time you ever tried sending them a payload, so do some smarter logging here!!
Console.WriteLine("Error calling client, attempting to disconnect.");
try
{
MyService.SingletonServiceController.TerminateClientChannelByHashcode(d.Target.GetHashCode());//this is an IContextChannel object, kept in a dictionary of active connections, cross referenced by hashcode just for this exact occasion
}
catch (Exception ex2)
{
Console.WriteLine("Attempt to disconnect client failed: " + ex2.ToString());
}
}
}
I don't have any good ideas how to go and kill all the pending packets the server is still waiting to see if they'll get delivered on. Once I get the first exception I should in theory be able to go and terminate all the other requests in some queue somewhere, but this setup is functional and meets the objectives.