Should a service marked as UNHEALTHY if it's too busy? - load-balancing

The Health Check is just like a simple ping-pong test, if the service has no response, then it's unhealthy. If something is unhealthy, there should be an alarm for it.
But, should I mark the service as unhealthy if it's too busy but still responsive (maybe it's just a short time overloading)?
Most of the Health Check doesn't have a BUSY state (only a SERVING and
a NOT_SERVING state)
enum ServingStatus {
UNKNOWN = 0;
SERVING = 1;
NOT_SERVING = 2;
}
It makes no point to connect to the busying service. But since it's still working properly, it's still count as a healthy service right?
Or the load balancer should still be able to connect to it, but the service should return a TOO_BUSY status to refuse the incoming requests (then why not just mark it as unhealthy)?

Reasons for not marking it unhealthy:
If your service manager restarts unhealthy services, then it might do more damage.
If you need to wake your team up at 3am because of an unhealthy service, but not because of a busy service, you're wasting your team's energy.
If your service is unhealthy because you're infrastructure lacks the proper adjective, then add the adjective.
And finally and most importantly:
The real question is, why are you trying to prevent load from your service? Are you afraid that requests will fail because the server is too busy responding? Maybe its time to scale. How would your autoscaler ever know that your service needs to scaling if whenever its busy, loads start to magically relieve themselves. Your autoscaler needs the service to be busy so that it can scale. If you aren't using an autoscaler, then why do you care if its "BUSY" or "UNHEALTHY"?

Related

Will BackgroundService play nicely on a Kubernetes cluster

I have a kubernetes cluster into which I'm intending to implement a service in a pod - the service will accept a grpc request, start a long running process but return to the caller indicating the process has started. Investigation suggests that IHostedService (BackgroundService) is the way to go for this.
My question is, will use of BackgroundService behave nicely with various neat features of asp.net and k8s:
Will horizontal scaling understand that a service is getting overloaded and spin up a new instance even though the service will appear to have no pending grpc requests because all the work is background (I appreciate there's probably hooks that can be implemented, I'm wondering what's default behaviour)
Will the notion of awaiting allowing the current process to be swapped out and another run work okay with background services (I've only experienced it where one message received hits an await so allows another message to be processed, but backround services are not a messaging context)
I think asp.net will normally manage throttling too many requests, backing off if the server is too busy, but will any of that still work if the 'busy' is background processes
What's the best method to mitigate against overloading the service (if horizontal scaling is not an option) - I can have the grpc call reutrn 'too busy' but would need to detect it (not quite sure if that's cpu bound, memory or just number of background services)
Should I be considering something other than BackgroundService for this task
I'm hoping the answer is that "it all just works" but feel it's better to have that confirmed than to just hope...
Investigation suggests that IHostedService (BackgroundService) is the way to go for this.
I strongly recommend using a durable queue with a separate background service. It's not that difficult to split into two images, one running ASP.NET GRPC requests, and the other processing the durable queue (this can be a console app - see the Service Worker template in VS). Note that solutions using non-durable queues are not reliable (i.e., work may be lost whenever a pod restarts or is scaled down). This includes in-memory queues, which are commonly suggested as a "solution".
If you do make your own background service in a console app, I recommend applying a few tweaks (noted on my blog):
Wrap ExecuteAsync in Task.Run.
Always have a top-level try/catch in ExecuteAsync.
Call IHostApplicationLifetime.StopApplication when the background service stops for any reason.
Will horizontal scaling understand that a service is getting overloaded and spin up a new instance even though the service will appear to have no pending grpc requests because all the work is background (I appreciate there's probably hooks that can be implemented, I'm wondering what's default behaviour)
One reason I prefer using two different images is that they can scale on different triggers: GRPC requests for the API and queued messages for the worker. Depending on your queue, using "queued messages" as the trigger may require a custom metric provider. I do prefer using "queued messages" because it's a natural scaling mechanism for the worker image; out-of-the-box solutions like CPU usage don't always work well - in particular for asynchronous processors, which you mention you are using.
Will the notion of awaiting allowing the current process to be swapped out and another run work okay with background services (I've only experienced it where one message received hits an await so allows another message to be processed, but backround services are not a messaging context)
Background services can be asynchronous without any problems. In fact, it's not uncommon to grab messages in batches and process them all concurrently.
I think asp.net will normally manage throttling too many requests, backing off if the server is too busy, but will any of that still work if the 'busy' is background processes
No. ASP.NET only throttles requests. Background services do register with ASP.NET, but that is only to provide a best-effort at graceful shutdown. ASP.NET has no idea how busy the background services are, in terms of pending queue items, CPU usage, or outgoing requests.
What's the best method to mitigate against overloading the service (if horizontal scaling is not an option) - I can have the grpc call reutrn 'too busy' but would need to detect it (not quite sure if that's cpu bound, memory or just number of background services)
Not a problem if you use the durable queue + independent worker image solution. GRPC calls can pretty much always stick another message in the queue (very simple and fast), and K8 can autoscale based on your (possibly custom) metric of "outstanding queue messages".
Generally, "it all works".
For the automatic horizontal scale, you need a autoscaler, read this: Horizontal Pod Autoscale
But you can just scale it yourself (kubectl scale deployment yourDeployment --replicas=10).
Lets assume, you have a deployment of your backend, which will start with one pod. Your autoscaler will watch your pod (eg. used cpu) and will start a new pod for you, when you have a high load.
A second pod will be started. Each new request will send to different pods (round-robin).
There is no need, that your backend throttle calls. It should just handle many calls as possible.

WCF service hosted on Azure App service never seems to finish threads opened for processing

I have deployed a WCF service to Azure App Service that performs just one task - send a message to the topic. Although app works fine with normal load, it starts experiencing higher thread count as soon as load on the app increases.
The app instance becomes unhealthy when the threads count limit is reached.
Those threads stay in waiting state forever. We tried scaleout option on thread count metrics but the app just keeps on adding more instances as the earlier instance still had almost all threads waiting and remain unhealthy forever.
This is performed in the below sequence.
Accept a request.
initialize a Service bus topic client
Send the requested message to the topic.
Closed the topic client.
While sending a burst of 1000 requests, the app works but the number of threads initiated always stays in the waiting state. However, while these threads are waiting CPU stays at 0%. The average response time from this service is also under 100 ms avg.
After sending 1000 requests to this service, I see a similar number of threads open.
What could be the potential root cause of this issue? Is there any issue with my code to send the message to the topic?
public async Task SendAsync(Message message)
{
try
{
await _topicClient.SendAsync(message);
}
catch(Exception exc)
{
throw new Exception(exc.Message);
}
finally
{
await _topicClient.CloseAsync();
}
}
enter image description here
The code sample you provided does not really tell us much. We do not know how SendAsync(Message message) is being invoked. Is your image your queue count that drops to 0 before accepting more messages? I'm assuming a client calls your WCF app service which tells it send the message to service bus?
It does sound like you are hitting the 1000 maximum connections. Your _topicClinet should be a singleton for your app domain that all clients use. You also should only need one app service instance if all you're doing is message forwarding. No need for scaling unless there's more processing that you haven't alluded to.
Have a look at the Service Bus messaging best practices doc for more suggestions.
Thanks for responding. These are good suggestions and I will look to review my implementation inline with these.
The good news is that I was able to resolve the issue, it wasn't related to the topic client as I thought earlier. It was due to how I was registering dependency injection.
I am implementing a WCF service based on .Net Framework 4.8 and initially, we did not include Global.asax but registered DI in the service controller constructor. The implementation worked till we realized (as part of performance testing) it seems to add additional threads when we added ILogger dependency. Those additional threads never cool down but were adding up as the service received more requests.
To resolve, I moved DI registration into Application_Start in global.asax.

Strategy for busy WCF service

I've got a really busy self-hosted WCF server that requires 2000+ clients to update their status on a frequent basis. What I'm finding is that the CPU utilization of the server is sitting at around 70% constantly, and the clients have a 50% chance of actually getting a connection to the server. They will timeout after 60 seconds. This is problematic because if the server doesn't hear back from a client, it'll assume the client is offline.
I've implemented throttling so I can adjust concurrent connections/sessions/etc., but if I'm not mistaken, increasing this will only lead to higher CPU utilization and worse connectivity problems. Right?
Will increasing the timeout to something more than 60 seconds help? I'm not exactly sure how it works, but will a client sit in a type of queue until the server can field the request? Or is it best to set the timeout to something smaller and make the client check in more often if it can't get connected (this seems like it could only make the problem worse in a sense)?
If it's really important for the server to know if the client is still connected, I don't think relying solely on WCF is your best bet for that.
Maybe your server should have some sort of ping mechanism that either allows it to ping client machines based on some sort of timer or vice versa.
If you're super concerned about the messages always getting through, no matter what, then I suggest exploring Reliable services. Check out the enableReliableSession behavior attribute. I suggest reading through at least the first chapter in Juval Lowy's Programming WCF Services which is available for free as the Kindle sample of the book.
Increasing the timeout may help, but probably not much, and the Amazing Ever-Increasing Timeout is kind of a motif on http://www.thedailywtf.com . Making the client hammer the server if it can't get through the first time is guaranteed to cause pain.
If all that you care about is knowing whether the client is there, might it be practical to go down a layer or two, and have the client send you an HTTP POST once in a while? WCF requires some active back-and-forth, but a POST can just lay there until your server has time to deal with it, and the client can just send it and forget about it.

WCF polling, background processing, and resource starvation

I have a web service, implemented with WCF and hosted in IIS7, with a submit-poll communication pattern. An initial request is made, which returns quickly and kicks off a background process. The client polls for the status of the background process. This interface is set and can't be changed (it's a simulation of an external service we depend on).
I implemented the background processing by adding another service contract to the existing service with a one-way message contract that starts the long-running process. The "background" service keeps a database updated with the status in order to communicate with the main service. This avoids creating any new web services or items to deploy.
The problem is that the background process is very CPU intensive, and it seems to be starving the other service calls out. It will take up an entire processor, and while a single instance of the background process is running, status polling calls to the main service can take over a minute. I don't care how long the background process takes.
Is there any way to throttle the resource usage of the background method? Or an obvious way to do long running async processes in WCF without changing my submit/poll service contract? Would separating them into different web services help if the two services were still running on the same server?
The first thing I would try would be to lower the priority.
If you're actually spinning off a separate process for the background work, then you can do it like this:
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.BelowNormal;
If it's really just a background thread, use this instead (from within the thread):
Thread.CurrentThread.Priority = ThreadPriority.BelowNormal;
(Actually, it's better to start the thread suspended and change the priority at the caller before running it, but it's generally OK to lower your own priority.)
At the very least it should help determine whether or not it's really a CPU issue. If you still have problems after lowering the priority then it might be something else that's getting starved, like file or network I/O.

WCF - How to detect if server is alive?

I am developing a client/server application with net tcp binding and I need to be notified if my connection to server goes down.
From server-side if a client disconnects, i can detect it instantly with CommunicationObject. Faulted event (with reliable session off). However, from Client side, it seems I have no way to know if server goes down. Same event doesn't fire. By the way I am setting receiveTimeout to infinite. Some people suggested a heartbeat or ping function to check if server is alive. But i think at WCF level such methodologies have big impacts. After all it's not a simple packet you send , it's the whole WCF request. What should I do ?
There seems to be a common misconception that, in order to find out on the client side whether a WCF session is still alive, one has to implement some kind of custom ping or heartbeat operation on the service. However, the WCF framework, when configured correctly, already does this for you in the background.
The trick is to set the ReliableSession.InactivityTimeout to a period that is short enough. For instance, if you set it to 30 seconds, then the ICommunicationObject.Faulted event will be raised on the client proxy after 30 (minimum) to appr. 45 (maximum) seconds after a service breakdown. The exact delay depends on the rhythm of the WCF-internal session keep-alive control timer and the specific time of the breakdown.
Of course, this can only work for reliable-session capable bindings, combined with the right session properties (ServiceContractAttribute.SessionMode, ServiceBehaviorAttribute.InstanceContextMode, OperationContractAttribute.IsInitiating, and OperationContractAttribute.IsTerminating).