Webjob Errors since move to Azure Sql v12 - azure-sql-database

Each morning at 4am a Scheduled Task creates around 500 messages on a message queue. Each message is an id of an email to send. Each message is picked up and a url created and sent via await HttpClient.GetAsync(url) The url target then creates and sends the email. This has worked well for months.
I've just upgraded to SQL Azure v12 and all is now not well.
The very first messages to be processed (after 2 minutes running time) threw a
"System.Threading.Tasks.TaskCanceledException"
I'm also seeing
"System.Data.Entity.Core.EntityException: The underlying provider
failed on Open. ---> System.InvalidOperationException: Timeout
expired. The timeout period elapsed prior to obtaining a connection
from the pool. This may have occurred because all pooled connections
were in use and max pool size was reached."
and a couple of
"The timeout period elapsed prior to completion of the operation or
the server is not responding. This failure occurred while attempting
to connect to the routing destination. The duration spent while
attempting to connect to the original server was - [Pre-Login]
initialization=6; handshake=426; [Login] initialization=0;
authentication=0;"
The webjob that sends the request to the api is awaiting a response. I'm wondering if because it's async, while awaiting the response the thread is freed to go off and processes another queue item - and therefore creates another api request, essentially this hits the api again and again until there are so many requests being processed by the api all the theads are in use - and that I might be better off NOT making the webjob async because then it (the 'trapped' thread) would send a request only after the first request completes? Is that right? edit: actually the IIS logs suggest that the api requests are not all happening at once. So my question is "what should I look at next? Are these common SQL v12 errors or is the recent upgrade a red-herring?"
just to clarify, the webjob that fires in response to the queue message simply does:
using (HttpClient client = new HttpClient())
{
response = await client.GetAsync(url);
}
and hits the web api of an Always On standard tier azure website. Database DTU % is about 25% while this happens.
edit:
"Gateway no longer provides retry logic in V12 Before version V12,
Azure SQL Database had a gateway that acted as a proxy to buffer all
interactions between the database and your client program. The gateway
provided automated retry logic for some transient errors.
V12 eliminated the gateway. Now your program must more fully handle
transient errors."
adding DbConfiguration class for SqlAzureExecutionStrategy. Will so how it runs tonight.

Adding the EF retry config class fixed the transient errors. The cancelled tasks are a different issue (new question)
//https://msdn.microsoft.com/en-us/data/jj680699
public class SqlAzureConfiguration : DbConfiguration
{
public SqlAzureConfiguration()
{
this.SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy());
}
}
and in web.config (because I have multiple contexts)
<entityFramework codeConfigurationType="Abc.DataService.SqlAzureConfiguration, Abc.DataService">

Related

WCF service hosted on Azure App service never seems to finish threads opened for processing

I have deployed a WCF service to Azure App Service that performs just one task - send a message to the topic. Although app works fine with normal load, it starts experiencing higher thread count as soon as load on the app increases.
The app instance becomes unhealthy when the threads count limit is reached.
Those threads stay in waiting state forever. We tried scaleout option on thread count metrics but the app just keeps on adding more instances as the earlier instance still had almost all threads waiting and remain unhealthy forever.
This is performed in the below sequence.
Accept a request.
initialize a Service bus topic client
Send the requested message to the topic.
Closed the topic client.
While sending a burst of 1000 requests, the app works but the number of threads initiated always stays in the waiting state. However, while these threads are waiting CPU stays at 0%. The average response time from this service is also under 100 ms avg.
After sending 1000 requests to this service, I see a similar number of threads open.
What could be the potential root cause of this issue? Is there any issue with my code to send the message to the topic?
public async Task SendAsync(Message message)
{
try
{
await _topicClient.SendAsync(message);
}
catch(Exception exc)
{
throw new Exception(exc.Message);
}
finally
{
await _topicClient.CloseAsync();
}
}
enter image description here
The code sample you provided does not really tell us much. We do not know how SendAsync(Message message) is being invoked. Is your image your queue count that drops to 0 before accepting more messages? I'm assuming a client calls your WCF app service which tells it send the message to service bus?
It does sound like you are hitting the 1000 maximum connections. Your _topicClinet should be a singleton for your app domain that all clients use. You also should only need one app service instance if all you're doing is message forwarding. No need for scaling unless there's more processing that you haven't alluded to.
Have a look at the Service Bus messaging best practices doc for more suggestions.
Thanks for responding. These are good suggestions and I will look to review my implementation inline with these.
The good news is that I was able to resolve the issue, it wasn't related to the topic client as I thought earlier. It was due to how I was registering dependency injection.
I am implementing a WCF service based on .Net Framework 4.8 and initially, we did not include Global.asax but registered DI in the service controller constructor. The implementation worked till we realized (as part of performance testing) it seems to add additional threads when we added ILogger dependency. Those additional threads never cool down but were adding up as the service received more requests.
To resolve, I moved DI registration into Application_Start in global.asax.

ADO.NET Pooled connections are unable to reuse

I'm working on an ASP.NET MVC application which use EF 6.x to work with my Azure SDL Database. Recently with an increased load app start to get into a state when it's unable to communicate with the SQL server anymore. I can see that there are 100 active connections to my database using exec sp_who and any new connection is unable to create with the following error:
System.Data.Entity.Core.EntityException: The underlying provider
failed on Open. ---> System.InvalidOperationException: Timeout
expired. The timeout period elapsed prior to obtaining a connection
from the pool. This may have occurred because of all pooled connections
were in use and max pool size was reached.
Most of the time app works with average active connection count from 10 to 20. And any load doesn't change this number... Event when load is high it stays at level 10-20. But in certain situations, it could just up to 100 in less than a minute without any ramp up time and this causes app state when all my requests are failing. All those 100 connection are in sleeping state and awaiting command.
The good part is I found a workaround which helped me to mitigate the issue - clear connection pool from the client side. I'm using SqlCoonection.ClearAllPools() and it instantly closing all the connections and sp_who shows me my regular 10-20 connection after that.
The bad part, I still don't know the root cause.
Just to clarify the app load is about 200-300 concurrent users, which generate 1000 requests per minute
With the great suggestion #DavidBrowne to track leaked connection with a simple pattern I was able to find leaked connections while configuring Owin engine
private void ConfigureOAuthTokenGeneration(IAppBuilder app)
{
// here in create method I'm creating also a connection leak tracker
app.CreatePerOwinContext(() => MyCoolDb.Create());
...
}
Basically with every request, Owin creates a connection and doesn't let it go and when the WebAPI load is increased I have troubles.
Could it be the real cause and I Owin smart enough to lazy create a connection when needed (using the function provided) and let it go when it was used?
It's very unlikely that this is caused by anything other than your application code leaking connections.
Here's a helper library you can use to track when a connection is leaked, and report the call site that initially opened the connection.
http://ssbwcf.codeplex.com/SourceControl/latest#SsbTransportChannel/SqlConnectionLifetimeTracker.cs

WCF service writes log only if client receives results

I'm working on a WCF service to help our new code interoperate with a legacy system. The process goes like this:
Client calls the service with a request for the legacy system.
Service writes the request into a database.
Legacy system services request from the DB in its own time and writes results back into the DB (updating a status flag to say results are ready).
Client retrieves results by calling a second service method, which polls the DB until the ready flag is set.
Just before returning the results, the service updates the status flag to client has results, so that the related DB rows can be deleted.
My concern is the race condition at the last step. I can see this happening:
Service updates status to client has results.
Client times out after waiting for the service to poll the DB.
Service tries to return results. Hilarity ensues.
One way to solve this would be to have three service calls instead of two: the second call retrieves results, and the last one is an explicit acknowledgement by the client that it has them. I'd like to know whether there is a way which doesn't impose this extra "protocol" burden on the client though.
I've looked briefly into using transactions in WCF, and it sounds like they might be able to do what I need. The client (optionally) starts a transaction, flows it to the service, which uses it if it's there, and commits it when done. This seems as if it implicitly does the "third call".
Does this idea have any merit? Any disadvantages that you can see? Are there any other avenues I could explore?
Using transaction flow is possible but flowing transaction in polling scenario (in each poll call) is terrible architecture. What you generally need is transaction flow for the real read operation where service modifies the record and returns data back to the client. The client will commit the transaction and it will commit changes performed by the service.
Using transactional processing places some additional requirements on your service and clients.
Another approach can be transactional MSMQ:
Client calls the service with a request for the legacy system = client sends a message to the service's queue
Service writes the request into a database = service processes the message from its queue
Legacy system services request from the DB in its own time and writes results back into the DB (updating a status flag to say results are ready).
Service polls the database and places messages to correct client queues. Placing the message and modifying database records runs in transaction
Client processes incoming message
Transactional queue allows transactional reading (the message is removed from the queue only if transaction is committed) and writing (the message is added to the queue only if transactions is committed). That will allow deleting records before the client reads the message because the message will remain in the queue until he successfully reads it (or until it timeouts and even after that it can be passed to some error queues).
In both cases you should think about clients who will consume the service. Transaction flowing can be interoperable but not every web service stack supports it. MSMQ is not interoperable.
Why not reduce the likelihood of the client timing out by doing this instead:
Client calls service with a request for the legacy system.
Service writes the request into a database.
Legacy system services request from the DB in its own time and writes results back into the DB (updating a status flag to say results are ready).
Client calls a service to find out whether the results are ready. NB. no polling: just returns with an immediate yes or no.
If the results are NOT ready, client waits a bit and then goes back to step 4.
If the results ARE ready, call the service to retrieve the results. The service can update the status to "Client has results" at that point.
By doing this, the client won't be waiting for the service call in step 4. to return for a prolonged period, and the chances of a timeout should be minimal.
However, you're never going to be 100% certain that the client has received the results unless the client makes a final service call to say so. (What if, for example, the client dies after making the very last request?)

Handling NServiceBus timeouts correctly

NServiceBus provides a timeout mechanism. From nservicebus.com:
The RequestTimeout method on the base
class tells NServiceBus to send a
message to another endpoint which will
durably keep time for us ... There's a
process that comes with NServiceBus
called the Timeout Manager which
provides a basic implementation of
this functionality.
When time is up, the Timeout Manager
sends a message back to the saga
causing its Timeout method to be
called with the same state object
originally passed.
As I see it there is a possibility that the timeout is triggered even though the message has been delivered to the receiver (the reply got stuck somewhere for example).
How do I design my application in such a way that my application will behave correctly regardless if the message made it to the receiver or not.
If the Client sends a message to the Server and then requests a Timeout, the state of the request will be stored. If the Timeout message is received by the Client prior to the reply from the Server then you can compare the state returned by the Timeout to the current state and see that the Server has not replied and decide what to do. If the request is no longer valid, you might ignore the reply. If that is the case, you may want to look at the "TimeToBeReceived" attribute for the Server message. It will throw away messages it can't receive in the designated time.

Is it possible to have asynchronous processing

I have a requirement where I need to send continuous updates to my clients. Client is browser in this case. We have some data which updates every sec, so once client connects to our server, we maintain a persistent connection and keep pushing data to the client.
I am looking for suggestions of this implementation at the server end. Basically what I need is this:
1. client connects to server. I maintain the socket and metadata about the socket. metadata contains what updates need to be send to this client
2. server process now waits for new client connections
3. One other process will have the list of all the sockets opened and will go through each of them and send the updates if required.
Can we do something like this in Apache module:
1. Apache process gets the new connection. It maintains the state for the connection. It keeps the state in some global memory and returns back to root process to signify that it is done so that it can accept the new connection
2. the Apache process though has returned the status to root process but it is also executing in parallel where it going through its global store and sending updates to the client, if any.
So can a Apache process do these things:
1. Have more than one connection associated with it
2. Asynchronously waiting for new connection and at the same time processing the previous connections?
This is a complicated and ineffecient model of updating. Your server will try to update clients that have closed down. And the server has to maintain all that client data and meta data (last update time, etc).
Usually, for continuous updates ajax is used in a polling model. The client has a javascript timer that when it fires, hits a service that provides updated data. The client continues to get updates at regular intervals without having to write an apache module.
Would this model work for your scenario?
More reasons to opt for poll instead of push
Periodic_Refresh
With a little patch to resume a SUSPENDED mpm_event connection, I've got an asynchronous Apache module working. With this you can do the improved polling:
javascript connects to Apache and asks for an update;
if there's no updates, then instead of answering immediately the module uses SUSPENDED;
some time later, after an update or a timeout happens, callback fires somewhere;
callback gives an update (or a "no updates" message) to the client and resumes the connection;
client goes to step 1, repeating the poll which with Keep-Alive will use the same connection.
That way the number of roundtrips between the client and the server can be decreased and the client receives the update immediately. (This is known as Comet's Reverse Ajax, AFAIK).