RestSharp: The underlying connection was closed: A connection that was expected to be kept alive was closed by the server - restsharp

I am using RestSharp as the underlying HTTP client library to make a stress/throughput test client on a black box service. Threadpool and Servicepoint connection limits have been lifted to 5000, but that shouldn't be much of a worry as we are testing around 500-1000 requests per second. A high-resolution (microsecond) timer component is used to throw out requests at the rate we want to test.
The RestSharp code roughly goes
restClient.ExecuteAsync(postRequest, res =>
{
stopwatch.Stop();
lock (this.countLocker)
{
this.roundTrips.Add(stopwatch.ElapsedMilliseconds);
if (res.ResponseStatus == ResponseStatus.Completed &&
(res.StatusCode == HttpStatusCode.OK ||
res.StatusCode == HttpStatusCode.NoContent))
{
this.responseCount++;
}
else
{
// Treat all other status codes as errors.
this.reportError(res);
}
}
});
On pumping too many requests, we'd observe the service would after some time spill some error 503 responses, but RestSharp treats those as Complete responses since that's a valid response from the server; no actual exception got thrown.
What is not clear is when RestSharp encounters exception due to underlying connection error
The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
at RestSharp.Http.GetRawResponseAsync(IAsyncResult result, Action`1 callback)
at RestSharp.Http.ResponseCallback(IAsyncResult result, Action`1 callback)
or
The underlying connection was closed: An unexpected error occurred on a receive.
at RestSharp.Http.GetRawResponseAsync(IAsyncResult result, Action`1 callback)
at RestSharp.Http.ResponseCallback(IAsyncResult result, Action`1 callback)
Which seems to suggest RestSharp is using HTTP keep-alive for connections. Is there a way to control this behaviour? I cannot seem to locate any setting to instruct RestSharp to not use keep-alive.
Beyond that, am also trying to gain better understanding on how to investigate further on the actual issue of server breaking those connections? Is it simply a matter of client issuing accumulating more connections than the server can deal with? (since it cannot keep up with its response rate)

After additional investigation and tinkering with the test client code, I think I've come to some understanding what is happening.
By adding a monitoring count to the number of open HTTP/TCP connections to the server, it can be observed that RestSharp keeps HTTP connections around in keep-alive state and reuses them for subsequent requests. For a request rate and throughput that is sustainable, there is no problem; RestSharp can reuse a certain pool of connections and keep them alive perpetually.
But for rates that server might not fulfil from time to time, the client has to open more connections since previous HTTP requests have not completed; resulting in a jump in open connections. Later on, if it ends up reusing a reserved HTTP connection that it thinks the server is still honouring, it would end up with that "A connection that was expected to be kept alive was closed by the server." message.

I solved it with this simple line
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;

The solution for this will be setting keep-alive to false.
var client = new RestClient("https://test.com");
client.ConfigureWebRequest((r) =>
{
r.ServicePoint.Expect100Continue = false;
r.KeepAlive = false;
});
After this change, each connection will be closed after getting the response. This will increase a bit of effort of opening the connection each time. However, the weird unexpected connection disclosed error will be prevented.

Related

SignalR - Unhandled Rejection (Error): WebSocket closed with status code: 1000 ()

An asp.net core application with react+redux on the client side, using signalR.
Getting the following error on the client side:
Unhandled Rejection (Error): WebSocket closed with status code: 1000 ().
Seems like this is a "normal closure", but there's no code to close the connection.
The application sends small images at 60 FPS per viewport, in several viewports. This utilizes the JS thread almost completely, to the extent that I'd assume that it may prevent signalR from maintaining keep-alive.
Tried setting the timeouts in the server for signalR to their max value, that did not prevent the issue from recurring.
What is it that could cause the signalR socket to close without invoking the close and without an error message?
I'm guessing the browser or the server could close out of self-preservation or reaching set limits.
Most likely: The default maximum size of a hub message (MaximumReceiveMessageSize) is 32 KB, and a image could easily surpass this. You could turn on EnableDetailedErrors to see if there's more info.
If the browser is unable to send quickly enough, it will need to buffer and this buffer can't grow infinitely. You could also run into some sort of anti-malware protection based either on hogging the JS thread (maybe use workers?) or on using too much network I/O. The server can also close for similar reasons.
As for why the error message is vague: The browser literally can't give you too much feedback about this - see the warning text before 9.3.4. Edit: this is wrong and only applies to close code 1006.
To solve the issue, I turned on the logs as Jesper suggested.
The issue was that I was cancelling a CancellationToken passed to the SendAsync method. For some odd reason cancelling the send closes the socket (I'd expect it to only cancel the specific message, not close the connection).

Connection to database not closing

I'm using "redis-rs" for rust and I tested sending a few thousand requests locally
and it works really well at the few first, except that at some point it stops accepting requests
and starts showing me this error when I send a command to redis:
"Only one usage of each socket address (protocol/network address/port) is normally permitted"
I am opening a client and a connection on every request to the http server that takes them in,
that's probably a bad idea in the first place, but shouldn't the connections stop existing and close after the function that opened it ends?
Is there any better solution, like some kind of global connection?
thanks
Well if it is an http server, the crate you are using likely is doing multithreading to handle requests. It is possible that one thread got caught in the process of closing the connection just as another thread began processing the next request.
Or in your case, maybe the remote database has not finished closing the previous request by the time the next connection is created. Either way, it's easier to think of it as a race condition between threads.
Since you don't know what thread will request a connection next, it may be better to store the connection as a global resource. Since I assume a mutex lock is faster than opening and closing a socket, I used lazy_static to create a single thread safe connection.
use lazy_static::lazy_static;
use parking_lot::Mutex;
use std::sync::Arc;
lazy_static! {
pub static ref LOCAL_DB: Arc<Mutex<Connection>> = {
let connection = Connection::open("local.sqlite").expect("Unable to open local DB");
connection.execute_batch(CREATE_TABLE).unwrap();
Arc::new(Mutex::new(connection))
};
}
// I can then just use it anywhere in functions without any complications.
let lock = LOCAL_DB.lock();
lock.execute_batch("begin").unwrap();
// etc.

ASP.NET Core middleware with unbounded output

I have a piece of ASP.NET Core middleware that produces an unbounded amount of data when it takes over processing a request. It goes into a loop that calls await context.Response.Body.WriteAsync as long as it is allowed to. If the caller stays connected, it expects to keep receiving data. This service is being hosted with Kestrel, and it seems to be working properly as described so far. But, what I am finding is that when the caller disconnects, Kestrel doesn't seem to notice and continues pumping output from the middleware. That output isn't going anywhere, because the memory usage of the process isn't going up, and at the same time netstat -an doesn't show the connection any more. But, the middleware just keeps on chugging away.
For typical HTTP requests, this wouldn't be a terribly serious issue, because most of the time the client doesn't disconnect when it has only read part of the request, and in those cases where it does, the response is finite in size anyway. But the pattern with this endpoint is that the data is conceptually infinite in length, and the caller stays connected for as long as it wants and then signals that it no longer wants further data by disconnecting.
These images illustrate the problem:
https://imgur.com/a/9Qp7VV3
How can I make it so that the middleware notices when the client disconnects?
I also posted this as an issue on the aspnetcore GitHub repo, and I got a reply that explained the problem and provided a solution:
https://github.com/dotnet/aspnetcore/issues/22156
Basically, for better or worse, the ASP.NET Core infrastructure suppresses all errors from write operations, and so if the request has been aborted, calling context.Response.Body.WriteAsync fails silently. I personally think there's a mistake in there somewhere, but the rationale given is that this reduces exception/log spam from behaviours over which the server has no control.
Because of this, if you're writing a loop like mine, you have to explicitly check for the request having been aborted. The context provides a CancellationToken that is used to capture aborts. You can also use this CancellationToken on other actions the handler is doing that aren't within the scope of the request context.
My data pump looks like this now:
while (true)
{
int bytesRead = await responseStream.ReadAsync(buffer, 0, buffer.Length, context.RequestAborted);
if (bytesRead < 0)
throw new Exception("I/O error");
if (bytesRead == 0)
break;
await context.Response.Body.WriteAsync(buffer, 0, bytesRead);
if (context.RequestAborted.IsCancellationRequested)
break;
_statusConsoleSender.NotifyRequestProgress(requestID, bytesRead);
}

Webjob Errors since move to Azure Sql v12

Each morning at 4am a Scheduled Task creates around 500 messages on a message queue. Each message is an id of an email to send. Each message is picked up and a url created and sent via await HttpClient.GetAsync(url) The url target then creates and sends the email. This has worked well for months.
I've just upgraded to SQL Azure v12 and all is now not well.
The very first messages to be processed (after 2 minutes running time) threw a
"System.Threading.Tasks.TaskCanceledException"
I'm also seeing
"System.Data.Entity.Core.EntityException: The underlying provider
failed on Open. ---> System.InvalidOperationException: Timeout
expired. The timeout period elapsed prior to obtaining a connection
from the pool. This may have occurred because all pooled connections
were in use and max pool size was reached."
and a couple of
"The timeout period elapsed prior to completion of the operation or
the server is not responding. This failure occurred while attempting
to connect to the routing destination. The duration spent while
attempting to connect to the original server was - [Pre-Login]
initialization=6; handshake=426; [Login] initialization=0;
authentication=0;"
The webjob that sends the request to the api is awaiting a response. I'm wondering if because it's async, while awaiting the response the thread is freed to go off and processes another queue item - and therefore creates another api request, essentially this hits the api again and again until there are so many requests being processed by the api all the theads are in use - and that I might be better off NOT making the webjob async because then it (the 'trapped' thread) would send a request only after the first request completes? Is that right? edit: actually the IIS logs suggest that the api requests are not all happening at once. So my question is "what should I look at next? Are these common SQL v12 errors or is the recent upgrade a red-herring?"
just to clarify, the webjob that fires in response to the queue message simply does:
using (HttpClient client = new HttpClient())
{
response = await client.GetAsync(url);
}
and hits the web api of an Always On standard tier azure website. Database DTU % is about 25% while this happens.
edit:
"Gateway no longer provides retry logic in V12 Before version V12,
Azure SQL Database had a gateway that acted as a proxy to buffer all
interactions between the database and your client program. The gateway
provided automated retry logic for some transient errors.
V12 eliminated the gateway. Now your program must more fully handle
transient errors."
adding DbConfiguration class for SqlAzureExecutionStrategy. Will so how it runs tonight.
Adding the EF retry config class fixed the transient errors. The cancelled tasks are a different issue (new question)
//https://msdn.microsoft.com/en-us/data/jj680699
public class SqlAzureConfiguration : DbConfiguration
{
public SqlAzureConfiguration()
{
this.SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy());
}
}
and in web.config (because I have multiple contexts)
<entityFramework codeConfigurationType="Abc.DataService.SqlAzureConfiguration, Abc.DataService">

Reliable session faulting for unknown reason

I am trying to achieve the following - one client-side proxy instance (kept open) accessed by multiple threads using a reliable session. What I have managed so far is to have either A) a reliable session with a client-side proxy which is created and disposed per call or B) what I aim for, but without a reliable session.
When I enable reliable sessions on my binding however, the following behaviour is exhibited:
Client-side
Upon application startup everything appears to work fine until roughly 18 messages in to the WCF session. I firstly get the proxy.InnerChannel.Faulted event raised, then an exception is caught at the point where I am calling the method on the proxy. The exception is a System.TimeoutException, with message:
"The request channel timed out while waiting for a reply after
00:00:59.9062512. Increase the timeout value passed to the call to
Request or increase the SendTimeout value on the Binding. The time
allotted to this operation may have been a portion of a longer
timeout."
The inner exception has a similar message:
"The request operation did not complete within the allotted timeout of
00:01:00. The time allotted to this operation may have been a portion
of a longer timeout."
With the method at the top of the inner stack trace being:
System.ServiceModel.Channels.ReliableRequestSessionChannel.SyncRequest.WaitForReply(TimeSpan timeout)
I then call proxy.Close followed by proxy.Abort (catching and ignoring exceptions). If I utilize the default settings (i.e. have simply <reliableSession/>), then calling proxy. Close results in another System.Timeout exception (although this time the allotted timeout is 00:00:00), however if I override the defaults as specified above no exception is thrown.
Service-side
Utilizing WCF tracing I get a System.ServiceModel.CommunicationException, with message:
"The sequence has been terminated by the remote endpoint. The session
has stopped waiting for a particular reply. Because of this the
reliable session cannot continue. The reliable session was faulted."
And a stack trace ending at:
System.ServiceModel.AsyncResult.End[TAsyncResult](IAsyncResult result)
When remotely attaching to the server I get the same message, which occurs when code execution steps over the return statement of my service in the service call which causes the error.
The puzzling thing to me is that the service is stable and runs with options A) or B) as decribed at the beginning of my post, and occurs after a varying number of messages (around 18). The former fact points to there being nothing wrong with the code (indeed I have checked that no exceptions are thrown), and the latter just serves to confuse me and is why I modified the settings on the reliable session binding.
I am quite stuck on this. Can anyone suggest why the reliable session would fault in such a way?
You need to overide the default ,and set your timeout higher or lower depends on cause,it seems the timout is causing an exception just after some other program has started or stopped just a millisecond before the exception
OR most likely cause
your alloted timeouts may be added as a continous single timeout of 18 min or 18 calls ..plus other usage times are added together as one complete time out .which may be why it asking for more time.
in any case ,you have to staticly set your own settings because automatic default will always over ride any changes you made..
type in your local host http binding name and set your closetimeout at maybe 5.00 min
and maybe even change the request time as well . Requesttimeout 2.00 min
closeTimeout="00:05:00"