An asp.net core application with react+redux on the client side, using signalR.
Getting the following error on the client side:
Unhandled Rejection (Error): WebSocket closed with status code: 1000 ().
Seems like this is a "normal closure", but there's no code to close the connection.
The application sends small images at 60 FPS per viewport, in several viewports. This utilizes the JS thread almost completely, to the extent that I'd assume that it may prevent signalR from maintaining keep-alive.
Tried setting the timeouts in the server for signalR to their max value, that did not prevent the issue from recurring.
What is it that could cause the signalR socket to close without invoking the close and without an error message?
I'm guessing the browser or the server could close out of self-preservation or reaching set limits.
Most likely: The default maximum size of a hub message (MaximumReceiveMessageSize) is 32 KB, and a image could easily surpass this. You could turn on EnableDetailedErrors to see if there's more info.
If the browser is unable to send quickly enough, it will need to buffer and this buffer can't grow infinitely. You could also run into some sort of anti-malware protection based either on hogging the JS thread (maybe use workers?) or on using too much network I/O. The server can also close for similar reasons.
As for why the error message is vague: The browser literally can't give you too much feedback about this - see the warning text before 9.3.4. Edit: this is wrong and only applies to close code 1006.
To solve the issue, I turned on the logs as Jesper suggested.
The issue was that I was cancelling a CancellationToken passed to the SendAsync method. For some odd reason cancelling the send closes the socket (I'd expect it to only cancel the specific message, not close the connection).
Related
I have multi-threaded (linux) server that registers async_writes and async_reads on the same native file descriptor through a socket object. I noticed under very heavy load when the server was dropping connections, on a very rare occasion a client would receive a garbled first message.
Tracking it down, the async_read detects an error on the socket and closes the socket. This closes the native file descriptor. If that file descriptor is reused before the original async_write has a chance to fire, it will find its native file descriptor valid and proceed to send its message (which is really a message from a previous session).
The only way I could see to fix this was to make the the async_read and async_write callbacks know if there were other callbacks registered and only close the socket if it were the last one.
Has anyone seen this issue?
Haven't seen it but it sounds plausible. Although I am surprised to see a new native file descriptor getting the exact same number than a recently closed descriptor.
You might want to put the socket in a shared_ptr and query shared_ptr::is_unique in both async_read and async_write. That'd be the easiest way to let the other callback know if both callbacks are registered. If is_unique is true you can be sure that no one else is still using this socket and can close it.
So if the connection gets dropped, async_read can check is_unique. If it is true, close the socket. And let go of the shared_ptr in either case.
Then, when async_write also fires it will find is_unique true and can close the socket, unless async_read has already closed it.
The only drawback is of course: async_write has to fire also (perhaps with an error code) in order to close the socket.
Oh I've seen exactly this in production code. (Much fun: we would be talking a proprietary protocol on a TCP socket to mysql server). The problem is when some thread "handles" (mis-handles) errors by closing sockets using the native handle (fd). Don't. Use shutdown (perhaps with cancel) instead and let the destructor take care of close. Of course, the real problem is the non-owning copies of the handle (fd) that are the cause of the resource race.
Critical Note:
Tracking it down, the async_read detects an error on the socket and closes the socket. This closes the native file descriptor
That's patently UNTRUE for Asio itself. Perhaps you have (third-party) code in the completion handlers doing that, but as I mention above, you cannot afford to do that.
Currently looking at SSE using Angular 5 and Spring 5 webflux. The basic application is working correctly, but whilst investigating error handling we've noticed that the EventSource in the angular application doesn't see any difference between spring closing the connection due to reaching the end of the Flux stream of data, and an error occuring (e.g. terminating the application mid transfer).
The examples we've based our investigations on are the following.
https://thepracticaldeveloper.com/2017/11/04/full-reactive-stack-ii-the-angularjs-client/
http://javasampleapproach.com/reactive-programming/angular-4-spring-webflux-spring-data-reactive-mongodb-example-full-reactive-angular-4-http-client-spring-boot-restapi-server#25_Service
Both onerror and the completion function in the EventSource get called either when the data is send by spring successfully and reaches the end of the stream, which then closes the connection, or when we ctrl+c the application mid stream, or when we throw an exception randomly in the middle of sending data.
The EventSource argument just contains {type: 'error'} in all 3 cases.
From what I understand, SSE streams are mainly about infinite streams; the spec doesn't seem to offer a standard way of signaling the end of a stream to clients (they will try to reconnect by default).
You could implement that in your controller, by returning a Flux<ServerSentEvent> and terminating the flux with a custom event:
return Flux.concat(
fetchUserEvents(),
Flux.just(ServerSentEvent.builder().event("disconnect").build()
);
On the client side, you could correctly close the connection when you're done, leaving all other use cases as errors and letting the browser reconnecting automatically:
evtSource.addEventListener("disconnect", function(e) {
evtSource.close();
}, false);
This is rather annoying, so I've raised SPR-16761 to improve SSE support there.
Please excuse the Obvious Self-Q/A, but this information is widely misunderstood, and almost always incorrectly answered. So I Wanted to place this information here for people searching for a definitive answer to this problem.
Even so, there's still some information I haven't been able to nail down. I will put this towards the end of the question (skip to that if you are not interested in the preamble).
How do I correctly configure a WCF NetTcp Duplex Reliable Session?
There are many questions and answers regarding this topic, and nearly all of them suggest setting inactivityTimeout="Infinite" in your configuration. This doesn't really seem to work correctly, particularly for the case of NetTcp (It may work correctly for WSDualHttp Bindings, but I have never used those).
There are a number of other issues that are often associated with this: Including, Channel not faulting after client or server unexpectedly disconnected, Channel disconnecting after 10 minutes, Channel randomly disconnecting... Channel throwing exception when trying to open... Unable to configure Metadata on same endpoint...
Please note: There are two concepts that are important below. Infrastructure messages are internal to the way WCF communicates, and are used by the framework to keep things running smoothly. Operation messages are messages that occur because your app has done something, like send a message across the wire. Infrastructure messages are largely invisible to your app (but they still occur in the background) while operation messages are the result of an action your app has taken.
Information I have figured out, through hard won trial and error.
Infinite does not appear to be a valid configuration setting in all situations (and certainly, the visual studio validation schema doesn't know about it).
There are two special configuration converters, called InfiniteIntConverter and InfiniteTimeSpanConverter which will sometimes work to convert the value Infinite to either Int.MaxValue or TimeSpan.MaxValue, but I haven't yet figured out the situations in which this appears to be valid as sometimes it works, and sometimes it doesn't. What's more, it appears that some libraries will allow Infinite in the config, while others will not, so you can succeed in one part of a configuration, but fail in another.
You must configure BOTH inactivityTimeout and receiveTimeout, on both the client and the server. While these values do not HAVE to be the same, they probably should be as they will probably cause confusion if they are not. (technically, you can leave inactivityTimeout to its default value if you want, but you should be aware of its value, and what it does)
inactivityTimeout should NEVER be set to a large value, much less Infinite or TimeSpan.MaxValue.
inactivityTimeout has two functions (and this is not widely understood). The first function defines the maximum amount of time that can elapse on a channel without receiving any "infrastructure" or "operation" messages. The second function defines the time period in which infrastructure messages are sent (half the time specified). If no infrastructure or operation messages have been received during the timeout period, the connection is aborted.
receiveTimeout specifies the maximum amount of time that can elapse between operation messages only. This value can be set to a large value, such as TimeSpan.MaxValue (particularly if your channel runs internally over a trusted network or over a vpn). This value is what defines how long the reliable session will "stay alive" if there is no activity between client and server (other than infrastructure messages). ie, your client does not call any methods of the interface, and your server does not call back into the client.
setting a short inactivityTimeout and a large receiveTimeout keeps your reliable session "tacked up" even when there is no operational activity between your client and server. The short inactivity timeout (i like to keep the default 10 minutes or less) sends infrastructure "ping" messages to keep the TCP connection alive while the long receive timeout keeps the reliable session active. while at the same time providing a reasonable timeout in case of disconnection.
If you set inactivityTimeout to a large value, then the reliable session will not be reliable as it has no way to keep the Tcp connection alive, nor does it have any way to verify the integrity of the connection. It won't know if a user has disconnected unexpectedly until you try and send a message to that client and find out the connection is no longer there. This is why many people who use Infinite for this setting resort to creating a "Ping" method in their service, which is completely unnecessary if you've configured these settings correctly.
If you set inactivityTimeout to a value larger than receiveTimeout then it will likewise also be unreliable, as you will still be governed by the receiveTimeout for operation messages. ie. if you forget to set receiveTimeout and leave it at the default 10 minutes, then if the user is idle for 10 minutes, the connection will be aborted.
When the client or server unexpectedly disconnects (app crashes, network failure, someone trips over the power cord, etc..), the other side may not notice right away. I have attached various ChannelFaulted event handlers in various test situations, and sometimes the connection is faulted right away... other times it doesn't seem to fault at all. What i have discovered through trial and error is that the when it doesn't seem to fault, it will actually fault after the inactivityTimeout expires on that end. (so if it's set to 10 minutes, then after 10 minutes it will call the ChannelFaulted event).
I have not yet figured out why in some situations it notices the disconnection right away, and others it waits for the timer to expire. In both cases, I notice internal first chance communication exceptions thrown and handled by the framework, and there are calls to Abort the connection... but somehow the call to the event handler gets lost and it must wait for the timeout. My suspicion is this is somehow thread related.
When trying to configure Metadata to work across the NetTcp channel, I have had sporadic results. Sometimes it works, sometimes it doesn't. I've read many reports that Metadata doesn't work over NetTcp and that you have to use an Http channel for the Metadata, but I have in fact had it work on several occasions using the net.tcp:// url to generate the proxy. Then I would change something, recompile and it would no longer work. Changing things back, it wouldn't work again. So it was very confusing what magic incantation was necessary to make Metadata function over net.tcp, shared with the endpoint on the same port (obviously with a different address).
When configuring both a NetTcp and Metatdata endpoint on the same service, and specifying non-default settings for connection parameters like listenBacklog, and maxConnections, you also need to make sure the Metadata endpoint uses the same settings, which typically means you have to define a custom binding, since these settings are not available from the standard tcp mex binding. This includes setting listenBacklog and maxPendingConnections on tcpTransport, and groupName and maxOutboundConnectionsPerEndpoint on connectionPoolSettings.
The default setting for the Ordered setting of ReliableSession is True. This uses a lot more overhead than turning it off. If you don't need ordered messages, i would suggest turning it off (need to set this on both sides)
-
Configuration I still need to understand:
How do I correctly configure the shared net.tcp Metadata endpoint? (I will add an example when I get a chance) Currently, i'm specifying an http get url to bypass the problem. It's so inconsistent as to why it sometimes works and sometimes does not. I kept getting the error `The URI Prefix is not recognized' when generating the proxy in Visual Studio.
Why does WCF sometimes Fault the channel immediately upon disconnect, and sometimes waits for inactivityTimeout to expire? What controls/causes one vs the other behavior?
I have two applications talking to each other over SSL. The client is running on a windows machine, the server is a linux based application. The client is sending a large amount of data to the server on startup. The data is sent in ~4000byte chunks over to the server that contains 30 entries. I have to send about 50000 entries over.
During that transmission the server sends a message to the client, the message size is ~4000bytes. After that happens, the SSL_write() on the client side begins to return error of SSL_ERROR_WANT_WRITE. The client sleeps for 10ms, and retries the SSL_write with the exact same parameters, however, the SSL_write fails infinitely. Subsequently it aborts. If it tries to send a new message, I get an error indicating I am not sending the same aborted message from earlier.
error:1409F07F:SSL routines:SSL3_WRITE_PENDING: bad write retryā€¯
The server eventually kills the connection since it has not heard from the client for 60s and re-establishes a new one. This is just an FYI, the real issue is how can I get SSL_write to resume.
If the server does not send a request during the receive the problem goes away. If I shrink the size of the request from 16K to 100 bytes the problem does not happen.
The SSL CTX MODE is set to SSL_MODE_AUTO_RETRY and SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER.
Does anyone have an idea what might cause a simultaneous transmission from both sides with large information can cause this failure. What can I do to prevent it if this is a limitation other than capping the size that goes out from the server to the client. My concern is that if the client is not sending anything the throttling I applied to avoid this issue is a waste.
On the client side I tried to perform an SSL_read to see if I need to read during a write even though I never receive an SSL_ERROR_PENDING_READ, but the buffer is not that big anyway. ~1000bytes in size.
Any insight on this would be appreciated.
SSL_ERROR_WANT_WRITE - This error is returned by OpenSSL (I am assuming you are using OpenSSL) only when socket send gives it an EWOULDBLOCK or EAGAIN error. The socket send will give a EWOUDLBLOCK error when the send side buffer is full, which in turn means that your Server is not reading the messages sent from Client.
So, essentially, the problem lies with your Server which is not reading the messages sent to it. You need to check your server and fix it, which will automatically fix your client problem.
Also, why have you set the option "SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER"? SSL always expects that the record which it is trying to send should be sent completely before the next record can be sent.
As it turns out that with both the client and server side app, the read and writes are processed in one thread. In a perfect storm as I described above, the client is busy writing (non blocking). The server then decides to do a write a large set of messages of its own in between processing its rx buffers. The server tx is a blocking call. The server gets stuck writing, starves the read, the buffers fill up and we have a deadlock scenario.
The default windows buffer is 8k bytes so it doesn't take much to fill it up.
The architecture should be such that there is a separate thread for the rx and tx processing on both sides. As a short cut/term fix, once can increase the rx buffers and rate limit the tx side to prevent the deadlock.
I am trying to achieve the following - one client-side proxy instance (kept open) accessed by multiple threads using a reliable session. What I have managed so far is to have either A) a reliable session with a client-side proxy which is created and disposed per call or B) what I aim for, but without a reliable session.
When I enable reliable sessions on my binding however, the following behaviour is exhibited:
Client-side
Upon application startup everything appears to work fine until roughly 18 messages in to the WCF session. I firstly get the proxy.InnerChannel.Faulted event raised, then an exception is caught at the point where I am calling the method on the proxy. The exception is a System.TimeoutException, with message:
"The request channel timed out while waiting for a reply after
00:00:59.9062512. Increase the timeout value passed to the call to
Request or increase the SendTimeout value on the Binding. The time
allotted to this operation may have been a portion of a longer
timeout."
The inner exception has a similar message:
"The request operation did not complete within the allotted timeout of
00:01:00. The time allotted to this operation may have been a portion
of a longer timeout."
With the method at the top of the inner stack trace being:
System.ServiceModel.Channels.ReliableRequestSessionChannel.SyncRequest.WaitForReply(TimeSpan timeout)
I then call proxy.Close followed by proxy.Abort (catching and ignoring exceptions). If I utilize the default settings (i.e. have simply <reliableSession/>), then calling proxy. Close results in another System.Timeout exception (although this time the allotted timeout is 00:00:00), however if I override the defaults as specified above no exception is thrown.
Service-side
Utilizing WCF tracing I get a System.ServiceModel.CommunicationException, with message:
"The sequence has been terminated by the remote endpoint. The session
has stopped waiting for a particular reply. Because of this the
reliable session cannot continue. The reliable session was faulted."
And a stack trace ending at:
System.ServiceModel.AsyncResult.End[TAsyncResult](IAsyncResult result)
When remotely attaching to the server I get the same message, which occurs when code execution steps over the return statement of my service in the service call which causes the error.
The puzzling thing to me is that the service is stable and runs with options A) or B) as decribed at the beginning of my post, and occurs after a varying number of messages (around 18). The former fact points to there being nothing wrong with the code (indeed I have checked that no exceptions are thrown), and the latter just serves to confuse me and is why I modified the settings on the reliable session binding.
I am quite stuck on this. Can anyone suggest why the reliable session would fault in such a way?
You need to overide the default ,and set your timeout higher or lower depends on cause,it seems the timout is causing an exception just after some other program has started or stopped just a millisecond before the exception
OR most likely cause
your alloted timeouts may be added as a continous single timeout of 18 min or 18 calls ..plus other usage times are added together as one complete time out .which may be why it asking for more time.
in any case ,you have to staticly set your own settings because automatic default will always over ride any changes you made..
type in your local host http binding name and set your closetimeout at maybe 5.00 min
and maybe even change the request time as well . Requesttimeout 2.00 min
closeTimeout="00:05:00"