BITMEX Websocket connection drops / disconnects despite keep alive burning through connection pool - api

Every once in a while Bitmex disconnects our websocket connection which forces us to reconnect. However, they provide a connection pool of 40 connections per hour. In times of low volatility it seems not to be a problem AT ALL, however as soon as trading activity goes up, we are running through these 40 connections in no time leaving our connection dead eventually.
We do have a keep-alive but it does not solve the problem at all.
We haven’t seen any specifics on the API documentation regarding how to deal with this problem, or the specific reasons we get so many close opcodes whenever the volatility raises
Does anyone know if we are doing something wrong?
EDIT: heartbeat is also in place

I suggest implementing heartbeats as per https://www.bitmex.com/app/wsAPI#Heartbeats
In general, WebSocket connections can drop if the connection remains idle for too long without transmitting any data.

Related

Prevent disconnections caused by intermittent packet loss

Been trying to find the issue here (Unity devs make it seem it's a design choice somehow), but I've been trying to simulate lag to see how my game works over the internet (I've also tried connecting a client from a tethered hotspot and it disconnected as well) and found that any amount of packet loss (even just 1%) will eventually lead to a disconnection within a couple minutes.
My game receives data just fine, as you can see the remote players moving around just fine until the disconnection. I think it has something to do with UNET's heartbeat packets not being resent for some reason, and has soon as it drops the first one you get disconnected.
If this is a design choice, I can't see how Unity would think you could have a rock solid connection when you obviously have some who could be playing off a cellular connection. Anyone know anything about this? I've tried asking on the Unity forums as well, and there's been no replies for over a week now.
Thanks

What is a reasonable value for heartbeat in RabbitMQ?

RabbitMQ allows you to "heartbeat" a connection, i.e. from time to time the client and the server check (using empty messages) that the other party is still there and available. So far, so good.
Unfortunately, I was not able to find a place in the documentation where a suggestion is made what a reasonable value for this is. I know that you need to specify the heartbeat in seconds, but what is a real-world best practice value?
Obviously, it should not be too often (traffic), but also not too rare (proxies, …). Any suggestions?
Is 15 seconds fine? 30? 60? …?
This answer if for RabbitMQ < 3.5.5, for newer versions see the answer from #bmaupin.
It depends on your application needs. Out of the box it is 10 min for RabbitMQ. If you fail to ack heartbeat twice (20min of inactivity), connection will be closed immediately without sending any connection.close method or any error from the broker side.
The case to use heartbeat is firewalls that closes inactive for a long time connection or some other network settings that doesn't allow you to have waiting connections.
In fact, hearbeat is not a must, from RabbitMQ config doc
heartbeat
Value representing the heartbeat delay, in seconds, that the server sends in the connection.tune frame. If set to 0, heartbeats are disabled. Clients might not follow the server suggestion, see the AMQP reference for more details. Disabling heartbeats might improve performance in situations with a great number of connections, but might lead to connections dropping in the presence of network devices that close inactive connections.
Default: 580
Note, that having hearbeat interval too short may result in significant network overhead. Keep in mind, that hearbeat frames are sent when there are no other activity on the connection for a hearbeat time interval.
The RabbitMQ documentation now provides a recommended heartbeat timeout value between 5 and 20 seconds:
Setting heartbeat timeout value too low can lead to false positives (peer being considered unavailable while it really isn't the case) due to transient network congestion, short-lived server flow control, and so on. This should be taken into consideration when picking a timeout value.
Several years worth of feedback from the users and client library maintainers suggest that values lower than 5 seconds are fairly likely to cause false positives, and values of 1 second or lower are very likely to do so. Values within the 5 to 20 seconds range are optimal for most environments.
Source: https://www.rabbitmq.com/heartbeats.html#false-positives
In addition, as of RabbitMQ 3.5.5 the default heartbeat timeout value is 60 seconds (https://www.rabbitmq.com/heartbeats.html#heartbeats-timeout)

How to prevent an I/O Completion Port from blocking when completion packets are available?

I have a server application that uses Microsoft's I/O Completion Port (IOCP) mechanism to manage asynchronous network socket communication. In general, this IOCP approach has performed very well in my environment. However, I have encountered an edge case scenario for which I am seeking guidance:
For the purposes of testing, my server application is streaming data (lets say ~400 KB/sec) over a gigabit LAN to a single client. All is well...until I disconnect the client's Ethernet cable from the LAN. Disconnecting the cable in this manner prevents the server from immediately detecting that the client has disappeared (i.e. the client's TCP network stack does not send notification of the connection's termination to the server)
Meanwhile, the server continues to make WSASend calls to the client...and being that these calls are asynchronous, they appear to "succeed" (i.e. the data is buffered by the OS in the outbound queue for the socket).
While this is all happening, I have 16 threads blocked on GetQueuedCompletionStatus, waiting to retrieve completion packets from the port as they become available. Prior to disconnecting the client's cable, there was a constant stream of completion packets. Now, everything (as expected) seems to have come to a halt...for about 32 seconds. After 32 seconds, IOCP springs back into action returning FALSE with a non-null lpOverlapped value. GetLastError returns 121 (The semaphore timeout period has expired.) I can only assume that error 121 is an artifact of WSASend finally timing out after the TCP stack determined the client was gone?
I'm fine with the network stack taking 32 seconds to figure out my client is gone. The problem is that while the system is making this determination, my IOCP is paralyzed. For example, WSAAccept events that post to the same IOCP are not handled by any of the 16 threads blocked on GetQueuedCompletionStatus until the failed completion packet (indicating error 121) is received.
My initial plan to work around this involved using WSAWaitForMultipleEvents immediately after calling WSASend. If the socket event wasn't signaled within (e.g. 3 seconds), then I terminate the socket connection and move on (in hopes of preventing the extensive blocking effect on my IOCP). Unfortunately, WSAWaitForMultipleEvents never seems to encounter a timeout (so maybe asynchronous sockets are signaled by virtue of being asynchronous? Or copying data to the TCP queue qualifies for a signal?)
I'm still trying to sort this all out, but was hoping someone had some insights as to how to prevent the IOCP hang.
Other details: My server application is running on Win7 with 8 cores; IOCP is configured to use at most 8 concurrent threads; my thread pool has 16 threads. Plenty of RAM, processor and bandwidth.
Thanks in advance for your suggestions and advice.
It's usual for the WSASend() completions to stall in this situation. You won't get them until the TCP stack times out its resend attempts and completes all of the outstanding sends in error. This doesn't block any other operations. I expect you are either testing incorrectly or have a bug in your code.
Note that your 'fix' is flawed. You could see this 'delayed send completion' situation at any point during a normal connection if the sender is sending faster than the consumer can consume. See this article on TCP flow control and async writes. A better plan is to use a counter for the amount of oustanding writes (per connection) that you want to allow and stop sending if that counter gets reached and then resume when it drops below a 'low water mark' threshold value.
Note that if you've pulled out the network cable into the machine how do you expect any other operations to complete? Reads will just sit there and only fail once a write has failed and AcceptEx will simply sit there and wait for the condition to rectify itself.

Sockets and Timeout Errors

I'm building a program that has a very basic premise.
For X amount of Objects
Open Conection
Perform Actions
Close Connection
Open Next
Each of these connections is made on a socks5 proxy and after about the 200th connection I get "The operation has timeout" errors. I have tested all the proxies and they work just fine and the really wierd thing is if I shut down the program and restart it again the problems go away. So I'm left to believe that when I'm closing my connection that its really not closing the connection and the computer is being overloaded. How cna i force all socks connections to close that are associated with a class?
socket.Shutdown(SocketShutdown.Both);
//socket.Close();
socket.Disconnect(true);
socket = null;
In reponse to a tip to use netstat I checked it out. I noticed connections where lingering but finally would go away. However, the problem still remains, after about the 100th connection, 5 second pause between connections. I get timeout errors. If I close the proram and restart it they go away. So for some reason I feel that the connections are leaving behind something. Netstat dosent show it. I've even tried adding the instances of the client to a list and each time one is finish remove it from the list and then set it to null. Is there a way to kill a port? Maybe that would work, if I killed the port the connection was being made on? Is it possible this is a Windows OS issue? Something thats used to prevent viruses? I'm making roughly a connection a minute and mainint that connection for about 1 minute before moving on to the next with atleast 20 concurent if not more connections at the same time. What dosent make sense to me is that shuting down the program seem sto clean up whatever resources I'm not cleaning up in my code. I'm using an class I found on the internet that allows socks5 proxies to be used with the socket class. So i'm really at a loss, any advice or direction to head woudl be great? It dosent have to be pretty. I'm have tempted to wite to a text file where I was in my connection list and shutdown the program and then have anohter program restart it to pick up where it left off at this point.
Sounds like your connections aren't really closed. Without seeing the code, it's hard to troubleshoot this; can you boil it down to a program that loops through an open-close sequence?
If the connection doesn't close as expected, you can probably see what state it is in with netstat. Do you have 200 established connections, or are they in some sort of closing state?
Sockets implement IDisposable. Only calling Dispose or Close will cause the socket to give give up the unmanaged resources in a deterministic manner. This is causing you to run out of the resources that the socket uses (probably a network handle of some sort), even though you may not any managed object useing them.
So you should probably just do
socket.Shutdown(SocketShutdown.Both);
socket.Close();
To be clear setting the socket to Null does not do this because setting the socket to null only causes the sockets to be placed on the freachable queue, to have its finalizer called when it gets around to processing the freachable queue.
You may want to review this article which gives a good model on how Unmanaged resources are dealt with in .NET
Update
I checked and Sockets do indeed contain a handle to a WSASocket. So unless you call close or dispose you'll have to wait until the Finalizers run (or exiting the appplication) for you to get them back.

WCF: What happens if a channel is established but no method is called?

In my specific case: A WCF connection is established, but the only method with "IsInitiating=true" (the login method) is never called. What happens?
In case the connection is closed due to inactivity after some time: Which setting configures this timeout? Is there still a way for a client to keep the connection alive?
Reason for this question: I'm considering the above case as a possible security hole. Imagine many clients connecting to a server without logging in thus preventing other clients from connecting due to bandwidth problems or port shortage or lack of processing power or ...
Am I dreaming, or is this an actual issue?
The WCF client side proxy will close the connection (if open) when it goes out of scope, e.g. when the method it is being used in terminates.
If you're using sessions (but that only kicks in if you actually have indeed established a session - after a method has been called), there's a inactivityTimeout setting in the sessions, both on the client and the server side - the smaller value "wins", so to speak.
If your "concurrentSessions" settings is quite low on your server, this might be an issue - but again, this only kicks in when there is an actual session in place, e.g. at least one method has been called - and in that case, the inactivity timeout on the session will clear out those unused sessions as needed.