Our Application (which uses existing Erlang OTP R15B01 modules) sends https request to external authentication server and it gets reply and seems work fine under normal cases. But under heavy loads some requests are failing since they are consuming more time to do SSL handshake.
I have observed the following things during SSL handshake:
client is taking (our application) nearly 80 sec to send the certificate after server hello is done with server certificate
since our server expects to complete the request-response in 30 sec otherwise it drops the connection hence results in connection failures and affects the performance of application severely
Finally, I would like to know:
Is our application failing to invoke the client certificate quickly? I mean does httpc module do the file/IO related operations to invoke the certificates which results to slow response under heavy loads?
Does Erlang have any limitations in SSL handshake procedure?
Related
I created and manage a SOAP API built in ASP.NET ASMX. The API processes about 10,000 requests per day. Most days, about 3 request sent by the client (we only have 1 client) do not reach the web server (IIS). There is no discernible pattern.
We are actually using 2 web servers that sit behind a load balancer. From the IIS logs, I am 100% confident that the requests are not reaching either web server.
The team that manages the network and the load balancer have not been able to 'confirm or deny' whether the problem is occurring at the load balancer. They suggested it's normal for request to sometimes "get lost in the internet", and said that we should add retry logic to the API.
The requests are using TCP (and TLS). The client has confirmed that there is no problem occurring on their end.
My question is: is it normal for TCP requests to "get lost in the internet" at the frequency we are seeing (about 3 out of 10,000 per day).
BTW, both the web server and the client are located in the same country. For what it's worth, the country in question is an anglopshere country, so it's not the case that our internet infrastructure is shoddy.
There is no such thing as a TCP request getting lost since there is no such thing as a TCP request in the first place. There is a TCP connection and within this there is a TLS tunnel and within this the HTTP protocol is spoken - and only at this HTTP level there is the concept of request and response which then is visible in the server logs.
Problems can occur in many places, like failing to establish the TCP connection in the first place due to no route (i.e. no internet) or too much packet loss. There can be random problems at the TLS level caused by bit flips which cause integrity errors and thus connection close. There can be problems at the HTTP level, for example when using HTTP keep-alive and the server closing an idle connection while at the same time the client is trying to send another request. And probably more places.
The client has confirmed that there is no problem occurring on their end.
I have no idea what exactly this means. No problem would be if the client is sending the request and getting a response. But this is obviously not the case here, so either the client is failing to establish the TCP connection, failing at the TLS level, failing while sending the request, failing while reading the response, getting timeouts ... - But maybe the client is simply ignoring some errors and thus no problem is visible at the clients end.
My IIS 7.5 web server farm (2xWindows 2008 R2 physical servers using Network Load Balancing) is experiencing heavy server use and SSL/TLS requests to port 443 are timing out on what appears to be the TLS negotiation (500+ Get Requests/sec with over 20K Current Connections).
Despite the heavy load, the performance of the server hardware is fine--less than 20% processor utilization, 75% of memory still available, and virtually no processor queuing. Additionally, the bandwidth utilization is fine as well. However, during this heavy usage event, my websites stopped responding to SSL-based (https) requests and clients were unable to negotiate a TLS connection. During this same time, requests using http to the same websites were working fine and the websites were very responsive (I disabled the IIS rewrite rule from http to https). The problem may have gone away after I uninstalled my CA issued certificate and reinstalled the same one and then restarted all web services however I can't say for sure that this corrected it because I also stopped forcing the use of SSL.
In troubleshooting, the only thing I see is that my Windows event logs are filled with Event ID: 36887, which seems to be related to SSL but the meaning of the error is vague to me. This is the description of the error message:
"This error message indicates the computer received an SSL fatal alert message from the server ( It is not a bug in the Schannel or the application that uses Schannel). Sometimes is caused by the installation of third party web browser (other than Internet Explorer)."
There are hundreds of entries per minute corresponding to the time of the performance issues. After this occurred, I was told to enable the CAPI2 log but since the issue is not occurring now, I only see informational messages in this log.
What would cause this problem with TLS unable to negotiate a connection under a heavy load in my networked balanced web farm and how I can prevent this from occurring again?
I'm testing SSL/TLS stream proxying within NGINX that will connect to a web server using gnutls as the underlying TLS API. Using the command line test tool in gnutls (gnutls-serv) the entire process works, but I can't understand the logic:
the NGINX client (proxying HTTP requests from an actual client to the gnutls server) seems to want to handshake the connection multiple times. In fact in most tests it seems to handshake 3 times without error before the server will respond with a test webpage. Using wireshark, or just debugging messages, it looks like the socket on the client side (in the perspective of the gnutls server) is being closed and reopened on different ports. Finally on the successful connection, gnutls uses a resumed sessions, which I imagine is one of the previously mentioned successful handshakes.
I am failing to find any documentation about this sort of behaviour, and am wondering if this is just an 'NGINX thing.'
Though the handshake eventually works with the test programs, it seems kind of wasteful (to have multiple expensive handshakes) and implementing handshake logic in a non-test environment will be tricky without actually understanding what the client is trying to do.
I don't think there are any timeouts or problems happening on the transport, the test environment is a few different VMs on the same subnet connected between 1 switch.
NGINX version is the latest mainline: 1.11.7. I was originally using 1.10.something, and the behaviour was similar though there were more transport errors. Those errors seemed to get cleaned up nicely with upgrading.
Any info or experience from other people is greatly appreciated!
Use either RSA key exchange between NGINX and the backend server or use SSLKEYLOGFILE LD_PRELOAD for NGINX to have the necessary data for Wireshark to decrypt the data.
While a single incoming connection should generate just one outgoing connection, there may be some optimisations in NGINX to fetch common files (favicon.ico, robots.txt).
I have an client application that connects to a remote server via https for commercial purposes. This connection is using old IO (blocking connection). It normally runs smoothly.
Recently I have cloned the client thus created a new client instance, running from the same box and using the same client certificate. I'm noticing many connection timeouts from the server. I wonder if the cloning may have somehow been the cause of the timeouts and if there is a ssl issue here.
Both instances receive the following system parameters for security:
javax.net.ssl.trustStore=cacerts
javax.net.ssl.keyStore=1234567890123
javax.net.ssl.keyStorePassword=wordpass
Unfortunately the support from the server side is quite limited. I hope someone in this forum may come up with an idea.
I am writing a client for an HTTP API which is not yet publicly available.
Based on the specs I got, I have mocked a server which simulates the API, to test how my client reacts.
This server is a very simple Rack application, which currently runs on WEBRick.
The API client interacts with this fake API et performs correctly in the different tests cases.
Hopefully, I will just have to change the hostname in the config file when the API goes live.
However, I know for a fact that the API will be put under heavy load when it goes live. My client will thus most likely have to face :
HTTP timeouts
Jitter
Dropped TCP connections
503 Responses
...
I know that my client performs well in an ideal scenario, but how can I randomly (or not randomly) introduce these behaviors in my test cases, to verify that the client handles these errors correctly ?
Is there some kind of reverse proxy that can be configured to simulate these errors when serving data from a stable server on a stable network (in my case : local server on localhost) ?
You can try Net Limiter (http://www.netlimiter.com/) to shape bandwidth.
On the other hand, to make more accurate simulations you need to control both server and client side.
For instance, to simulate timeout condition your mock server can receive request from HTTP API client and then stop, hence triggering timeout on client side.
Other benefits of your own mock/test server is that you can emulate dropped TCP connections (just close newly received client connection), 50X responses, invalid responses, protocol breaking responses and much more.