TCP windows full - intermittent issues in SSL handshake - ssl

I am seeing an intermittent SSL handshake error. Looking at the tcp packets, it seems that the times when the SSL handshake fails, TCP options are missing. I have attached the wireshark screenshot for the success and failure scenario. Notice the difference in the [SYN,ACK] packet sent by the server. In success case, it has larger Window size (4380) as compared to 512 when it fails. It also has additional options like MSS and SACK_PERM.
Would anyone know why this would happen? It's the same server but it's sending different capabilities in separate scenarios. Any info in troubleshooting this issue will help. Thanks!

Related

Syslog-ng to Syslog-ng over TLS - destination not writing to disk

Trying to configure a syslog-ng server to send all of the logs that it receives, to another syslog-ng server over TLS. Both running RHEL 7. Everything seems to be working from an encryption and cert perspective. Not seeing any error messages in the logs, an openssl s_client test connection works successfully, I can see the packets coming in over the port that I'm using for TLS, but nothing is being written to disk on the second syslog-ng server. Here's the summary of the config on the syslog server that I'm trying to send the logs to:
source:
source s_encrypted_syslog {
syslog(ip(0.0.0.0) port(1470) transport("tls")
tls(key-file("/etc/syslog-ng/key.d/privkey.pem")
certfile("/etc/syslog-ng/cert.d/servercert.pem")
peer-verify(optional-untrusted)
}
#changing to trusted once issue is fixed
destination:
destination d_syslog_facility_f {
file("/mnt/syslog/$LOGHOST/log/$R_YEAR-$R_MONTH-$R_DAY/$HOST_FROM/$HOST/$FACILITY.log" dir-owner ("syslogng") dir-group("syslogng") owner("syslogng") group("syslogng"));
log setting:
log { source (s_encrypted_syslog); destination (d_syslog_facility_f); };
syslog-ng is currently running as root to rule out permission issues. selinux is currently set to permissive. Tried increasing the verbosity on syslog-ng logs and turned on debugging, but not seeing anything jumping out at me as far as errors or issues go. Also the odd thing is, I have very similar config on the first syslog-ng server and it's receiving and storing logs just fine.
Also, I should note that there could be some small typo's in the config above as I'm not able to copy and paste it. Syslog-ng allows me to start up the service with no errors with the config that I have loaded currently. It's simply not writing the data that it's receiving to the destination that I have specified.
It happens quite often that the packet filter prevents a connection to the syslog port, or in your case port 1470. In that case the server starts up successfully, you might even be able to connect using openssl s_client on the same host, but the client will not be able to establish a connection to the server.
Please check that you can actually connect to the server from the client computer (e.g. via openssl s_client, or at least with something like netcat or telnet).
If the connection works, another issue might be that the client is not routing messages to this encrypted destination. syslog-ng only performs the SSL handshake as messages are being sent. No messages would result in the connection being open but not really exchanging packets on the TCP level.
Couple of troubleshooting tips:
You can check if there is a connection between the client and the server with "netstat -antp | grep syslog-ng" on the server or the client. You should see connections in the ESTABLISHED state on both sides of the connection (with local/remote addresses switched of course).
Check that your packet filter lets port 1470 connections through. You are most likely using iptables, try reviewing your ruleset and see if port 1470 on TCP is allowed to pass in the INPUT chain. You could try adding a "LOG" rule right before the default rule to see if the packets are dropped at that level. If you already have LOG rules, you might check the kernel logs of the server to see if that LOG rule produced any messages.
You can also confirm if there's traffic with tcpdump on the server (e.g. tcpdump -pen port 1470). If you write the traffic dump to a file (e.g. the -w argument to tcpdump, along with -s 0 to avoid truncation), then this dump file can be analyzed with wireshark to see if the negotiation takes place. You should at the very least see a "Client Hello" and a "Server Hello" packet which are not encrypted at the beginning of the handshake.

TLS handshake fail, but communication is not closed

I have TLS program and I did some experiments on it.
I start confidential TLS server session and try to connect to it with pure Telnet client.
As expected, the handshake failed and the server is available to the next client but on the Telnet client side I didn't receive any indication that the handshake failed and that the server is accepting other clients.
I can see in Wireshark that even after the handshake failed the Telnet client can send strings; I see [PSH, ACK] from the client answered by [ACK] from the server.
Adding Wireshark snapshot, Telnet failed the handshake, Telnet keep sending messages, followed by success in the TLS handshake and more Telnet messages:
Why is the server ACKing the Telnet client if the handshake failed and he is accepting other clients?
As expected, the handshake failed ...
I cannot see a failed TLS handshake in the packet capture and I'm not sure how you come to this conclusion.
All I can see that the client on source port 60198 (presumable your telnet) is sending 3 bytes several times and the server just ACK'ing these without sending anything back and without closing the connection. Likely the server is still expecting data in the hope that at some time it will be a complete TLS record. Only then it will be processed by the TLS stack and then it might realize that something is wrong with the client.
... the server is available to the next client
It is pretty normal for a server to handle multiple clients in parallel. In contrary, it would be unusual if the server could not do this.

nginx - log SSL handshake failures

I'm running an nginx server with SSL enabled.
My protocol / cipher settings are fairly secure, and I've checked them at ssllabs.com, but --
-- since this is a web service which is called by http clients that I have no control over, I have concerns about compatibility.
To the point:
Is there a way to log SSL handshake failures as they happen (if they happen) in my nginx logs?
For example, I've got SSLv3 disabled, and if I try to "curl -3" (forcing SSlv3) to my server, then I get this:
NSS error -12286 (SSL_ERROR_NO_CYPHER_OVERLAP)
Cannot communicate securely with peer: no common encryption algorithm(s).
Closing connection 0 curl: (35) Cannot communicate securely with peer: no common encryption algorithm(s).
I would like to log this type of error in server logs too, with the default nginx settings, there is nothing.
Enabling "debug" log level for the error log does what I want, will log SSL handshake errors -- but unfortunately it also logs too much other stuff, making the log too bloated, drowning out other potentially useful info.
You can use the info log level.

How to debug and fix intermittent SSL 'connection reset by peer' error?

We are having an occasional (1 in 100) error appear on our client (CentOS) when connecting to a server (Windows/IIS) over HTTPS.
The error is: SSL: Connection reset by peer.
Running openssl s_client -connect example.com:443 -prexit works 99% of the time but sometimes returns write:errno=104 confirming the connection reset issue.
Interestingly the handshake is a different (smaller) size when the connection is reset and fails but I cannot see how to actually see the handshake.
A successful connection is: SSL handshake has read 5308 bytes and written 319 bytes
A failed connection is: SSL handshake has read 5249 bytes and written 198 bytes
The same protocol (TLS) and cipher is used at all times.
Server side, the error in Windows Event log is: A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 20. The Windows SChannel error state is 960.
Fatal error code 20 is Received a record with an incorrect MAC. This message is always fatal..
Can anyone help debug this further? As it's only an occasional issue I am struggling to think why it would happen. Thanks!
Not an application error, but most likely a low level error in the infrastructure. Not specific to SSL but to connection oriented sockets. Packet TTL expiring, network route changing or many others. Well written socket code will alway retry a few times before failing. This is very hard to debug becuase it is often not repeatable over short time periods.
Many years ago this error was making me crazy. Did everything I could to track it down, even wrote a monitor to walk the network graph of the system to make sure each node of the graph was functional and responding properly. About a year later the problem disappeared when a switch on the subnet was replaced. The switch was close to the application not to the nodes on the graph in the datacenter.

How to track down "Connection timout during SSL handshake" and "Connection closed during ssl handshake" errors

I have recently switched over to HAProxy from AWS ELB. I am terminating SSL at the load balancer (HAProxy 1.5dev19).
Since switching, I keep getting some SSL connection errors in the HAProxy log (5-10% of the total number of requests). There's three types of errors repeating:
Connection closed during SSL handshake
Timeout during SSL handshake
SSL handshake failure (this one happens rarely)
I'm using a free StartSSL certificate, so my first thought was that some hosts are having trouble accepting this certificate, and I didn't see these errors in the past because ELB offers no logging. The only issue is that some hosts have do have successful connections eventually.
I can connect to the servers without any errors, so I'm not sure how to replicate these errors on my end.
This sounds like clients who are going away mid-handshake (TCP RST or timeout). This would be normal at some rate, but 5-10% sounds too high. It's possible it's a certificate issue; I'm not certain exactly how that presents to
Things that occur to me:
If negotiation is very slow, you'll have more clients drop off.
You may have underlying TCP problems which you weren't aware of until your new SSL endpoint proxy started reporting them.
Do you see individual hosts that sometimes succeed and sometimes fail? If so, this is unlikely to be a certificate issue. I'm not sure how connections get torn down when a user rejects an untrusted certificate.
You can use Wireshark on the HAProxy machine to capture SSL handshakes and parse them (you won't need to decrypt the sessions for handshake analysis, although you could since you have the server private key).
I had this happen as well. The following appeared first SSL handshake failure then after switching off option dontlognull we also got Timeout during SSL handshake in the haproxy logs.
At first, I made sure all the defaults timeouts were correct.
timeout connect 30s
timeout client 30s
timeout server 60s
Unfortunately, the issue was in the frontend section
There was a line with timeout client 60 which I only assume means 60ms instead of 60s.
It seems certain clients were slow to connect and were getting kicked out during the SSL handshake. Check your frontend for client timeouts.
How is your haproxy ssl frontend configured ?
For example I use the following to mitigate BEAST attacks :
bind X.X.X.X:443 ssl crt /etc/haproxy/ssl/XXXX.pem no-sslv3 ciphers RC4-SHA:AES128-SHA:AES256-SHA
But some clients seem to generate the same "SSL handshake failure" errors. I think it's because the configuration is too restrictive.