Error occurred during the pre-login handshake, due to AntiVirus? - sql

I need your expertise on one of my issues. I often get an intermittent issue from our Power BI on-premises Gateway to SQL connectivity
Error from gateway log
Error: A connection was successfully established with the server, but
then an error occurred during the pre-login handshake. (provider: SSL
Provider, error: 0 - The wait operation timed out.)
The difficult part here is it's very difficult to reproduce ☹️ Whenever I tried the connectivity from the gateway to SQL server, it succeeds but at some very rare case, it fails.
Steps we did to find the root cause
Checked in both the gateway server and SQL server TLS 1.2 only is
enabled, other versions of TLS are disabled
created a .udl file and tried the connectivity but got the error like
[DBNETLIB] ConnectionOpen( SECCreateCredentials().] SSL Security
error.
Finally, we contacted our internal support team, they told to run the network tracer. So we did.
After some long times, we had the luck to capture the error in the network tracer. (Below Image)
Support team told like:
We see that client (gateway server) is sending Client hello after 14 seconds for the TLS SSL handshake, this delay is causing the connection to fail as connection needs to get established in 15 seconds.
We see the same pattern, where the client is causing delay on multiple instances of the communication.
And such delay is generally caused by the Antivirus
My question:
Is this really the Antivirus issue? If so then why it's not happening
all the times?
P.S I know this question is already asked in SO and possible for duplicate, but my real question is this antivirus would be a possible cause for this?

The issue is finally resolved after so many attempts. The below is the solution worked for us
• Azure AD join, where the connections head to the “login.microsoft.com” and delay the connections. There are few settings from registry and GPO that needs to be performed to disable this Auto Azure WorkPlace join.
https://learn.microsoft.com/en-us/azure/active-directory/device-management-troubleshoot-hybrid-join-windows-current
It talks about restricting the server from joining AzureAD through a GPO, which resolves to:
HKLM\SOFTWARE\Policies\Microsoft\Windows\WorkplaceJoin\ key:
autoWorkplaceJoin = 0
• Connections headed to http://ctldl.windowsupdate.com , refer the below article that talks about this issue.
https://blogs.technet.microsoft.com/askds/2018/04/10/tls-handshake-errors-and-connection-timeouts-maybe-its-the-ctl-engine/
To disable it: • Create a backup of this registry key (export and save
a copy)
HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\SystemCertificates\AuthRoot • Then create the following DWORD registry values under the key
“EnableDisallowedCertAutoUpdate”=dword:00000000
“DisableRootAutoUpdate”=dword:00000001
I hope this helps someone in the future !

Related

RavenDB 5.3.102 Issue Errors with installation and Lets Encrypt bug

I have, unsuccessfully, been trying to install RavenDB 5 on a Win 2012R2 server in secure mode, however, getting error messages that are not in Raven's documentation under troubleshooting. I have seen similar errors and applied all their suggestions - they blame port numbers being blocked on firewalls (disabled firewall completely - so no luck there) or IP address binding (when I set it up in unsecure mode, it worked fine - so no luck there).
I am using the downloaded version Free Community version. I think maybe that Windows Server 2012R2 does not support TLS1.2 or there's a configuration issue
Here's the message
Setting up RavenDB in Let's Encrypt security mode failed.
System.InvalidOperationException: Setting up RavenDB in Let's Encrypt security mode failed.
---> System.InvalidOperationException: Validation failed.
---> System.InvalidOperationException: Failed to simulate running the server with the supplied settings using: https://a.******.ravendb.community:60443
---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
---> System.Security.Authentication.AuthenticationException: Authentication failed because the remote party sent a TLS alert: 'HandshakeFailure'.
---> System.ComponentModel.Win32Exception (0x80090326): The message received was unexpected or badly formatted.
--- End of inner exception stack trace ---
Besides disabling the firewall in WinServer completely (tried opening every port that may be a problem including 60443, 38888, 8080 (I know, it's not there, but), 80, 443, but that did nothing, so I disabled the firewall (bad form - yes, I know).
Since I use Let's Encrypt on IIS for a couple of other sites with other port #s and have not had any problem with auto-renewals, I figure that maybe it's a Kestrel configuration issue (so looking into this). The ambiguity of this error: HandshakeFailure could mean hundreds of different things, so it's hard to determine what.
It looks like RavenDB is running a simulation "Failed to simulate running the server" and, perhaps it's a bug on their end (perhaps LetsEncrypt does not recognize the simulation?).
Anywho, before saying to heck with this DB (which has raving reviews) and moving on to another NoSQL database like FoundationDB or CouchDB, I'd love to figure out how to secure it. It DOES WORK in nonsecure mode fine!!
Any ideas?
The issue is that Windows Server 2012R2 lacks the ciphers that are required by RavenDB.
To fix that, please make sure you enabled TLS 1.2 and add the following ciphers:
You can use IIS Crypto to add the ciphers, please see:
https://stackoverflow.com/a/63274439/11341261
Turns out, Windows Server 2012R2 does not come with the following cipher suites:
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
Because of this, it's not possible to use Let's Encrypt (as configured for RavenDB) on a Windows 2012R2 Server.

Azure SQL Server: Error occurred during the pre-login handshake

Occasionally, when I go to connect to my Azure SQL Server through either Power BI or SSMS, I get the following error:
A connection was successfully established with the server, but then an
error occurred during the pre-login handshake. (provider: TCP
Provider, error: 0 - The semaphore timeout period has expired.)
(Microsoft SQL Server, Error: 121)
There are numerous questions about this topic, but very few address this for Azure SQL Server. My IP address is added to the firewall (and works sometimes!) I have tried increasing the connection time out. I have tried to look at the "netsh WinSock Show Catalog" information as directed by the answer to this question, but nothing looked incorrectly formatted.
Has anyone seen any other reason for this error? It popped up a few days ago, went away for a few days, and is back now, but I haven't changed anything in the meantime.
The problem ended up being that my computer had disconnected from my company's VPN.
Error 121 has always been considered a network related error as you can read in this Microsoft Support article. The Internet service you receive, network adapters are things you should consider examine.
Network connectivity problems have various causes, but they typically
occur because of incorrect network adapters, incorrect switch
settings, faulty hardware, or driver issues. Some connectivity
symptoms are intermittent and do not clearly point to any one of these
causes.
Typical error messages include the following:
Error 121: "The semaphore timeout period has expired"
(ERROR_SEM_TIMEOUT).
Please consider to click the "Options" button of SQL Server Management Studio, on the "Connection Properties" tab, try setting a greater value for the "Connection time-out" setting.

Is a TLS negotiation failure error proof that connectivity exists?

We are attempting to allow a client to access one of our QA environments. They are seeing the following error in IE:
This page can't be displayed
Turn on TLS 1.0, TLS 1.1, and TLS 1.2 in Advanced settings and try connecting to https://oursite.com again. if this error persists, it is possible that this site uses an unsupported protocol or cipher suite such as RC4 (link for the details), which is not considered secure. Pelase contact your site administrator.
I am not asking stackoverflow users to solve this problem.
I am asking the following very specific question:
Because we are seeing this error, does this prove that connectivity exists, i.e. our firewall is letting them through? I am thinking if they were blocked at the firewall they would simply get a timeout or perhaps a 403 or 500 error. since they are getting so far as to be able to see what TLS protocols are supported on the web server, I infer that they must be able to communicate with it on OSI levels 1-4. Am I correct? (I need to know whether to engage the networking team, which runs the firewalls, or to engage the application support team, which sets up the TLS configuration).
Note that SSL terminates on our IIS web server (we don't have SSL offloading).
Unfortunately we have port 80 blocked so we can only test on 443; otherwise I would suggest using http access to help isolate the problem.
... if they were blocked at the firewall they would simply get a timeout or perhaps a 403 or 500 error.
In order to send back a 403 or 500 error the firewall must have successfully done the SSL handshake with the client because the HTTP response (which includes the status code, i.e. 403, 500..) will only be sent inside the encrypted connection. There is no way to return a 403 or 500 inside the SSL handshake already.
Typical behavior with a firewall in between would be a timeout (firewalls drops packet) or more likely a connection reset or close (firewall resets or closes the connection). With a simple packet filter firewall it will usually block the TCP connection already, resulting in connection refused. But a firewall using DPI might actually let the TCP connection establish and only block after it gets actual data based on the content of this payload (i.e. application detection).
The last case might result in the error you see. But exactly the same behavior can be seen if there is a problem on the server side where the server simply closes or resets the connection. Some TLS stacks show such behavior (instead of sending back a TLS alert) when they cannot find a shared protocol version or cipher. Insofar you can neither conclude from this error message that the firewall is blocking the connection nor can you conclude that the server is causing the error.

Connection timeout over ssl

I have an client application that connects to a remote server via https for commercial purposes. This connection is using old IO (blocking connection). It normally runs smoothly.
Recently I have cloned the client thus created a new client instance, running from the same box and using the same client certificate. I'm noticing many connection timeouts from the server. I wonder if the cloning may have somehow been the cause of the timeouts and if there is a ssl issue here.
Both instances receive the following system parameters for security:
javax.net.ssl.trustStore=cacerts
javax.net.ssl.keyStore=1234567890123
javax.net.ssl.keyStorePassword=wordpass
Unfortunately the support from the server side is quite limited. I hope someone in this forum may come up with an idea.

Can't make an SSL Connection

I'm using a device that's got GPRS media to connect to a PC running stunnel. On TCPIP connections, the number of sessions is limitless. However, when it comes to SSL connections, it could only go as far as 1062 successful sessions. I've tried it like 3 times but makes no difference. I've checked the OpenSSL codes and I couldn't seem to find any code block that limits SSL connection to 1062. On SSL's point of view, is there anything that limits the number of connections?
Yes, I'm using a postpaid phone SIM, but there isn't any problem with TCPIP. It only happens with SSL connections. We've tried connecting to other PC's as well using same OpenSSL stunnel, but only ends up to 1062 connections.
I guess I'm not the only one having this kind of problem. I found out that Sun Java System Directory Server had a limit of opened ssl connection which only reached 1020 (FD_SETSIZE=1024). It was hardcoded though so you could obviously see the cause of the problem. In my case however, I couldn't seem to find the culprit... :(
Are you connecting via a phone provider - could that be the issue?