What's disconnecting my Websocket connection? (cloudflare, apache's mod_proxy) - apache

All my websocket connections arrive over the http(s) port and are proxied to the backend WS server:
[client]----[cloudflare]----[Apache 2.4 mod_proxy_wstunnel]----[websocket server]
Once a client connects to my WS server, if no data goes through the socket, the connection is always cut off after exactly 100 seconds.
In the dev environment, with the same client, also using mod_proxy_wstunnel, and the same WS server, this limitation does not occur
If the WS server sends a ping every 60 seconds, the connection is not cut off.
I'd like to know whether anyone has seen documentation about Cloudflare disconnecting quiet WS connections, and whether mod_proxy as it is setup on the server could be the cause. I'm not sure how to get to the bottom of this.

A support tech confirmed to me via email that Cloudflare automatically disconnects websocket connections that remain dormant for 100 seconds.

Cloudflare has 100 seconds timeout.
Only enterprise users of Cloudflare can change this setting.
So the solution is to have some keep-alive to re-open to connection. Alternatively, a much better solution would be not to use CloudFlare at all. It will give you much more flexibility, and you won't have to depend on external third party.

I faced same issue. I changed "proxied" mode to DNS only and bought my own certificate from GoDaddy. After that not facing that issue. Detailed discussion is present here: https://techxperiment.blogspot.com/2020/06/aws-ec2-tomcat-jsr-356-secure.html

I highly suggest that send pings to the connection to avoid disconnecting. Sending pings to the WebSocket connection can help to prevent disconnections.

Related

Cloudflare Error 524 : Connection timeout

Reason behind error524 connection timeout.
How to fix it even though the website havent cover high storage
Server origin that is being used is from India.
website :https://motogenes.com/
trying to fix the error.
pause the running website then again enable to run the website via cloudflare
.
Error 524 is usually related to a connection timeout issue between Cloudflare and the origin server. This means that Cloudflare was unable to establish a connection with the origin server in a timely manner.
It's hard to tell what is the issue without debugging the server, but I would start by checking the firewall and network settings.
In documentation of Cloudflare, they say that you can change value of timeOut but only accounts Entreprice
https://api.cloudflare.com/#zone-settings-get-proxy-read-timeout-setting

Is there any way to increase the cloudflare proxy request timeout limit(524)? [duplicate]

Is it possible to increase CloudFlare's time-out? If yes, how?
My code takes a while to execute and I wasn't planning on Ajaxifying it the coming days.
No, CloudFlare only offers that kind of customisation on Enterprise plans.
CloudFlare will time out if it fails to establish a HTTP handshake after 15 seconds.
CloudFlare will also wait 100 seconds for a HTTP response from your server before you will see a 524 timeout error.
Other than this there can be timeouts on your origin web server.
It sounds like you need Inter-Process Communication. HTTP should not be used a mechanism for performing blocking tasks without sending responses, these kind of activities should instead be abstracted away to a non-HTTP service on the server. By using RabbitMQ (or any other MQ) you can then pass messages from the HTTP element of your server over to the processing service on your webserver.
I was in communication with Cloudflare about the same issue, and also with the technical support of RabbitMQ.
RabbitMQ suggested using Web Stomp which relies on Web Sockets. However Cloudflare suggested...
Websockets would create a persistent connection through Cloudflare and
there's no timeout as such, but the best way of resolving this would
be just to process the request in the background and respond asynchronously, and serve a 'Loading...' page or similar, rather than having the user to wait for 100 seconds. That would also give a better user experience to the user as well
UPDATE:
For completeness, I will also record here that
I also asked CloudFlare about running the report via a subdomain and "grey-clouding" it and they replied as follows:
I will suggest to verify on why it takes more than 100 seconds for the
reports. Disabling Cloudflare on the sub-domain, allow attackers to
know about your origin IP and attackers will be attacking directly
bypassing Cloudflare.
FURTHER UPDATE
I finally solved this problem by running the report using a thread and using AJAX to "poll" whether the report had been created. See Bypassing CloudFlare's time-out of 100 seconds
Cloudflare doesn't trigger 504 errors on timeout
504 is a timeout triggered by your server - nothing to do with Cloudflare.
524 is a timeout triggered by Cloudflare.
See: https://support.cloudflare.com/hc/en-us/articles/115003011431-Troubleshooting-Cloudflare-5XX-errors#502504error
524 error? There is a workaround:
As #mjsa mentioned, Cloudflare only offers timeout settings to Enterprise clients, which is not an option for most people.
However, you can disable Cloudflare proxing for that specific (sub)domain by turning the orange cloud into grey:
Before:
After:
Note: it will disable extra functionalities for that specific (sub)domain, including IP masking and SSL certificates.
As Cloudflare state in their documentation:
If you regularly run HTTP requests that take over 100 seconds to
complete (for example large data exports), consider moving those
long-running processes to a subdomain that is not proxied by
Cloudflare. That subdomain would have the orange cloud icon toggled to
grey in the Cloudflare DNS Settings . Note that you cannot use a Page
Rule to circumvent Error 524.
I know that it cannot be treated like a solution but there is a 2 ways of avoiding this.
1) Since this timeout is often related to long time generating of something, this type of works can be done through crontab or if You have access to SSH you can run a PHP command directly to execute. In this case connection is not served through Cloudflare so it goes as long as your configuration allows it to run. Check it on Google how to run scripts from command line or how to determine them in crontab by using /usr/bin/php /direct/path/to/file.php
2) You can create subdomain that is not added to cloudlflare and move Your script there and run them directly through URL, Ajax call or whatever.
There is a good answer on Cloudflare community forums about this:
If you need to have scripts that run for longer than around 100 seconds without returning any data to the browser, you can’t run these through Cloudflare. There are a couple of options: Run the scripts via a grey-clouded subdomain or change the script so that it kicks off a long-running background process and quickly returns a status which the browser can poll until the background process has completed, at which point the full response can be returned. This is the way most people do this type of action as keeping HTTP connections open for a long time is unreliable and can be very taxing also.
This topic on Stackoverflow is high in SERPs so I decided to write down this answer for those who will find it usefull.
https://support.cloudflare.com/hc/en-us/articles/115003011431-Troubleshooting-Cloudflare-5XX-errors#502504error
Cloudflare 524 error results from a web page taking more than 100 seconds to completely respond.
This can be overridden to (up to) 600 seconds ... if you change to "Enterprise" Cloudflare account. The cost of Enterprise is roughtly $40k per year (annual contract required).
If you are getting your results with curl, you could use the resolve option to directly access your IP, not using the Cloudflare proxy IP:
For example:
curl --max-time 120 -s -k --resolve lifeboat.com:443:127.0.0.1 -L https://lifeboat.com/blog/feed
The simplest way to do this is to increase your proxy waiting timeout.
If you are using Nginx for instance you can simply add this line in your /etc/nginx/sites-availables/your_domain:
location / {
...
proxy_read_timeout 600s; # this increases it by 10mins; feel free to change as you see fit with your needs.
...
}
If the issue persists, make sure you use let's encrypt to secure your server alongside Nginx and then disable the orange cloud on that specific subdomain on Cloudflare.
Here are some resources you can check to help do that
installing-nginx-on-ubuntu-server
secure-nginx-with-let's-encrypt

ERR_CONNECTION_REFUSED over SSL

I've been searching and haven't found a solution for this yet.
I have a LAMP server running Centos 5 and cPanel. I have converted the site from http to https. The site works just fine. However, periodically there are ERR_CONNECTION_REFUSED errors on my PC only. This happens only over https and only periodically. Port 443 is open on the server.
FTP, Remote MySQL, SSH, and HTTPS connections are refused during that brief period. I've checked the server's firewall to allow my ip and unblock my ip. The ip is allowed and was never blocked.
We have other PCs connected to the same network with no issues during the brief period where only my connection is refused. I've cleared my cookies and cache with no luck. However, when I run a trace route, it stops at the first hop in our network.
Any suggestions with what I need to do or look at?
Do you think it is a server related issue?
Do you think it is an internal network related issue?
Could it be the issuer of the SSL cert?
You're probably running into a full backlog queue. A Windows server will actively refuse a connection if the backlog queue is currently full. The defence is to increase the backlog or speed up the accept loop.

Telnet is blocked on a port (443) while still allowing web service request on the same host and port

I have been trying to connect to a partner's web service which is running on HTTPS default port 443. I had been under wrong impression that they had not open firewall ports for us because telnet from my server was unable to establish a connection. For example, I was typing:
$ telnet <vendor's host> 443
After waiting a long time (Around 15-20 seconds), it prints out that it connected but immediately also says that the connection closed:
Connected to <host>.
Escape character is '^]'.
Connection to <host> closed by foreign host.
However, on running the SOAP UI from the server and hitting a URL that is hosted on the same host and port works fine.
Just wondering why telnet connection gets tripped. Is there any kind of setting possible at the server side?
Maybe you're actually making a Telnet connection? But then it closes because the server finds no interesting conversation, because the server is expecting SSL negotiations to complete.
Understand that Telnet is not very different than TCP. ][CyberPillar: Telnet may discuss that.) So what would you expect the SSL server to do with a TCP connection? In the case of an HTTPS server (which is what I'm presuming, since you mentioned TCP port 443), I would expect the HTTPS server to want to immediately perform SSL negotiation. If a client does not successfully provide SSL negotiation, then the client may just be an attacker trying to use up the server's resources. So, the server won't be wasting resources by responding in interesting ways (like printing out an informative message). That would be the behavior that provides the most desirable results, most of the time. Most connections from clients who know what they are doing will be HTTPS connections by a client that does know how to negotiate SSL.
I would expect similar results from many other protocols that are designed to use encryption. Offhand, I don't know that this behavior is absolutely required by any specific technical specifications/requirements. However, what I do know is that the description you provide, which notes the behavior you experienced, is really not surprising to me whatsoever. Perhaps just from some experience I've had, it's what I would expect. The results you describe would not be surprising to me, even if your firewall was doing nothing. Consequently, I don't offhand know whether your firewall is effectively doing anything noteworthy with this traffic. Maybe the firewall is blocking it, or maybe the firewall is passing it to an HTTPS server which is just handling the connection in a way that you weren't expecting.

Delay issue with Websocket over SSL on Amazon's ELB

I followed the instructions from this link:
How do you get Amazon's ELB with HTTPS/SSL to work with Web Sockets? to set up ELB to work with Websocket (having ELB forward 443 to 8443 on TCP mode). Now I am seeing this issue for wss: server sends message1, client does not receive it; after few seconds, server sends message2, client receives both messages (both messages are around 30 bytes). I can reproduce the issue fairly easily. If I set up port forwarding with iptable on the server and have client connecting directly to the server (port 443), I don't have the problem Also, the issue seems to happen only to wss. ws works fine.
The server is running jetty8.
I checked EC2 forums and did not really find anything. I am wondering if anyone has seen the same issue.
Thanks
From what you describe, this pretty likely is a buffering issue with ELB. Quick research suggests that this actually is the issue.
From the ELB docs:
When you use TCP for both front-end and back-end connections, your
load balancer will forward the request to the back-end instances
without modification to the headers. This configuration will also not
insert cookies for session stickiness or the X-Forwarded-* headers.
When you use HTTP (layer 7) for both front-end and back-end
connections, your load balancer parses the headers in the request and
terminates the connection before re-sending the request to the
registered instance(s). This is the default configuration provided by
Elastic Load Balancing.
From the AWS forums:
I believe this is HTTP/HTTPS specific but not configurable but can't
say I'm sure. You may want to try to use the ELB in just plain TCP
mode on port 80 which I believe will just pass the traffic to the
client and vice versa without buffering.
Can you try to make more measurements and see how this delay depends on the message size?
Now, I am not entirely sure what you already did and what failed and what did not fail. From the docs and the forum post, however, the solution seems to be using the TCP/SSL (Layer 4) ELB type for both, front-end and back-end.
This resonates with "Nagle's algorithm" ... the TCP stack could be configured to bundling requests before sending them over the wire to reduce traffic. This would explain the symptoms, but worth a try