Prevent Cloudflare 524 on long running scripts - cloudflare

It seems Cloudflare times out after 100 seconds of not receiving a response from the server. I have some scripts that require longer than that to run. Is there anything within Cloudflare that can be used to bypass that limit e.g. page rules? Or is there another way around it without having to recode stuff, setup bypass routes or run on a separate server?

Related

Too many 429 errors when the cache extension and the proxy middleware are enabled at the same time in scrapy

I am using scrapy to crawl data. The target website blocks the IP after it sends about 1000 requests.
To deal with this, I wrote a proxy middleware, and because the amount of data is relatively large, I also wrote a cache extension. When I enabled both of them, I get banned more often. It works well when only the proxy middleware is enabled.
I know that when scrapy engine start, extensions start earlier than middlewares. Could this be the reason? If not, what else should I consider?
Any suggestions will be appreciated!

Apache server seems to be caching requests

I am running a Flask app on an Apache 2.4 server. The app sends requests to an API built by a colleague using the Requests library. The requests are in a specific format and constructed by data stored in a MySQL database. The site is designed to show the feedback from the API on the index, and the user can edit the data stored in the MySQL database (and by extension, the data sent in the request) by another page, the editing page.
So let's say for example a custom field date is set to be "2006", I would access the index page, a request would be sent, the API does its magic and sends back data relevant to 2006. If I then went and changed the date to "2007" then the new field is saved in MySQL and upon navigating back to index the new request is constructed, sent off and data for 2007 should be returned.
Unfortunately that's not happening.
My when I change details on my editing page they are definitely stored to the database, but when I navigate back to the index the request sends the previous set of data. I think that Apache is causing the problem because of two reasons:
When I reset the server (service apache2 restart) the data sent back is the 'proper' data, even though I haven't touched the database. That is, the index is initially requesting 2006 data, I change it to request 2007 data, it still requests 2006 data, I restart the server, refresh the index and only then does it request 2007 data like it should have been doing since I edited it.
When I run this on my local Flask development server, navigating to the index page after editing an entry immediately returns the right result - it feeds off the same database and is essentially identical to the deployed server except that it's not running on apache.
Is there a way that Apache could be caching requests or something? I can't figure out why the server would keep sending old requests until I restart it.
EDIT:
The requests themselves are large and ungainly and the responses would return data that I'm not comfortable with making available for examples for privacy reasons.
I am almost certain that Apache is the issue because as previously stated, the Flask development server has no issues with returning the correct dataset. I have also written some requests to run through Postman, and these also return the data as requested, so the request structure must be fine. The only difference I can see between the local Flask app and the deployed one is Apache, and given that restarting the Apache server 'updates' the requests until the data is changed again, I think that it's quite clearly doing something untoward.
Dirn was completely right, it turned out not to be an Apache issue at all. It was SQL Alchemy all along.
I imagine that SQL Alchemy knows not to do any 'caching' when it requests data on the development server but decides that it's a good idea in production, which makes perfect sense really. It was not using the committed data on every search, which is why restarting the Apache server fixed it because it also reset the connection.
I guess that's what dirn meant by "How are you loading data in your application?" I had assumed that since I turned off Flask's debugging on the development server it would behave just like it would in deployment but it looks like something has slipped through.

Server timeout when re-assembling the uploaded file

I am running a simple server app to receive uploads from a fine-uploader web client. It is based on the fine-uploader Java example and is running in Tomcat6 with Apache sitting in front of it and using ProxyPass to route the requests. I am running into an occasional problem where the upload gets to 100% but ultimately fails. In the server logs, as well as on the client, I can see that Apache is timing out on the proxy with a 502 error.
After trying and seeing this myself, I realized the problem occurs with really large files. The Java server app was taking longer than 30 seconds to reassemble the chunks into a single file and so Apache would kill the connection and stop waiting. I have increased Apache Timeout to 300 seconds which should largely correct the problem but the potential remains.
Any ideas on other ways to handle this so that the connection between Apache and Tomcat is not killed while the app is assembling the chunks on the server? I am currently using 2 MB chunks and was thinking maybe I should use a larger chunk size. Perhaps with fewer chunks to assemble the server code could do it faster. I could test that but unless the speedup is dramatic it seems like the potential for problems remain and will just be waiting for a large enough upload to come along to trigger them.
It seems like you have two options:
Remove the timeout in Apache.
Delegate the chunk-combination effort to a separate thread, and return a response to the request as soon as possible.
With the latter approach, you will not be able to let Fine Uploader know if the chunk combination operation failed, but perhaps you can perform a few quick sanity checks before responding, such as determining if all chunks are accessible.
There's nothing Fine Uploader can do here, the issue is server side. After Fine Uploader sends the request, its job is done until your server responds.
As you mentioned, it may be reasonable to increase the chunk size or make other changes to speed up the chunk combination operation to lessen the chance of a timeout (if #1 or #2 above are not desirable).

Varnish Cache - Initial cache of web pages

I have installed the Varnish cache with my Apache web server and configured them correctly. It works OK and I can now access my web pages though Varnish Cache.
The default behavior of varnish is to store copies of the pages served by the web server. The next time the same page is requested, Varnish will serve the copy instead of requesting the page from the Apache server.
And now comes my question: Is it possible to cache my entire website initially after setting up the Varnish cache, without the need to have a page to be accessed then store it on the cache? This is because, after varnish has been setup, the cache is initially empty, and it will require a page to be accessed in order to be available on the cache. Can this be done without having to access each page manually?
What you are looking for is a way of warming up the cache. You could use varnishreplay or a Web crawler, such as Wget or HTTrack to go through your site. Alternatively if you have a sitemap of your pages you could use that as a starting point and warm up the cache by looping over it and issuing requests on the pages using e.g. curl or wget.
Using varnishreplay requires you to first run varnishlog and gather a log of traffic before you can use it later for playing back the traffic and warming up the cache.
Wget, HTTrack etc. can be pointed to your home page and they will crawl their way through your site. Depending on the size and nature of your site this might not be practical though (for example if you use Ajax extensively).
Unless your pages take a very long time to load from the backend server (i.e. Apache), I wouldn't worry too much about warming up the cache. If the TTL for the cached content is high enough most of the visitors will only ever receive cached content anyway.
There is a much better way to do this which employs req.hash_always_miss and works with Varnish 3 and 4 (employs sitemap too). It warms up your cache and refreshes old pages without having to purge the cache. Full diagram, outline of how to configure it and 3 scripts for various use cases are outlined here http://www.htpcguides.com/smart-warm-up-your-wordpress-varnish-cache-scripts/ and are easily adapted for non-Wordpress sites.

How can I test a comet ajax site on a single host and work around browser simultaneous connection limit?

I am using the comet long-polling technique with apache, php, jquery.
I've got a basic comet update running and it works great. I'm now attempting to build a more complex comet script, and I want a better way to debug.
My comet scripts use $.ajax() with a long timeout, and the server side just sleeps until it either runs up to the timeout or has an event to send to the client. The comet requests go to a different subdomain than the main ajax requests.
For normal pages I edit and test on a linux laptop. I've got apache, mysql, and php with a test database and mirror image of the site. I can edit, save, and see the changes with no upload step. For the comet stuff I've been having to upload to a server to test. This requires me to set up a few fake servers, but mostly it requires me to upload changed files for each test. I've got a mostly automatic upload script, but it's still too slow.
The problem testing locally is the long timeout. The browser won't open another connection to the same server while the comet request is still open. I don't have a subdomain locally so I have all the requests going to the same server so they basically block each other.
I've tried a number of things to make this work and none really do it. I tried first to change my browser setting for number of simultaneous connections. This didn't work in firefox on linux, and I didn't find anything about changing this limit on other browsers.
I tried setting my hosts file to give me two names that map to my ip address. Then I tried configuring VirtualHost conf directives in apache, but that didn't work. I think because apache is looking for an actual dns server to tell it the hostname, not just my /etc/hosts file. Maybe I can run a local dns server to fool apache into thinking my box has two names, but that just seems like a real long way around this problem.
So, does anyone have an idea of how to make this work on one ip address/host?
I'm new to the comet thing, so maybe I've just got the wrong idea about something. Maybe this isn't even possible. Either way, it's time to just ask if this is already a solved problem.
It really should be possible to use /etc/hosts to fool Apache. It certainly does work on Ubuntu Hardy with Apache 2.2.
Try to give different hostname to you local address. Simply add a line like this to /etc/hosts:
127.0.0.1 a.example.com b.example.com c.example.com d.example.com
(Note: use a tab after IP)
Validate this with a ping
ping a.example.com
In you apache configuration, you may use a wildcard alias together with a named virtual host:
<VirtualHost *:80>
ServerName example.com
ServerAlias *.example.com
## snip ##
<VirtualHost>
Instead of using example.com, you might want to use something that's under your control. I use local subdomain of our company's domain (i.e. something.local.molindo.at).
Now you can use different subdomains for your test, each with its own limitation on concurrent connections.
You may need to restart your browser to get this working.
I have made something similar and my hosting gives my max queries limit reached which actually should not happen. But I have read that if my php code is in infinite loop.. ie the sleep mode the hosting detects it and makes db connection user as to be using more queries than allowed. That is alot to presume but I have found a solution to that with same speculations.