mod_rewrite to serve static cached files if they exist and if the source file hasn't changed - apache

I am working on a project that processes images, saves the processed images in a cache, and outputs the processed image to the client. Lets say that the project is located in /project/, the cache is located in /project/cache/, and the source images are located wherever else on the server (like in /images/ or /otherproject/images/). I can set up the cache to mirror the path to the source image (e.g. if the source image is /images/image.jpg, the cache for that image could be /project/cache/images/image.jpg), and the requests to the project are roughly /project/path/to/image (e.g. /project/images/image.jpg).
I would like to serve the images from the cache, if they exist, as efficiently as possible. However, I also want to be able to check to see if the source image has changed since the cached image was created. Ideally, this would all be done with mod_rewrite so PHP wouldn't need to be used to do any of the work.
Is this possible? What would the mod_rewrite rules need to be for this to work?
Alternatively, it seems like it would be a fine compromise to have mod_rewrite serve the cached file most of the time but send 1 out of X requests to the PHP script for files that are cached. Is this possible?

You cannot acces the file modification timestamp from the RewriteRule, so there is no way around using PHP or another programming language for that task.
On the other hand this is really simple in PHP, so you should first check whether the PHP solution is good enough in you case. Only if it isn't you should look for alternatives.

What if you used the client to do some of the work? Say you display an image in the web browser and always use src="/cache/images/foobar.jpg" and add an onerror="this.src='/images/foobar.jpg'". In mod_rewrite, send anything that goes to the /images/ dir to a script that will return and generate an image in the cache.

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

Apache redirect for single XML file

I have a number of subdomains, which are using crossdomain.xml file and I'm looking to a simple way of managing them all - which get semi-regularly updated. One way I've thought is a PHP script, which pushes and overwrites the xml file. The other, which I much prefer is a an apache redirect on a single file.
So, question is how would I, across multiple domains, redirect an xml on dom1.domain.com and dom2.wirewax.com to the same crossdomain.xml file without Flash getting upset about. i.e. not a 302 HTTP redirect, but internal file fetching.
You can write a PHP script that fetches the content from a single location (database or text file) and sends it as-is to Flash. Yes, the script itself needs to be copied on all hosts.
If you have all websites hosted on same webserver, perhaps mod_alias could help:
Alias /crossdomain.xml /path/to/shared/crossdomain.xml
I have not personally tested this. The reference page includes instructions to setup the shared directory so that it can be read by multiple hosts.

Append file date to css file in apache

I am trying to find a way to make sure browsers dont cache versions of my css files everytime I push a new update.
I was thinking the best way would be to somehow get the file timestamp of the css file on the filesytem and append append it tot he css URL somehow like www.mysite.com/css/style.css?13245645434
Is this possible at all? If not, then any idea how i can make sure the browser gets a new version of the file when it is updated? I dont want to eliminate browser cache all together because if the file hasnt been touched then i would like it to be cached. However, when i push a new update i would like to someone tell the browser that.
I understand i can write server side code to put the style.css?2342343 in but i want to see if its possible through apache at all.
Thanks
everytime I push a new update
How do you push updates? If there is an automated build process you have then that is the right place to rewrite your URLs.
If you wanted to do the rewriting via apache you'd need a module which would parse the html and rewrite the links. That would not be optimal.
Lastly, consider rewriting to /css/42/style.css (where 42 is the current version) because in case you cache your site through a proxy or CDN, query parameters may not work.

Serving dynamic zip files through Apache

One of the responsibilities of my Rails application is to create and serve signed xmls. Any signed xml, once created, never changes. So I store every xml in the public folder and redirect the client appropriately to avoid unnecessary processing from the controller.
Now I want a new feature: every xml is associated with a date, and I'd like to implement the ability to serve a compressed file containing every xml whose date lies in a period specified by the client. Nevertheless, the period cannot be limited to less than one month for the feature to be useful, and this implies some zip files being served will be as big as 50M.
My application is deployed as a Passenger module of Apache. Thus, it's totally unacceptable to serve the file with send_data, since the client will have to wait for the entire compressed file to be generated before the actual download begins. Although I have an idea on how to implement the feature in Rails so the compressed file is produced while being served, I feel my server will get scarce on resources once some lengthy Ruby/Passenger processes are allocated to serve big zip files.
I've read about a better solution to serve static files through Apache, but not dynamic ones.
So, what's the solution to the problem? Do I need something like a custom Apache handler? How do I inform Apache, from my application, how to handle the request, compressing the files and streaming the result simultaneously?
Check out my mod_zip module for Nginx:
http://wiki.nginx.org/NgxZip
You can have a backend script tell Nginx which URL locations to include in the archive, and Nginx will dynamically stream a ZIP file to the client containing those files. The module leverages Nginx's single-threaded proxy code and is extremely lightweight.
The module was first released in 2008 and is fairly mature at this point. From your description I think it will suit your needs.
You simply need to use whatever API you have available for you to create a zip file and write it to the response, flushing the output periodically. If this is serving large zip files, or will be requested frequently, consider running it in a separate process with a high nice/ionice value / low priority.
Worst case, you could run a command-line zip in a low priority process and pass the output along periodically.
it's tricky to do, but I've made a gem called zipline ( http://github.com/fringd/zipline ) that gets things working for me. I want to update it so that it can support plain file handles or paths, right now it assumes you're using carrierwave...
also, you probably can't stream the response with passenger... I had to use unicorn to make streaming work properly... and certain rack middleware can even screw that up (calling response.to_s breaks it)
if anybody still needs this bother me on the github page

Is mod_rewrite a valid option for caching dynamic pages with Apache?

I have read about a technique involving writing to disk a rendered dynamic page and using that when it exists using mod_rewrite. I was thinking about cleaning out the cached version every X minutes using a cron job.
I was wondering if this was a viable option or if there were better alternatives that I am not aware of.
(Note that I'm on a shared machine and mod_cache is not an option.)
You could use your cron job to run the scripts and redirect the output to a file.
If you had a php file index.php, all you would have to do is run
php index.php > (location of static file)
You just have to make sure that your script runs the same on command line as it does served by apache.
I would use a cache on application level. Because the application knows best when the cached version is out of date and is more flexible and powerful in the matter of cache negotiation.
Does the page need to be junked every so often because it just has to? Or should it be paralleled with a static version after an update to the page?
If the latter, you could try and write a script that would make a copy of the just edited page and save it to its static filename version. That should lighten the write load since in that scenario you wouldn't need to have a fresh static copy unless there was a change made that needed some show time.