How to setup static assets caching with apache? - apache

I'd like to optimize caching of static assets (.js, .css, ... files) used in our web. My goal is based on this article (https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#invalidating-and-updating-cached-responses).
In short - because these static assets tend to be updated ad-hoc (sometimes weekly, sometimes twice a day, ...) I'd like to cache them with far future expiration and give them unique names based on the content or modification date or similar. This should allow to have them cached for a long time but have them updated as soon as some change occurs.
Is this technique supported by Apache2 server? Or is there some middle ware system which handles fingerprints generating (to have unique asset names) and updating references to them in HTML file (which won't be cached at all)?
We use LAMP stack on our host.
Thank you in advance

There are a number of techniques, some better than others. One good one is to have the following configuration:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)\.(\d+)\.(bmp|css|cur|gif|ico|jpe?g|js|png|svgz?|webp|webmanifest)$ $1.$3 [L]
</IfModule>
This allows URLs of the form /i/filename.1433499948.gif - but the file that is actually read from disk is just /i/filename.gif parts 1 and 3 of the filename.
This Apache vhost/.htaccess stanza is from H5BP filename-based_cache_busting.conf file, and there are other examples of good practices in the repository.
That, combined with the H5BP mod_expires config, mean you will always be able to trivially renew the users local browser cache with just updating the reference to the file by a new name.

You can enable mod_mime, mod_expires for Apache and use the following snippet
<FilesMatch "\.(png|jp?g|gif|ico|mp4|wmv|mov|mpeg|css|map|woff?|eot|svg|ttf|js|json|pdf|csv)">
ExpiresActive on
ExpiresDefault "access plus 2 weeks"
</FilesMatch>
Or set the respective php headers
session_cache_limiter('none');
header('Cache-control: max-age='.(60*60*24*7)); //one week
header('Expires: '.gmdate(DATE_RFC1123,time()+60*60*24*365)); //one week
Also related article here: How to get the browser to cache images, with php?

Related

Is it possible to have a directory listed and hide specific files based on an excluder, i.e HTTP_Host?

Is it possible to have a directory listed and hide specific files based on an excluder, i.e HTTP_Host?
Request for cloud -> from example.com -> Hide all files beginning with xamplee_
Request for cloud -> from xamplee.com -> Hide all files beginning with example_
I tried the following with no success:
IndexIgnore xxx
Options -MultiViews
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^xamplee_ xxx [NC,L]
#RewriteCond %{HTTP_HOST} ^www\.xamplee\.com$ [NC]
IndexIgnore êxample_ xxx [NC,L]
I'm not even sure if it is possible at all. A different approach would also be welcome.
OK, based on the exchange in the comments to the question here some means to get you started (hopefully):
The apache http server comes with a bundle of modules. One of those is mod_dir which provides features to generate directory listings.
It offers a standard listing but also allows more flexible usage. The "DirectoryIndex" directive allows to declare any object as usable index document for a directory. With this you can configure any type of handler generating such an issue. Typical examples would be static html documents or a dynamic index.php document, so a php script generating an html based index of the directory at hand. Of course any other scripting language can be chosen just as well.
Its documentation helps with the details: http://httpd.apache.org/docs/2.2/mod/mod_dir.html
The the approach here would be to pick a suitable scripting language which is available inside your environment (or make it available) and point the DirectoryIndex directive to a script coded in that language.
Another approach would be to create a rewriting rule to relay all requests to directories to a script. Comes out more or less the same as long as the script knows what directory it should create the index for.
That way whenever a request to a directory is coming in that script well be executed and expected to provide the correct html index document for this directory. Inside such a script you are free to do and output whatever you like.

Control Cache Expiration For Custom File On Server

We have files that we serve to a Native Windows OS applications from our server. The files can change every minute so we need to ensure the user is downloading the latest file.
We've found that users on Portable WiFi's tend to get served an older file. So we are changing our servers .htaccess file expirations for certain files.
We serve a custom file type (.ebc) and the files contents are sent over HTTP as plain text. In this case should we use ExpiresByType text/ebc "access 1 minute"?
Will changing .htaccess cache control affect Portable Wifi caching or will this only affect browsers?
Should mod_expires / mode_headers occur code occur before redirects and rewrites? I've discovered before that you should perform certain .htaccess code operations before others (such as place redirects at the top of the file).
Heres my code:
RedirectMatch (?i)^/wp-content/uploads/2014/10/a.exe http://www.website.com/wp-content/uploads/2014/10/b.exe
## EXPIRES CACHING Should we place this before mode_rewrite or after? ##
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType text/ebc "access 1 minute"
</IfModule>
## EXPIRES CACHING ##
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
There are 3 questions here, so I'll attempt to answer them.
We serve a custom file type (.ebc) and the files contents are sent over HTTP as plain text. In this case should we use ExpiresByType text/ebc access 1 minute
That should be fine, as long as you have the text/ebc mime-type properly set on your server.
Will changing .htaccess cache control affect Portable Wifi caching or will this only affect browsers
I don't really know what "Portable Wifi caching" is. These headers are targeted at browsers only. If a custom application is downloading these files, it could be implementing its own caching and so these headers might get ignored.
Should mod_expires / mod_headers code occur before redirects and rewrites?
I'd put it before the redirects but only from a logical point of view. These are not like RewriteRules and think they get evaluated separately.
Additionally, I'll add that caching is difficult and once a file has left your server it can be hard to force an update. Different browsers behave different ways and I've come across configurations that work one place and not another.
I would additionally consider two other approaches to what you're attempting.
Firstly, don't cache your files at all:
<FilesMatch "\.ebc$">
Header set Cache-Control no-cache
Header set pragma no-cache
</FilesMatch>
Secondly think about implementing a cache-busting mechanism. If the file is linked from somewhere, try and make sure that link is changed (normally a querystring with a timestamp suffices) each time the file changes. You obviously then need to make sure whatever contains the link also isn't being cached.
An easier solution I used in the past was adding a parameter to the downloadable files.
For example, if the file you're serving is
http://www.domain.tld/file.pdf
then you can create the following link:
http://www.domain.tld/file.pdf?d486dFyg
The question mark and whatever comes after it (random) will be ignored but it will guarantee that the user will always download the latest version, as the URL will be different (because of the random always being different of course).
The downloaded file on the user's computer will just be file.pdf so absolutely no downside.
EDIT: I noticed some reference to WordPress in your question, which is PHP, so you can use the rand() function to append the random part: http://php.net/manual/en/function.rand.php

Manual content compression in Apache

I need a manual compression solution on Apache. My goals:
Provide gzip-encoded content on my server along with uncompressed.
Files are pre-compressed.
Not all files are served compressed. I want to specify these files and the choice isn't type (extension) based.
Many content-types (custom ones) are served, and new types are showing up from time to time. Also, file extension doesn't determine if it will be compressed or not (!!!).
Keep overhead minimal (the less extra headers, the better).
Always provide Content-Length header and never send chunked response (this disqualifies mod_deflate).
Ideal Functionality
Ideal functionality would work like that:
Web client asks for file file.ext.
If file.ext.gz exists on server:
Content-Encoding is set to gzip.
Content-Type is set to value of file.ext (!!!).
Server returns file.ext.gz.
Otherwise, file.ext is returned.
I tested a number of solutions, this article contains a good compilation, but there was always a problem with parts marked with (!!!). I have hundreds of thousands of files and dozens of content types and because of that I'm looking for some automatic solution, without need of adding ForceType for each file.
What I tried
Multiviews
How this works:
Rename a file file.ext to file.ext.en
Create a file file.ext.gz
Configure Apache:
Options +MultiViews
AddEncoding x-gzip .gz
RemoveType .gz
Works almost as expected, except it requires the original file (file.ext) to not exist and it adds many (useless to me) headers (TCN, Content-Language and few more) that can't be removed (Header unset doesn't remove them).
Rewrite
How this works:
- Create file.ext.gz file.
- Configure Apache:
<pre>
RewriteEngine On
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule (.*)$ $1.gz [L]
AddEncoding x-gzip .gz
<Files file.ext>
ForceType my-custom/mime-type
</FilesMatch>
</pre>
This works well, but requires a ForceType for each compressed file. As I said before, I can't rely on extensions because not all files of certain type will be compressed.
mod_deflate
I didn't investigate it too much, the problem is that if file is too big, then it's split into pieces and sent in chunks and Content-Length is not provided. Increasing size of compression buffers can't eliminate this problem.
Is it possible at all to configure Apache to work as I'd liked to?
I tried to dynamically get Content-Type of file.ext and set it to file.ext.gz, but I didn't find the way how to do this.

Virtual Hosts (Apache) with mod_rewrite issues

I am trying to fix this whole day without success, so I hope someone might be able to help me. I have an app at http://localhost/, and it uses Pylons for the app I am hosting. In addition to that, I need to host a PHP/MySQL site, so I had to use Apache too.
My current setup is that I use haproxy with this config for the Apache backend:
backend apache
mode http
timeout connect 4000
timeout server 30000
timeout queue 60000
balance roundrobin
server app02-8002 localhost:8002 maxconn 1000
This is triggered by this:
acl image url_sub images
use_backend apache if image
So, when I open my IP/images, it will trigger that and open Apache then, with port 8002.
For Apache, I created virtual hosts, and this is the "image" one:
<VirtualHost *:8002>
ServerAdmin my#email.com
ServerName image
ServerAlias image
DocumentRoot /srv/www/image/public_html/
ErrorLog /srv/www/image/logs/error.log
CustomLog /srv/www/image/logs/access.log combined
</VirtualHost>
So, that all works nicely, when I type IP/images it open the /srv/www/image/public_html. But then the issues come. As I am using the image uploading script, it involves a lot of rewriting, so I had to enable that mod. This is the .htaccess which is located in the public_html/images folder (I somehow had to make this subfolder too, to "match" the URL with the actual location in the public_html.
SetEnv PHP_VER 5_3
RewriteEngine On
# You must define your installation directory and uncomment the line :
RewriteBase /images/
RewriteRule ^([a-zA-Z]+)\.(jpg|gif|png|wbmp)$ controller/Resizer.php?m=original&a=$1&e=$2 [L]
RewriteRule ^(icon|small|medium|square)\/([a-zA-Z]+)\.(jpg|gif|png|wbmp)$ controller/Resizer.php?m=$1&a=$2&e=$3 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) application.php?request=$1 [L,QSA]
So, basically, this is somethow not working. I suppose there is a conflict between this virtual host, subdirectory, rewriting or something, but I can't seem to isolate it.
It is a bit confusing that when I open the IP/images/xxxx.jpg it opens the image, which is located in the public_html/images/upload/original folder, so the rewrite is working. The the other rules seem not to be working. All of the thumbnails and smaller versions are not rendering properly (with the icon, small, medium, square), so that makes the site quite unsusable.
Here is the link of the development server: http://localhost/images/
Thanks in advance for your time and help!
The first thing you should do is determine whether mod_rewrite is in fact part of the problem by accessing one of the failing URLs directly via its rewritten form and verifying that you get the expected result.
Indeed, the problem might simply be that the PHP script for the smaller resolutions "doesn't work" while it does for the original size ones. The first of the following URLs nicely served me an image; the second one is supposed to give me a smaller version of the same image, but served me an HTTP 500:
http://106.186.21.176/images/controller/Resizer.php?m=original&a=q&e=png
http://106.186.21.176/images/controller/Resizer.php?m=small&a=q&e=png
I got the same result (HTTP 500) for any of the smaller-size format names mentioned in your post, which matches your problem description.
Once you've verified that the script works as expected, it's likely that the problem is with mod_rewrite. If so, enable rewrite logging: use the RewriteLog directive to activate it, and RewriteLogLevel to control its verbosity. Especially at the higher log levels, it can give you very detailed information about exactly what it's doing. This should make the problem readily apparent from the logs.
Also, if possible, try to avoid configuring mod_rewrite rules in .htaccess files -- move them into your main server config file instead. The reason is explained on Apache mod_rewrite Technical Details, section "API phases":
Unbelievably mod_rewrite provides URL manipulations in per-directory context, i.e., within .htaccess files, although these are reached a very long time after the URLs have been translated to filenames. It has to be this way because .htaccess files live in the filesystem, so processing has already reached this stage. In other words: According to the API phases at this time it is too late for any URL manipulations. To overcome this chicken and egg problem mod_rewrite uses a trick: When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
Again mod_rewrite tries hard to make this complicated step totally transparent to the user, but you should remember here: While URL manipulations in per-server context are really fast and efficient, per-directory rewrites are slow and inefficient due to this chicken and egg problem. But on the other hand this is the only way mod_rewrite can provide (locally restricted) URL manipulations to the average user.
In general, not using .htaccess at all has the added advantage that you can tell Apache to not even bother and disable the functionality all together, which save Apache from having to scan each directory level it serves from for the .htaccess files.

How can I block mp3 crawlers from my website under Apache?

Is there some way to block access from a referrer using a .htaccess file or similar? My bandwidth is being eaten up by people referred from http://www.dizzler.com which is a flash based site that allows you to browse a library of crawled publicly available mp3s.
Edit: Dizzler was still getting in (probably wasn't indicating referrer in all cases) so instead I moved all my mp3s to a new folder, disabled directory browsing, and created a robots.txt file to (hopefully) keep it from being indexed again. Accepted answer changed to reflect futility of my previous attempt :P
That's like saying you want to stop spam-bots from harvesting emails on your publicly visible page - it's very tough to tell the difference between users and bots without forcing your viewers to log in to confirm their identity.
You could use robots.txt to disallow the spiders that actually follow those rules, but that's on their side, not your server's. There's a page that explains how to catch the ones that break the rules and explicitly ban them : Using Apache to stop bad robots [evolt.org]
If you want an easy way to stop dizzler in particular using the .htaccess, you should be able to pop it open and add:
<Directory /directoryName/subDirectory>
Order Allow,Deny
Allow from all
Deny from 66.232.150.219
</Directory>
From this site: (put this in your .htaccess file)
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://((www\.)?dizzler\.com [NC]
RewriteRule .* - [F]
You could use something like
SetEnvIfNoCase Referer dizzler.com spammer=yes
Order allow,deny
allow from all
deny from env=spammer
Source: http://codex.wordpress.org/Combating_Comment_Spam/Denying_Access
It's not a very elegant solution, but you could block the site's crawler bot, then rename your mp3 files to break the links already on the site.