Manual content compression in Apache - apache

I need a manual compression solution on Apache. My goals:
Provide gzip-encoded content on my server along with uncompressed.
Files are pre-compressed.
Not all files are served compressed. I want to specify these files and the choice isn't type (extension) based.
Many content-types (custom ones) are served, and new types are showing up from time to time. Also, file extension doesn't determine if it will be compressed or not (!!!).
Keep overhead minimal (the less extra headers, the better).
Always provide Content-Length header and never send chunked response (this disqualifies mod_deflate).
Ideal Functionality
Ideal functionality would work like that:
Web client asks for file file.ext.
If file.ext.gz exists on server:
Content-Encoding is set to gzip.
Content-Type is set to value of file.ext (!!!).
Server returns file.ext.gz.
Otherwise, file.ext is returned.
I tested a number of solutions, this article contains a good compilation, but there was always a problem with parts marked with (!!!). I have hundreds of thousands of files and dozens of content types and because of that I'm looking for some automatic solution, without need of adding ForceType for each file.
What I tried
Multiviews
How this works:
Rename a file file.ext to file.ext.en
Create a file file.ext.gz
Configure Apache:
Options +MultiViews
AddEncoding x-gzip .gz
RemoveType .gz
Works almost as expected, except it requires the original file (file.ext) to not exist and it adds many (useless to me) headers (TCN, Content-Language and few more) that can't be removed (Header unset doesn't remove them).
Rewrite
How this works:
- Create file.ext.gz file.
- Configure Apache:
<pre>
RewriteEngine On
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule (.*)$ $1.gz [L]
AddEncoding x-gzip .gz
<Files file.ext>
ForceType my-custom/mime-type
</FilesMatch>
</pre>
This works well, but requires a ForceType for each compressed file. As I said before, I can't rely on extensions because not all files of certain type will be compressed.
mod_deflate
I didn't investigate it too much, the problem is that if file is too big, then it's split into pieces and sent in chunks and Content-Length is not provided. Increasing size of compression buffers can't eliminate this problem.
Is it possible at all to configure Apache to work as I'd liked to?
I tried to dynamically get Content-Type of file.ext and set it to file.ext.gz, but I didn't find the way how to do this.

Related

Serving precompressed content with Brotli on Apache

I have installed mod_brotli on my WHM server via easyapache 4 - html, css, js files etc are all being compressed.
I then came across this in the offocial docs - https://httpd.apache.org/docs/2.4/mod/mod_brotli.html#precompressed
I have since added this to my Post VirtualHost include file in WHM (post_virtualhost_global.conf) instead of htaccess as I want this to be server wide.
How can I verify if this is working and indeed serving precompressed files? I haven't found anything to say either way, I can only confirm that brotli compression is in use. CPU loads are near enough the same with or without the include so I suspect it may not be saving the compressed files for next time.
This is the virtual host include:
<IfModule mod_headers.c>
# Serve brotli compressed CSS and JS files if they exist
# and the client accepts brotli.
RewriteCond "%{HTTP:Accept-encoding}" "br"
RewriteCond "%{REQUEST_FILENAME}\.br" "-s"
RewriteRule "^(.*)\.(js|css)" "$1\.$2\.br" [QSA]
# Serve correct content types, and prevent double compression.
RewriteRule "\.css\.br$" "-" [T=text/css,E=no-brotli:1]
RewriteRule "\.js\.br$" "-" [T=text/javascript,E=no-brotli:1]
<FilesMatch "(\.js\.br|\.css\.br)$">
# Serve correct encoding type.
Header append Content-Encoding br
# Force proxies to cache brotli &
# non-brotli css/js files separately.
Header append Vary Accept-Encoding
</FilesMatch>
</IfModule>
this is my /etc/apache2/conf.2/brotli.conf
<IfModule brotli_module>
# Compress only a few types
# https://httpd.apache.org/docs/trunk/mod/mod_brotli.html
AddOutputFilterByType BROTLI_COMPRESS text/plain text/css text/html application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript
SetOutputFilter BROTLI_COMPRESS
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-brotli
BrotliFilterNote Input instream
BrotliFilterNote Output outstream
BrotliFilterNote Ratio ratio
LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' brotli
CustomLog "logs/brotli_log" brotli
</IfModule>
and this is /etc/apache2/conf.modules.d/115_mod_brotli.conf
# Enable mod_brotli
LoadModule brotli_module modules/mod_brotli.so
So if anyone can help me figure out how to confirm if the files are precompressed or not that would be great.
Edit: I don't think my files are being pre-compressed. Does anyone have any further info about this? I cannot find any further posts or docs on it at akk
To configure Apache to serve pre-compressed Brotli files:
Make sure brotli compressed files exist right next to the normal files in respective folders. Eg if you have a file /var/www/html/index.html there should also be /var/www/html/index.html.br
Add the following to the right VirtualHost configuration:
RewriteCond %{HTTP:Accept-Encoding} br
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME}.br -f
RewriteRule ^(.*)$ $1.br [L]
<Files *.js.br>
AddType "text/javascript" .br
AddEncoding br .br
</Files>
<Files *.css.br>
AddType "text/css" .br
AddEncoding br .br
</Files>
<Files *.svg.br>
AddType "image/svg+xml" .br
AddEncoding br .br
</Files>
<Files *.html.br>
AddType "text/html" .br
AddEncoding br .br
</Files>
To check if pre-compressed brotli files are being served:
You can log the rewrites to see if your rewrites are in action or no. If these are in action, your pre-compressed brotli files are being served. In your virtual host, add the following:
LogLevel alert rewrite:trace6
Restart your apache2, hit your URL and then grep for rewrite statements in your apache error log
tail -f /var/log/apache2/error.log | grep '[rewrite'
I'm late to the party, but in my crash course of Brotli through Apache, the OP isn't possible.
What the Apache docs show is how to properly serve the files "if" they are pre-compressed, hence the text: "if they exist".
From what I gather in my search to understand this better, Apache can't actually pre-compress the files, this must be accomplished through a binary or extension which is out of Apache's scope.
What Apache mod_brotli does for you is dynamically compress requests on-the-fly as it's being sent. In the case of the OP, using cPanel, if you enable mod_brotli, EasyApache4 adds the necessary bits to serve and compress the files outlined in AddOutputFilterByType as Brotli. Again, these are served dynamically. Generated and served on-the-fly. As far as I can tell, these are cached in memory and not on disk.
Enabling mod_brotli is the easy way to go about enabling brotli, however it's better to pre-compress the files being served as the OP wanted due to the overhead and performance hit on having to literally compress all requests flowing through Apache. I ran across a blog where they talk about this and the difference between dynamic vs static is worth using static pre-compressed files, however if you have a small site, or maybe a really beefy hosting platform, then dynamically serving might work just fine for you.
If I'm not mistaken, you don't even need mod_brotli enabled in order to serve the pre-compressed .br files if you can figure out a way to pre-compress them.
Here's an example of using PHP to pre-compress the files: https://github.com/kjdev/php-ext-brotli
So far, no-one has answered the OP with viable ways to pre-compress files as Brotli (as I too am searching for this) but what I need to point out is that Apache doesn't do the pre-compressing and you will have to continue your search if you're looking for a static way of serving Brotli .br files.
Just remove the reference to /etc/apache2/conf.2/brotli.conf temporarily and restart Apache, and you should see that your precompressed brotli files are still delivered with brotli compression whereas dynamic compressed files (e.g. HTML, or CSS or JS where a precompressed file does not exist) are now not compressed at all.

How to setup static assets caching with apache?

I'd like to optimize caching of static assets (.js, .css, ... files) used in our web. My goal is based on this article (https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#invalidating-and-updating-cached-responses).
In short - because these static assets tend to be updated ad-hoc (sometimes weekly, sometimes twice a day, ...) I'd like to cache them with far future expiration and give them unique names based on the content or modification date or similar. This should allow to have them cached for a long time but have them updated as soon as some change occurs.
Is this technique supported by Apache2 server? Or is there some middle ware system which handles fingerprints generating (to have unique asset names) and updating references to them in HTML file (which won't be cached at all)?
We use LAMP stack on our host.
Thank you in advance
There are a number of techniques, some better than others. One good one is to have the following configuration:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)\.(\d+)\.(bmp|css|cur|gif|ico|jpe?g|js|png|svgz?|webp|webmanifest)$ $1.$3 [L]
</IfModule>
This allows URLs of the form /i/filename.1433499948.gif - but the file that is actually read from disk is just /i/filename.gif parts 1 and 3 of the filename.
This Apache vhost/.htaccess stanza is from H5BP filename-based_cache_busting.conf file, and there are other examples of good practices in the repository.
That, combined with the H5BP mod_expires config, mean you will always be able to trivially renew the users local browser cache with just updating the reference to the file by a new name.
You can enable mod_mime, mod_expires for Apache and use the following snippet
<FilesMatch "\.(png|jp?g|gif|ico|mp4|wmv|mov|mpeg|css|map|woff?|eot|svg|ttf|js|json|pdf|csv)">
ExpiresActive on
ExpiresDefault "access plus 2 weeks"
</FilesMatch>
Or set the respective php headers
session_cache_limiter('none');
header('Cache-control: max-age='.(60*60*24*7)); //one week
header('Expires: '.gmdate(DATE_RFC1123,time()+60*60*24*365)); //one week
Also related article here: How to get the browser to cache images, with php?

Control Cache Expiration For Custom File On Server

We have files that we serve to a Native Windows OS applications from our server. The files can change every minute so we need to ensure the user is downloading the latest file.
We've found that users on Portable WiFi's tend to get served an older file. So we are changing our servers .htaccess file expirations for certain files.
We serve a custom file type (.ebc) and the files contents are sent over HTTP as plain text. In this case should we use ExpiresByType text/ebc "access 1 minute"?
Will changing .htaccess cache control affect Portable Wifi caching or will this only affect browsers?
Should mod_expires / mode_headers occur code occur before redirects and rewrites? I've discovered before that you should perform certain .htaccess code operations before others (such as place redirects at the top of the file).
Heres my code:
RedirectMatch (?i)^/wp-content/uploads/2014/10/a.exe http://www.website.com/wp-content/uploads/2014/10/b.exe
## EXPIRES CACHING Should we place this before mode_rewrite or after? ##
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType text/ebc "access 1 minute"
</IfModule>
## EXPIRES CACHING ##
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
There are 3 questions here, so I'll attempt to answer them.
We serve a custom file type (.ebc) and the files contents are sent over HTTP as plain text. In this case should we use ExpiresByType text/ebc access 1 minute
That should be fine, as long as you have the text/ebc mime-type properly set on your server.
Will changing .htaccess cache control affect Portable Wifi caching or will this only affect browsers
I don't really know what "Portable Wifi caching" is. These headers are targeted at browsers only. If a custom application is downloading these files, it could be implementing its own caching and so these headers might get ignored.
Should mod_expires / mod_headers code occur before redirects and rewrites?
I'd put it before the redirects but only from a logical point of view. These are not like RewriteRules and think they get evaluated separately.
Additionally, I'll add that caching is difficult and once a file has left your server it can be hard to force an update. Different browsers behave different ways and I've come across configurations that work one place and not another.
I would additionally consider two other approaches to what you're attempting.
Firstly, don't cache your files at all:
<FilesMatch "\.ebc$">
Header set Cache-Control no-cache
Header set pragma no-cache
</FilesMatch>
Secondly think about implementing a cache-busting mechanism. If the file is linked from somewhere, try and make sure that link is changed (normally a querystring with a timestamp suffices) each time the file changes. You obviously then need to make sure whatever contains the link also isn't being cached.
An easier solution I used in the past was adding a parameter to the downloadable files.
For example, if the file you're serving is
http://www.domain.tld/file.pdf
then you can create the following link:
http://www.domain.tld/file.pdf?d486dFyg
The question mark and whatever comes after it (random) will be ignored but it will guarantee that the user will always download the latest version, as the URL will be different (because of the random always being different of course).
The downloaded file on the user's computer will just be file.pdf so absolutely no downside.
EDIT: I noticed some reference to WordPress in your question, which is PHP, so you can use the rand() function to append the random part: http://php.net/manual/en/function.rand.php

Manual alternative to mod_deflate

Say I don't have mod_deflate compiled into apache, and I don't feel like recompiling right now. What are the downsides to a manual approach, e.g. something like:
AddEncoding x-gzip .gz
RewriteCond %{HTTP_ACCEPT_ENCODING} gzip
RewriteRule ^/css/styles.css$ /css/styles.css.gz
(Note: I'm aware that the specifics of that RewriteCond need to be tweaked slightly)
Another alternative would be to forward everything to a PHP script, which gzips and caches everything on the fly. On every request, it would compare timestamps with the cached version and return that if it's newer than the source file. With PHP, you can also overwrite the HTTP Headers, so it is treated properly as if it was GZIPed by Apache itself.
Something like this might do the job for you:
.htaccess
RewriteEngine On
RewriteRule ^(css/styles.css)$ cache.php?file=$1 [L]
cache.php:
<?php
// Convert path to a local file path (may need to be tweaked)
cache($_GET['file']);
// Return cached or raw file (autodetect)
function cache($file)
{
// Regenerate cache if the source file is newer
if (!is_file($file.'.gz') or filemtime($file.'.gz') < filemtime($file)) {
write_cache($file);
}
// If the client supports GZIP, send compressed data
if (!empty($_SERVER['HTTP_ACCEPT_ENCODING']) and strpos($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip') !== false) {
header('Content-Encoding: gzip');
readfile($file.'.gz');
} else { // Fallback to static file
readfile($file);
}
exit;
}
// Saved GZIPed version of the file
function write_cache($file)
{
copy($file, 'compress.zlib://'.$file.'.gz');
}
You will need write permissions for apache to generate the cached versions. You can modify the script slightly to store cached files in a different place.
This hasn't been extensively tested and it might need to be modified slightly for your needs, but the idea is all there and should be enough to get you started.
There doesn't seem to be a big performance difference between the manual and automatic approaches. I did some apache-bench runs with automatic and manual compression and both times were within 4% of each other.
The obvious downside is that you'll have to manually compress the CSS files before deploying. The other thing you might want to make very sure is that you've got the configurations right. I couldn't get wget to auto-decode the css when I tried the manual approach and ab reports also listed the compressed data size instead of uncompressed ones as with automatic compression.
You could also use mod_ext_filter and pipe things through gzip. In fact, it's one of the examples:
# mod_ext_filter directive to define the external filter
ExtFilterDefine gzip mode=output cmd=/bin/gzip
<Location /gzipped>
# core directive to cause the gzip filter to be
# run on output
SetOutputFilter gzip
# mod_header directive to add
# "Content-Encoding: gzip" header field
Header set Content-Encoding gzip
</Location>
The advantage of this is that it's really, really easy… The disadvantage is that there will be an additional fork() and exec() on each request, which will obviously have a small impact on performance.

Combining deflate and minify - am i creating overhead?

I minify my css and js files on the fly with google.codes minify. I have also set my .htaccess to use deflate on all my css and js files - the reason beeing some js files (like shadowbox and tinymce) reference to other js files in the code.
So i'm compressing with apache deflate and also minify compresses some js and css files with gzip - am i creating overhead by doing this - first gzipping (minify) and then zlib (deflate) will run through again. Or will apache deflate ignore the already gzipped files having the attributes set by minify in the headers. Anyone have any experiences with this?
Minifying + deflating/gzipping works great together.
I use mod rewrite to do that purpose, I have pre-built all the css/js files into 2 versions, original and .css.gz/.js.gz version.
Browser just send .js/.css request, server checks the existance of .js.gz/.css.gz and return gzipped content if certain conditions are matched.
So it does not matter for js/css file are loaded on the fly from js (for example your shadowbox or tinymce)
Basically, like this
RewriteEngine On
RewriteBase /
#Check for browser's Accept-Encoding,
RewriteCond "%{HTTP:Accept-Encoding}" "gzip.*deflate|deflate.*gzip"
#check file name is endswith css or js
RewriteCond %{REQUEST_FILENAME} "\.(css|js)$"
#check existance of .gz file name
RewriteCond %{REQUEST_FILENAME}.gz -s
#rewrite it to .js.gz or .css.gz
RewriteRule ^.*$ %{REQUEST_URI}.gz [L]
#update some response header
<FilesMatch "\.js\.gz$">
AddEncoding gzip .gz
ForceType "text/javascript"
</FilesMatch>
<FilesMatch "\.css\.gz$">
AddEncoding gzip .gz
ForceType "text/css"
</FilesMatch>
gzip uses the zlib compression algorithm, and most byte sequences will not compress well the second time around.
Minify doesn't serve the files through Apache, so there's no double-encoding.
With the DEFLATE filter, Apache gzips the requested file on-the-fly each time. Minify gzips the file on the first request then sends the pre-gzipped cached version for later requests.
Being PHP-based it trades performance for flexibility and ease-of-maintenance, but if you throw a proxy cache in front of it it'll perform as well as S.Mark's configuration.