mod_pagespeed is not sending expected output with extended cache - apache

For testing purposes, I have this in my Apache configuration:
<Directory "/home/http">
...
<FilesMatch "\.(html|htm)$">
Header unset Etag
Header set Cache-control "max-age=0, no-cache"
</FilesMatch>
<FilesMatch "\.(jpg|jpeg|gif|png|js|css)$">
Header unset Etag
Header set Cache-control "public, max-age=10"
</FilesMatch>
</Directory>
This basically says to set static assets to have a cache that lasts 10 seconds. Again this is for testing and demonstration purposes.
I test it out by navigating directly to the file
$ wget -O - --save-headers localhost/mod_pagespeed_example/images/Puzzle.jpg
Cache-control: public, max-age=10
which works fine. But then I try to load the page with mod_pagespeed and extend_cache enabled
$wget -O - --save-headers localhost/mod_pagespeed_example/extend_cache.html?ModPagespeed=on&ModPagespeedFilters=extend_cache
<img src="images/Puzzle.jpg"/>
$wget -O - --save-headers localhost/mod_pagespeed_example/extend_cache.html?ModPagespeed=on&ModPagespeedFilters=extend_cache
<img src="http://localhost/mod_pagespeed_example/images/xPuzzle.jpg.pagespeed.ic.hgbHsZe0IN.jpg"/>
This is all fine and dandy. The initial request doesn't work because it needs to load the info into the cache, but from there it correctly replaces the src of the img tag with the cached, hashed version.
However, this only persists UNTIL max-age. So, if I have it set to 10 seconds, it will continue to point to http://localhost/mod_pagespeed_example/images/xPuzzle.jpg.pagespeed.ic.hgbHsZe0IN.jpg, but then it will revert to images/Puzzle.jpg again after 10 seconds, at which time it will go back to the cached version.
Is this expected behavior? I would think that pagespeed would check the hash after max-age, and if it's the same it wouldn't bother changing it back to the original value, but instead continue serving the cached file.
This is somewhat concerning. If I set max-age to something more useful, say 60 minutes, that will allow me to continue to update these asset files and assure that my updates are seen in a timely manner. However, if the site is visited once per day by users, then that is more than the max-age and they will always be served the original file rather than the cached version.

This is expected behavior. As you mentioned, the reason is that the resource has expired in cache and so we need to re-check it to make sure it is still the same. We do not want to block the user request while we check all the sub-resources.
Note, one solution to this would be to use ModPagespeedLoadFromFile. This will check the file's last modified time on disk and so can check even if the resource expired in cache.

Related

Netlify Headers Cache Control For Static Assets

Is it possible to have Cache Control but only for static assets like image, font, css and js?
Here's my workaround
[[headers]]
for = "/*" # This defines which paths this specific [[headers]] block will cover.
[headers.values]
Cache-Control = "public, max-age=604800"
it preety much works but not as I expected. The site seems to use the old version even when I updating the content.
You've now said that the browser should cache every file, including index.html, for a week, for anyone who has visited your site. So, you'll see the old copy of your site for that long.
This is probably not what you want. A better way to do it is to create several header rules, one for each type:
[[headers]]
for = "*.js" # js files should be set this way
[headers.values]
Cache-Control = "public, max-age=604800"
[[headers]]
for = "*.css" # css files too
[headers.values]
Cache-Control = "public, max-age=604800"
However, you may not want to do even this. Netlify sets the caching very intentionally to max-age of 0 but it does allow content to be cached AND enables atomic rollbacks and deploys. Here's the details about that: https://www.netlify.com/blog/2017/02/23/better-living-through-caching/

apache2 DirectoryIndex change does not bypass cached index

I am trying to make sure that visitors to my website see the latest version. To this end I have written a script to rename appropriate files and requests so that they append a fresh version number at build time. This includes the index file, let's call it index-v123.html.
I have uploaded this built source and pointed my apache2 server to the new index file by including
DirectoryIndex index-v123.html
in my apache2.conf. I have restarted it, and when viewing the website in chrome incognito mode or on hard refresh I can see that all the new files are loaded and the website works as expected.
My issue is that in my normal browser, when I visit the URL, I still load up a cached version of index.html. Clearly changing the DirectoryIndex didn't convince the client to go to the new index file like I'd hoped...
So can I do anything to make this happen?
(Also may be relevant: I am running a progressive web app using Polymer 2.0, with a service-worker.js that is built automatically by polymer build.)
This turned out to be a service worker issue: service-worker.js was being cached on the client side, and hence was providing outdated content as if the client was in offline mode. Could only be updated by deregistering the worker. The solution was to implement max-age=0 on the service worker at the apache2 server side:
<Files index.html|service-worker.js>
FileETag None
Header unset ETag
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "Wed, 11 Jan 1984 05:00:00 GMT"
</Files>
Was surprised this wasn't better highlighted in the polymer build/production docs somewhere. For reference, in the google primer on service workers it says:
The caching headers on the service worker script are respected (up to
24 hours) when fetching updates. We're going to make this opt-in
behaviour, as it catches people out. You probably want a max-age of 0
on your service worker script.

Clear web browser cache programmatically

I am working on a website with PHP in backend and AngularJS in frontend. and it's served via apache2.4.
My problem is when I update my website to a new version some users cannot see the latest modifications, so I added this .htaccess to force cleaning the cache every 1 hour, but it doesn't work as I expected.
FileETag None
<ifModule mod_headers.c>
Header unset ETag
Header set Cache-Control "max-age=3600, must-revalidate, private"
</ifModule>
Could you give me the right cache configuration to force the browsers to get the last update whenever a new version is available?
Within your build process, you could append a query parameter to your static files such as JS / CSS like : app.js?1476109496 (where epoch is a unique reference such as deployment epoch, commit hash or similar) which would cause browsers to request a new version without needing to mess with your .htaccess.

Apache2 no cache specific file only

I'm running and Amazon EC2 with Ubuntu 14.4 and Apache2 with no PHP or other server-side script--just a static content site. I used this tutorial to get to the point I am at now with the apache file (see screenshot at link below):
https://www.digitalocean.com/community/tutorials/how-to-configure-apache-content-caching-on-ubuntu-14-04
I want to have a directive (if that is the nomenclature) that tells Apache to not cache a single, specific file only, but still handle everything else as it is already configured. I'm no computer whizz here--just learning. Is there a way to do this? Currently I have made a new directory inside my images folder called "no-cache" where the image I do not want cached lives.
I tried adding a second location tag below the first one with "CacheDisable on" inside it, however this is not supported. Also tried using a Directory tag, but this also does not work with the current configuration.
Thanks in advance!
/etc/apache2/sites-enabled/000.default.conf
The link you provided is a bit confusing since it mentions so many different types of caching.
When dealing with Webservers and caching, what you usually mean is sending cache messages (using http headers) to define how the browser should handle caching, to improve visitors performance. This is the last item discussed in that link of yours, despite being the most common. The first section talks about Apache caching files itself to improve its own performance and is much less common.
If client side caching using mod_expiries is what you mean, then you can control this with location headings:
#Allow long term assets to be cached for 6 months as they are versioned and should never change
<Location /assets/libraries/ >
ExpiresDefault A15724800
Header set Cache-Control "public"
</Location>
#Do not cache these files
<Location /login >
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
</Location>
I've a more detailed blog on this here: https://www.tunetheweb.com/performance/http-performance-headers/caching/.

Enable caching of css and js files in Apache

Using Apache 2.4 on Debian 8.2, I am trying to enable caching of all css and js files. Caching of images works fine; that is, the browser receives a 304 status, so it doesn't download again. But I cannot get caching of other files working.
I use this inside a virtual host file:
<IfModule mod_expires.c>
<FilesMatch "\.(jpe?g|png|gif|js|css)$">
ExpiresActive On
ExpiresDefault "access plus 1 week"
</FilesMatch>
</IfModule>
The expires module is enabled. I did restart apache, cleaned browser cookies, etc. No success.
The response for a gif image, from browser developer tools:
Cache-Control:max-age=604800
Connection:Keep-Alive
Date:Wed, 25 Nov 2015 21:37:50 GMT
ETag:"4174a-4e69c97fbf080"
Expires:Wed, 02 Dec 2015 21:37:50 GMT
Keep-Alive:timeout=5, max=100
Server:Apache/2.4.10 (Debian)
The response for a css file:
Accept-Ranges:bytes
Cache-Control:max-age=604800
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:135
Content-Type:text/css
Date:Wed, 25 Nov 2015 21:37:50 GMT
ETag:"5116-525639d271c78-gzip"
Expires:Wed, 02 Dec 2015 21:37:50 GMT
Keep-Alive:timeout=5, max=99
Last-Modified:Wed, 25 Nov 2015 20:50:52 GMT
Server:Apache/2.4.10 (Debian)
Vary:Accept-Encoding
It looks like the expires heading is set correctly, but the browser keeps requesting the file (200 OK).
I tried with Chrome and Firefox.
Summary:
1.) When I follow links inside the web site, the browser uses the cached files. But when I press F5, they re-download the css and js files, but they don't re-download images. Images give 304. That is fine.
2.) When I press Ctrl-F5, naturally enough, all files are re-downloaded. That's fine too.
3.) So the problem is how to enable caching (just like images) for other files. Why is apache discriminating between images and other files? I didn't put anything special to images in the config files.
Q: How to properly enable caching of css and js files?
Another Q: is there a special http header that says to the browser never to request the file. The reason is, sending even a request to check if the file is modified takes 100-200 ms, which is too much. I am sure the files will not be modified. And if they are modified, I can easily put a version string at the end of the css file, such as myFile.css?v=1.1 So I hope there should be a way to stop sending requests completely.
SOLVED
First, there is a bug in apache as mentioned in the answer below.
Second, there was a misunderstanding on my part. I guess this is how modern browsers work:
1.) Follow links inside a web site: No request is sent, even to check if the file has been modified.
2.) F5: Send a request. If file is not modified then the server responds 304.
3.) Ctrl+F5: Full download.
The behavior about F5 does not make sense to me. Anyway.
In case anybody needs it, here is a working solution that I put into virtual host file:
RequestHeader edit "If-None-Match" "^\"(.*)-gzip\"$" "\"$1\""
Header edit "ETag" "^\"(.*[^g][^z][^i][^p])\"$" "\"$1-gzip\""
LoadModule expires_module /usr/lib/apache2/modules/mod_expires.so
ExpiresActive On
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
ExpiresDefault "access plus 4 weeks"
</FilesMatch>
Turn off Etags, they don't play well in Apache when gzip is on for 304s.
See here: Apache is not sending 304 response (if mod_deflate and AddOutputFilterByType is enabled)
As images are already compressed they are typically not gzipped and hence why they work.
ETags are not that useful in my opinion in their current implementation (see my blog here for a more in depth discussion as to why) so, coupled with above bug, I turn them off.
For your second question just set a long expiry.
As discussed in the comments below there are three scenarios:
Normal browsing - in which caching should be used and 304s only used if cache is still valid after expiry (in which case it's set to valid again for same expiry).
F5 or Refresh button. This is an explicit action by the user to confirm the page and all its resources are still valid so they all will be double checked (even those still in cache and still valid according to expiries header) and 304s sent when they haven't changed. It does not mean "just redownload anything which has expired but leave cached items alone as they are still valid" as you think it should. Personally I think the current implementation the browsers use makes sense and your method would be confusing to end users. While some sites may version assets like images, css and JavaScript so rechecking is a waste of time, not all such sites do this.
Ctrl+F5. This is a force refresh. It means "Ignore cache and download everything". It's rarely needed except by developers who change files requested on development servers.
Hope that all makes sense.
Edit 12 May 2016: Looks like Firefox is bringing in the functionality you actually want: https://bitsup.blogspot.ie/2016/05/cache-control-immutable.html?m=1
If nothing else seems to work, don't forget to turn the disable cache from devtools in you browser!!!