mod_expires in apache htaccess - apache

I am learning about apache and its various modules, currently i am confused about mod_expires. What i read so far is that using this module we can set future expiry header for static files so that browser need not to request them each time.
I am confused about the fact that if some one change css/js or any image file in between, how will browser come to know about it since we have already told the browser that this is not going to change say for next 1 year.
Thanks in advance

It may not be possible for all provided content on your HTTP server, but you can simply change the name of the file to update a file on the client side from the server. At that point, the browser will download the new content.
Sometimes, for websites with less traffic it is far more functional to set the cache to a much lower value.
An expiration of 365 days should always be used with caution, and the fact that you can set an expiration of 1 year does not mean you always have to do it. In other words, do not fall prey to premature optimization.
A good example of setting cache expiration to 1 year are countries' flags, which are not likely to change. Also, be aware that with a simple browser refresh of a page, the client can discard the local cache and download the content again from the origin.
A good and easy way of testing all this is to use Firefox with Firebug. With this extension, you can analyze requests and responses.
Here you can find the RFC specifications.

Related

Force a page to be re-downloaded, rather than fetched from browser cache - Apache Server

Ive made a minor text change to our website, the change is minor in that its a couple of words, but the meaning is quite significant.
I want all users (both new and returning) to the site to see the new text rather than any cached versions, is there a way i can force a user (new or returning) to re download the page, rather than fetch it from their browser cache ?
The site is a static html site hosted on a LAMP server.
This depends totally on how your webserver has caching set up but, in short, if it's already cached then you cannot force a download again until the cache expires. So you'll need to look at your cache headers in your browsers developer tools to see how long it's been set for.
Caching gives huge performance benefits and, in my opinion, really should be used. However that does mean you've a difficulty in forcing a refresh as you've discovered.
In case you're interested in how to handle this in the future, there are various cache busting methods, all of which basically involve changing the URL to fool the browser into thinking its a different resource and forcing the download.
For example you can add a version number to a resource so you ask for so instead of requesting index.html the browser asks for index2.html, but that could mean renaming the file and all references to it each time.
You can also set up rewrites in Apache using regular expressions so that index[0-9]*.html actually loads index.html so you don't need multiple copies of the file but can refer to it as index2.html or index3.html or even index5274.html and Apache will always serve the contents of index.html.
These methods, though a little complicated to maintain unless you have an automated build process, work very well for resources that users don't see. For example css style sheets or JavaScript.
Cache busting techniques work less well for HTML pages themselves for a number of reasons: 1) they create unfriendly urls, 2) they cannot be used for default files where the file name itself is not specified (e.g. the home page) and 3) without changing the source page, your browser can't pick up the new URLs. For this reason some sites turn off caching for the HTML pages, so they are always reloaded.
Personally I think not caching HTML pages is a lost opportunity. For example visitors often visit a site's home page, and then try a few pages, going back to the home page in between. If you have no caching then the pages will be reloaded each time despite the fact it's likely not to have changed in between. So I prefer to have a short expiry and just live with the fact I can't force a refresh during that time.

Best apache caching scheme for files that get refreshed on specific date

On my website I have some google map overlay images that get updated once per year. So they are candidates for browser caching.
What is the best way of specifying the caching? E.g. if I use ...
Header set Cache-Control "max-age=31536000, public"
(31536000 secs = 1 year)
as far as I understand it this is no use, as if somebody accesses the website one day before I update the images, then they will have to wait one year before they see the correct new images? Can I specify a date when the images will expire rather than a duration. Or is there a better way to handle this?
Also, I cant seem to get the regular expression to work. Can anyone see what could be wrong with this code in my .htaccess file (I want to match all .PNG images in a specific directory) ...
<FilesMatch "\/overlayDirectoty\/[^\.]+\.png$">
Header set Cache-Control "max-age=31536000, public"
</FilesMatch>
I'm on shared Linux/Apache hosting (goDaddy).
UPDATE
The image files have an average size of 580 bytes. But many will be downloaded as the user pans and zooms the map (there are 12000 of them in total).
UPDATE
I've just discovered this. If I know I am going to update the images on 1st Jan every year at earliest, will this work? ...
Header set Expires "Sun, 1 Jan 2014 00:00:00 GMT"
In this case I would set the image to never expire but then when you do change it, use a different file-name.
I'm not sure you need to do anything at all.
Is Apache already responding with a Last-modified header? (It should be, for static png files.) If so, then browsers should be sending an If-modified-since header with all subsequent requests; this will cause your server to reply with an HTTP 304 instead of actually re-sending the image. (The ETag header acts similarly.)
When you do update your files--that one time per year--the file update time will change and all subsequent requests will get the new version of the png file.
The down side of this approach is that every browser will still make a request to your server for each image it's trying to render--so you'll see many 304s in your logs. But that 304 traffic is (generally) quite minimal when compared with multi-kilobyte image files.
First you have to consider whether you want to take advantage of revalidating using etags.
http://en.wikipedia.org/wiki/HTTP_ETag
You can either let the browser cache the image entirely, or let the client perform a head request that is only used to validate whether the etag is still the same. The e tag is caculated by apache on the fly using file modification time, name and size.
In other words: If your image changes, its e-tag changes.
If the e-tag does not change, the client uses the cacehd version and does not download the file.
However you will have the overhead of a head request, which is minimal and I would recommend this approach.
Nevertheless for completeness lets discuss the other possibilities:
Change the filename
When the filename changes, the browser will re-fetch it. A common practice is to append a so called "cachebreaker" to the file as a query string. If you generate your src urls in php, just append something like the modification timestamp so the url looks like
image.jpg?UNIX_TIMESTAMP
use a caching rule that expires at your point of choosing
I think this is not good to maintain because you nail yourself down as the when refresh the image and you cannot before. But then changing the filename always remains as a measure of "last resort".
You could set the header dynamically using a scripting language and calculate it, however this will be not as performant as delivering using the webserver. There are also combinations like mod_xsendfile, but that would be absolutely overkill for your demand.
No, I think you are not looking at the bigger picture.
mod_expires lets you pecify caching lifetimes in reference to the current time (access) or te file modification. If you make sure your file modification time is correct, just set it inreference to that.
Read up here:
http://httpd.apache.org/docs/2.4/mod/mod_expires.html
But what you could really do is, just code it!
Its only a header you know, and no one forces you to use mod expires.
Just set the header manually using mod_headers!
Read up here:
http://httpd.apache.org/docs/2.4/mod/mod_headers.html
I will not cover this with examples because i really think you should use etag.

Asset cache time on Shopify servers

When editing a custom .css.liquid file that is not automatically set up by Shopify and cannot be placed in a page (since it does not have access to Shopify's Liquid templating system), I find that it can take hours for the CDNs to start serving up the new version of said .css.liquid file.
In the future, how can I cut down on this waiting time? Currently, here's what I think is going on:
Most asset urls have some number appended to them, like so: path/to/filename?270. It could be that this number is meant to represent last time file was served, version number, or some other flag to indicate to serve up the file. If so, then I can just create a template to grab this info myself (though I prefer not having to take an additional step.
The CDN servers' cache times are high, and will not reissue a new representation of the file until the data in the cache has expired. If so, there's not much I can do about this.
Please let me know if it's one of the above situations, or if it's something else.
I've had success with re-saving the layout file that calls the .css.liquid file.
For example: edit something then save it up to the server. And then edit it back again and save that back up to the server.
This seems to increment the query string on the path to the css file.

random numbers after a ? on an image is this for far expires headers?

I was pooking around a random site on the internet and noticed that on the images they have numbers prefixing it: icons-16.png?1292032550
I've heard of people optimising websites with far expires headers. If someone changes the content that doesn't change very often, the cache won't get refreshed. Ergo this new image won't get re-downloaded to someones cache. Because the filename has to change.
Yes, the intent is probably to force a refresh of the browser cache. However, I do not recommend this approach:
Many proxies (and possibly some browsers) simply will not cache anything with a query string, regardless of Cache-Control headers. You're shooting yourself in the foot if you include a superfluous query string – you'll needlessly consume your own bandwidth sending images that should be cached, but aren't.
Depending on how you configure your server, user agents will periodically make a request for cached resources, with a If-Modified-Since and/or If-None-Match header. If the client's cache is up to date, the server responds with 304 Not Modified and stops; otherwise it responds with a normal 200 OK and sends the new content. You do not have to change a resource's file name in order for client caches to be updated when the resource changes. Trying to get clever with a query string only serves to defeat caching mechanisms.
That said, if you do optimize caching by setting an Expires date a year out (and if the Last Modified date of the resource is long ago), user agents may check for updates infrequently. If this is unacceptable to you, you have two options: either reduce the amount of time before the resource expires (so that the browser will issue a GET request and you can respond with 304 or 200 as appropriate), or use "URL fingerprinting," where a random token is included in the path, instead of in the query string. For example:
/img/a03f/image.png
instead of
/img/image.png?a03f
This way, your resources are still cached by proxies. You'll probably want to look in to using mod_rewrite to allow you to include a token in the path. Of course, you need to be able to change all references to this URL whenever you change the resource.
For further reading, I highly recommend Google's page speed best practices, specifically the section on optimizing caching.
Yes. One way to get around caches is to append an otherwise inconsequential query parameter, such as a time stamp, to a URL.
This is a way to compensate for your web server e.g. not generating the correct ETag headers when your content changes. Using ETag with a value like the file's hash will be a better approach for telling the browser that some content has changed and it must be reloaded.

Web site migration and differences in firebug time profiles

I have a php web site under apache (at enginehosting.com). I rewrote it in asp.net MVC and installed it at discountasp.net. I am comparing response times with firebug.
Here is the old time profile:
Here is the new one:
Basically, I get longer response times with the new site (not obvious on the pictures I posted here but in average yes with sometimes a big difference like 2s for the old site and 9s for the new one) and images appear more progressively (as opposed to almost instantly with the old site). Moreover, the time profile is completely different. As you can see on the second picture, there is a long time passed in DNS search and this happens for images only (the raw html is even faster on the new site). I thought that once a url has been resolved, then it would be applied for all subsequent requests...
Also note that since I still want to keep my domain pointed on the old location while I'm testing, my new site is under a weird URL like myname.web436.discountasp.net. Could it be the reason? Otherwise, what else?
If this is more a serverfault question, feel free to move it.
Thanks
Unfortunately you're comparing apples and oranges here. The test results shown are of little use because you're trying to compare the performance of an application written using a different technology AND on a different hosting company's shared platform.
We could speculate any number of reasons why there may be a difference:
ASP.NET MVC first hit and lag due to warmup and compilation
The server that you're hosting on at DiscountASP may be under heavy load
The server at EngineHosting may be under utilised
The bandwidth available at DiscountASP may be under contention
You perhaps need to profile and optimise your code
...and so on.
But until you benchmark both applications on the same machine you're not making a proper scientific comparison and are going to be pulling a straws.
Finally, ignore the myname.web436.discountasp.net url, that's just a host name/header DiscountASP and many other hosters add so you can test your site if you're waiting for a domain to be transferred/registered, or for a DNS propagation of the real domain name to complete. You usually can't use the IP addresse of your site because most shared hosters share a single IP address across multiple sites on the same server and use HTTP Host Headers.