Browser Resource Caching (Images, Javascript, CSS) - apache

I am having trouble with caching on a website that I'm working on updating. Many of the resources I've updated (javascript, css, and image files) appear to be cached locally by browsers. What I can't understand is why, or how to resolve this short of renaming everything I've edited (which isn't a very attractive solution).
The server is generating the following in http headers.
Date Fri, 06 Jan 2012 00:09:32 GMT
Server Apache/2.2.16 (Amazon)
X-Powered-By PHP/5.3.5
Expires Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma no-cache
Content-Length 3636
Connection close
Content-Type text/html; charset=UTF-8
Based on what I understand from cache control directives, resources shouldn't be cached. Despite this, I'm getting images, css, and javascript files that are not updating after I upload changes.
Any ideas on how I can resolve this or best practices?

After making little headway on this I decided to place all of my cache-able resources in versioned folders (ie. css-1.2.1, js-1.2.1, etc). Any time I update the site I just increment my version numbers (I'm keeping all version numbers synchronized). This is suboptimal in terms of cache optimization, but it means I don't have to track individual version numbers for every resource. Given I'm not updating the site every four hours, it will just mean users have to download a fresh resource set whenever I update the site.

Related

Mobile UC Browser blocking CDN assets

I have website setup to serve static files via CDN. I'm using cloudfront CDN. When i open my website on mobile using UC Browser. it only loads the HTML page and do not accepts the files(images,.js,.css, etc..) delivered by CDN.
However When I turn of CDN and serve files directly then my sites work without any problem. The problem occurs when ad-block mode is On in UC Browser.
about 80% visitors of my site uses UC browser and I don't want to turn off CDN. Also I've seen some websites that uses CDN and works fine in UC browser with ad-block mode turned On.
here are the headers returned by CDN for js file
age: 237446
cache-control: public, max-age=2592000
content-encoding: gzip
content-type: application/javascript
date: Mon, 24 Sep 2018 10:51:26 GMT
last-modified: Sat, 15 Sep 2018 08:56:57 GMT
server: Apache/2.4.18 (Ubuntu)
status: 200
vary: Accept-Encoding
via: 1.1 b5fbf0f0a39109b04a248c9bb51b5d6a.cloudfront.net (CloudFront)
x-amz-cf-id: IoyQ_74Qgo9nqBRJN0YHkTZk-atMfRscw50y27LxPiMuUG0VeQUxHA==
x-cache: Hit from cloudfront`

What is the difference between "always" and "onsuccess" in Apache's Header config?

I have a website where virtual hosts are defined in /etc/apache2/sites-enabled/ with a header being set with the always option like this:
Header always set X-Frame-Options DENY
If I now set the same header using .htaccess in the web site's root folder, but without always then the header is returned twice in the server's response.
The setting in .htaccess (amongst others):
Header set X-Frame-Options DENY
The server's response:
HTTP/1.1 200 OK
Date: Mon, 02 May 2016 16:02:29 GMT
Server: Apache/2.4.10 (Debian)
X-Frame-Options: DENY
Cache-Control: no-cache, no-store, must-revalidate, private
Pragma: no-cache
X-XSS-Protection: 1
X-Content-Type-Options: nosniff
Last-Modified: Mon, 02 May 2016 15:03:42 GMT
Accept-Ranges: bytes
Content-Length: 0
X-Frame-Options: DENY
X-XSS-Protection: 1
X-Content-Type-Options: nosniff
Cache-Control: no-cache, no-store, must-revalidate, private
Pragma: no-cache
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
The Apache docs say that without the always option the default value of onsuccess is used. But they also say that "... the default value of onsuccess does not limit an action to responses with a 2xx status code..." (http://httpd.apache.org/docs/current/en/mod/mod_headers.html#header).
But if I don't add always, then error pages like 301s and 404s will not have the header set. On the other hand, if I do add always then the headers might be set twice if I do use the default value (i.e. onsuccess) in .htaccess. As the docs state: "repeating this directive with both conditions makes sense in some scenarios because always is not a superset of onsuccess with respect to existing headers". Setting headers twice is not always valid for an HTTP response, see https://stackoverflow.com/a/4371395/641481. So I want to avoid it, naturally.
My question now is: When exactly should I use onsuccess (i.e. the default value) and when always? I must admit that even after reading through the Apache docs a couple of times I do not exactly understand this. Pragmatically it seems that always using always leads to the correct/expected behaviour.
I also do not understand why Apache writes the header twice if it is set in always and onsuccess. It seems wrong to me, but there must be a good reason for this, since I assume the Apache-devs know a lot more than I do about HTTP ;-)
This is only a partial answer since it does not cover the onsuccess attribute. It is based on experiences using apache 2.4.7 running on an Ubuntu 14 os. Hope it helps you along.
The pure set parameter, without attributes, to the Header directive overwrites any always attribute by forcing the argument to Header set to be the only one delivered. If the same directive appears in a directory, i.e. file system based .htaccess file it has precedence over the same directive noted in a virtual host definition file related to that directory. If the attribute always is noted additionaly, it has the effect that any, equal or different, notation of the same directive is added to the server answer instead of overwriting/replacing it.
Probably the onsuccess attribute, which i unfortunately do not have the time to explore now, may be handled similar as the always attribute.
We use Adobe Experience Manager with a Dispatcher [caching] module for our Apache webserver. Adobe recently changed the code below. Essentially I believe you may need to use the "expr=" syntax to make sure the value is not already set. That should eliminate the duplicates.
Here's the reference code from Adobe:
Original Config: Header always append X-Frame-Options SAMEORIGIN
New Config: Header merge X-Frame-Options SAMEORIGIN "expr=%{resp:X-Frame-Options}!='SAMEORIGIN'"
When I inquired, Adobe gave me the following reasons. Thanks Adobe.
Explanation:
Using "merge" instead of "append" prevents the entry's value from from being added to the header more than once.
expr=expression: The directive is applied if and only if expression evaluates to true. Details of expression syntax and evaluation are documented in the ap_expr documentation. The "expr" is looking in the response headers from the server (Publisher Application Server) to make sure that it does not include SAMEORIGIN. This ensures that SAMEORGIN is not duplicated in the response header sent back to the request client.
It's required because testing has found that when AEM included this header Apache would duplicate the SAMEORIGIN value even with the merge option. Apache is capable of proper merge when it sources the header from itself, but because the first header was set by AEM outside of the Apache instance is when it gets weird (and requires the extra expression).
It also appears that they do not use "always" with the merge+expr syntax. Perhaps to also work around an Apache weirdness.
PS... remember to change "SAMEORIGIN" for "DENY" in your case.

Enable caching of css and js files in Apache

Using Apache 2.4 on Debian 8.2, I am trying to enable caching of all css and js files. Caching of images works fine; that is, the browser receives a 304 status, so it doesn't download again. But I cannot get caching of other files working.
I use this inside a virtual host file:
<IfModule mod_expires.c>
<FilesMatch "\.(jpe?g|png|gif|js|css)$">
ExpiresActive On
ExpiresDefault "access plus 1 week"
</FilesMatch>
</IfModule>
The expires module is enabled. I did restart apache, cleaned browser cookies, etc. No success.
The response for a gif image, from browser developer tools:
Cache-Control:max-age=604800
Connection:Keep-Alive
Date:Wed, 25 Nov 2015 21:37:50 GMT
ETag:"4174a-4e69c97fbf080"
Expires:Wed, 02 Dec 2015 21:37:50 GMT
Keep-Alive:timeout=5, max=100
Server:Apache/2.4.10 (Debian)
The response for a css file:
Accept-Ranges:bytes
Cache-Control:max-age=604800
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:135
Content-Type:text/css
Date:Wed, 25 Nov 2015 21:37:50 GMT
ETag:"5116-525639d271c78-gzip"
Expires:Wed, 02 Dec 2015 21:37:50 GMT
Keep-Alive:timeout=5, max=99
Last-Modified:Wed, 25 Nov 2015 20:50:52 GMT
Server:Apache/2.4.10 (Debian)
Vary:Accept-Encoding
It looks like the expires heading is set correctly, but the browser keeps requesting the file (200 OK).
I tried with Chrome and Firefox.
Summary:
1.) When I follow links inside the web site, the browser uses the cached files. But when I press F5, they re-download the css and js files, but they don't re-download images. Images give 304. That is fine.
2.) When I press Ctrl-F5, naturally enough, all files are re-downloaded. That's fine too.
3.) So the problem is how to enable caching (just like images) for other files. Why is apache discriminating between images and other files? I didn't put anything special to images in the config files.
Q: How to properly enable caching of css and js files?
Another Q: is there a special http header that says to the browser never to request the file. The reason is, sending even a request to check if the file is modified takes 100-200 ms, which is too much. I am sure the files will not be modified. And if they are modified, I can easily put a version string at the end of the css file, such as myFile.css?v=1.1 So I hope there should be a way to stop sending requests completely.
SOLVED
First, there is a bug in apache as mentioned in the answer below.
Second, there was a misunderstanding on my part. I guess this is how modern browsers work:
1.) Follow links inside a web site: No request is sent, even to check if the file has been modified.
2.) F5: Send a request. If file is not modified then the server responds 304.
3.) Ctrl+F5: Full download.
The behavior about F5 does not make sense to me. Anyway.
In case anybody needs it, here is a working solution that I put into virtual host file:
RequestHeader edit "If-None-Match" "^\"(.*)-gzip\"$" "\"$1\""
Header edit "ETag" "^\"(.*[^g][^z][^i][^p])\"$" "\"$1-gzip\""
LoadModule expires_module /usr/lib/apache2/modules/mod_expires.so
ExpiresActive On
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
ExpiresDefault "access plus 4 weeks"
</FilesMatch>
Turn off Etags, they don't play well in Apache when gzip is on for 304s.
See here: Apache is not sending 304 response (if mod_deflate and AddOutputFilterByType is enabled)
As images are already compressed they are typically not gzipped and hence why they work.
ETags are not that useful in my opinion in their current implementation (see my blog here for a more in depth discussion as to why) so, coupled with above bug, I turn them off.
For your second question just set a long expiry.
As discussed in the comments below there are three scenarios:
Normal browsing - in which caching should be used and 304s only used if cache is still valid after expiry (in which case it's set to valid again for same expiry).
F5 or Refresh button. This is an explicit action by the user to confirm the page and all its resources are still valid so they all will be double checked (even those still in cache and still valid according to expiries header) and 304s sent when they haven't changed. It does not mean "just redownload anything which has expired but leave cached items alone as they are still valid" as you think it should. Personally I think the current implementation the browsers use makes sense and your method would be confusing to end users. While some sites may version assets like images, css and JavaScript so rechecking is a waste of time, not all such sites do this.
Ctrl+F5. This is a force refresh. It means "Ignore cache and download everything". It's rarely needed except by developers who change files requested on development servers.
Hope that all makes sense.
Edit 12 May 2016: Looks like Firefox is bringing in the functionality you actually want: https://bitsup.blogspot.ie/2016/05/cache-control-immutable.html?m=1
If nothing else seems to work, don't forget to turn the disable cache from devtools in you browser!!!

Why browser does not send "If-None-Match" header?

I'm trying to download (and hopefully cache) a dynamically loaded image in PHP. Here are the headers sent and received:
Request:
GET /url:resource/Pomegranate/resources/images/logo.png HTTP/1.1
Host: pome.local
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: PHPSESSID=fb8ghv9ti6v5s3ekkmvtacr9u5
Response:
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2013 11:00:36 GMT
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.14 ZendServer/5.0
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Disposition: inline; filename="logo"
ETag: "1355829295"
Last-Modified: Tue, 18 Dec 2012 14:44:55 Asia/Tehran
Keep-Alive: timeout=5, max=98
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: image/png
When I reload the URL, the exact same headers are sent and received. My question is what should I send in my response to see the If-None-Match header in the consequent request?
NOTE: I believe these headers were doing just fine not long ago, even though I can not be sure but I think browsers are changed not to sent the If-None-Match header anymore (I used to see that header). I'm testing with Chrome and Firefox and both fail to send the header.
Same Problem, Similar Solution
I've been trying to determine why Google Chrome won't send If-None-Match headers when visiting a site that I am developing. (Chrome 46.0.2490.71 m, though I'm not sure how relevant the version is.)
This is a different — though very similar — answer than the OP ultimately cited (in a comment regarding the Accepted Answer), but it addresses the same problem:
The browser does not send the If-None-Match header in subsequent requests "when it should" (i.e., server-side logic, via PHP or similar, has been used to send an ETag or Last-Modified header in the first response).
Prerequisites
Using a self-signed TLS certificate, which turns the lock red in Chrome, changes Chrome's caching behavior. Before attempting to troubleshoot an issue of this nature, install the self-signed certificate in the effective Trusted Root Store, and completely restart the browser, as explained at https://stackoverflow.com/a/19102293 .
1st Epiphany: If-None-Match requires an ETag from the server, first
I came to realize rather quickly that Chrome (and probably most or all other browsers) won't send an If-None-Match header until the server has already sent an ETag header in response to a previous request. Logically, this makes perfect sense; after all, how could Chrome send If-None-Match when it's never been given the value?
This lead me to look at my server-side logic — particularly, how headers are sent when I want the user-agent to cache the response — in an effort to determine for what reason the ETag header is not being sent in response to Chrome's very first request for the resource. I had made a calculated effort to include the ETag header in my application logic.
I happen to be using PHP, so #Mehran's (the OP's) comment jumped-out at me (he/she says that calling header_remove() before sending the desired cache-related headers solves the problem).
Candidly, I was skeptical about this solution, because a) I was pretty sure that PHP wouldn't send any headers of its own by default (and it doesn't, given my configuration); and b) when I called var_dump(headers_list()); just before setting my custom caching headers in PHP, the only header set was one that I was setting intentionally just above:
header('Content-type: application/javascript; charset=utf-8');
So, having nothing to lose, I tried calling header_remove(); just before sending my custom headers. And much to my surprise, PHP began sending the ETag header all of a sudden!
2nd Epiphany: gzipping the response changes its hash
It then me hit me like a bag of bricks: by specifying the Content-type header in PHP, I was telling NGINX (the webserver I'm using) to GZIP the response once PHP hands it back to NGINX! To be clear, the Content-type that I was specifying was on NGINX's list of types to gzip.
For thoroughness, my NGINX GZIP settings are as follows, and PHP is wired-up to NGINX via php-fpm:
gzip on;
gzip_min_length 1;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript image/svg+xml;
gzip_vary on;
I pondered why NGINX might remove the ETag that I had sent in PHP when a "gzippable" content-type is specified, and came up with a now-obvious answer: because NGINX modifies the response body that PHP passes back when NGINX gzips it! This makes perfect sense; there is no point in sending the ETag when it's not going to match the response used to generate it. It's pretty slick that NGINX handles this scenario so intelligently.
I don't know if NGINX has always been smart enough not to compress response bodies that are uncompressed but contain ETag headers, but that seems to be what's happening here.
UPDATE: I found commentary that explains NGINX's behavior in this regard, which in turn cites two valuable discussions regarding this subject:
NGINX forum thread discussing the behavior.
Tangentially-related discussion in a project repository; see the comment Posted on Jun 15, 2013 by Massive Bird.
In the interest of preserving this valuable explanation, should it happen to disappear, I quote from Massive Bird's contribution to the discussion:
Nginx strips the Etag when gzipping a response on the fly. This is
according to spec, as the non-gzipped response is not byte-for-byte
comparable to the gzipped response.
However, NGINX's behavior in this regard might be considered slightly flawed in that the same spec
... also says there is a thing called weak Etags (an Etag value
prefixed with W/), and tells us it can be used to check if a response
is semantically equivalent. In that case, Nginx should not mess with
it. Unfortunately, that check never made it into the source tree [citation is now filled with spam, sadly]."
I'm unsure as to NGINX's current disposition in this regard, and specifically, whether or not it has added support for "weak" Etags.
So, What's the Solution?
So, what's the solution to getting ETag back into the response? Do the gzipping in PHP, so that NGINX sees that the response is already compressed, and simply passes it along while leaving the ETag header intact:
ob_start('ob_gzhandler');
Once I added this call prior to sending the headers and the response body, PHP began sending the ETag value with every response. Yes!
Other Lessons Learned
Here are some interesting tidbits gleaned from my research. This information is rather handy when attempting to test a server-side caching implementation, whether in PHP or another language.
Chrome, and its Developer Tools "Net" panel, behave differently depending on how the request is initiated.
If the request is "made fresh", e.g., by pressing Ctrl+F5, Chrome sends these headers:
Cache-Control: no-cache
Pragma: no-cache
and the server responds 200 OK.
If the request is made with only F5, Chrome sends these headers:
Pragma: no-cache
and the server responds 304 Not Modified.
Lastly, if the request is made by clicking on a link to the page you're already viewing, or placing focus into Chrome's address bar and pressing Enter, Chrome sends these headers:
Cache-Control: no-cache
Pragma: no-cache
and the server responds 200 OK (from cache).
While this behavior a bit confusing at first, if you don't know how it works, it's ideal behavior, because it allows one to test every possible request/response scenario very thoroughly.
Perhaps most confusing is that Chrome automatically inserts the Cache-Control: no-cache and Pragma: no-cache headers in the outgoing request when in fact Chrome is acquiring the responses from its cache (as evidenced in the 200 OK (from cache) response).
This experience was rather informative for me, and I hope others find this analysis of value in the future.
Your response headers include Cache-Control: no-store, no-cache; these prevent caching.
Remove those values (I think must-revalidate, post-check=0, pre-check=0 could/should be kept – they tell the browser to check with the server if there was a change).
And I would stick with Last-Modified alone (if the changes to your resources can be detected using this criterion alone) – ETag is a more complicated thing to handle (especially if you want to deal with it in your PHP script yourself), and Google PageSpeed/YSlow advise against this one too.
Posting this for future me...
I was having a similar problem, I was sending ETag in the response, but the HTTP client wasn't sending a If-None-Match header in subsequent requests (which was strange because it was the day before).
Turns out I was using http://localhost:9000 for development (which didn't use If-None-Match) - by switching to http://127.0.0.1:9000 Chrome1 automatically started sending the If-None-Match header in requests again.
Additionally - ensure Devtools > Network > Disable Cache [ ] is unchecked.
Chrome: Version 71.0.3578.98 (Official Build) (64-bit)
1 I can't find anywhere this is documented - I'm assuming Chrome was responsible for this logic.
Similar problem
I was trying to obtain Conditional GET request with If-None-Match header, having supplied proper Etag header, but to no avail in any browser I tried.
After a lot of trial I realize that browsers treat both GET and POST to the same path as a same cache candidate. Thus having GET with proper Etag was effectively canceled with immediate "POST" to the same path with Cache-Control:"no-cache, private", even though it was supplied by X-Requested-With:"XMLHttpRequest".
Hope this might be helpful to someone.
This was happening to me because I had set the cache size to be too small (via Group Policy).
It didn't happen in Incognito, which is what made me realize this might be the case.
Fixing that resolved the issue.
This happened to me due to 2 reasons:
My server didn't send etag response header. I updated my jetty web.xml to return etag by adding the following:
<init-param>
<param-name>etags</param-name>
<param-value>true</param-value>
</init-param>
The URL I called was for xml file. When I changed it to html file, chrome started sending "if-none-match" header!
I hope it helps someone

How to get mod_python site to allow clients to cache selected image content?

I have a small dynamic site implemented in mod_python. I inherited this, and while I have successfully made relatively minor changes to its content and logic, with HTTP caching I'm out of my depth. The site works fine already, so this isn't "the usual question" about how to disable caching for a dynamic site.
My problem is there is one large banner image on each page (the same image from same URL on each page) which accounts for ~90% of site bandwidth but which so far as I can tell isn't being cached; as I browse the site every time I click to a new page (or back to a previously visited one) there it is downloading it yet again.
If I wget the banner's image URL (to see the headers) I see:
$ wget -S http://example.com/site/gif?name=banner.gif
--2012-04-04 23:02:38-- http://example.com/site/gif?name=banner.gif
Resolving example.com... 127.0.0.1
Connecting to example.com|127.0.0.1|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 04 Apr 2012 22:02:38 GMT
Server: Apache/2.2.14 (Ubuntu)
Content-Location: gif.py
Vary: negotiate
TCN: choice
Set-Cookie: <blah blah blah>
Connection: close
Content-Type: image/gif
Length: unspecified [image/gif]
Saving to: `gif?name=banner.gif'
and the code which is serving it up isn't much more than
req.content_type = 'image/gif'
req.sendfile(fullname)
where fullname is a file-path munged from the request's name parameter.
My question is: is there some quick fix along the lines of setting an Expires: or Vary: field in the image's request response which will result in clients being less keen to repeatedly download it ?
The site is hosted on Ubuntu 10.04 and doesn't have any non-default apache mods enabled other than rewrite.
I note that most (not all) of the site pages' headers themselves do contain
Pragma: no-cache
Cache-Control: no-cache
Expires: -1
Vary: Accept-Encoding
(and the original site author has clearly thought about this as no-cache is applied selectively to non-static content pages). I don't know enough about caching to know whether this somehow poisons the included .gif IMG into being reloaded every time too though.
I don't know if my answer can help you or not, but I post it anyway.
Instead of serving image files from within python application, you can create another virtualhost within apache (on same server) just to serve static and image file. In your python application, you can embed image likes this
<img src="http://img.yoursite.com/banner.gif" alt="banner" />
With separate virtualhost, you can add various header to various content type using mode header, or add another caching layer for your static file.
Hope this help.