Cache Control Question - apache

If I set this for cache control on my site:
Header unset Pragma
FileETag None
Header unset ETag
# 1 YEAR
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$">
Header set Cache-Control "public"
Header set Expires "Thu, 15 Apr 2010 20:00:00 GMT"
Header unset Last-Modified
</FilesMatch>
# 2 HOURS
<FilesMatch "\.(html|htm|xml|txt|xsl)$">
Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>
# CACHED FOREVER
# MOD_REWRITE TO RENAME EVERY CHANGE
<FilesMatch "\.(js|css)$">
Header set Cache-Control "public"
Header set Expires "Thu, 15 Apr 2010 20:00:00 GMT"
Header unset Last-Modified
</FilesMatch>
...then what if I update any css or image or other files, will the users browser still use the caches version until it expires (a year later)?
Thanks

Your css, js and image files will never be cached, as you are setting a date in the past.
I assume this is a mistake, and you intended to set it for a year in the future, this is one reason to favour max-age over expires.
If this was the case, then your images will be cached up to a year. It's allowable to drop something out of the cache at any time, for example to clean out less-frequently used entries to reduce the size on disk that the cache is taking up.
There are two possible approaches to deal with the possibility of reducing the risk of staleness. One is to set a much lower expiry time, and use e-tags and modification dates so that after that expiry time has past you can send a 304 if there is no change, so the server need send only a few bytes rather than the entire entity.
The other is to keep the expiry at a year, but to change the URI used when you change. This can be useful in the case of e.g. a large file that is used on almost every page on your site. It requires that you change all references to that resource when it does change (because you are essentially changing to use a new resource), which can be fiddly and therefore is only advised as an optimisation in a few hotspot cases. If a file ignores query attributes (e.g. it's just served straight from a file) the browser won't know that, hence you could use something like /scripts/bigScript.js?version=1.2.3 and then change to /scripts/bigScript.js?version=1.2.4 when you change bigScript.js. This will have no effect on bigScript.js, but will cause the browser to get a new file, as for all it knows it's a completely different resource.

Yes, a response with an expiration date in the future will be considered as fresh until the expiration date:
The Expires entity-header field gives the date/time after which the response is considered stale. […]
The presence of an Expires header field with a date value of some time in the future on a response that otherwise would by default be non-cacheable indicates that the response is cacheable, unless indicated otherwise by a Cache-Control header field (section 14.9).
Note that an expiration date more than one year in the future may be interpreted as never expires:
To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future.
So if a cache has the response stored, it will probably take the response from the cache even without revalidating the cached response before sending it.
Now if you change a resource that is already stored in caches and still fresh, there is no way to invalidate them:
[…] although they might continue to be "fresh," they do not accurately reflect what the origin server would return for a new request on that resource.
There is no way for the HTTP protocol to guarantee that all such cache entries are marked invalid. For example, the request that caused the change at the origin server might not have gone through the proxy where a cache entry is stored.
This is the reason for why such never expiring resources use a unique version number in the URL (e.g. style-v123.css) that is changed with each update. This is also what I recommend in this case.
By the way, declaring the response with Cache-Control as public doesn’t do anything in this case. This is only used when a response that required authorization should be cacheable:
public  –  Indicates that the response MAY be cached by any cache, even if it would normally be non-cacheable or cacheable only within a non- shared cache. (See also Authorization, section 14.8, for additional details.)
For further information on HTTP caching:
HTTP 1.1 specification – Caching in HTTP
Mark Nottingham’s Caching Tutorial

Related

Apache: difference between "Header always set" and "Header set"?

Questions
What is the difference between Header always set and Header set in Apache?
That is, what does the always keyword change about the circumstances under which the header is set?
Should I always set my headers using always?
Is there any reason not to?
Background
I've seen...
Header always set X-Frame-Options DENY
...as well as...
Header always set Access-Control-Allow-Headers "*"
...and I sometimes hear that the presence of the always keyword ensures that the header is properly set, or that it's simply better to include the always keyword in general. However, I have never found a clear, definitive answer for why that is the case.
I've already checked the Apache docs for mod_headers, which only briefly mention always:
When your action is a function of an existing header, you may need to specify a condition of always, depending on which internal table the original header was set in. The table that corresponds to always is used for locally generated error responses as well as successful responses. Note also that repeating this directive with both conditions makes sense in some scenarios because always is not a superset of onsuccess with respect to existing headers:
You're adding a header to a locally generated non-success (non-2xx) response, such as a redirect, in which case only the table corresponding to always is used in the ultimate response.
You're modifying or removing a header generated by a CGI script, in which case the CGI scripts are in the table corresponding to always and not in the default table.
You're modifying or removing a header generated by some piece of the server but that header is not being found by the default onsuccess condition.
As far as I can tell, this means that Header set always ensures that the header is set even on non-200 pages. However, my HTTP headers set with Header set have always seemed to apply just fine on my 404 pages and such. Am I misunderstanding something here?
FWIW, I've found SO posts like What is the difference between "always" and "onsuccess" in Apache's Header config?, but the only answer there didn't really explain it clearly for me.
Thanks very much,
Caleb
What is the difference between Header always set and Header set in Apache?
As the quoted bit from the manual says, without 'always' your additions will only go out on succesful responses.
But this also includes "successfully" forward errors via mod_proxy and perhaps other similar handlers that roughly act like proxies. What generates your 404s that you found to disagree with the manual? A 404 on a local file certainly behaves as the quoted bit describes.
That is, what does the always keyword change about the circumstances under which the header is set?
Apache's API keeps two lists associated with each request, headers and err_headers. The former is not used if the server encounters an error processing the request the latter is.
Should I always set my headers using always?
It depends on their significance. Let's say you were setting Cache-Control headers that were related to what you had expected to serve for some resource. Now let's say you were actually serving something like a 400 or 502. You might not want that cached!
Is there any reason not to?
See above.
-/-
There is also a bit in the manual you did not quote which explains the proxy or CGI of an error code but not for one which Apache is generating an error response for:
The optional condition argument determines which internal table of
responses headers this directive will operate against. Despite the
name, the default value of onsuccess does not limit an action to
responses with a 2xx status code.
Headers set under this condition are still used when, for example, a
request is successfully proxied or generated by CGI, even when they
have generated a failing status code.

Understanding caching strategies of a dynamically generated search page

While studying the caching strategies adopted by various search engine websites and Stackoverflow itself, I can't help but notice the subtle differences in the response headers:
Google Search
Cache-Control: private, max-age=0
Expires: -1
Yahoo Search
Cache-Control: private
Connection: Keep-Alive
Keep-Alive: timeout=60, max=100
Stackoverflow Search
Cache-Control: private
There must be some logical explanation behind the settings adopted. Can someone care to explain the differences so that everyone of us can learn and benefit?
From RFC2616 HTTP/1.1 Header Field Definitions, 14.9.1 What is Cacheable:
private
Indicates that all or part of the response message is intended for a single
user and MUST NOT be cached by a shared cache. This allows an origin server
to state that the specified parts of the response are intended for only one
user and are not a valid response for requests by other users. A private
(non-shared) cache MAY cache the response.
max-age=0 means that it may be cached up to 0 seconds. The value zero would mean that no caching should be performed.
Expires=-1 should be ignored when there's a max-age present, and -1 is an invalid date and should be parsed as a value in the past (meaning already expired).
From RFC2616 HTTP/1.1 Header Field Definitions, 14.21 Expires:
Note: if a response includes a Cache-Control field with the max-age directive
(see section 14.9.3), that directive overrides the Expires field
HTTP/1.1 clients and caches MUST treat other invalid date formats, especially
including the value "0", as in the past (i.e., "already expired").
The Connection: Keep-Alive and Keep-Alive: timeout=60, max=100 configures settings for persistent connections. All connections using HTTP/1.1 are persistent unless otherwise specified, but these headers change the actual timeout values instead of using the browsers default (which varies greatly).

What's the default cache expiration time for NSURLRequests?

I'm using a NSURLRequest to check for available data updates. Today I noticed, that NSURLRequest caches the request by default. Even after several hours no new request was sent to the server. Now I'm wondering what the default behavior of this cache is. When will the cached data be stale and a new request send to the server?
The data is a static file and the server does not send explicit cache control headers:
HTTP/1.1 200 OK
Date: Fri, 13 Apr 2012 08:51:13 GMT
Server: Apache
Last-Modified: Thu, 12 Apr 2012 14:02:17 GMT
ETag: "2852a-64-4bd7bcdba2c40"
Accept-Ranges: bytes
Content-Length: 100
Vary: Accept-Encoding,User-Agent
Content-Type: text/plain
P.S.: The new version of my app sets an explicit caching policy, so that this isn't a problem anymore, but I'm curious what the default behavior is.
Note: here specifies how this should work in detail
If there is no cache then fetch the data.
If there is a cache, then check the loading scheme
a. if re-validation is specified, check the source for changes
b. if re-validation is not specified then fetch from the local cache as per 3)
If re-validation is not specified the local cache is checked to see if it is recent enough.
a. if the cache is not stale then it pulls the data from the cache
b. if the data is stale then it re-validates from the source
From here:
The default cache policy for an NSURLRequest instance is NSURLRequestUseProtocolCachePolicy. The NSURLRequestUseProtocolCachePolicy behavior is protocol specific and is defined as being the best conforming policy for the protocol
From here:
If an NSCachedURLResponse does not exist for the request, then the data is fetched from
the originating source. If there is a cached response for the request, the URL loading
system checks the response to determine if it specifies that the contents must be
revalidated. If the contents must be revalidated a connection is made to the originating
source to see if it has changed. If it has not changed, then the response is returned
from the local cache. If it has changed, the data is fetched from the originating source.
If the cached response doesn’t specify that the contents must be revalidated, the maximum
age or expiration specified in the response is examined. If the cached response is recent
enough, then the response is returned from the local cache. If the response is determined
to be stale, the originating source is checked for newer data. If newer data is
available, the data is fetched from the originating source, otherwise it is returned from
the cache.
Other options listed here.
It used to do a fresh request after i restart my app. By restart, i do it by pressing home button twice which shows list of running apps, kill my app and do a fresh start again by clicking on app icon.
Also it is possible to disable caching if you want to. override this method of NSURLProtocol delegate
//disable caching since our files are stored in multiple servers and false response caching can cause issues.
-(NSCachedURLResponse *)connection:(NSURLConnection *)connection
willCacheResponse:(NSCachedURLResponse *)cachedResponse
{
NSCachedURLResponse *newCachedResponse = cachedResponse;
//if ([[[[cachedResponse response] URL] scheme] isEqual:#"https"])
{
newCachedResponse = nil;
}
return newCachedResponse;
}
According to this article: http://blackpixel.com/blog/2012/05/caching-and-nsurlconnection.html , if you are using NSURLRequestUseProtocolCachePolicy and the server does not return either expiration or max-age, the default cache time interval is 6 - 24 hours. So Be careful about this condition. It is a better practice to set max-age or expiration when using NSURLRequestUseProtocolCachePolicy.

What's the difference Expires and Cache-control:max-age?

Could you tell me the difference of Expires and Cache-control:max-age?
Expires was defined in the HTTP/1.0 specifications, and Cache-Control in the HTTP/1.1 specifications.
I would suggest defining both so you cater to both, the older clients that only understand HTTP/1.0, and the newer ones.
Expires was specified in HTTP 1.0 specification as compared to Cache-Control: max-age, which was introduced in the early HTTP 1.1 specification. The value of the Expires header has to be in a very specific date and time format, any error in which will make your resources non-cacheable. The Cache-Control: max-age header's value when sent to the browser is in seconds, the chances of any error happening in which is quite less.
Since you can specify only one of the two headers in your web.config file, I'd suggest going with the Cache-Control: max-age header because of the flexibility it offers in setting a relative timespan from the present date to a date in the future. You can basically set and forget, as compared to the case with Expires header, whose value you will have to remember to update at least once every year. And if you set both headers programmatically from within your code, know that the value of Cache-Control: max-age header will take precedence over Expires header. So, something to keep in mind there as well.
From Setting Expires and Cache-Control: max-age headers for static resources in ASP.NET

Does the `Expires` HTTP header needs to be consistent across multiple cold-cache requests?

I'm implementing a custom web server of a kind. And am looking into adding an Expires header support. However, I'm a little unsure of how exactly to implement it.
If multiple cold-cache requests are being made to the same unchanged resource on the server and the server returned different Expires header (say it uses relative time to calculate the exact value of the Expires date e.g. +6 hours from the request time), does that invalidate the cache on all the proxy servers in-between as well? Or is it impossible to happen (per the spec)?
Does the Expires HTTP header needs to be consistent across multiple cold-cache requests?
Ok, never mind, found the relevant information under the Cache Revalidation and Reload Controls section of the HTTP Spec
Basically, you can serve all the different validators you want but you must be aware that in such case proxies may have a set of different validators from their own cache and from various user agents communicating with the proxy. They may choose to send one to you and that might not be the correct or the most optimal one for the end-users. However, a "best approach" has been suggested in the spec.
I suppose this should covers Expires headers as well as ETags, Cache-Control and whatnot.
Here's the relevant excerpt, in case anyone's interested:
When an intermediate cache is forced,
by means of a max-age=0 directive, to
revalidate its own cache entry, and
the client has supplied its own
validator in the request, the supplied
validator might differ from the
validator currently stored with the
cache entry. In this case, the cache
MAY use either validator in making its
own request without affecting semantic
transparency. However, the choice of
validator might affect performance.
The best approach is for the
intermediate cache to use its own
validator when making its request. If
the server replies with 304 (Not
Modified), then the cache can return
its now validated copy to the client
with a 200 (OK) response. If the
server replies with a new entity and
cache validator, however, the
intermediate cache can compare the
returned validator with the one
provided in the client's request,
using the strong comparison function.
If the client's validator is equal to
the origin server's, then the
intermediate cache simply returns 304
(Not Modified). Otherwise, it returns
the new entity with a 200 (OK)
response. If a request includes the
no-cache directive, it SHOULD NOT
include min-fresh, max-stale, or
max-age.