Where is externally loaded resource (JS/CSS) Cache Control TTL set? - browser-cache

We have an SEO individual telling us to change the Cache TTL properties for a number of elements on the page - including externally loaded JS and CSS files.
Can you do that, without setting up those resources to be served from your server? Aren't the cache TTL settings set by the server it comes from?

Yes, the cache TTL is set by the resource server, so you'd have to rehost them to influence them.

Related

Is Cache Invalidation on S3 a One Time Event or a Type of Policy?

I have a set of files on Amazon S3 cloudfront that I do not want cached. I was able to create an Invalidation on a single file that seemed to work. However, going forward the file seems to be cached again even though the Invalidation Entry is still present. Is the Invalidation a "One Time Event"? Does anyone know the exact details of how this works.
I would like a set of files to basically never be cached going forward.
Thanks for any suggestions and best practices advice.
Invalidation removes a cached entry from CloudFront's edge locations, but has no impact on whether or not the invalidated object(s) are cached again in the future. All else held equal: after you issue an invalidation, objects that were previously cached will be cached again on subsequent requests.
Before we explore the options, two definitions that are important to understand:
Cache behaviors are effectively routes with dedicated configurations applying only to requests matching the route (known as a path pattern)
Cache policies are instructions for how CloudFront will cache your responses. Cache policies are attached to one or more cache behaviors. The min and max TTL set a floor and ceiling on the value returned in your Cache-Control/Expires headers. The default TTL determines the length of time to cache a response when you don't provide a Cache-Control/Expires header.
Do you want to prevent caching for all files in your S3 bucket?
Attach the CachingDisabled cache policy (provided by CloudFront) to your default cache behavior.
Do you want to prevent caching for only certain files in your S3 bucket?
If the files you do not want to cache live in the same directory, create a cache behavior to match that path and use the CachingDisabled cache policy (provided by CloudFront) to prevent files in that directory from being cached. This instructs CloudFront to use a cache policy that does not cache responses when processing requests that match a specific path/route.
Set a Cache-Control header as metadata on the objects in S3 to instruct CloudFront not to cache, while caching the other objects.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html (Scroll down to Adding headers to your objects using the Amazon S3 console)

Memcached with Apache - handling stale objects

I'm using memcached and Apache with the following default configuration
CacheEnable socache /
CacheSocache memcache:IP:PORT
MemcacheConnTTL 30
What will the behavior be when 30 seconds expire and a request for the same URL comes in? Is there a way to configure the cache key? I.e. what are the info which make a request unique?
What if the server can't get an answer? (like timeout to fetch the newly updated object) Can it be configured to serve the old object?
Thanks
What will the behavior be when 30 seconds expire and a request for the same URL comes in
Apache would simply create a new connection to memcached. It doesn’t mean something would happen to the data stored in memcached
https://httpd.apache.org/docs/2.4/mod/mod_socache_memcache.html#memcacheconnttl
Set the time to keep idle connections with the memcache server(s)
alive (threaded platforms only).
If you need to control for how long an object will be stored in a cache, check out CacheDefaultExpire
Is there a way to configure the cache key
An url is used to build the key, but you can partially configure which parts of the url are used, check out
CacheIgnoreQueryString, CacheIgnoreURLSessionIdentifiers
I.e. what are the info which make a request unique
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cacheenable
The CacheEnable directive instructs mod_cache to cache urls at or
below url-string
Notice that not all requests can be cached, there’s a lot of rules on it
What if the server can't get an answer? Can it be configured to serve the old object
You need CacheStaleOnError
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cachestaleonerror
When the CacheStaleOnError directive is switched on, and when stale
data is available in the cache, the cache will respond to 5xx
responses from the backend by returning the stale data instead of the
5xx response

Varnish Cache - Initial cache of web pages

I have installed the Varnish cache with my Apache web server and configured them correctly. It works OK and I can now access my web pages though Varnish Cache.
The default behavior of varnish is to store copies of the pages served by the web server. The next time the same page is requested, Varnish will serve the copy instead of requesting the page from the Apache server.
And now comes my question: Is it possible to cache my entire website initially after setting up the Varnish cache, without the need to have a page to be accessed then store it on the cache? This is because, after varnish has been setup, the cache is initially empty, and it will require a page to be accessed in order to be available on the cache. Can this be done without having to access each page manually?
What you are looking for is a way of warming up the cache. You could use varnishreplay or a Web crawler, such as Wget or HTTrack to go through your site. Alternatively if you have a sitemap of your pages you could use that as a starting point and warm up the cache by looping over it and issuing requests on the pages using e.g. curl or wget.
Using varnishreplay requires you to first run varnishlog and gather a log of traffic before you can use it later for playing back the traffic and warming up the cache.
Wget, HTTrack etc. can be pointed to your home page and they will crawl their way through your site. Depending on the size and nature of your site this might not be practical though (for example if you use Ajax extensively).
Unless your pages take a very long time to load from the backend server (i.e. Apache), I wouldn't worry too much about warming up the cache. If the TTL for the cached content is high enough most of the visitors will only ever receive cached content anyway.
There is a much better way to do this which employs req.hash_always_miss and works with Varnish 3 and 4 (employs sitemap too). It warms up your cache and refreshes old pages without having to purge the cache. Full diagram, outline of how to configure it and 3 scripts for various use cases are outlined here http://www.htpcguides.com/smart-warm-up-your-wordpress-varnish-cache-scripts/ and are easily adapted for non-Wordpress sites.

CloudFront and mod_pagespeed - wrong content received

I am using CloudFront with mod_pagespeed running on the server.
When updating a CSS or flushing the cache I see problematic behavior, first refresh on the browser returns the original css (this is fine). When I refresh a second time I get the correct manipulated CSS file name but the content of the file from CloudFront is still the original and not the correct manipulated content.
Why would this happen?
Any idea how to fix this?
Update:
For some reason it just stopped happening... I don't know why.
SimonW, since your original post there has been a feature added to pagespeed (in March 2013 in version 1.2.24.1) to deal with this issue directly. The directive is enabled via the following:
Apache:
ModPagespeedRewriteDeadlinePerFlushMs deadline_value_in_milliseconds
Nginx:
pagespeed RewriteDeadlinePerFlushMs deadline_value_in_milliseconds;
The docs describe the directive as follows (emphasis mine):
When PageSpeed attempts to rewrite an uncached (or expired) resource
it will wait for up to 10ms per flush window (by default) for it to
finish and return the optimized resource if it's available. If it has
not completed within that time the original (unoptimized) resource is
returned and the optimizer is moved to the background for future
requests. The following directive can be applied to change the
deadline. Increasing this value will increase page latency, but might
reduce load time (for instance on a bandwidth-constrained link where
it's worth waiting for image compression to complete). Note that a
value less than or equal to zero will cause PageSpeed to wait
indefinitely.
So, if you specify a value of 0 for deadline_value_in_milliseconds you should always get the fully optimized page. I would caution that the latency can be high on this in some cases. I my case, I really wanted this behavior, even with the latency concern, because the content was to be cached on my CDN's edge servers and, thus, I wanted the most optimized version possible to be served to the CDN for caching.
This could happen if you have multiple backend servers and CloudFront is hitting a different one than the HTML request went through. In that case the resource was rewritten on the HTML server, but not on the other server. There is a short timeout and if the other server doesn't finish the rewrite in that time, it will just serve the original content with Cache-Control: private,max-age=300. It's possible CloudFront caches that for a little while (even though obviously it shouldn't), but then eventually re-requests the resource from your backend and gets the correctly rewritten version this time.

how to invalidate server side caching on apache

I've used apache's content expire times for caching of static assets. The only way I know of to force it to update the content on command is to restart apache. is there a better way?
Caching on our server, as it turns out, was data stored in a file, so we had to clear the cache file's contents, and voila, the cache was cleared.