Can browser caching be controlled by HTTP headers alone w/o using hash names for asset files? - http-headers

I'm reading it in Webpack docs:
The way it works has a pitfall: if we don’t change filenames of our resources when deploying a new version, browser might think it hasn’t been updated and client will get a cached version of it.
I'm curious, is it mandatory to use this mechanism with ugly file names main.55e783391098c2496a8f.js for assets in order to inform a browser that an asset file has changed?
Can it be controlled by HTTP headers only? There are multiple HTTP headers in the standard to control how browser caches assets, like:
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Wed, 24 Aug 2020 18:32:02 GMT
Last-Modified: Tue, 15 Nov 2024 12:45:26 GMT
ETag: x234dff
max-age: 12345
So can I use those headers alone? Or do I still have to bother about hash parts in file names main.55e783391098c2496a8f.js?

When user agent opens a page it must always get correct version of a source code. You have two options to achieve this:
Set Cache-Control, Expires and strong validator (ETag) response headers . This way you instruct user agent to perform relatively lightweight conditional request on each page load
Embed version in source code file URL and set Cache-Control and Expires response headers. This way you instruct user agent to cache source code with particural version forever
For more information check HTTP Caching article by Ilya Grigorik, HTTP conditional requests MDN page and this StackOverflow answer about resource revalidation.

Related

How to serve a wbn (WebPackage/WebBundle) file from a web server?

does anyone know how to serve a web bundle so that it loads, rather than just downloading as a file?
Some disambiguation: There is a format called WebPackage (not to be confused with webpack), also called a Web Bundle. Files typically have the .wbn suffix. It contains html and js files and can be used to view websites offline. Useful for e.g. archiving websites or making websites that work well with intermittent network access. Download the file once, and you have all the assets you need for at last basic operation of the site.
The standard on how to serve a .wbn file is here:
https://wicg.github.io/webpackage/draft-yasskin-wpack-bundled-exchanges.html
However when I add the required headers in the web server, the .wbn file is just downloaded. If I drag the downloaded file onto my browser (google-chrome), the file is displayed as the website it contains, so unless there is some very subtle bug in there I believe that the format of the bundle is OK.
Here is a sample request:
Request URL: http://localhost/bundle/www-signed.wbn
Request Method: GET
Status Code: 200 OK
Remote Address: [::1]:80
Referrer Policy: strict-origin-when-cross-origin
and the server response:
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 4300
Content-Type: application/webbundle <-- Required by the standard
Date: Thu, 02 Sep 2021 12:00:24 GMT
ETag: "612ef7cb-10cc"
Last-Modified: Wed, 01 Sep 2021 03:47:23 GMT
Server: nginx/1.18.0 (Ubuntu)
X-Content-Type-Options: nosniff <-- required by the standard
If anyone has this working on a website or knows how to do it, I would love to have a look.
I had the same problem that the wbn file was just downloaded instead of executed.
I had to enable the web bundles feature even though my chrome version is 96+

Cache control: max-age settings

This below is the HTTP header of my site. I need to know:
what is Cache-Control: max-age=259200?
Do you think that a so high value 259200 would prevent Googlebot to index my pages? Should I lower that value?
We talk about a blog of information, publishing articles every day.
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 25 Feb 2017 15:07:53 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 123783
Connection: keep-alive
X-Powered-By: PHP/7.0.14
X-Pingback: http://www.example.com/xmlrpc.php
Link: <http://www.example.com/wp-json/>; rel="https://api.w.org/", <http://www.example.com/?p=1427>; rel=shortlink
Vary: Accept-Encoding
X-Powered-By: PleskLin
Cache-Control: max-age=259200
Expires: Tue, 28 Feb 2017 15:07:52 GMT
According to https://developer.mozilla.org/ru/docs/Web/HTTP/Headers/Cache-Control
max-age=<seconds>
Specifies the maximum amount of time a resource will be considered fresh. Contrary to Expires, this directive is relative to the time of the request.
In other words this is time interval for which any client such as browser or proxy server might use cached version.
How exactly it affects google I'm not sure. Googlebot might take it into account in some way (but I doubt they blindly trust you). This might be an issue if you have it on your main page because the bot might not come back for 3 days (259200 seconds = 3 days) to see new articles/posts. The same goes for new comments. Still if google ignores your site for much longer than that, the issue is not with caching but somewhere else.
You might also consider looking into Google Webmaster Tools. Start at https://support.google.com/webmasters/answer/34397/?hl=en and https://support.google.com/webmasters/answer/6065812/?hl=en

Analysis of HTTP header

Hello I want to analyze & understand at first place and then optimize the HTTP header responses of my site. What I get when I fetch as Google from webmasters is:
HTTP/1.1 200 OK
Date: Fri, 26 Oct 2012 17:34:36 GMT // The date and time that the message was sent
Server: Apache // A name for the server
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM" // P3P Does an e-commerse store needs this?
ETag: c4241ffd9627342f5f6f8a4af8cc22ed // Identifies a specific version of a resource
Content-Encoding: gzip // The type of encoding used on the data
X-Content-Encoded-By: Joomla! 1.5 // This is obviously generated by Joomla, there wont be any issue if I just remove it, right?
Expires: Mon, 1 Jan 2001 00:00:00 GMT // Gives the date/time after which the response is considered stale: Since the date is set is already expired, this creates any conflicts?
Cache-Control: post-check=0, pre-check=0 // This means site is not cached? or what?
Pragma: no-cache // any idea?
Set-Cookie: 5d962cb89e7c3329f024e48072fcb9fe=9qdp2q2fk3hdddqev02a9vpqt0; path=/ // Why do I need to set cookie for any page?
Last-Modified: Fri, 26 Oct 2012 17:34:37 GMT
X-Powered-By: PleskLin // Can this be removed?
Cache-Control: max-age=0, must-revalidate // There are 2 cache-controls, this needs to be fixed right? which one is preffected? max-age=0, must-revalidate? post-check=0, pre-check=0?
Keep-Alive: timeout=3, max=100 // Whats that?
Connection: Keep-Alive
Transfer-Encoding: chunked // This shouldnt be deflate or gzip ??
Content-Type: text/html
post-check
Defines an interval in seconds after which an entity must be checked for freshness. The check may happen after the user is shown the resource but ensures that on the next roundtrip the cached copy will be up-to-date.
http://www.rdlt.com/cache-control-post-check-pre-check.html
pre-check
Defines an interval in seconds after which an entity must be checked for freshness prior to showing the user the resource.
Pragma: no-cache header field is an HTTP/1.0 header intended for use in requests. It is a means for the browser to tell the server and any intermediate caches that it wants a fresh version of the resource, not for the server to tell the browser not to cache the resource. Some user agents do pay attention to this header in responses, but the HTTP/1.1 RFC specifically warns against relying on this behavior.
Set-Cookie: When the user browses the same website in the future, the data stored in the cookie can be retrieved by the website to notify the website of the user's previous activity.[1] Cookies were designed to be a reliable mechanism for websites to remember the state of the website or activity the user had taken in the past. This can include clicking particular buttons, logging in, or a record of which pages were visited by the user even months or years ago.
X-Powered-By: specifies the technology (e.g. ASP.NET, PHP, JBoss) supporting the web application.This comes under common non-standard response headers and can be removed.
Keep-Alive It is meant to reduce the number of connections for a website. Instead of creating a new connection for each image/css/javascript in a webpage many requests will be made re-using the same connection.
Transfer-Encoding: The form of encoding used to safely transfer the entity to the user. Currently defined methods are: chunked, compress, deflate, gzip, identity.

Must-revalidate headers of this request wrong?

I noticed that chrome cached a video file. I replaced it with another one on the server and chrome kept serving the old one from cache (using JW flash player 5)
The headers of the request look like this:
joe#joe-desktop:~$ wget -O - -S --spider http://www.2xfun.de/files_geheimhihi14/20759.mp4
Spider mode enabled. Check if remote file exists.
--2011-05-15 22:40:56-- http://www.2xfun.de/files_geheimhihi14/20759.mp4
Resolving www.2xfun.de... 213.239.214.112
Connecting to www.2xfun.de|213.239.214.112|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Sun, 15 May 2011 20:40:56 GMT
Server: Apache
Last-Modified: Sun, 15 May 2011 20:37:59 GMT
ETag: "89b38-3bb227-4a35683b477c0"
Accept-Ranges: bytes
Content-Length: 3912231
Cache-Control: max-age=29030400, public, must-revalidate
Expires: Sun, 15 Apr 2012 20:40:56 GMT
Connection: close
Content-Type: video/mp4
Length: 3912231 (3.7M) [video/mp4]
Remote file exists.
I am using mod_headers and mod_expires in apache2 like this:
<FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav|mp4)$">
ExpiresDefault A29030400
Header append Cache-Control "public, must-revalidate"
</FilesMatch>
Did I spell revalidate wrong or something?
edit:
To make the use case clear: I want the files to be cached, because they are rather big and I want to save bandwidth. But on the other hand I want the files to be re-validated. So the client does a HEAD request and checks whether the content has changed (thats what the etag is for), and only re-fetches if necessary.
Your problem is that must-revalidate only kicks in once a cache entry is no longer fresh, but you've marked the response as cacheable for 29 million seconds. 'Cache-Control: max-age=0, must-revalidate' may be closer to what you want, if you want to allow caching but require revalidation on each use.

Client- headers in a HTTP request

Lately, I've been seeing headers such as:
Client-Date: Fri, 10 Dec 2010 15:20:02 GMT
Client-Peer: 64.64.64.64:80
Client-Response-Num: 1
On a apache 2 server with php 5.3.
What is the purpose / meaning of these headers, and where/when are they inserted into the response?
These HTTP headers are generated by libwww-perl - LWP::UserAgent (makes http requests / simulates web user agent) for debugging purposes and their meaning is exactly what their names imply.
References:
http://www.google.com/codesearch?hl=en&lr=&q=Client-Date+Client-Peer+Client-Response-Num&sbtn=Search
http://metacpan.org/pod/LWP::UserAgent
http://metacpan.org/pod/HTTP::Proxy
They are non-standard http response headers and they can have any meaning.
Are you, by any chance, using HTTP::Proxy? If you turn on [client_headers]1 there, you get these headers.