configuring e-tags - optimization

I am using Yslow as a simple speed benchmarking tool and I came across a really confusing concept. The E-tag
So the main problem is : How do I configure E-tags? my grade in yslow says:
There are 19 components with misconfigured ETags
* http://thehotelinventory.com/media/js/jquery.min.js
* http://thehotelinventory.com/media/js/jquery.colorbox.min.js
* http://thehotelinventory.com/media/js/easyslider.min.js
* http://thehotelinventory.com/media/js/jquery.tools.min.js
* http://thehotelinventory.com/media/js/custom.min.js
* http://thehotelinventory.com/media/js/jquery.validate.min.js
* http://thehotelinventory.com/media/images/colorbox/loading_background.png
* http://thehotelinventory.com/media/images/productheaderbg.jpg
* http://thehotelinventory.com/media/images/buttons/field-bg. //etc
I browsed through the developer.yahoo.com guidelines on website optimization yet I can't really understand the thing with e-tags

This page shows how to disable ETags for IIS and this page shows how to do it for Apache.

Assuming you are running Apache...
You can set up a simple ETag like this:
FileETag MTime Size
If you have multiple servers, you want to disable ETags.
FileETag None
Put the above code in your httpd.conf (if you have access), otherwise you can put it in .htaccess.

Think of E-Tags as a sort of hash. When a browser makes a request for a resource, it sends along the E-tag of the file version it has cached. If the server decides that the files are similar enough (there are "strong" and "weak" versions of E-Tags so it's not always a simple comparison check) it will send a "304 Not Modified" response to the client, rather than the resource itself. This translates into a speed boost, since it prevents bandwidth from being wasted on unchanged files.
E-Tags are sent via HTTP headers.
There's a good example of E-Tags at work (and also how to disable them for Apache) here:
http://www.askapache.com/htaccess/apache-speed-etags.html

By removing the ETag header, you disable caches and browsers from being able to validate files, so they are forced to rely on your Cache-Control and Expires header.
Add these lines to .htaccess:
<ifModule mod_headers.c>
Header unset ETag
</ifModule>
FileETag None

Go straight to the source, YSlow provides guidance on all of it's advice, including how to configure ETags.

The best way to configure your ETags is to remove them. For static files, far-future expiration dates are a much better approach.
The way to remove them depends on the web server you're using. For IIS 7, it can be done with a simple HttpModule.

Entity tags are a feature of the HTTP protocol, see http://www.ietf.org/rfc/rfc2616.txt
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (section 14.24), If-None-Match (section 14.26), and
If-Range (section 14.27) header fields. The definition of how they
are used and compared as cache validators is in section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.

wikipedia is the man's best friend:)
http://en.wikipedia.org/wiki/HTTP_ETag
Basically a hash as ShZ said, that should be unique or almost for a file.

Related

GCP HTTP(S) Load Balancers will convert HTTP/1.1 header names to lowercase, could my code be affected?

Yesterday I received a mail from GCP telling about Load Balancer and upper and lowercases headers. A part of the message is:
After September 30, HTTP(S) Load Balancers will convert HTTP/1.1
header names to lowercase in the request and response directions;
header values will not be affected.
As header names are case-insensitive, this change will not affect
clients and servers that follow the HTTP/1.1 specification (including
all popular web browsers and open source servers). Similarly, as
HTTP/2 and QUIC protocols already require lowercase header names,
traffic arriving at load balancers over these protocols will not be
affected. However, we recommend testing projects that use custom
clients or servers prior to the rollout to ensure minimal impact.
Google talk specificly about request and response header names (not values) but, for example, is Google Load Balancer asking to me to replace a classic PHP redirection header "Location" into a lowercase "location"?
header("location: http://www.example.com/error/403");
Of course, the plan is to do what the standars says, but in many cases will be work that cant will be done before GCP deadline (September 30, 2019).
As is a standard, all modern browsers are prepared to use case insentive header names?
Should I be worry about files naming? (camelcases)
If is the case, there exist some mod in Apache (for example) to use meanwhile I change my code?
https://cloud.google.com/load-balancing/docs/https/
HTTP/1.1 specification specifies that HTTP headers are case insensitive. This only applies to the header name ("content-type") and not the value of the header ("application/json").
In the event that this new policy will cause problems for you, you can contact Google Support and opt-out temporarily.
For code that is correctly written and performs case-insensitive comparisons, you will not have problems. In most cases, you can use curl with various HTTP headers to test your backend code. Of course, completing a code walkthru is a good idea.
Example curl command:
curl --http1.1 -H “x-goog-downcase-all-headers: test” http://example.com/
Curl documentation for the --http1.1 command line option:
https://curl.haxx.se/docs/manpage.html
As is a standard, all modern browsers are prepared to use case
insentive header names?
Yes. This has been the norm for a long time.
Should I be worry about files naming? (camelcases)
No. The new changes do not affect values of HTTP headers, only the header names.
If is the case, there exist some mod in Apache (for example) to use
meanwhile I change my code?
No that I am aware of.

How do servers set HTTP response headers?

I sense I'm going to end up embarrassed for asking such a simple question, but I've been researching for days and can't any useful information.
What determines the HTTP response header that a server sends? If I control the server (if we need concreteness, let's say Apache), then what file can I edit to change the response header? For example, to set it to include Content-Length instead of Transfer-Encoding: chunked?
I'm aware that PHP and Java Servlets can be used to manipulate headers. The existence and content of response headers is fundamental to HTTP, though, so there ought to exist a way to edit these without using outside technology, no?
Certain headers are set automatically. They are part of the HTTP spec and the server takes care of them for you. That’s what a web server is for and why it differs from, say, an FTP server or a fileshare. For example Content-Length is easily calculated by the webserver and needs to be set so the server just does it.
Certain other headers are set based on config. Apache usually loads a main config file (often called httpd.conf or apache2.conf) but then, to save this file getting into a big unwieldy mess it often loads other files from within that. Those files are just text files with lines of configuration text to change behaviour of the server. Other web servers may use XML configuration files and may have a GUI to control the config (e.g. IIS)
So, for some of the headers, you might not explicitly set the header value but you basically configure the server and it then uses that config to figure out the appropriate headers to send. For example you can configure the server to gzip certain files (e.g. text files but not jpgs which are already compressed). In Apache this is handled by the mod_deflate module and the config options it gives you. Once the appropriate config is added to the server config, the server will do the necessarily processing (e.g. gzip the file or not depending on type) and then automatically add the headers. So an Apache module is basically something that changes how the server works and this may or may not the also set headers. Another example is for sending caching headers to tell the browser how long to cache files for. This is controlled by adding the mod_expiries module and all the config options it allows. While some of these headers could be hardcoded (e.g. Cache-Control) others depend on Apache doing calculations (e.g. Expires) so better to use the module to do this for you based on your config.
And finally you can explicitly set headers in your server (in Apache this is done using the mod_headers module). This is useful for new features added to browsers for example (e.g. HSTS, CSP or HPKP) where the server doesn't need to do anything but just add the header and the client (e.g. the web browser) knows what to do with them. You could add a JonahHuron header for example by adding this config to httpd.conf:
Header always set JonahHuron "Some Value"
As to whether that header is used depends entirely on the program receiving the response.

Browser cache, Last-Modified header

Let's assume there are two URLs:
http://example.com/myaccount?user=12345
http://example.com/myaccount?user=34567
As far as I understand the browser will cache them separately and will not use the Last-Modified header from the first request to revalidate the second.
Is it possible to force the browser to use the Last-Modified header in this case?
Could you please explain why does it work this way?
I think that in certain situations the web server will ignore the query string. You can try for instance on a out-of-the-box Apache server do curl -I http://example.com/styles.css | grep Last-Modified and then run the same command for http://example.com/styles.css?v=2. Assuming the file exists, you'll probably get the same Last-Modified timestamp.
It might be the browsers (my guess is that they do that) who consider the ?v=2 as a different file or a file with an updated content. Also, I think that most Content Delivery Networks will be configured that way which allows to serve a fresh copy of a file if its query string differs.
It's an interesting question anyway. I'll read up on it more. I hope that somebody might explain this more closely here.

HTTP Content-type header for cached files

Using Apache with mod_rewrite, when I load a .css or .js file and view the HTTP headers, the Content-type is only set correctly the first time I load it - subsequent refreshes are missing Content-type altogether and it's creating some problems for me.
I can get around this by appending a random query string value to the end of each filename, eg. http://www.site.com/script.js?12345
However, I don't want to have to do that, since caching is good and all I want is for the Content-type to be present. I've tried using a RewriteRule to force the type but still didn't solve the problem. Any ideas?
Thanks, Brian
The answer depends on information you've not provided here, specifically where are you seeing these headers?
Unless it's from sniffing the network traffic between the browser and client, then you can't be sure if you are looking at a real request to the server or a request which has been satisfied from the cache. Indeed changing the URL as you describe is a very simple way to force a reload from the server rather than a load from the cache.
I don't think its as broken as you seem to. Fire up Wireshark and see for yourself - or just disable caching for these content types.
C.

How do I configure apache - that has not got mod_expires or mod_headers - to send expiry headers?

The webserver hosting my website is not returning last-modified or expiry headers. I would like to rectify this to ensure my web content is cacheable.
I don't have access to the apache config files because the site is hosted on a shared environment that I have no control over. I can however make configurations via an .htaccess file. The server - apache 1.3 - is not configured with mod_expires or mod_headers and the company will not install these for me.
With these limitations in mind, what are my options?
Sorry for the post here. I recognise this question is not strictly a programming question, and more a sys admin question. When serverfault is public I'll make sure I direct questions of this nature there.
What sort of content? If static (HTML, images, CSS), then really the only way to attach headers is via the front-end webserver. I'm surprised the hosting company doesn't have mod_headers enabled, although they might not enable it for .htaccess. It's costing them more bandwidth and CPU (ie, money) to not cache.
If it's dynamic content, then you'll have control when generating the page. This will depend on your language; here's an example for PHP (it's from the PHP manual, and is a bad example, as it should also set the response code):
if (!headers_sent()) {
header('Location: http://www.example.com/');
exit;
}
Oh, and one thing about setting caching headers: don't set them for too long a duration, particularly for CSS and scripts. You may not think you want to change these, but you don't want a broken site while people still have the old content in their browsers. I would recommend maximum cache settings in the 4-8 hour range: good for a single user's session, or a work day, but not much more.