Analysis of HTTP header - http-headers

Hello I want to analyze & understand at first place and then optimize the HTTP header responses of my site. What I get when I fetch as Google from webmasters is:
HTTP/1.1 200 OK
Date: Fri, 26 Oct 2012 17:34:36 GMT // The date and time that the message was sent
Server: Apache // A name for the server
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM" // P3P Does an e-commerse store needs this?
ETag: c4241ffd9627342f5f6f8a4af8cc22ed // Identifies a specific version of a resource
Content-Encoding: gzip // The type of encoding used on the data
X-Content-Encoded-By: Joomla! 1.5 // This is obviously generated by Joomla, there wont be any issue if I just remove it, right?
Expires: Mon, 1 Jan 2001 00:00:00 GMT // Gives the date/time after which the response is considered stale: Since the date is set is already expired, this creates any conflicts?
Cache-Control: post-check=0, pre-check=0 // This means site is not cached? or what?
Pragma: no-cache // any idea?
Set-Cookie: 5d962cb89e7c3329f024e48072fcb9fe=9qdp2q2fk3hdddqev02a9vpqt0; path=/ // Why do I need to set cookie for any page?
Last-Modified: Fri, 26 Oct 2012 17:34:37 GMT
X-Powered-By: PleskLin // Can this be removed?
Cache-Control: max-age=0, must-revalidate // There are 2 cache-controls, this needs to be fixed right? which one is preffected? max-age=0, must-revalidate? post-check=0, pre-check=0?
Keep-Alive: timeout=3, max=100 // Whats that?
Connection: Keep-Alive
Transfer-Encoding: chunked // This shouldnt be deflate or gzip ??
Content-Type: text/html

post-check
Defines an interval in seconds after which an entity must be checked for freshness. The check may happen after the user is shown the resource but ensures that on the next roundtrip the cached copy will be up-to-date.
http://www.rdlt.com/cache-control-post-check-pre-check.html
pre-check
Defines an interval in seconds after which an entity must be checked for freshness prior to showing the user the resource.
Pragma: no-cache header field is an HTTP/1.0 header intended for use in requests. It is a means for the browser to tell the server and any intermediate caches that it wants a fresh version of the resource, not for the server to tell the browser not to cache the resource. Some user agents do pay attention to this header in responses, but the HTTP/1.1 RFC specifically warns against relying on this behavior.
Set-Cookie: When the user browses the same website in the future, the data stored in the cookie can be retrieved by the website to notify the website of the user's previous activity.[1] Cookies were designed to be a reliable mechanism for websites to remember the state of the website or activity the user had taken in the past. This can include clicking particular buttons, logging in, or a record of which pages were visited by the user even months or years ago.
X-Powered-By: specifies the technology (e.g. ASP.NET, PHP, JBoss) supporting the web application.This comes under common non-standard response headers and can be removed.
Keep-Alive It is meant to reduce the number of connections for a website. Instead of creating a new connection for each image/css/javascript in a webpage many requests will be made re-using the same connection.
Transfer-Encoding: The form of encoding used to safely transfer the entity to the user. Currently defined methods are: chunked, compress, deflate, gzip, identity.

Related

Cache control: max-age settings

This below is the HTTP header of my site. I need to know:
what is Cache-Control: max-age=259200?
Do you think that a so high value 259200 would prevent Googlebot to index my pages? Should I lower that value?
We talk about a blog of information, publishing articles every day.
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 25 Feb 2017 15:07:53 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 123783
Connection: keep-alive
X-Powered-By: PHP/7.0.14
X-Pingback: http://www.example.com/xmlrpc.php
Link: <http://www.example.com/wp-json/>; rel="https://api.w.org/", <http://www.example.com/?p=1427>; rel=shortlink
Vary: Accept-Encoding
X-Powered-By: PleskLin
Cache-Control: max-age=259200
Expires: Tue, 28 Feb 2017 15:07:52 GMT
According to https://developer.mozilla.org/ru/docs/Web/HTTP/Headers/Cache-Control
max-age=<seconds>
Specifies the maximum amount of time a resource will be considered fresh. Contrary to Expires, this directive is relative to the time of the request.
In other words this is time interval for which any client such as browser or proxy server might use cached version.
How exactly it affects google I'm not sure. Googlebot might take it into account in some way (but I doubt they blindly trust you). This might be an issue if you have it on your main page because the bot might not come back for 3 days (259200 seconds = 3 days) to see new articles/posts. The same goes for new comments. Still if google ignores your site for much longer than that, the issue is not with caching but somewhere else.
You might also consider looking into Google Webmaster Tools. Start at https://support.google.com/webmasters/answer/34397/?hl=en and https://support.google.com/webmasters/answer/6065812/?hl=en

Can browser caching be controlled by HTTP headers alone w/o using hash names for asset files?

I'm reading it in Webpack docs:
The way it works has a pitfall: if we don’t change filenames of our resources when deploying a new version, browser might think it hasn’t been updated and client will get a cached version of it.
I'm curious, is it mandatory to use this mechanism with ugly file names main.55e783391098c2496a8f.js for assets in order to inform a browser that an asset file has changed?
Can it be controlled by HTTP headers only? There are multiple HTTP headers in the standard to control how browser caches assets, like:
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Wed, 24 Aug 2020 18:32:02 GMT
Last-Modified: Tue, 15 Nov 2024 12:45:26 GMT
ETag: x234dff
max-age: 12345
So can I use those headers alone? Or do I still have to bother about hash parts in file names main.55e783391098c2496a8f.js?
When user agent opens a page it must always get correct version of a source code. You have two options to achieve this:
Set Cache-Control, Expires and strong validator (ETag) response headers . This way you instruct user agent to perform relatively lightweight conditional request on each page load
Embed version in source code file URL and set Cache-Control and Expires response headers. This way you instruct user agent to cache source code with particural version forever
For more information check HTTP Caching article by Ilya Grigorik, HTTP conditional requests MDN page and this StackOverflow answer about resource revalidation.

Cache-control in response headers

I have this server response for a file that I want not to be cached from the browsers. The response has two cache control headers.
Cache-Control: no-cache, no-store, must-revalidate (which is what I want and)
Cache-Control: private (which is appended by default from netscaler and the server side guys tell me they cannot remove it)
My question is which one will prevail?
HTTP/1.1 200 OK
Date: Mon, 20 Jan 2014 15:29:53 GMT
Server: Apache
Last-Modified: Fri, 17 Jan 2014 16:50:54 GMT
ETag: "682-4f02d58643780"
Accept-Ranges: bytes
Cteonnt-Length: 1666
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTR STP IND DEM"
Keep-Alive: timeout=5, max=1000
Connection: Keep-Alive
Content-Type: text/javascript
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Cache-Control: private
Content-Encoding: gzip
Content-Length: 716
As per RFC2616, setting the same header multiple times should be equivalent to setting it once with all values separated by comas.
Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.
So in your case, it would be equivalent to
Cache-Control: no-cache, no-store, must-revalidate, private
private will just further prevent the response to be cached by a proxy between the server and the browser, so it shouldn't have any adverse effect.
Having researched a similar issue for a client, I can tell you from my own experience that, if this content is being served through a Citrix NetScaler and compression has been enabled, anything with a content-type of text will have a Cache-Control: private value set by the NetScaler. How you're getting two entries is beyond me. However, Yolanda's answer is most likely correct. The only reason for the caveat is that RFC2616 was superseded in 2014. (See https://www.w3.org/Protocols/rfc2616/rfc2616.html)
Regarding the NetScaler adding/replacing the Cache-Control header, it appears that it can be turned off; You just have to know how. Had to open a case with Citrix to learn about CTX124717 (FAQ:Preventing the Cache-Control Response Header from being Set to private).
If compression is enabled on the NetScaler, two of the default policies (ns_cmp_content_type and ns_adv_cmp_content_type) "compress data when the response contains Content-Type header and contains text" (see http://docs.citrix.com/en-us/netscaler/10-5/ns-optimization-wrapper-10-con/ns-compression-gen-wrapper-con/ns-compression-configactions-tsk.html). Using the NetScaler API Mgr (nsapimgr) you can prevent the Compression feature from adding the Cache-Control response header (nsapimgr -ys cmp_no_cc_hdr=1).

Keep the assets fresh in browser and cancel the freshness check request of the cache [for rails 3.1 app on heroku]

I have lot of small images (of sizes ~3kb or so) and lot of css and js files. After the first request tey are getting cached on the browser, but when I reload the page the browser is trying to check the freshness of the cached content (by setting the If-Modified-Since etc) and gets the response 304 not modified. Each of this validation request seriously increase the page load time (say 20 time 300ms).
How can I cancel this cache freshness check with the server from the browser? How can instruct the browser to use local cached files/images for certain time (say 1 hour) without re-validating or checking the freshness of the local cache with the remote server for every reload with that time period?
sample small image fetch header details below [using rails 3.1, on heroku]:
Response Headers
HTTP/1.1 304 Not Modified
Server: nginx/0.7.67
Date: Thu, 10 Nov 2011 17:53:33 GMT
Connection: keep-alive
Via: 1.1 varnish
X-Varnish: 1968827848
Last-Modified: Tue, 08 Nov 2011 07:36:04 GMT
Cache-Control: public, max-age=31536000
Etag: "5bda917d22f8a144c293f3f19723dbc6"
Request Headers
GET /assets/icons/flash_close_button-5bda917d22f8a144c293f3f19723dbc6.png HTTP/1.1
Host: ???.heroku.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:6.0.1) Gecko/20100101 Firefox/6.0.1
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Referer: http://???.heroku.com/
Cookie: ???
If-Modified-Since: Tue, 08 Nov 2011 07:36:04 GMT
If-None-Match: "5bda917d22f8a144c293f3f19723dbc6"
Cache-Control: max-age=0
This line:
Cache-Control: public, max-age=31536000
is telling the browser to not ask for updates for a long time, and store the files in a publicly accessible cache (which hear means public to the local machine - not the general public). Your browser should therefore not really be re-checking those files. Have you tried another browser to verify this behaviour exists elsewhere?
Saying all of this though, considering that your files are coming from the varnish cache and not your dyno, and are being returned as HTTP 304, 300ms for 20 files sounds like a very long time. However, This should be barely perceptible to the user.

Fiddler doesn't decompress gzip responses

I use Fiddler to debug my application. Whenever the response is compressed by server, instead of decompressed response, Fiddler shows unreadable binary data:
/* Response to my request (POST) */
HTTP/1.1 200 OK
Server: xyz.com
Date: Tue, 07 Jun 2011 22:22:21 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.3.3
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Encoding: gzip
14
����������������
0
How can I get the response decompressed?
I use fiddler version 2.3.4.4 and just noticed that in the Inspectors tab ("Raw" sub-tab), above the response section (in case of gzip-ed response), appears "Response is encoded and may need to be decoded before inspection. Click here to transform."
If you click on that, the response becomes readable.
The settings are pretty much the default, I just installed Fiddler and did not change anything.
If you don't want to have to click per response as in the accepted answer, using the menu, click Rules -> Remove All Encodings.
From the fiddler faq
Q: I like to navigate around a site then do a "search" for a text on all the logged request/responses. I was curious if Fiddler automatically decompressed gzipped responses during search?
A: Fiddler does not decompress during searches by default, since it would need to keep both the compressed and decompressed body in memory (for data integrity reasons).
In current versions of Fiddler, you can tick the "Decode Compressed Content" checkbox on the Find dialog.
Here is a link to the site
http://www.fiddler2.com/fiddler/help/faq.asp