Apache: difference between "Header always set" and "Header set"? - apache

Questions
What is the difference between Header always set and Header set in Apache?
That is, what does the always keyword change about the circumstances under which the header is set?
Should I always set my headers using always?
Is there any reason not to?
Background
I've seen...
Header always set X-Frame-Options DENY
...as well as...
Header always set Access-Control-Allow-Headers "*"
...and I sometimes hear that the presence of the always keyword ensures that the header is properly set, or that it's simply better to include the always keyword in general. However, I have never found a clear, definitive answer for why that is the case.
I've already checked the Apache docs for mod_headers, which only briefly mention always:
When your action is a function of an existing header, you may need to specify a condition of always, depending on which internal table the original header was set in. The table that corresponds to always is used for locally generated error responses as well as successful responses. Note also that repeating this directive with both conditions makes sense in some scenarios because always is not a superset of onsuccess with respect to existing headers:
You're adding a header to a locally generated non-success (non-2xx) response, such as a redirect, in which case only the table corresponding to always is used in the ultimate response.
You're modifying or removing a header generated by a CGI script, in which case the CGI scripts are in the table corresponding to always and not in the default table.
You're modifying or removing a header generated by some piece of the server but that header is not being found by the default onsuccess condition.
As far as I can tell, this means that Header set always ensures that the header is set even on non-200 pages. However, my HTTP headers set with Header set have always seemed to apply just fine on my 404 pages and such. Am I misunderstanding something here?
FWIW, I've found SO posts like What is the difference between "always" and "onsuccess" in Apache's Header config?, but the only answer there didn't really explain it clearly for me.
Thanks very much,
Caleb

What is the difference between Header always set and Header set in Apache?
As the quoted bit from the manual says, without 'always' your additions will only go out on succesful responses.
But this also includes "successfully" forward errors via mod_proxy and perhaps other similar handlers that roughly act like proxies. What generates your 404s that you found to disagree with the manual? A 404 on a local file certainly behaves as the quoted bit describes.
That is, what does the always keyword change about the circumstances under which the header is set?
Apache's API keeps two lists associated with each request, headers and err_headers. The former is not used if the server encounters an error processing the request the latter is.
Should I always set my headers using always?
It depends on their significance. Let's say you were setting Cache-Control headers that were related to what you had expected to serve for some resource. Now let's say you were actually serving something like a 400 or 502. You might not want that cached!
Is there any reason not to?
See above.
-/-
There is also a bit in the manual you did not quote which explains the proxy or CGI of an error code but not for one which Apache is generating an error response for:
The optional condition argument determines which internal table of
responses headers this directive will operate against. Despite the
name, the default value of onsuccess does not limit an action to
responses with a 2xx status code.
Headers set under this condition are still used when, for example, a
request is successfully proxied or generated by CGI, even when they
have generated a failing status code.

Related

Concept of default headers in mule

I want to understand the concept of default header in mule.I want to hit a get api call[the code is written in java] from mule and I am sending a token in the header, but I am setting the token in the default header inside the http request configuration.
<http:default-headers >
<http:default-header key="testing" value="#[vars.authorizationHeader]" />
</http:default-headers>
Will my java code be able to read this header from attributes ?
Default headers are just ones that will always be sent across all requests referencing that configuration, so yes, your server will get that token. However, it is not a good practice to use them along with expressions as you are doing because that makes the configuration very fragile (what if there's no such var in the request flow?) and force a new configuration to be used (as the expression must be resolved each time). Default headers make sense when you want to force a static header everywhere, for tracking purposes for example. If the header will be dynamic, then it's best to configure it on each request.

Is the URL subject to HTTP/2 header compression?

I understand that, if you send duplicate header values in subsequent requests, the dynamic table makes it so that you do not send the value again but a reference to it in the table is sent instead.
My question is whether this applies to the URL as well?
Say you have repeated requests to the same URL (possibly containing long IDs and/or tokens), would bandwidth be saved in this instance?
There are various options that a client can use to send headers under HTTP/2 as defined in the HPACK specification. These basically say whether to use a previously referred to header, whether to store a header for later reference, whether to never store a header for reuse...etc. The client decides which of these to use for headers it sends.
In HTTP/2 the URL is sent in the :path pseudo-header so unlike in HTTP/1.1 it is a just like any other HTTP Header so could be compressed. Typically a URL is not repeated often, however, so it would be sent as a Literal Header Field without Indexing, which means this is a once off header so don’t store it for reuse. Of course, as it’s an HTTP header much like any other, there’s nothing to stop an HTTP/2 client sending this as an indexed type, but web browsers are unlikely to do this, so this is probably only really an option for custom clients.
Incidentally if wishing to know more about this, and finding the spec a little difficult to follow, then my book HTTP/2 in Action, goes into this in a lot more detail in Chapter 8.

How to force dispatcher cache urls with get parameters

As I understood after reading these links:
How to find out what does dispatcher cache?
http://docs.adobe.com/docs/en/dispatcher.html
The Dispatcher always requests the document directly from the AEM instance in the following cases:
If the HTTP method is not GET. Other common methods are POST for form data and HEAD for the HTTP header.
If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
The authentication header is set (this can be configured)
But I want to cache url with parameters.
If I once request myUrl/?p1=1&p2=2&p3=3
then next request to myUrl/?p1=1&p2=2&p3=3 must be served from dispatcher cache, but myUrl/?p1=1&p2=2&p3=3&newParam=newValue should served by CQ for the first time and from dispatcher cache for subsequent requests.
I think the config /ignoreUrlParams is what you are looking for. It can be used to white list the query parameters which are used to determine whether a page is cached / delivered from cache or not.
Check http://docs.adobe.com/docs/en/dispatcher/disp-config.html#Ignoring%20URL%20Parameters for details.
It's not possible to cache the requests that contain query string. Such calls are considered dynamic therefore it should not be expected to cache them.
On the other hand, if you are certain that such request should be cached cause your application/feature is query driven you can work on it this way.
Add Apache rewrite rule that will move the query string of given parameter to selector
(optional) Add a CQ filter that will recognize the selector and move it back to query string
The selector can be constructed in a way: key_value but that puts some constraints on what could be passed here.
You can do this with Apache rewrites BUT it would not be ideal practice. You'll be breaking the pattern that AEM uses.
Instead, use selectors and extensions. E.g. instead of server.com/mypage.html?somevalue=true, use:
server.com/mypage.myvalue-true.html
Most things you will need to do that would ever get cached will work this way just fine. If you give me more details about your requirements and what you are trying to achieve, I can help you perfect the solution.

Content-Range header - allowed units?

This is related to:
How should I implement a COUNT verb in my RESTful web service? , Paging in a Rest Collection
and Using the HTTP Range Header with a range specifier other than bytes?
Actually I think the -1 rated anwser here is correct https://stackoverflow.com/a/1434701/1237617
Generally anwsers say that you can use custom units citing the sec 3.12
range-unit = bytes-unit | other-range-unit
bytes-unit = "bytes"
other-range-unit = token
However when you read the HTTP spec please notice the production rules are thus:
Content-Range = "Content-Range" ":" content-range-spec
content-range-spec = byte-content-range-spec
byte-content-range-spec = bytes-unit SP
byte-range-resp-spec "/"
( instance-length | "*" )
The header spec only references bytes-unit from sec 3.12, not range-units, so I think that actually it's against the spec to use custom units here.
Am I missing something or is the popular anwser wrong?
EDIT: Since this probbably isn't clear, the gist of my question is:
rfc2616 sec14.16 only references bytes-unit. It never mentions range-unit, so range-unit production is not relevant for Content-Range, and thus only byte-units can be used.
I think this adresses my concerns best, although I needed some time to understand it (plus I wanted to make sure, that there is something wrong with the wording).
This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests
thanks to elgaton
The spec, as being revised, allows custom range units. See HTTPbis Part 5, Section 2.
If you read the HTTP/1.1 RFC, section 3.12, you will see that:
The only range unit defined by HTTP/1.1 is "bytes". HTTP/1.1 implementations MAY ignore ranges specified using other units.
So, the other-range-unit token has been introduced only to make servers more "liberal" when accepting. This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests, so that servers could accept even invalid requests (they will be simply ignored) and clients would use only the universally-accepted bytes unit.
Therefore, I personally recommend to:
use only the bytes unit when acting as a client, and
accept other units (discarding the Content-Range header if they are invalid) when acting as a server.
This is a purely personal opinion, but I think it is fairly consistent with how other HTTP extensions (custom methods or headers) are used. Here is how I read it: Yes I can use custom range units and no, I shouldn't submit a bug report when it gets ignored when passing through firewalls, web proxies, and other intermediaries. I conform to the HTTP spec when I'm sending it and they conform to HTTP when they ignore it. WebDAV uses HTTP extensions correctly, IMO, but rarely works over the Internet for exactly this reason. As I said, a personal opinion only.
Apparently it's OK to use custom units, because:
This reflects the fact that, apparently, the first set of grammar
rules has been specifically made for parsing and the second one for
producing HTTP requests

What are the Valid values for http Pragma

What are the valid values for http header pragma . I know no-cache is one but i wnat to enable caching so what should i set it. I did some googleing and all that i got was most clients ignore this but no info on other values it accepts.
Surprisingly there is only one parameter defined by default, which is no-cache and no new Pragma directives will be defined in HTTP as per RFC.
ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.32
Moreover, you will need to use the Cache-Control header for managing the caching behaviors rather than the Pragma directive which seems to be still included only to support the legacy HTTP/1.0.
ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
Bonus: http://www.mnot.net/cache_docs/
You're probably looking for Cache-Control, this is supported in HTTP/1.1 and defines more states than Pragma.
Some more information, that might help some people that are less interested in caching, and more interested in http headers in general. i.e the literal interpretation of the original question, "what are the valid values for the http header pragma"?
The reference in the accepted answer (https://stackoverflow.com/a/7376516/3246928) is the RFC http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.32. It defines the snytax as:
Pragma = "Pragma" ":" 1#pragma-directive
pragma-directive = "no-cache" | extension-pragma
extension-pragma = token [ "=" ( token | quoted-string ) ]
This implies that any 'token=value' pair is acceptable (with the value being optional). The spec goes on to say
No new Pragma directives will be defined in HTTP.
and I would guess this is also meant to cover the "extension-pragma" part, but I wish they had been more unambiguous here.
This header does not seem to be specifically created for caching; the description in the RFC says:
The Pragma general-header field is used to include implementation-
specific directives that might apply to any recipient along the
request/response chain
So, in theory, you could add things here, and they could work. However, despite much searching, I have not found any reference to any other values ever being used here. It is effectively a dead and embarrassing part of http/1.
It seems like the normal thing to do is:
Only use pragma with the no-cache flag. This is the only value anyone should ever use. (And of course you should also use the cache-control header for your caching to behave as expected).
If you want to put some special information into a http header - i.e. If you want to "include implementation-specific directives that might apply to any recipient along the request/response chain", then create a custom http header. Google and Amazon, for example, do this:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html and
https://cloud.google.com/storage/docs/reference-headers
Note the naming convention on the http header. The "x-" prefix is deprecated
by https://www.rfc-editor.org/rfc/rfc6648, but everyone seems to use it
anyway.