Caching proxy with authenticated REST requests - authentication

Consider following scenario:
I have RESTful URL /articles that returns list of articles
user provide his credentials using Authorization HTTP header on each request
articles may vary from user to user based on his privileges
Its possible to use caching proxy, like Squid, for this scenario?
Proxy will see only URL /articles so it may return list of articles only valid for first user that generates the cache. Other users requesting URL /articles can see articles they don't have access to, which is not desirable of course.
Should I roll my own cache or some caching proxy software can be configured to base its cache on Authorization HTTP header?

One possibility to try is using the Vary: Authorization response header to instruct downstream caches to be careful about caching by varying the cached documents based on the request's Authorization header.
You may already be using this header if you use response-compression. The user generally requests a resource with the header Accept-Encoding: gzip, deflate; if the server is configured to support compression, then the response might come with the headers Content-Encoding: gzip and Vary: Accept-Encoding already.

By the HTTP/1.1 RFC section 14.8 (https://www.rfc-editor.org/rfc/rfc2616#section-14.8):
When a shared cache (see section 13.7) receives a request
containing an Authorization field, it MUST NOT return the
corresponding response as a reply to any other request, unless one
of the following specific exceptions holds:
1. If the response includes the "s-maxage" cache-control
directive, the cache MAY use that response in replying to a
subsequent request. But (if the specified maximum age has
passed) a proxy cache MUST first revalidate it with the origin
server, using the request-headers from the new request to allow
the origin server to authenticate the new request. (This is the
defined behavior for s-maxage.) If the response includes "s-
maxage=0", the proxy MUST always revalidate it before re-using
it.
2. If the response includes the "must-revalidate" cache-control
directive, the cache MAY use that response in replying to a
subsequent request. But if the response is stale, all caches
MUST first revalidate it with the origin server, using the
request-headers from the new request to allow the origin server
to authenticate the new request.
3. If the response includes the "public" cache-control directive,
it MAY be returned in reply to any subsequent request.

Related

CORS: Is the Access-Control-Allow-Credentials header mandatory for subsequent calls to the OPTIONS request?

I have an application that require credentials. For the preflight requests, I am returning Access-Control-Allow-Credentials to true only in OPTIONS requests. I assumed that this header would not be necessary in subsequent requests but it is failing.
Is this behaviour expected or should I perform modifications?
The MDN website mentions the following but it is not entirely clear to me:
When used as part of a response to a preflight request, this indicates whether or not the actual request can be made using credentials. Note that simple GET requests are not preflighted. So, if a request is made for a resource with credentials, and if this header is not returned with the resource, the response is ignored by the browser and not returned to the web content.
(source)
The fetch standard includes this note for Access-Control-Allow-Credentials header but it's not clear to me either.
For a CORS-preflight request, request’s credentials mode is always "same-origin", i.e., it excludes credentials, but for any subsequent CORS requests it might not be. Support therefore needs to be indicated as part of the HTTP response to the CORS-preflight request as well.
(source)
I am returning Access-Control-Allow-Credentials to true only in OPTIONS requests assuming that in the following calls this header would not be needed.
The OPTIONS request is successful but the browser blocks the subsequent POST request (which does not include Access-Control-Allow-Credentials to true) with the next message:
You want to make a credentialed CORS request (that is, fetch(..., {credentials: "include"})) that requires a preflight (for example, because it is a POST request with Content-Type: application/json).
Without the Access-Control-Allow-Credentials: true header in the preflight response, the browser would not make the credentialed request in the first place.
Since you set that header in the preflight response, the browser makes the credentialed request (so that the effect happens on the server). But then the response lacks a header Access-Control-Allow-Credentials: true, therefore the browser refuses to make the response accessible to Javascript. This is the same behavior as if you made a simple CORS GET request (which does not require a preflight) but the response lacks an Access-Control-Allow-Origin header.
So you really need this header in both responses.

To detect page request and API/XHR call from backend

Have a single entry service which is acting like a facade/proxy service for downstream services. The service will need able to detect if the request is a "page request" or "api/xhr" request to perform error handling (302 redirection or 401).
So far have considered:
To use Accept header and detect text/html follow the following reference, can't tell if this is a good indicator to detect a page request
To introduce a custom header for all "api/xhr" request
To enforce all the "api/xhr" requests to follow a "/api" pattern (troublesome as for certain application the xhr is not a restful api)
Any good suggestions are welcome
Ended up using Option 1
Detect page request using Accept header with value "text/html"
As we do not use ajax for partial view
Usually the non-standard HTTP header X-Requested-With is used. Just the presence of the header should be enough. It has at least one advantage over Accept: It cannot be set on a cross-site request, which helps preventing CSRF.

Does the must-revalidate cache-control header tell the browser to only download a cached file if it has changed?

If I want browsers to load PDF files from cache until they changed on the server, do I have to use max-age=0 and must-revalidate as cache-control headers?
If I would use another value (larger than 0) for max-age would that mean the revalidation would only happen once the max-age value was exceeded?
What would happen if I would only set the must-revalidate header without max-age?
I was reading through this question and I am not 100% sure.
Also, what exactly does revalidate mean? Does it mean the client asks the server if the file has changed?
On the contrary, I've read that cache-control no-cache pretty much does what I want to achieve. Cache and check with the server if there is a new version... so whats the correct way?
I assume you are asking about which headers should you configure to be sent from your server, and by "client" you mean "modern web browser"? Then the quoted question/answer is correct, so:
Yes, you should set both, but max-age=0 is enough, (must-revalidate is the default behavior)
Yes, correct, the response would be served from local cache until max-age expires, after that it would be revalidated (once), then again served from local cache and so on
It is kind of undefined, and differs between browsers and the way you send request (clicking link from html, hitting reload button, typing directly in address bar and hitting enter). Generally, response should not be served directly from cache but it could either just be revalidated or full response can be requested from server.
Revalidate means that client asks server to send the content only if it has been changed since it was last retrieved. In order for this to work, in response to initial request server will send either one or both of:
Etag header (which contains hashed value of the content), which client will cache and send back in revalidation request as If-None-Match header, so server can compare clients cached Etag value with the current Etag on server side. If the value did not change, server will respond with 304 Not Modified (and empty body), and if the value changed, server will respond with 200 and full (new) content
Last-Modified (which contains timestamp of the last content modification), and client will send that in revalidation request in If-Modified-Since header, which will be used on server side to detirmine the response (304 or 200)
Cache-control: no-cache might achieve the same effect in most of the (simple) cases. The situation where things get complicated is when there are intermediate caches between client and the server, or when you want to tweak client behavior (for example when sending AJAX requests) and that is when most of the caching directives come into use

Does the presence of an Origin header imply a CORS request

Is it safe to assume that an HTTP request with an Origin header is a CORS request?
If not, what is the correct way to distinguish a CORS request from a regular HTTP request originating from, say, an PHP app on an external server?
If by “CORS request” you mean a cross-origin request—i.e. a request using the CORS protocol—then no, it isn’t safe to assume request with an Origin header is a cross-origin request.
That’s because, along with requiring browsers to send an Origin in all cross-origin requests that use the CORS protocol, the Fetch spec also requires browsers to send the Origin header for all requests whose method is neither GET nor HEAD:
If the CORS flag is set or httpRequest’s method is neither GET nor HEAD, then append Origin/httpRequest’s origin, serialized and UTF-8 encoded, to httpRequest’s header list.
So browsers must also send the Origin header for, e.g., all POST requests.
The Fetch spec further states:
A CORS request is an HTTP request that includes an Origin header. It cannot be reliably identified as participating in the CORS protocol as the Origin header is also included for all requests whose method is neither GET nor HEAD.
So the spec actually defines CORS request to mean “any request that has an Origin header“ — even if that request isn’t cross-origin and so doesn’t use the the CORS protocol.
That may seem like a weird way to define it, but anyway given that, it’s important to remember that anywhere else the spec mentions CORS request, it does not necessarily mean a cross-origin request or even “a request participating in the CORS protocol” — because per the above definition from the spec, a same-origin POST request is also a CORS request.
So, there’s no way from the server side to reliably identify a given request as participating in the CORS protocol. Only the browser sending the request knows—and other than Origin, there are no other headers we can assume browsers will always send in CORS-protocol requests.

Restful Webservices Caching Data

Want to know where exactly data will be cached in Restful Webservices? Please avoid saying browsers cache Restful webservices data.
REST is based on HTTP.
In HTTP you do not know if you data is cached somewhere. It may be in the browser or in any node in between the client and the server.
However your REST-Server may add the Cache-Control HTTP header to its response, e.g. Cache-Control: No-Cache to mark the response as not to cache.
It is not assured if this will not be ignored by a proxy or whatever.
Your client can also request to not cache data. In jquery you just add cache: no to the AJAX-request and it will do the trick.
If jquery is not available you will have to use the if-modified-since header (http://www.w3.org/Protocols/HTTP/HTRQ_Headers.html#if-modified-since).
Probably this post cleared my doubt.
http://www.openlogic.com/wazi/bid/283625/Caching-web-service-results-can-enhance-Apache-application-performance.