can i use "http header" to check if a dynamic page has been changed - http-headers

you can request the http header to check if a web page has been edited by looking at its date but how about dynamic pages such as - php, aspx- which grabs its data from a database?

Even though you might think it's outdated I've always found Simon Willison's article on Conditional GET to be more than useful. The example is in PHP but it is so simple that you can adapt it to other languages. Here it is the example:
function doConditionalGet($timestamp) {
// A PHP implementation of conditional get, see
// http://fishbowl.pastiche.org/archives/001132.html
$last_modified = substr(date('r', $timestamp), 0, -5).'GMT';
$etag = '"'.md5($last_modified).'"';
// Send the headers
header("Last-Modified: $last_modified");
header("ETag: $etag");
// See if the client has provided the required headers
$if_modified_since = isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) ?
stripslashes($_SERVER['HTTP_IF_MODIFIED_SINCE']) :
false;
$if_none_match = isset($_SERVER['HTTP_IF_NONE_MATCH']) ?
stripslashes($_SERVER['HTTP_IF_NONE_MATCH']) :
false;
if (!$if_modified_since && !$if_none_match) {
return;
}
// At least one of the headers is there - check them
if ($if_none_match && $if_none_match != $etag) {
return; // etag is there but doesn't match
}
if ($if_modified_since && $if_modified_since != $last_modified) {
return; // if-modified-since is there but doesn't match
}
// Nothing has changed since their last request - serve a 304 and exit
header('HTTP/1.0 304 Not Modified');
exit;
}
With this you can use HTTP verbs GET or HEAD (I think it's also possible with the others, but I can't see the reason to use them). All you need to do is adding either If-Modified-Since or If-None-Match with the respective values of headers Last-Modified or ETag sent by a previous version of the page. As of HTTP version 1.1 it's recommended ETag over Last-Modified, but both will do the work.
This is a very simple example of how a conditional GET works. First we need to retrieve the page the usual way:
GET /some-page.html HTTP/1.1
Host: example.org
First response with conditional headers and contents:
200 OK
ETag: YourETagHere
Now the conditional get request:
GET /some-page.html HTTP/1.1
Host: example.org
If-None-Match: YourETagHere
And the response indicating you can use the cached version of the page, as only the headers are going to be delivered:
304 Not Modified
ETag: YourETagHere
With this the server notified you there was no modification to the page.
I can also recommend you another article about conditional GET: HTTP conditional GET for RSS hackers.

This is the exact purpose of the ETag header, but it has to be supported by your web framework or you need to take care that your application responds properly to requests with headers If-Match, If-Not-Match and If-Range (see HTTP Ch 3.11).

You can if it uses the http response headers correctly but it's often overlooked.
Otherwise storing a local md5-hash of the content might be useful to you (unless there's an easier in-content string you could hook out). It's not ideal (because it's quite a slow process) but it's an option.

Yes, you can and should use HTTP headers to mark pages as unexpired. If they are dynamic though (PHP, ASPX, etc.) and/or database driven, you'll need to manually control setting the Expires header/sending HTTP Not Modified appropriately. ASP.NET has some SqlDependency objects for this, but they still need to be configured and managed. (Not sure if PHP has something just like it, but there's probably something in PEAR if not...)

The Last-Modified header will only be of use to you if the programmer of the site has explicitly set it to be returned.
For a regular, static page Last-Modified is the timestamp of the last modification of the HTML file. For a dynamically generated page the server can't reliably assign a Last-Modified value as it has no real way of knowing how the content has changed depending on request, so many servers don't generate the header at all.
If you have control over the page, then ensuring the Last Modified header is being set will ensure a check on Last-Modified is successful. Otherwise you may have to fetch the page and either perform a regex to find a changed section (e.g. date/time in the header of a news site). If no such obvious marker exists, then I'd second Oli's suggestion of an MD5 on the page content as a way to be sure it has changed.

Related

Override default headers for an IXMLHTTPRequest2

I'm trying to send a HTTP request on Windows 8 using an IXMLHTTPRequest2 object and I want to customise the outgoing Accept-Encoding header to something other than the default value of "gzip, deflate". When I try and use SetRequestHeader method to set the Accept-Encoding header, the method call succeeds but the request is still sent with the default header value instead of the value I provided (Verified by using Wireshark to capture the HTTP request).
Sample code (simplified for beverity):
::CoCreateInstance( CLSID_FreeThreadedXMLHTTP60, NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS( &m_pXHR ));
m_pXHR->Open( "GET", "http://192.168.0.100/test", m_pXHRCallback.Get(), NULL, NULL, NULL,NULL );
m_pXHR->SetRequestHeader( L"Accept-Encoding", L"gzip" );
m_pXHR->Send( NULL, 0 );
Wireshark capture of request that gets made:
GET /users/me/id HTTP/1.1
Accept: */*
Host: 192.168.0.100
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
According to the docs for SetRequestHeader, it is an append operation only. You're getting the gzip Accept-Encoding, so I think it's working as intended. I don't see a way of removing the default header, however.
It seems that you can use IXMLHTTPRequest2::SetProperty() with XHR_PROP_NO_DEFAULT_HEADERS to suppress the default headers.
See: http://msdn.microsoft.com/en-us/library/windows/desktop/hh831167(v=vs.85).aspx
It seems that the IXMLHTTPRequest2 API is plainly broken, as it doesn't have a way to remove a header. Or, perhaps, documentation is broken because it doesn't mention that passing an empty string or NULL removes a header.
Also, according to IXMLHTTPRequest2::SetRequestHeader declaration:
virtual HRESULT STDMETHODCALLTYPE SetRequestHeader(
/* [ref][string][in] */ __RPC__in_string const WCHAR *pwszHeader,
/* [unique][string][in] */ __RPC__in_opt_string const WCHAR *pwszValue) = 0;
header's value is marked as optional (__RPC__in_opt_string) and can be NULL.
So, if you really want to set a header's value, the only proper way that works with IXMLHTTPRequest2 is to do this:
m_pXHR->SetRequestHeader(L"SomeMyHeader", L"");
m_pXHR->SetRequestHeader(L"SomeMyHeader", L"value");
This way you can remove, or change some default headers:
m_pXHR->SetRequestHeader(L"Accept-Language", L"");
Since this isn't documented, this may or may not work for you at some point on some particular version of Windows. If you tried to use IXMLHTTPRequest2 heavily you'd come to the same conclusion as me: it's just broken crap. This for example doesn't work:
m_pXHR->SetRequestHeader(L"Accept-Encoding", L"");
Seems that some dude who implemented IXMLHTTPRequest2 put lots of undocumented logic in there:
You can complete remove some headers if you set them to NULL or an empty string (for example, Accept-Language header, or your own headers can be removed).
You can change, but not remove some headers (for example, User-Agent header).
You cannot change some headers at all (for example, Accept-Encoding header).
When you call ->Send on IXMLHTTPRequest2, internally they unconditionally set Accept-Encoding to whatever they feel like using. That means that you cannot add some alternative encoding like brotli without resorting to hacks and custom headers.
They should just start using libcurl and expose its API instead of exposing IXMLHTTPRequest2 shameful quality.

Is it possible to remove a Pragma no-cache response header once it has been set by SetCacheability method?

I have an MVC4 GET action method that returns a FileStreamResult. A requirement exists to only use SSL and to not allow caching of the served document so SSL it is and I've also used the OutputCache filter with the following properties:
[OutputCache(NoStore = true, Duration = 0, VaryByParam = "None", Location = OutputCacheLocation.None)]
This behaves as expected and produces the following response headers:
Cache-Control: no-cache, no-store
Expires: -1
Pragma: no-cache
All was well until asked to also support IE8 and as many here have also encountered the documents just won't download with both no-cache set and SSL in the mix. The workaround for IE8 and below is to add some registry setting which is not really viable, or to remove the no-cache headers which breaks a fundamental requirement.
I experimented with Fiddler and IE8 and was able to download a document if I just removed the pragma: no-cache header but left the Cache-Control header intact. This didn't appear to leave a copy of the document in my temporary internet files but I might need to test this some more.
With this information in mind I thought it might be a simple task to remove the pragma using a filter on the action but it seems no matter what I do I cannot change whatever the OutputCache is going to set. I've even removed the OutputCache attribute and used:
Response.Cache.SetCacheability(HttpCacheability.NoCache)
Using this method alone ensures I get the same cache settings as before but they are not set at the point of this method call. This merely sets up the cache policy which gets applied at some point in the response pipeline but I just don't know where.
Does anyone know if there is a way of hooking into the response pipeline to alter the cache headers as they are being written?
EDIT
I've added a simple custom IHttpModule into the pipeline that looks for and removes any pragma header in the response NameValueCollection and whilst the cache-control is set the pragma is not there. Does this mean that IIS 7.5 is inserting the pragma itself based upon what it sees in the cache-control perhaps? I know for sure I have not set anything beyond defaults for a simple web site.
EDIT
Checked the Cache-Control header value within the module and it is set private so the cache headers haven't been applied to the response yet. So it would appear the cache headers get added after modules are executed perhaps?
I was troubleshooting this same issue and ran into the same issue removing the pragma header. When .NET renders a Page object, it outputs the cache headers. The cache handling is controlled by an HttpModule. I've tried several ways to remove the pragma header, but to no avail.
One method I haven't tried yet that looks like it might work, but also looks like a PITA is to implement a filter on the Response output stream via Response.Filter = new MyCustomFilter(...).
Prior to this I tried checking the headers in various locations, but the output cache processing had not been executed yet and pragma header did not exist and so could not be removed. Notably the HttpApplication event PreSendRequestHeaders did not work.
Some other options include implementing your own OutputCache module instead of using the built-in framework version, or somehow overriding the System.Web.HttpCachePolicy class where the pragma header is rendered.
The pragma header is rendered as part of the HttpCacheability.NoCache option:
if (httpCacheability == HttpCacheability.NoCache || httpCacheability == HttpCacheability.Server)
{
if (HttpCachePolicy.s_headerPragmaNoCache == null)
HttpCachePolicy.s_headerPragmaNoCache = new HttpResponseHeader(4, "no-cache");
this._headerPragma = HttpCachePolicy.s_headerPragmaNoCache;
if (this._allowInHistory != 1)
{
if (HttpCachePolicy.s_headerExpiresMinus1 == null)
HttpCachePolicy.s_headerExpiresMinus1 = new HttpResponseHeader(18, "-1");
this._headerExpires = HttpCachePolicy.s_headerExpiresMinus1;
}
}
The only pragmatic option I've found is to set the cache-control to private and also set a short expiration for the URL. It doesn't address the root cause on either end, but it does end up with almost the same desired effect.

Caching JSON with Cloudflare

I am developing a backend system for my application on Google App Engine.
My application and backend server communicating with json. Like http://server.example.com/api/check_status/3838373.json or only http://server.example.com/api/check_status/3838373/
And I am planning to use CloudFlare for caching JSON pages.
Which one I should use on header? :
Content-type: application/json
Content-type: text/html
Is CloudFlare cache my server's responses to reduce my costs? Because I'll not use CSS, image, etc.
The standard Cloudflare cache level (under your domain's Performance Settings) is set to Standard/Aggressive, meaning it caches only certain types by default scripts, stylesheets, images. Aggressive caching won't cache normal web pages (ie at a directory location or *.html) and won't cache JSON. All of this is based on the URL pattern (e.g. does it end in .jpg?) and regardless of the Content-Type header.
The global setting can only be made less aggressive, not more, so you'll need to setup one or more Page Rules to match those URLs, using Cache Everything as the custom cache rule.
http://blog.cloudflare.com/introducing-pagerules-advanced-caching
BTW I wouldn't recommend using an HTML Content-Type for a JSON response.
By default, Cloudflare does not cache JSON file. I've ended up with config a new page rule:
https://example.com/sub-directiory/*.json*
Cache level: Cache Everything
Browser Cache TTL: set a timeout
Edge Cache TTL: set a timeout
Hope it saves someone's day.
The new workers feature ($5 extra) can facilitate this:
Important point:
Cloudflare normally treats normal static files as pretty much never expiring (or maybe it was a month - I forget exactly).
So at first you might think "I just want to add .json to the list of static extensions". This is likely NOT want you want with JSON - unless it really rarely changed - or is versioned by filename. You probably want something like 60 seconds or 5 minutes so that if you update a file it'll update within that time but your server won't get bombarded with individual requests from every end user.
Here's how I did this with a worker to intercept all .json extension files:
// Note: there could be tiny cut and paste bugs in here - please fix if you find!
addEventListener('fetch', event => {
event.respondWith(handleRequest(event));
});
async function handleRequest(event)
{
let request = event.request;
let ttl = undefined;
let cache = caches.default;
let url = new URL(event.request.url);
let shouldCache = false;
// cache JSON files with custom max age
if (url.pathname.endsWith('.json'))
{
shouldCache = true;
ttl = 60;
}
// look in cache for existing item
let response = await cache.match(request);
if (!response)
{
// fetch URL
response = await fetch(request);
// if the resource should be cached then put it in cache using the cache key
if (shouldCache)
{
// clone response to be able to edit headers
response = new Response(response.body, response);
if (ttl)
{
// https://developers.cloudflare.com/workers/recipes/vcl-conversion/controlling-the-cache/
response.headers.append('Cache-Control', 'max-age=' + ttl);
}
// put into cache (need to clone again)
event.waitUntil(cache.put(request, response.clone()));
}
return response;
}
else {
return response;
}
}
You could do this with mime-type instead of extension - but it'd be very dangerous because you'd probably end up over-caching API responses.
Also if you're versioning by filename - eg. products-1.json / products-2.json then you don't need to set the header for max-age expiration.
You can cache your JSON responses on Cloudflare similar to how you'd cache any other page - by setting the Cache-Control headers. So if you want to cache your JSON for 60 seconds on the edge (s-maxage) and the browser (max-age), just set the following header in your response:
Cache-Control: max-age=60, s-maxage=60
You can read more about different cache control header options here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
Please note that different Cloudflare plans have different value for minimum edge cache TTL they allow (Enterprise plan allows as low as 1 second). If your headers have a value lower than that, then I guess they might be ignored. You can see the limits here:
https://support.cloudflare.com/hc/en-us/articles/218411427-What-does-edge-cache-expire-TTL-mean-#summary-of-page-rules-settings

Implementing ETag Support in ASP.NET MVC4 WebAPI

In the latest ASP.NET MVC4 beta, how would you support conditional GET support via ETags? The ActionFilter would need to be able to complete the request to generate the ETag for the returned resource in order to compare to the If-None-Match header in the request. And then, regardless of whether the incoming ETag in the If-None-Match header was the same as the generated ETag, add the generated ETag to the ETag response header. But with ASP.NET MVC4, I have no idea where to begin. Any suggestions?
Personally, I'm not a fan of "framework magic" and prefer plain old code in the web methods, else we end up with something more akin to WCF, yuk.
So, within your Get web method, manually create the response like so:
var response = this.Request.CreateResponse(HttpStatusCode.OK, obj);
string hash = obj.ModifiedDate.GetHashCode().ToString();
response.Headers.ETag =
new EntityTagHeaderValue(String.Concat("\"", hash, "\""), true);
return response;
Please note that the ETag produced from the hash code of the timestamp is purely illustrative of a weak entity tagging system. It also shows the additional quotes required.
There is a ETagMessageHandler in the WebApiContrib which does what you need.
UPDATE
I have implemented RFC 2616's server side caching in WebApiContrib. Look for CachingHandler.
More info here.
More Update
This will be actively developed and expanded upon under CacheCow. This will include both client and server components. NuGet packages to be published soon are published now.
WebApiContrib's CachingHandler will still be maintained so any bugs or problems please let me know.
Luke Puplett's answer got me on the right track (+1), but note that you also have to read the ETag on the server side to avoid sending all the data with each request:
string hash = obj.ModifiedDate.GetHashCode().ToString();
var etag = new EntityTagHeaderValue(String.Concat("\"", hash, "\""), true);
if (Request.Headers.IfNoneMatch.Any(h => h.Equals(etag)))
{
return new HttpResponseMessage(HttpStatusCode.NotModified);
}
var response = this.Request.CreateResponse(HttpStatusCode.OK, obj);
response.Headers.ETag = etag;
return response;
It would also be a good idea to respect the If-Modified-Since header. See RFC 2616.
It seems this is what you are looking for (see section "Support for ETags"):
http://blogs.msdn.com/b/webdev/archive/2014/03/13/getting-started-with-asp-net-web-api-2-2-for-odata-v4-0.aspx
In case your model is stored deeper in domain and you are not able to apply the [ConcurrencyCheck] attribute, you can do that using the ODataModelBuilder:
ODataModelBuilder builder = new ODataConventionModelBuilder();
var myEntity = builder.EntitySet<MyEntity>("MyEntities");
myEntity.EntityType.Property(l => l.Version).ConcurrencyToken = true;
this will make it to add the "#odata.etag" property to a response body.

Why am I not getting the soap header?

Why is this so hard in WCF 4.0
I add a custom header in my client
Authorization: 18732818 gfdsgShoyh3sfayql6jWCRc=
so that my header looks like the following
GET http://HOSTNAME/Public/Xml/SyncReply/TestClearUsername?Id=1 HTTP/1.1
Authorization: 18732818 gfdsgShoyh3sfayql6jWCRc=
Host: HOSTNAME
Connection: Keep-Alive
in my wired up service responder I can access the property Id and get the value 1. I would also like to access the value Authorization, but it always shows as null.
What am I doing wrong?
After much googling I finally found the answer to this so I will post it here so it might be of use to someone else. I am assuming this is an undocumented feature, since it is so well hidden, but someone else might know different.
I found this enumeration System.Web.HttpWorkerRequest.HeaderAuthorization (value=24)
and this method System.Web.HttpWorkerRequest.GetKnownRequestHeader(24)
Just to summarize the reason Authorization was hiding from me was that its a reserved header value. if you add a random word and want to retrieve you can use.
.GetUnknownRequestHeader("YOUR_WORD_HERE").
so in full you need
HttpRequestContext hrc = (HttpRequestContext)this.RequestContext;
RequestAttributes ra = (RequestAttributes)hrc.RequestAttributes;
System.Web.HttpWorkerRequest hwr = ra.HttpWorkerRequest;
string Auth = hwr.GetKnownRequestHeader(System.Web.HttpWorkerRequest.HeaderAuthorization);