Caching JSON with Cloudflare - cloudflare

I am developing a backend system for my application on Google App Engine.
My application and backend server communicating with json. Like http://server.example.com/api/check_status/3838373.json or only http://server.example.com/api/check_status/3838373/
And I am planning to use CloudFlare for caching JSON pages.
Which one I should use on header? :
Content-type: application/json
Content-type: text/html
Is CloudFlare cache my server's responses to reduce my costs? Because I'll not use CSS, image, etc.

The standard Cloudflare cache level (under your domain's Performance Settings) is set to Standard/Aggressive, meaning it caches only certain types by default scripts, stylesheets, images. Aggressive caching won't cache normal web pages (ie at a directory location or *.html) and won't cache JSON. All of this is based on the URL pattern (e.g. does it end in .jpg?) and regardless of the Content-Type header.
The global setting can only be made less aggressive, not more, so you'll need to setup one or more Page Rules to match those URLs, using Cache Everything as the custom cache rule.
http://blog.cloudflare.com/introducing-pagerules-advanced-caching
BTW I wouldn't recommend using an HTML Content-Type for a JSON response.

By default, Cloudflare does not cache JSON file. I've ended up with config a new page rule:
https://example.com/sub-directiory/*.json*
Cache level: Cache Everything
Browser Cache TTL: set a timeout
Edge Cache TTL: set a timeout
Hope it saves someone's day.

The new workers feature ($5 extra) can facilitate this:
Important point:
Cloudflare normally treats normal static files as pretty much never expiring (or maybe it was a month - I forget exactly).
So at first you might think "I just want to add .json to the list of static extensions". This is likely NOT want you want with JSON - unless it really rarely changed - or is versioned by filename. You probably want something like 60 seconds or 5 minutes so that if you update a file it'll update within that time but your server won't get bombarded with individual requests from every end user.
Here's how I did this with a worker to intercept all .json extension files:
// Note: there could be tiny cut and paste bugs in here - please fix if you find!
addEventListener('fetch', event => {
event.respondWith(handleRequest(event));
});
async function handleRequest(event)
{
let request = event.request;
let ttl = undefined;
let cache = caches.default;
let url = new URL(event.request.url);
let shouldCache = false;
// cache JSON files with custom max age
if (url.pathname.endsWith('.json'))
{
shouldCache = true;
ttl = 60;
}
// look in cache for existing item
let response = await cache.match(request);
if (!response)
{
// fetch URL
response = await fetch(request);
// if the resource should be cached then put it in cache using the cache key
if (shouldCache)
{
// clone response to be able to edit headers
response = new Response(response.body, response);
if (ttl)
{
// https://developers.cloudflare.com/workers/recipes/vcl-conversion/controlling-the-cache/
response.headers.append('Cache-Control', 'max-age=' + ttl);
}
// put into cache (need to clone again)
event.waitUntil(cache.put(request, response.clone()));
}
return response;
}
else {
return response;
}
}
You could do this with mime-type instead of extension - but it'd be very dangerous because you'd probably end up over-caching API responses.
Also if you're versioning by filename - eg. products-1.json / products-2.json then you don't need to set the header for max-age expiration.

You can cache your JSON responses on Cloudflare similar to how you'd cache any other page - by setting the Cache-Control headers. So if you want to cache your JSON for 60 seconds on the edge (s-maxage) and the browser (max-age), just set the following header in your response:
Cache-Control: max-age=60, s-maxage=60
You can read more about different cache control header options here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
Please note that different Cloudflare plans have different value for minimum edge cache TTL they allow (Enterprise plan allows as low as 1 second). If your headers have a value lower than that, then I guess they might be ignored. You can see the limits here:
https://support.cloudflare.com/hc/en-us/articles/218411427-What-does-edge-cache-expire-TTL-mean-#summary-of-page-rules-settings

Related

IE - XSL files not getting loaded from browser cache (always hitting the server to load)

In our application, we have some pages where xsl transformation are happening using activex object Microsoft.XMLDOM. Its a legacy application, so there is not much scope for any changes.
Example:
<script>
var doc = new ActiveXObject("Microsoft.XMLDOM");
doc.async = false;
doc.load("/<<Some Path/myXSL.xsl");
document.write(myXML.transformNode(doc));
</script>
our HTTP server is sending correct cache-control, expiry attributes to the client end. The CSS, JS, Image files etc as we can see picked up from browser cache (status 304), but for XSL files, it is always hitting the server (status 200).
If we open the Temporary Internet Files, we can see although there is a future date is present against Expires column, but Last Checked is always getting updated against each request.
Any help here, would be much appreciated.
After searching a lot in google, we understood that ActiveX Microsoft.XMLDOM sends "pragma - nocache" in the request header (to skip the browser cache).
To solve this, we need to set ForcedResync property as false (telling the XML Dom object not to send that pragma).
Example:
<script>
var doc = new ActiveXObject("Microsoft.XMLDOM");
doc.async = false;
doc.setProperty("ForcedResync", false);
doc.load("/<<Some Path>>/myXSL.xsl");
document.write(myXML.transformNode(doc));
</script>

CSRF failure in custom mongoose pre-hook (Keystone.js)

using keystone LocalFile type to handle image uploads. similar to the Cloudinary autoCleanup option, I want to be able to delete the uploaded file itself, in addition to the corresponding mongo entry when deleting entries through the admin ui.
in this case, I want to delete an "Album", and it's corresponding album cover.
Album.schema.pre('remove', function(next){
var path = this._original.album_cover.path + "/" + this._original.album_cover.filename
fs.unlink(path, function () {
console.log('deleted');
})
I get "CSRF failure" when using the fs module. I thought all CSRF protection was handled internally with Keystone.
Anyone know of a better solution to this?
Took a 10 minute break and came back and it seems to be working now. I also found this, which seems to be the explanation.
"Moreover double check your session timeout. In my dev settings the session duration is set to 3 minutes. So, if I end up editing something for more than that time, Keystone will return a CSRF error on save because the new session (generate in the meantime) invalidates the old token."
https://github.com/keystonejs/keystone/issues/1330

ASP.NET Web API - Reading querystring/formdata before each request

For reasons outlined here I need to review a set values from they querystring or formdata before each request (so I can perform some authentication). The keys are the same each time and should be present in each request, however they will be located in the querystring for GET requests, and in the formdata for POST and others
As this is for authentication purposes, this needs to run before the request; At the moment I am using a MessageHandler.
I can work out whether I should be reading the querystring or formdata based on the method, and when it's a GET I can read the querystring OK using Request.GetQueryNameValuePairs(); however the problem is reading the formdata when it's a POST.
I can get the formdata using Request.Content.ReadAsFormDataAsync(), however formdata can only be read once, and when I read it here it is no longer available for the request (i.e. my controller actions get null models)
What is the most appropriate way to consistently and non-intrusively read querystring and/or formdata from a request before it gets to the request logic?
Regarding your question of which place would be better, in this case i believe the AuthorizationFilters to be better than a message handler, but either way i see that the problem is related to reading the body multiple times.
After doing "Request.Content.ReadAsFormDataAsync()" in your message handler, Can you try doing the following?
Stream requestBufferedStream = Request.Content.ReadAsStreamAsync().Result;
requestBufferedStream.Position = 0; //resetting to 0 as ReadAsFormDataAsync might have read the entire stream and position would be at the end of the stream causing no bytes to be read during parameter binding and you are seeing null values.
note: The ability of a request's content to be read single time only or multiple times depends on the host's buffer policy. By default, the host's buffer policy is set as always Buffered. In this case, you will be able to reset the position back to 0. However, if you explicitly make the policy to be Streamed, then you cannot reset back to 0.
What about using ActionFilterAtrributes?
this code worked well for me
public HttpResponseMessage AddEditCheck(Check check)
{
var request= ((System.Web.HttpContextWrapper)Request.Properties.ToList<KeyValuePair<string, object>>().First().Value).Request;
var i = request.Form["txtCheckDate"];
return Request.CreateResponse(HttpStatusCode.Ok);
}

Implementing ETag Support in ASP.NET MVC4 WebAPI

In the latest ASP.NET MVC4 beta, how would you support conditional GET support via ETags? The ActionFilter would need to be able to complete the request to generate the ETag for the returned resource in order to compare to the If-None-Match header in the request. And then, regardless of whether the incoming ETag in the If-None-Match header was the same as the generated ETag, add the generated ETag to the ETag response header. But with ASP.NET MVC4, I have no idea where to begin. Any suggestions?
Personally, I'm not a fan of "framework magic" and prefer plain old code in the web methods, else we end up with something more akin to WCF, yuk.
So, within your Get web method, manually create the response like so:
var response = this.Request.CreateResponse(HttpStatusCode.OK, obj);
string hash = obj.ModifiedDate.GetHashCode().ToString();
response.Headers.ETag =
new EntityTagHeaderValue(String.Concat("\"", hash, "\""), true);
return response;
Please note that the ETag produced from the hash code of the timestamp is purely illustrative of a weak entity tagging system. It also shows the additional quotes required.
There is a ETagMessageHandler in the WebApiContrib which does what you need.
UPDATE
I have implemented RFC 2616's server side caching in WebApiContrib. Look for CachingHandler.
More info here.
More Update
This will be actively developed and expanded upon under CacheCow. This will include both client and server components. NuGet packages to be published soon are published now.
WebApiContrib's CachingHandler will still be maintained so any bugs or problems please let me know.
Luke Puplett's answer got me on the right track (+1), but note that you also have to read the ETag on the server side to avoid sending all the data with each request:
string hash = obj.ModifiedDate.GetHashCode().ToString();
var etag = new EntityTagHeaderValue(String.Concat("\"", hash, "\""), true);
if (Request.Headers.IfNoneMatch.Any(h => h.Equals(etag)))
{
return new HttpResponseMessage(HttpStatusCode.NotModified);
}
var response = this.Request.CreateResponse(HttpStatusCode.OK, obj);
response.Headers.ETag = etag;
return response;
It would also be a good idea to respect the If-Modified-Since header. See RFC 2616.
It seems this is what you are looking for (see section "Support for ETags"):
http://blogs.msdn.com/b/webdev/archive/2014/03/13/getting-started-with-asp-net-web-api-2-2-for-odata-v4-0.aspx
In case your model is stored deeper in domain and you are not able to apply the [ConcurrencyCheck] attribute, you can do that using the ODataModelBuilder:
ODataModelBuilder builder = new ODataConventionModelBuilder();
var myEntity = builder.EntitySet<MyEntity>("MyEntities");
myEntity.EntityType.Property(l => l.Version).ConcurrencyToken = true;
this will make it to add the "#odata.etag" property to a response body.

can i use "http header" to check if a dynamic page has been changed

you can request the http header to check if a web page has been edited by looking at its date but how about dynamic pages such as - php, aspx- which grabs its data from a database?
Even though you might think it's outdated I've always found Simon Willison's article on Conditional GET to be more than useful. The example is in PHP but it is so simple that you can adapt it to other languages. Here it is the example:
function doConditionalGet($timestamp) {
// A PHP implementation of conditional get, see
// http://fishbowl.pastiche.org/archives/001132.html
$last_modified = substr(date('r', $timestamp), 0, -5).'GMT';
$etag = '"'.md5($last_modified).'"';
// Send the headers
header("Last-Modified: $last_modified");
header("ETag: $etag");
// See if the client has provided the required headers
$if_modified_since = isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) ?
stripslashes($_SERVER['HTTP_IF_MODIFIED_SINCE']) :
false;
$if_none_match = isset($_SERVER['HTTP_IF_NONE_MATCH']) ?
stripslashes($_SERVER['HTTP_IF_NONE_MATCH']) :
false;
if (!$if_modified_since && !$if_none_match) {
return;
}
// At least one of the headers is there - check them
if ($if_none_match && $if_none_match != $etag) {
return; // etag is there but doesn't match
}
if ($if_modified_since && $if_modified_since != $last_modified) {
return; // if-modified-since is there but doesn't match
}
// Nothing has changed since their last request - serve a 304 and exit
header('HTTP/1.0 304 Not Modified');
exit;
}
With this you can use HTTP verbs GET or HEAD (I think it's also possible with the others, but I can't see the reason to use them). All you need to do is adding either If-Modified-Since or If-None-Match with the respective values of headers Last-Modified or ETag sent by a previous version of the page. As of HTTP version 1.1 it's recommended ETag over Last-Modified, but both will do the work.
This is a very simple example of how a conditional GET works. First we need to retrieve the page the usual way:
GET /some-page.html HTTP/1.1
Host: example.org
First response with conditional headers and contents:
200 OK
ETag: YourETagHere
Now the conditional get request:
GET /some-page.html HTTP/1.1
Host: example.org
If-None-Match: YourETagHere
And the response indicating you can use the cached version of the page, as only the headers are going to be delivered:
304 Not Modified
ETag: YourETagHere
With this the server notified you there was no modification to the page.
I can also recommend you another article about conditional GET: HTTP conditional GET for RSS hackers.
This is the exact purpose of the ETag header, but it has to be supported by your web framework or you need to take care that your application responds properly to requests with headers If-Match, If-Not-Match and If-Range (see HTTP Ch 3.11).
You can if it uses the http response headers correctly but it's often overlooked.
Otherwise storing a local md5-hash of the content might be useful to you (unless there's an easier in-content string you could hook out). It's not ideal (because it's quite a slow process) but it's an option.
Yes, you can and should use HTTP headers to mark pages as unexpired. If they are dynamic though (PHP, ASPX, etc.) and/or database driven, you'll need to manually control setting the Expires header/sending HTTP Not Modified appropriately. ASP.NET has some SqlDependency objects for this, but they still need to be configured and managed. (Not sure if PHP has something just like it, but there's probably something in PEAR if not...)
The Last-Modified header will only be of use to you if the programmer of the site has explicitly set it to be returned.
For a regular, static page Last-Modified is the timestamp of the last modification of the HTML file. For a dynamically generated page the server can't reliably assign a Last-Modified value as it has no real way of knowing how the content has changed depending on request, so many servers don't generate the header at all.
If you have control over the page, then ensuring the Last Modified header is being set will ensure a check on Last-Modified is successful. Otherwise you may have to fetch the page and either perform a regex to find a changed section (e.g. date/time in the header of a news site). If no such obvious marker exists, then I'd second Oli's suggestion of an MD5 on the page content as a way to be sure it has changed.