CQ5 dispatcher- exclude specific url's from caching - apache

I need to exclude certain pages from caching in the dispatcher. I found here that a way can be to add this header in the page's . But this does not work in my page somehow.
<%
response.setHeader("Dispatcher", "no-cache");
%>
Another solution is to append the page URL with a query param like ?v=1 but this is not suitable for a production website page
Is there a way to tell the dispatcher NOT to cache certain url's ? Probable something similar to allow/deny certain file types in the dispatcher.any ?

If you are able to set some arbitrary regexp on which base you can tell whether to cache or not to cache given resource you can use the /rules section in the dispatcher /cache configuration.
Take a look at Configuring dispatcher.
/rules
{
/0000  { /glob "*" /type "allow" }
/0001  { /glob "/en/news/*" /type "deny" }
/0002  { /glob "*/private/*" /type "deny"  }  
}

Related

Which rule has priority when multiple routing rules apply for AWS S3 Static Site

I'm looking to redirect rules for an AWS S3 site. I don't see anywhere in the documentation that states which rules take priority when multiple rules might match.
Let's say I have a configuration like:
[
{
"Condition": {
"KeyPrefixEquals": "docs/foo"
},
"Redirect": {
"ReplaceKeyPrefixWith": "foo/"
}
},
{
"Condition": {
"KeyPrefixEquals": "docs/foo/bar"
},
"Redirect": {
"ReplaceKeyPrefixWith": "bar/"
}
}
]
Would a request for docs/foo/bar/baz, which object would the static site rout to and why?
Good question. It is indeed not mentioned in the doc, but after some experiments, it appears that the rule with a smaller index in the array will take the priority. I guess the S3 checks the configuration array by iterating through it in order when a request comes.
In the case of a request with the path docs/foo/bar/baz, the first routing rule matches and you are redirected to foo//bar/baz.
Hope this helps.

Cloudfront serving png/jpg vs webp based on request headers

I have Cloudfront in front of S3 serving images (png and jpg).
I have all png and jpg images in webp format in the same directory with .webp extension. For example:
png: /path/to/file.png
webp: /path/to/file.png.webp
I'd like to serve the webp file dynamically without changing the markup.
Since browsers flag webp support via Accept header, what i need to do is: if the user has support for webp (via Accept header) Cloudfront would pull the webp version (filename.png.webp), if not it should serve the original file (filename.png)
Is this possible to achieve?
Making Cloudfront serve different resources is easy (when you have done it a couple of times), but my concern is whether the entity making the request (i.e. browser) and possible caching elements between (proxies etc) expects to have different media types on the same request URI. But that is a bit beyond your question. I believe the usual way to handle this problem is with a element where the browser is free to choose an image from different media types like this:
<picture>
<source type="image/svg+xml" srcset="pyramid.svg" />
<source type="image/webp" srcset="pyramid.webp" />
<img
src="pyramid.png"
alt="regular pyramid built from four equilateral triangles" />
</picture>
But if you still want to serve different content from Cloufront for the same URL this is how you do it:
Cloudfront has 4 different points where you can inject a lamdba function for request manipulation (Lambda#Edge).
For your use case we need to create a Lambda#Edge function at the Origin Request location then associate this function with your Cloudfront Distribution.
Below is an example from AWS docs that looks on device type and does URL manipulation. For your use case, something similar can be done by looking at the "Accept" header.
'use strict';
/* This is an origin request function */
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
/*
* Serve different versions of an object based on the device type.
* NOTE: 1. You must configure your distribution to cache based on the
* CloudFront-Is-*-Viewer headers. For more information, see
* the following documentation:
* https://docs.aws.amazon.com/console/cloudfront/cache-on-selected-headers
* https://docs.aws.amazon.com/console/cloudfront/cache-on-device-type
* 2. CloudFront adds the CloudFront-Is-*-Viewer headers after the viewer
* request event. To use this example, you must create a trigger for the
* origin request event.
*/
const desktopPath = '/desktop';
const mobilePath = '/mobile';
const tabletPath = '/tablet';
const smarttvPath = '/smarttv';
if (headers['cloudfront-is-desktop-viewer']
&& headers['cloudfront-is-desktop-viewer'][0].value === 'true') {
request.uri = desktopPath + request.uri;
} else if (headers['cloudfront-is-mobile-viewer']
&& headers['cloudfront-is-mobile-viewer'][0].value === 'true') {
request.uri = mobilePath + request.uri;
} else if (headers['cloudfront-is-tablet-viewer']
&& headers['cloudfront-is-tablet-viewer'][0].value === 'true') {
request.uri = tabletPath + request.uri;
} else if (headers['cloudfront-is-smarttv-viewer']
&& headers['cloudfront-is-smarttv-viewer'][0].value === 'true') {
request.uri = smarttvPath + request.uri;
}
console.log(`Request uri set to "${request.uri}"`);
callback(null, request);
};
Next you need to tell Cloudfront that you want to use the Accept header as a part of your cache key (otherwise Cloudfront would only execute your Origin Request lambda once and also not expose this header to your function).
You do this nowadays with cache and origin request policies. Or with legacy settings (Edit Behaviour under your Cloudfront distribution settings) such as:
Worth to note here is that, if you get low cache hit ratio due to different variants of the Accept header you need to pre-process / clean it. The way I would do it is with a Viewer Request Lamdba that gets executed for each request. This new Lambda would then check if the Accept header supports Webp and then add a single NEW header to the request that it passes on to the Origin Request above. That way the Origin Request can cache on this new header (which only has two different possible values)
There's more config/setup needed such as IAM policies to get Lamdba to run etc, but there's lots of great material out there that walks you through the steps. Maybe start here?

Get the original filename of symlinks in nginx

From another script i got some generated symlinks.
2QGPCKVNG1R -> /anotherdir/movie1.mp4
HJS7J9ND2L5 -> /anotherdir/movie2.mp4
LKA6A9LA7SK -> /anotherdir/movie3.mp4
Displaying these files in NGINX works fine, but I'd like to rename the files at download via content disposition.
Question is how do i get the original filename in nginx variable?
I'm not sure it is possible at all. Is that another script yours or under your control? You can generate an additional nginx config file with a map block with the same script where you can describe a ruleset for mapping an URI value to the Content-Disposition header value (or you can write an additional script to do it with readlink -f <symlink> command:
map $uri $content_disposition {
~/2QGPCKVNG1R$ movie1.mp4;
~/HJS7J9ND2L5$ movie2.mp4;
~/LKA6A9LA7SK$ movie3.mp4;
}
And then include that file to the main nginx config:
include /path/to/content-disposition-map.conf;
server {
...
add_header Content-Disposition $content_disposition;
Another way I see is to use lua-nginx-module and a LUA script like
map $symlink_target $content_disposition {
~/([^/]*)$ $1;
}
server {
...
set_by_lua_block $symlink_target {
local result = io.popen("/bin/readlink -n -f " .. ngx.var.request_filename)
return result:read()
}
add_header Content-Disposition $content_disposition;

Rewrite Requests for Images to CDN URL with Varnish

I've got Varnish (3.0.3) sitting as a load-balancer/static cache in front of two web servers. I've got a CDN set up using Original Pull method. If I grab a URL from an image on my site manually, drop in the CDN address, I can verify that original pull is working and the image is pulled to the CDN and served.
My application is fairly complex and I'm testing this CDN to see if it significantly speeds up the web app, so I don't want to rewrite any of my php code to use the CDN images just yet.
What I'd like to do is set Varnish up to rewrite requests received for image files and pull them through the CDN instead of from the two Apache servers directly in my cluster.
I've read through the Varnish documentation and a couple howto's online about doing something similar, but I just can't get it to work properly and need a little help here.
Here are a couple different ways I tried doing this (edited for brevity):
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
set req.http.host = "cdn.domain.com/";
error 750 req.http.host + req.url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
That didn't work. It resulted in broken images everywhere, and anything that did show up was using the .webp extension, so it wasn't being processed by the condition above.
So I tried this:
backend cdn {
.host = "cdn.domain.com";
.port = "80";
}
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
set req.backend = cdn;
return(lookup);
}
}
This showed some images on the page, but when viewing their source, they looked to be coming from the Apache servers (the domain name wasn't that of the CDN) and only about half the images were displaying...probably browser cache.
I'd love some input here, thanks guys.
Is there no way to use Varnish for this kind of redirect? Would I be better off setting nginx up in front of Varnish to rewrite requests to the cdn?
UPDATE:
Using both answers given below, I have the redirect working and an ACL in place to allow the CDN to pull images directly vs redirecting to itself. However, though I verified the ACL is allowing connection through by using my own external IP, the CDN isn't pulling new images from the server. It gives a 502 error (odd<) instead of pulling the image from the local server to the CDN and serving it. This is what the block of my vcl_recv looks like now:
acl cdn {
"ip.of.CDN";
}
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
if(!client.ip ~ cdn){
error 750 "http://cdn.domain.com" + req.url;
}
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
You can definitely do this with Varnish quite easily - no need to setup nginx or anything. Actually your first solution is very close to doing the trick. It just needs a few modifications.
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
error 750 "http://cdn.domain.com" + req.url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
You forgot "http://" from your CDN URL, and you can omit the last slash from the host as all req.urls begin with /.
You also need to make sure that the vcl_error code is the first one that is run in vcl_error(). I.e. if you have multiple definitions of vcl_error, make sure that none of them get to deliver any output before the if (obj.status == 750) check is reached.
Bear in mind that this solution causes all client browsers to query your server first and then make another request to the CDN after the 302 redirect. This adds a significant delay to each image load, and is probably not the best way of determining if CDN improves your app performance.
Update: Regarding your problems with CDN showing 502 errors when trying to pull content from your origin. Relying on the remote IP address for determining the redirection is quite risky, as the CDN could very well use a number of servers to do the pull, and the addresses could change over time. That would make the VCL very laborious and error-prone to maintain.
Would it be possible setting up a unique virtual host for the CDN to use? For instance originpull.domain.com and setup the CDN so that it pulls content from that address instead of your primary www.domain.com address?
You could then modify the vcl_recv() as follows:
sub vcl_recv {
#if request is image and request is not made from CDN, redirect to CDN
if (req.http.host != "originpull.domain.com" &&
req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
error 750 "http://cdn.domain.com" + req.url;
}
}
That would ensure that the requests from CDN will never be redirected.
Assuming you have the CDN pulling it's copy of the images from the site, and your not manually pushing images to the CDN. Aren't you missing a simple exclusion of the CDN network, from either your rewrite, or backend proxy? As the CDN needs to be able to directly pull a copy of the images, from your site to populate it's caches.
Been a while since I played with Varnish, and never an expert, but something along the following lines may work:
# Defnine the IP ranges of the CDN server.
acl cdn {
"localhost";
"11.22.33.0"/24;
}
...
#if request is image, redirect to CDN, unless from the CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
if (!client.ip ~ cdn) {
error 750 "http://cdn.domain.com" + req.url;
}
}
...

Web-hosted file authorization

Im using a PHP based login authentication mechanism to allow/restrict access to some parts of my website (folder module1, module2, etc), but i have a problem with restricting access to files.
I used the documents folder (check below) to host some downloadable files. The links to those files appear in index.php (hosted in the root directory). However if for some reason a non-authorized user get the URL of the files hosed in documents he will be able to download it.
/
/documents/
/module1/
/module2/
PS: as this is an intranet website I restricted the access to documents by IPs, but there is still a small chances that someone use a PC with allowed IP address and he have the URL of the document.
Use some sort of a proxy PHP script that will serve the file for the user without giving the real source location.
The user would then see http://yourdomain.com/download.php?file=mydoc.docx
The real path is still /documents/userid/2342/mydoc.docx or what ever your structure looks like.
Then let your download.php file serve the file by:
<?php
// Validate the user here
// Set document root
$root = 'documents/userid/'.$userID.'/';
// Requested file
$file = $_GET['file'];
// Validate
if (file_exists($root . $file))
{
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private", false);
header("Content-Type: application/force-download");
header("Content-Disposition: attachment; filename=\"".basename($file)."\";");
header("Content-Transfer-Encoding: binary");
header("Content-Length: ".filesize($root . $file));
ob_clean();
flush();
readfile($root . $file);
}
else { echo "File not found"; }
?>
See more here