Rewrite Requests for Images to CDN URL with Varnish - apache

I've got Varnish (3.0.3) sitting as a load-balancer/static cache in front of two web servers. I've got a CDN set up using Original Pull method. If I grab a URL from an image on my site manually, drop in the CDN address, I can verify that original pull is working and the image is pulled to the CDN and served.
My application is fairly complex and I'm testing this CDN to see if it significantly speeds up the web app, so I don't want to rewrite any of my php code to use the CDN images just yet.
What I'd like to do is set Varnish up to rewrite requests received for image files and pull them through the CDN instead of from the two Apache servers directly in my cluster.
I've read through the Varnish documentation and a couple howto's online about doing something similar, but I just can't get it to work properly and need a little help here.
Here are a couple different ways I tried doing this (edited for brevity):
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
set req.http.host = "cdn.domain.com/";
error 750 req.http.host + req.url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
That didn't work. It resulted in broken images everywhere, and anything that did show up was using the .webp extension, so it wasn't being processed by the condition above.
So I tried this:
backend cdn {
.host = "cdn.domain.com";
.port = "80";
}
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
set req.backend = cdn;
return(lookup);
}
}
This showed some images on the page, but when viewing their source, they looked to be coming from the Apache servers (the domain name wasn't that of the CDN) and only about half the images were displaying...probably browser cache.
I'd love some input here, thanks guys.
Is there no way to use Varnish for this kind of redirect? Would I be better off setting nginx up in front of Varnish to rewrite requests to the cdn?
UPDATE:
Using both answers given below, I have the redirect working and an ACL in place to allow the CDN to pull images directly vs redirecting to itself. However, though I verified the ACL is allowing connection through by using my own external IP, the CDN isn't pulling new images from the server. It gives a 502 error (odd<) instead of pulling the image from the local server to the CDN and serving it. This is what the block of my vcl_recv looks like now:
acl cdn {
"ip.of.CDN";
}
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
if(!client.ip ~ cdn){
error 750 "http://cdn.domain.com" + req.url;
}
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}

You can definitely do this with Varnish quite easily - no need to setup nginx or anything. Actually your first solution is very close to doing the trick. It just needs a few modifications.
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
error 750 "http://cdn.domain.com" + req.url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
You forgot "http://" from your CDN URL, and you can omit the last slash from the host as all req.urls begin with /.
You also need to make sure that the vcl_error code is the first one that is run in vcl_error(). I.e. if you have multiple definitions of vcl_error, make sure that none of them get to deliver any output before the if (obj.status == 750) check is reached.
Bear in mind that this solution causes all client browsers to query your server first and then make another request to the CDN after the 302 redirect. This adds a significant delay to each image load, and is probably not the best way of determining if CDN improves your app performance.
Update: Regarding your problems with CDN showing 502 errors when trying to pull content from your origin. Relying on the remote IP address for determining the redirection is quite risky, as the CDN could very well use a number of servers to do the pull, and the addresses could change over time. That would make the VCL very laborious and error-prone to maintain.
Would it be possible setting up a unique virtual host for the CDN to use? For instance originpull.domain.com and setup the CDN so that it pulls content from that address instead of your primary www.domain.com address?
You could then modify the vcl_recv() as follows:
sub vcl_recv {
#if request is image and request is not made from CDN, redirect to CDN
if (req.http.host != "originpull.domain.com" &&
req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
error 750 "http://cdn.domain.com" + req.url;
}
}
That would ensure that the requests from CDN will never be redirected.

Assuming you have the CDN pulling it's copy of the images from the site, and your not manually pushing images to the CDN. Aren't you missing a simple exclusion of the CDN network, from either your rewrite, or backend proxy? As the CDN needs to be able to directly pull a copy of the images, from your site to populate it's caches.
Been a while since I played with Varnish, and never an expert, but something along the following lines may work:
# Defnine the IP ranges of the CDN server.
acl cdn {
"localhost";
"11.22.33.0"/24;
}
...
#if request is image, redirect to CDN, unless from the CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
if (!client.ip ~ cdn) {
error 750 "http://cdn.domain.com" + req.url;
}
}
...

Related

S3 hosted website with Cloudflare returns 404 status code for any route

I have an S3 hosted website working well behind Cloudflare with the following:
example.com/ works fine
example.com/test also works but the document itself in the network tab is returning 404, naturally, because /test doesn't exist on S3.
This is a problem for SEO, how do I configure Cloudflare to treat 404s as 200s?
In Cloudfront I usually do this:
But I can find no corresponding configuration in Cloudflare. Will this have to be done in a Cloudflare worker? What did people do before Workers existed?
Turns out people just didn't host on S3 with Cloudflare before workers, and if they did, they didn't care/notice that their routes would return 404.
Anyway, this is the solution with Cloudflare workers to force the return code of 200:
addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})
async function fetchAndApply(request) {
let originalResponse = await fetch(request)
const contentType = originalResponse.headers.get("Content-Type")
// Only bother with index pages (not assets)
if (contentType && contentType.includes("text/html")) {
// Force 404's from S3 to return as 200 to prevent Google indexing issues
let response = new Response(originalResponse.body, {
...originalResponse,
status: 200,
statusText: 'OK'
}
)
// Don't cache index.html
response.headers.set('Cache-Control', 'max-age=0')
return response
}
return originalResponse
}
I beleive you can use this approach from the AWS docs.
https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
Example #3 at the bottom of the document page.
This is S3 bucket for the demo.
EDIT: removed the URL, it served the purpose that was usable only to
author of the question.
Here is short example. Which will redirect to "home" if not found.
<RoutingRules>
<RoutingRule>
<Condition>
<HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals >
</Condition>
<Redirect>
<HostName>BUCKETNAME.s3-website-eu-west-1.amazonaws.com</HostName>
<ReplaceKeyWith></ReplaceKeyWith>
</Redirect>
</RoutingRule>

Varnish not to cache urls with specific word

I am using varnish 4.0.3 as revers proxy caching and load balancer.
I want to avoid varnish caching for links that start with /api/v1/ or any link that contains feed in its link and to serve the request from the backend servers directly.
I have done this:
sub vcl_recv {
if ((req.url ~ "^/api/v1/" || req.url ~ "feed") &&
req.http.host ~ "api.example.com") {
set req.backend_hint = apis.backend();
}
But based on access log, it serves the first request from Backend and then it serves the new requests from varnish directly! have i done anything wrong? or is there anything else i need to do?
It should be:
sub vcl_recv {
if ((req.url ~ "^/api/v1/" || req.url ~ "feed")
&& req.http.host == "api.example.com") {
return (pass);
}
}
The return (pass) will switch Varnish to pass mode for the matching requests. In pass mode, Varnish will neither put result to cache, nor deliver from cache (always talks to backend).
A micro-optimisation of a kind is matching req.http.host using == operator. Regex matching not really needed in this case.

Why is Varnish redirecting as 301?

I have been deploying a mediawiki docker container (appscontainer/mediawiki) based on Apache2 on a VPS, and I put a fresh install of Varnish on top of it, to be able to proxied different subdomains to the proper applications on the same server.
My current default.vcl configuration file look like the following:
backend default {
.host = "127.0.0.1";
.port = "8080";
}
backend wikimedia {
.host = "localhost";
.port = "8080";
}
sub vcl_recv {
if(req.http.host == "wiki.virtual-assembly.org") {
set req.backend_hint = wikimedia;
}
set req.backend_hint = default;
}
My issue is that when I request the URL http://wiki.virtual-assembly.org, I got redirected via a 301 to the IP adress of the server on port 8080 (port on which the apache2 instance is listening).
Is there a way to tell Varnish to keep the location to be http://wiki.virtual-assembly.org, or is it an apache2 misconfiguration ?
Thanks in advance,
PS: I know my two backends are equivalent, I will change the default in the future when I will have deployed more apps.
Shot in the dark answer. Do you still get a 301 if you put the default backend_req into an else statement instead of outside the if?

Configuring Varnish on cPanel with multiple IP addresses

So I am trying to configure Varnish on my cPanel server which has a primary shared IP along with a few other secondary IP addresses for dedicated domains that are hosted with me.
I have followed the following guide on how to get varnish to run, and it works perfectly for the shared IP domains, but the secondary IP domains won't load at all, going to the default Apache page.
http://crybit.com/how-to-enable-varnish-in-cpanel-server/
I was looking online for other resources and found to configure multiple hosts in the default.vcl file for varnish, so I had done exactly that but the service fails to load as soon as I try launch it, even with just two hosts in the file.
Am I doing something wrong?
backend default {
.host = "11.11.11.11";
.port = "8080";
}
backend secondary1 {
.host = "22.22.22.22";
.port = "8080";
}
I have also tried configuring the following below but also to no success, service won't load!
sub vcl_recv{
if(req.http.host == "www.secondary1.com") || (req.http.host == "secondary1.com) {
set req.backend = secondary1;
} else {
set req.backend = default;
}
}
Hoping that someone can give me a hand!
Can you please check your /etc/sysconfig/varnish file and change your -a flag with your IP's.
-a 192.168.0.1:80,192.168.0.2:80 \

Using Varnish with multiple apache named vhosts

I'm implementing Varnish (4.0) for a server with lots (1000+) of named virtual hosts (on Apache), from which most of them point to the same IP- and web. I get Varnish to work fine with:
backend default {
.host = "127.0.0.1";
.port = "80";
}
sub vcl_recv {
if (req.http.host ~ "^www.domain1.de(:[0-9]+)?$") {
set req.http.host = "www.domain1.de";
} else if (req.http.host ~ "^www.domain2.de(:[0-9]+)?$") {
set req.http.host = "www.domain2.de";
}
....
....
set req.backend_hint = default;
}
but, to do this for 1000+ domains seems a bit odd. I don't need any special configuration for the sites, they have all the same backend.
If I don't add any specific configuration, I only get to the standard website (no matter what domain I enter).
Any hint on how to solve that?
Thanks!
If you wish to remove the port name for example, or need to do some changes in general to the req.http.host you can use the regsub() method in your varnish VCL:
set req.http.host = regsub(req.http.host , "(.*)(:[0-9]+|)" , "\1" );
This example removes the port number if present.
Please set up the regexp according to your needs as your question does not really state what you are trying to achieve.
Note that you can invoke the replacement strings via \N and not as $1 as some man pages suggest. (A bug has already been filed to address this issue.)
And for last a nice Varnish regexp cheat-sheet:
http://kly.no/varnish/regex.txt