Varnish not to cache urls with specific word - load-balancing

I am using varnish 4.0.3 as revers proxy caching and load balancer.
I want to avoid varnish caching for links that start with /api/v1/ or any link that contains feed in its link and to serve the request from the backend servers directly.
I have done this:
sub vcl_recv {
if ((req.url ~ "^/api/v1/" || req.url ~ "feed") &&
req.http.host ~ "api.example.com") {
set req.backend_hint = apis.backend();
}
But based on access log, it serves the first request from Backend and then it serves the new requests from varnish directly! have i done anything wrong? or is there anything else i need to do?

It should be:
sub vcl_recv {
if ((req.url ~ "^/api/v1/" || req.url ~ "feed")
&& req.http.host == "api.example.com") {
return (pass);
}
}
The return (pass) will switch Varnish to pass mode for the matching requests. In pass mode, Varnish will neither put result to cache, nor deliver from cache (always talks to backend).
A micro-optimisation of a kind is matching req.http.host using == operator. Regex matching not really needed in this case.

Related

Configuring Varnish on cPanel with multiple IP addresses

So I am trying to configure Varnish on my cPanel server which has a primary shared IP along with a few other secondary IP addresses for dedicated domains that are hosted with me.
I have followed the following guide on how to get varnish to run, and it works perfectly for the shared IP domains, but the secondary IP domains won't load at all, going to the default Apache page.
http://crybit.com/how-to-enable-varnish-in-cpanel-server/
I was looking online for other resources and found to configure multiple hosts in the default.vcl file for varnish, so I had done exactly that but the service fails to load as soon as I try launch it, even with just two hosts in the file.
Am I doing something wrong?
backend default {
.host = "11.11.11.11";
.port = "8080";
}
backend secondary1 {
.host = "22.22.22.22";
.port = "8080";
}
I have also tried configuring the following below but also to no success, service won't load!
sub vcl_recv{
if(req.http.host == "www.secondary1.com") || (req.http.host == "secondary1.com) {
set req.backend = secondary1;
} else {
set req.backend = default;
}
}
Hoping that someone can give me a hand!
Can you please check your /etc/sysconfig/varnish file and change your -a flag with your IP's.
-a 192.168.0.1:80,192.168.0.2:80 \

Varnish and digest authentication resulting in uri mismatch

I have a live website and staging version set up on the same virtual server. The live site uses Varnish and no authentication, the staging site bypasses Varnish but uses digest authentication. In my VCL file I have this:
sub vcl_recv {
if (req.http.Authorization || req.http.Authenticate) {
return(pass);
}
if (req.http.host != "live.site.com") {
return(pass);
}
I'm seeing a problem on the staging site, whereby resources with any querystring are not being served - in Firebug I see '400 Bad request' and in the Apache logs this:
[Fri Sep 19 11:13:03 2014] [error] [client 127.0.0.1] Digest: uri mismatch -
</wp-content/plugins/jetpack/modules/wpgroho.js?ver=3.9.2> does not match
request-uri </wp-content/plugins/jetpack/modules/wpgroho.js>, referer:
http://stage.site.com/
What have I done wrong, does anyone know how to fix this?
Thanks,
Toby
Ok, found it, here's what I found (in case it helps anyone else):
I do, of course, have a section in my Varnish VCL that removes querystrings from static files, to aid caching:
if (req.request ~ "^(GET|HEAD)$" && req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)(\?.*)?$") {
if (req.url ~ "nocache") {
return(pass);
}
set req.url = regsub(req.url, "\?.*$", "");
unset req.http.Cookie;
set req.grace = 2m;
return(lookup);
}
This clearly conflicts with digest authentication, so I will have to revisit that part of the VCL.
UPDATE I just changed the second conditional to:
if (req.http.Authorization || req.http.Authenticate ||
req.url ~ "nocache") {
return(pass);
}

Using Varnish with multiple apache named vhosts

I'm implementing Varnish (4.0) for a server with lots (1000+) of named virtual hosts (on Apache), from which most of them point to the same IP- and web. I get Varnish to work fine with:
backend default {
.host = "127.0.0.1";
.port = "80";
}
sub vcl_recv {
if (req.http.host ~ "^www.domain1.de(:[0-9]+)?$") {
set req.http.host = "www.domain1.de";
} else if (req.http.host ~ "^www.domain2.de(:[0-9]+)?$") {
set req.http.host = "www.domain2.de";
}
....
....
set req.backend_hint = default;
}
but, to do this for 1000+ domains seems a bit odd. I don't need any special configuration for the sites, they have all the same backend.
If I don't add any specific configuration, I only get to the standard website (no matter what domain I enter).
Any hint on how to solve that?
Thanks!
If you wish to remove the port name for example, or need to do some changes in general to the req.http.host you can use the regsub() method in your varnish VCL:
set req.http.host = regsub(req.http.host , "(.*)(:[0-9]+|)" , "\1" );
This example removes the port number if present.
Please set up the regexp according to your needs as your question does not really state what you are trying to achieve.
Note that you can invoke the replacement strings via \N and not as $1 as some man pages suggest. (A bug has already been filed to address this issue.)
And for last a nice Varnish regexp cheat-sheet:
http://kly.no/varnish/regex.txt

Using Varnish Cache while preserving Google Analytics cookie on a NGINX SSL Terminator

I am using the Unixy Varnish plugin for cPanel and one particular website and all its subdomains use Full SSL + HTTP Strict Transport Security.
Nginx listens on a non-standard ssl port, passes the request to Varnish which by default strips all cookies. The request is then finally served up by Apache.
The website is mostly static html, with a WordPress subdomain, IPB installation, Piwik installation additionally.
The main domain is only static pages so I would like to force Varnish to cache it anyway since there isn't anything that involves logging in, then strip cookies excluding those belonging to Google Analytics.
Currently for Google Analytics I am using the script from http://www.ga-script.org, which uses the classical tracking code js. I intend to add the Universal Analytics code in addition, removing my UA-XXXXXXX id (Only from the classical js).
Then I will parse the Analytics cookie (as described here: http://www.dannytalk.com/read-google-analytics-cookie-script/), with the fix for Universal Analytics, in the latest comment on that post - so I can pass the resulting values to Piwik and/or a CRM system.
I'm not 100% clear on what I need to do to configure Varnish correctly for this kind of scenario and would appreciate others help with this.
Current Varnish config supplied by Unixy:
###################################################
# Copyright (c) UNIXY - http://www.unixy.net #
# The leading truly fully managed server provider #
###################################################
include "/etc/varnish/cpanel.backend.vcl";
include "/etc/varnish/backends.vcl";
# mod_security rules
include "/etc/varnish/security.vcl";
sub vcl_recv {
# Use the default backend for all other requests
set req.backend = default;
# Setup the different backends logic
include "/etc/varnish/acllogic.vcl";
# Allow a grace period for offering "stale" data in case backend lags
set req.grace = 5m;
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
# cPanel URLs
include "/etc/varnish/cpanel.url.vcl";
# Properly handle different encoding types
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|ico)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}
# Set up disabled
include "/etc/varnish/disabled.vcl";
# Exclude upgrade, install, server-status, etc
include "/etc/varnish/known.exclude.vcl";
# Set up exceptions
include "/etc/varnish/url.exclude.vcl";
# Set up exceptions
include "/etc/varnish/debugurl.exclude.vcl";
# Set up exceptions
include "/etc/varnish/vhost.exclude.vcl";
# Set up vhost+url exceptions
include "/etc/varnish/vhosturl.exclude.vcl";
# Set up cPanel reseller exceptions
include "/etc/varnish/reseller.exclude.vcl";
# Restart rule for bfile recv
include "/etc/varnish/bigfile.recv.vcl";
if (req.request == "PURGE") {
if (!client.ip ~ acl127_0_0_1) {error 405 "Not permitted";}
return (lookup);
}
## Default request checks
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD") {
return (pass);
}
## Modified from default to allow caching if cookies are set, but not http auth
if (req.http.Authorization) {
return (pass);
}
include "/etc/varnish/versioning.static.vcl";
## Remove has_js and Google Analytics cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}
include "/etc/varnish/slashdot.recv.vcl";
# Cache things with these extensions
if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|pdf)$" && ! (req.url ~ "\.(php)") ) {
unset req.http.Cookie;
return (lookup);
}
return (lookup);
}
sub vcl_fetch {
set beresp.ttl = 40s;
set beresp.http.Server = " - Web acceleration by http://www.unixy.net/varnish ";
# Turn off Varnish gzip processing
include "/etc/varnish/gzip.off.vcl";
# Grace to allow varnish to serve content if backend is lagged
set beresp.grace = 5m;
# Restart rule bfile for fetch
include "/etc/varnish/bigfile.fetch.vcl";
# These status codes should always pass through and never cache.
if (beresp.status == 503 || beresp.status == 500) {
set beresp.http.X-Cacheable = "NO: beresp.status";
set beresp.http.X-Cacheable-status = beresp.status;
return (hit_for_pass);
}
if (beresp.status == 404) {
set beresp.http.magicmarker = "1";
set beresp.http.X-Cacheable = "YES";
set beresp.ttl = 20s;
return (deliver);
}
/* Remove Expires from backend, it's not long enough */
unset beresp.http.expires;
if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|pdf|ico)$" && ! (req.url ~ "\.(php)") ) {
unset beresp.http.set-cookie;
include "/etc/varnish/static.ttl.vcl";
}
include "/etc/varnish/slashdot.fetch.vcl";
else {
include "/etc/varnish/dynamic.ttl.vcl";
}
/* marker for vcl_deliver to reset Age: */
set beresp.http.magicmarker = "1";
# All tests passed, therefore item is cacheable
set beresp.http.X-Cacheable = "YES";
return (deliver);
}
sub vcl_deliver {
# From http://varnish-cache.org/wiki/VCLExampleLongerCaching
if (resp.http.magicmarker) {
/* Remove the magic marker */
unset resp.http.magicmarker;
/* By definition we have a fresh object */
set resp.http.age = "0";
}
set resp.http.Location = regsub(resp.http.Location, ":[0-9]+", "");
#add cache hit data
if (obj.hits > 0) {
#if hit add hit count
set resp.http.X-Cache = "HIT";
set resp.http.X-Cache-Hits = obj.hits;
}
else {
set resp.http.X-Cache = "MISS";
}
}
sub vcl_error {
if (obj.status == 503 && req.restarts < 5) {
set obj.http.X-Restarts = req.restarts;
return (restart);
}
}
# Added to let users force refresh
sub vcl_hit {
if (obj.ttl < 1s) {
return (pass);
}
if (req.http.Cache-Control ~ "no-cache") {
# Ignore requests via proxy caches, IE users and badly behaved crawlers
# like msnbot that send no-cache with every request.
if (! (req.http.Via || req.http.User-Agent ~ "bot|MSIE|HostTracker")) {
set obj.ttl = 0s;
return (restart);
}
}
return (deliver);
}
sub vcl_hash {
hash_data(req.http.cookie);
}
You can simply remove the GA cookie from the request, their are not used by your backend.
You can for example remove all cookie except for admin
if ( !( req.url ~ ^/admin/) ) {
unset req.http.Cookie;
}
Or discard all cookies that start with a underscore:
// Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", "");
// Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
https://www.varnish-cache.org/docs/4.0/users-guide/increasing-your-hitrate.html

Rewrite Requests for Images to CDN URL with Varnish

I've got Varnish (3.0.3) sitting as a load-balancer/static cache in front of two web servers. I've got a CDN set up using Original Pull method. If I grab a URL from an image on my site manually, drop in the CDN address, I can verify that original pull is working and the image is pulled to the CDN and served.
My application is fairly complex and I'm testing this CDN to see if it significantly speeds up the web app, so I don't want to rewrite any of my php code to use the CDN images just yet.
What I'd like to do is set Varnish up to rewrite requests received for image files and pull them through the CDN instead of from the two Apache servers directly in my cluster.
I've read through the Varnish documentation and a couple howto's online about doing something similar, but I just can't get it to work properly and need a little help here.
Here are a couple different ways I tried doing this (edited for brevity):
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
set req.http.host = "cdn.domain.com/";
error 750 req.http.host + req.url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
That didn't work. It resulted in broken images everywhere, and anything that did show up was using the .webp extension, so it wasn't being processed by the condition above.
So I tried this:
backend cdn {
.host = "cdn.domain.com";
.port = "80";
}
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
set req.backend = cdn;
return(lookup);
}
}
This showed some images on the page, but when viewing their source, they looked to be coming from the Apache servers (the domain name wasn't that of the CDN) and only about half the images were displaying...probably browser cache.
I'd love some input here, thanks guys.
Is there no way to use Varnish for this kind of redirect? Would I be better off setting nginx up in front of Varnish to rewrite requests to the cdn?
UPDATE:
Using both answers given below, I have the redirect working and an ACL in place to allow the CDN to pull images directly vs redirecting to itself. However, though I verified the ACL is allowing connection through by using my own external IP, the CDN isn't pulling new images from the server. It gives a 502 error (odd<) instead of pulling the image from the local server to the CDN and serving it. This is what the block of my vcl_recv looks like now:
acl cdn {
"ip.of.CDN";
}
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
if(!client.ip ~ cdn){
error 750 "http://cdn.domain.com" + req.url;
}
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
You can definitely do this with Varnish quite easily - no need to setup nginx or anything. Actually your first solution is very close to doing the trick. It just needs a few modifications.
sub vcl_recv {
#if request is image, redirect to CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
error 750 "http://cdn.domain.com" + req.url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.status = 302;
set obj.http.Location = obj.response;
return(deliver);
}
}
You forgot "http://" from your CDN URL, and you can omit the last slash from the host as all req.urls begin with /.
You also need to make sure that the vcl_error code is the first one that is run in vcl_error(). I.e. if you have multiple definitions of vcl_error, make sure that none of them get to deliver any output before the if (obj.status == 750) check is reached.
Bear in mind that this solution causes all client browsers to query your server first and then make another request to the CDN after the 302 redirect. This adds a significant delay to each image load, and is probably not the best way of determining if CDN improves your app performance.
Update: Regarding your problems with CDN showing 502 errors when trying to pull content from your origin. Relying on the remote IP address for determining the redirection is quite risky, as the CDN could very well use a number of servers to do the pull, and the addresses could change over time. That would make the VCL very laborious and error-prone to maintain.
Would it be possible setting up a unique virtual host for the CDN to use? For instance originpull.domain.com and setup the CDN so that it pulls content from that address instead of your primary www.domain.com address?
You could then modify the vcl_recv() as follows:
sub vcl_recv {
#if request is image and request is not made from CDN, redirect to CDN
if (req.http.host != "originpull.domain.com" &&
req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
error 750 "http://cdn.domain.com" + req.url;
}
}
That would ensure that the requests from CDN will never be redirected.
Assuming you have the CDN pulling it's copy of the images from the site, and your not manually pushing images to the CDN. Aren't you missing a simple exclusion of the CDN network, from either your rewrite, or backend proxy? As the CDN needs to be able to directly pull a copy of the images, from your site to populate it's caches.
Been a while since I played with Varnish, and never an expert, but something along the following lines may work:
# Defnine the IP ranges of the CDN server.
acl cdn {
"localhost";
"11.22.33.0"/24;
}
...
#if request is image, redirect to CDN, unless from the CDN
if (req.url ~ "\.(gif|ico|jpg|jpeg|png)$") {
if (!client.ip ~ cdn) {
error 750 "http://cdn.domain.com" + req.url;
}
}
...