How to tell varnish to cache specific file types - vcl

I have a dedicated server that host my own websites. I have installed varnish with the default VCL file. Now I want to tell varnish to do the following:
Cache only the following static file types (.js, .css, .jpg, .png, .gif, .jpg). Those are the server file types that are served, not URLs ended with the those extensions.
Do not cache files that are bigger than over 1M byte
Caching of any file should expire in 1 day (or whatever period).
Caching may happen only when Apache send 200 HTTP code.
Otherwise leave the request intact so it would be served by Apache or whatever backend.
What should I write in the VCL file to achieve those requirements ? Or what should I do ?

You can do all of this in the vcl_fetch subroutine. This should be considered pseudocode.
if (beresp.http.content-type ~ "text/javascript|text/css|image/.*") {
if (std.integer(beresp.http.Content-Length,0) < /* max size in bytes here */ ) {
if (beresp.status == 200) { /* backend returned 200 */
set obj.ttl = 86400; /* cache for one day */
return (deliver);
}
}
}
set obj.ttl = 120;
return (hit_for_pass); /* won't be cached */

What I did :
1- Isolate all static contents to another domain (i.e. domain to serve the dynamic pages is different from the domain the serve static contents.)
2- Assign another dedicated IP Address to the domain that serve static contents
3- Tell varnish to only listen to that IP (i.e. static contents IP) on port 80
4- Using Apache conf to control caching period to each static content type (varnish will just obey that headers)
cons:
1- Varnish will not even listen or process to the requests that it should leave intact. Those requests (for dynamic pages) are going directly to Apache since Apache listen to the original IP (performance).
2- No need to change default VCL default file (only if you want to debug) and that is helpful for those who don't know the principles of VCL language.
3- You are controlling everything from Apache conf.
Pros:
1- You have to buy a new dedicated IP if you don't have a spare one.
thanks

Related

ktor redirecting to 0.0.0.0 when doing an https redirect

I've added an http redirect to my Ktor application and it's redirecting to https://0.0.0.0 instead of to the actual domain's https
#ExperimentalTime
fun Application.module() {
if (ENV.env != LOCAL) {
install(ForwardedHeaderSupport)
install(XForwardedHeaderSupport)
install(HttpsRedirect)
}
Intercepting the route and printing out the host
routing {
intercept(ApplicationCallPipeline.Features) {
val host = this.context.request.host()
i seem to be getting 0:0:0:0:0:0:0:0 for the host
Do i need to add any special headers to Google Cloud's Load Balancer for this https redirect to work correctly? Seems like it's not picking up the correct host
As your Ktor server is hidden behind a reverse proxy, it isn't tied to the "external" host of your site. Ktor has specific feature to handle working behind reverse proxy, so it should be as simple as install(XForwardedHeaderSupport) during configuration and referencing request.origin.remoteHost to get actual host.
Let's try to see what's going on.
You create a service under http://example.org. On the port 80 of the host for example.org, there is a load balancer. It handles all the incoming traffic, routing it to servers behind itself.
Your actual application is running on another virtual machine. It has its own IP address, internal to your cloud, and accessible by the load balancer.
Let's see a flow of HTTP request and response for this system.
An external user sends an HTTP request to GET / with Host: example.org on port 80 of example.org.
The load balancer gets the request, checks its rules and finds an internal server to direct the request to.
Load balancer crafts the new HTTP request, mostly copying incoming data, but updating Host header and adding several X-Forwarded-* headers to keep information about the proxied request (see here for info specific to GCP).
The request hits your server. At this point you can analyze X-Forwarded-* headers to see if you are behind a reverse proxy, and get needed details of the actual query sent by the actual user, like original host.
You craft the HTTP response, and your server sends it back to the load balancer.
Load balancer passes this respone to the external user.
Note that although there is RFC 7239 for specifying information on request forwarding, GCP load balancer seems to use de-facto standard X-Forwarded-* headers, so you need XForwardedHeaderSupport, not ForwardedHeaderSupport (note additional X).
So it seems either Google Cloud Load Balancer is sending the wrong headers or Ktor is reading the wrong headers or both.
I've tried
install(ForwardedHeaderSupport)
install(XForwardedHeaderSupport)
install(HttpsRedirect)
or
//install(ForwardedHeaderSupport)
install(XForwardedHeaderSupport)
install(HttpsRedirect)
or
install(ForwardedHeaderSupport)
//install(XForwardedHeaderSupport)
install(HttpsRedirect)
or
//install(ForwardedHeaderSupport)
//install(XForwardedHeaderSupport)
install(HttpsRedirect)
All these combinations are working on another project, but that project is using an older version of Ktor (this being the one that was released with 1.4 rc) and that project is also using an older Google Cloud load balancer setup.
So i've decided to roll my own.
This line will log all the headers coming in with your request,
log.info(context.request.headers.toMap().toString())
then just pick the relevant ones and build an https redirect:
routing {
intercept(ApplicationCallPipeline.Features) {
if (ENV.env != LOCAL) {
log.info(context.request.headers.toMap().toString())
// workaround for call.request.host that contains the wrong host
// and not redirecting properly to the correct https url
val proto = call.request.header("X-Forwarded-Proto")
val host = call.request.header("Host")
val path = call.request.path()
if (host == null || proto == null) {
log.error("Unknown host / port")
} else if (proto == "http") {
val newUrl = "https://$host$path"
log.info("https redirecting to $newUrl")
// redirect browser
this.context.respondRedirect(url = newUrl, permanent = true)
this.finish()
}
}
}

Apache - limit scope of RequestReadTimeout

We are running apache and using nagios to query http for alerting / monitoring purposes. We have a few webservers that required more sensitive settings for mod_reqtimeout.c and on those servers we periodically / sporadically get alerts about "UNKNOWN 500 read timeout". Nothing is actually wrong with the webserver / apache when this is happening and we think we have narrowed down the problem to our relatively strict settings for:
RequestReadTimeout header=
We have quite a few vhosts configured on some of these servers and are trying to find a way to modify our global header read timeout setting to ignore certain IP addresses, for example the IP address of our nagios server.
Otherwise a way to have it only apply to certain domains, without having to specifically add the setting into every vhost entry where it needs to exist.
Is there a resource available that talks about how to limit a global parameter to ignore certain IPs or page requests?
Although you can define the timeout at both the server config and virtual host level, in my testing with Apache 2.4.41 I wasn't able to apply a configuration at the server config level and then override it at the virtual host. It just continued to apply the server config values. So I ended up increasing values in the server config.
If you are on Ubuntu then you probably have defaults defined in /etc/apache2/mods-available/reqtimeout.conf for the whole server which then means you aren't able to set values for a virtual host without first changing the configuration here.
There's a short thread about this on the apache users list.
According to the [official documentation][1]
[1]: https://httpd.apache.org/docs/2.4/mod/mod_reqtimeout.html, you should be able to override the global config in vhost config.
Context: server config, virtual host

nginx: Is it safe to use "open_file_cache" in this scenario?

I'm currently in the progress of switching from Apache to nginx.
nginx – without any optimizations done – is much faster than Apache.
But I want to get the maximum performance.
I read about "open_file_cache" and I'm considering to use it in my configuration for a site – let's name it MY-SITE.
MY-SITE mainly serves static files, but also some PHP stuff.
MY-SITE has an api, which serves content via GET and POST requests.
(static content for GET requests, dynamic content for POST requests)
One of the statically served files returns a JSON formatted list of something.
This file gets around 15 reqs/s.
Current nginx config for MY-SITE:
..
location = /api/v1/something {
rewrite /api/v1/something /la/la/la/something.json;
}
..
I've read that when using "open_file_cache", one should NOT modify the file content / replace the file.
Why?
The API file I talked about (/la/la/la/something.json) may change regularly.
It might get completely replaced (deleted, then re-created -> inode will change) or only updated (inode will not change)
So is it safe to add the following configuration to my nginx config?
open_file_cache max=2000 inactive=20s;
open_file_cache_valid 10s;
open_file_cache_min_uses 5;
open_file_cache_errors off;
Does it possibly break anything?
Why is "open_file_cache" not enabled by default, if it greatly increases speed?

Use one instance of Yourls URL shortner with multiple domains

I've been looking for a way to use Yourls with multiple domains, the main issue is that when configuring Yourls you need to supply the domain name in the config.php file (YOURLS_SITE constant).
Configuring just one domain with actual multiple domains pointing to Yourls causes unexpected behavior.
I've looked around and couldn't find a quick hack for this
I would use this line in config.php...
define('YOURLS_SITE', 'http://' . $_SERVER['HTTP_HOST'] . '');
(note to add any /subdirectory or whatever if that applies)
then as long as your apache hosts config is correct, any domain or subdomain pointing at this directory will work. keep in mind though, any redirect will work with any domain, so domain.one/redirect == domain.two/redirect
I found this quick-and-dirty solution and thought it might be useful for someone.
in the config.php file I changed the constant definition to be based on the value of $_SERVER['HTTP_HOST'], this works for me because I have a proxy before the server that sets this header, you can also define virtual hosts on your Apache server and it should work the same (perhaps you will need to use $_SERVER['SERVER_NAME'] instead).
so in config.php I changed:
define( 'YOURLS_SITE', 'http://domain1.com');
to
if (strpos($_SERVER['HTTP_HOST'],'domain2.com') !== false) {
define( 'YOURLS_SITE', 'http://domain2.com/YourlsDir');
/* domain2 doesn't use HTTPS */
$_SERVER['HTTPS']='off';
} else {
define( 'YOURLS_SITE', 'https://domain1/YourlsDir');
/* domain1 always uses HTTPS */
$_SERVER['HTTPS']='on';
}
Note1: if Yourls is located in the html root you can remove /YourlsDir from the URL
Note2: The url in YOURLS_SITE must not end with /
Hopefully this will help anyone else

How to use vhosts alongside node-http-proxy?

I'm running Nodejs and Apache alongside each other.
node-http-proxy is listening on port 80 and then forwarding requests to either Apache(:9000) or to Express(:8000).
My virtual hosts on Apache look like:
<VirtualHost 127.0.0.1>
DocumentRoot "/localhost/myVhost"
ServerName myVhost
</VirtualHost>
My question is, what is the "correct" way to have vhost like functionality on the Express/Nodejs side? I would prefer to not have to place each Nodejs app on its own port as is suggested here:
https://github.com/nodejitsu/node-http-proxy
(Section titled "Proxy requests using a 'Hostname Only' ProxyTable")
I noticed Connect (which as I understand it, gets bundled in Express) has some vhosts functionality. Should I be using that? If so, what would be the correct way to run it alongside node-http-proxy?
http://www.senchalabs.org/connect/middleware-vhost.html
I also noticed this other module called "Cluster", it seems to be related but I'm not sure how:
http://learnboost.github.com/cluster/
While not wanting to overwhelm, I also came across one called, "Haibu" it seems to be related but I'm not sure if it would just be an all out replacement for using vhosts:
https://github.com/nodejitsu/haibu
Note: I'm a front-end guy, so I'm not very familiar with a lot of server terminology
I never figured out Haibu or Cluster. But I did find a good solution that addressed my issue. To my surprise, it was actually quite simple. However, I don't know much about servers, so while this works, it may not be optimal.
I set up virtual hosts like normal on Apache
(http://httpd.apache.org/docs/2.0/vhosts/examples.html)
I installed the following on Node
Express (http://expressjs.com/)
node-http-proxy (https://github.com/nodejitsu/node-http-proxy)
Then, as a matter of personal style, I placed all my virtual hosts in a common directory (/localhost)
I then switched Apache to listen on a port other than port 80. I just happened to choose port 9000 because I had seen that used somewhere. (In httpd.conf, changed "Listen 80" to "Listen 9000"). I also had to make sure that all my virtual hosts, as defined in extra/httpd-vhosts.conf were set to an IP based nameVirtualHost (127.0.0.1) instead of using a port (*:80).
On the Node side, I created my app/server (aka node virtual host) that listened on port 8000 (somewhat arbitrarily choice of port number) See this link on creating a server with express: http://expressjs.com/guide.html
In my /localhost directory I then created a file called "nodeHttpProxy.js"
Using node-http-proxy, in nodeHttpProxy.js I then created a proxy server that listens on port 80. Using express, which wraps connect (http://www.senchalabs.org/connect/) I created my virtual hosts.
The nodeHttpProxy.js file looks like this:
// Module dependancies
var httpProxy = require('/usr/local/lib/node_modules/http-proxy/lib/node-http-proxy')
, express = require('/usr/local/lib/node_modules/express/lib/express');
// Http proxy-server
httpProxy.createServer(function (req, res, proxy) {
// Array of node host names
var nodeVhosts = [
'vhost1'
, 'vhost2'
]
, host = req.header('host')
, port = nodeVhosts.indexOf(host) > -1
? 8000
: 9000;
// Now proxy the request
proxy.proxyRequest(req, res, {
host: host
, port: port
});
})
.listen(80);
// Vhosts server
express.createServer()
.use(express.vhost('vhost1', require('./vhost1/app')))
.use(express.vhost('vhost2', require('./vhost2/app')))
.app.listen(8000);
As you can see, I will have to do two things each time I create a new Node virtual host:
add the virtual host name to my "nodeVhosts" array
define a new express virtual host using the .set method
Of course, I will also have to create the actual host path/files in my /localhost directory.
Once all this is done I just need to run nodeHttpProxy.js:
node nodeHttpProxy.js
You might get some weird "EACCESS" error, in which case, just run as sudo.
It will listen on port 80, and if the host matches one of the names in the nodeVhosts array it will forward the request to that host on port 8000, otherwise it will forward the the request onto that host on port 9000.
I've been giving this some thought lately as I'm tackling the same problems on my personal test environment. You are not going to be able to get around having each node application running on it's own port, but you can abstract away the pain of that process. Here is what I am using now, but I hope to build an npm package around this to simplify things in the future.
Each of my node.js applications has a map file that contains the port that the application is listening on as well as a map that indicates the expected path which the application is being served on. The contents of the file look like this:
{"path": "domain.com/path", "port": 3001}
When I start my application, it will read the port from the map.json file and listen on the specified port.
var map = fs.readFileSync('map.json', 'ascii');
app.listen(map.port);
Then in my proxy setup, I iterate over each of my node.js application directories, and check for a map.json file which indicates port 80 traffic should be proxied to this application.
I use almost the exact same method to setup the proxy for our apache hosted applications as well. We use a folder based convention on the PHP websites that we are serving and it uses the following configuration:
VirtualDocumentRoot /var/www/%-2.0.%-1/%-3+/
VirtualScriptAlias /var/www/%-2.0.%-1/%-3+/cgi-bin/
This essentially allows us to map domains to folders using the following structure.
http://sub.domain.com/ = /var/www/domain.com/sub/
There is no additional configuration needed to add or remove sites. This is very close to what I am currently using to proxy both apache and node sites. I am able to add new node and new apache sites without modifying this proxy application.
proxy.js
var fs = require('fs');
var httpProxy = require('http-proxy');
var proxyTable = [];
// Map apache proxies
fs.readdirSync('/var/www/').forEach(function(domain) {
fs.readdirSync('/var/www/' + domain).forEach(function(path) {
var fqd = domain + '/' + path;
var port = fs.readFileSync('port', 'ascii');
proxyTable[fqd] = fqd + ':' + 8080;
});
});
// Map node proxies
fs.readdirSync('/var/www-node/').forEach(function(domain) {
var map = fs.readFileSync('map.json', 'ascii');
proxyTable.[map.path] = '127.0.0.1:' + map.port;
});
var options = {
router: proxyTable
};
var proxyServer = httpProxy.createServer(options);
proxyServer.listen(80);
In the future, I will probably decouple the path from the port that the application is listening on, but this configuration allows me to build the proxy map automatically with very little work. Hopefully this helps.
I took some inspiration from #uglymunky and wrote a chef script to do this on Ubuntu.
With this script you can install express and apache with vhost support on a single server using 1 line after you pull down my chef project from github
https://github.com/toranb/ubuntu-web-server
If you have git installed and you pull it down you can kick it off like so ...
sudo ./install.sh configuration.json
This does require Ubuntu 12.04 or greater as I took advantage of an upstart script to start node when you reboot the machine
When the script is finished you will have a working ubuntu web server with express to run any node apps you configured, along side apache to run any wsgi apps you configured
I'm working on an extremely minimal and to the point library that can be totally segregated from your projects. Basically the idea would be run this independently on your servers and don't ever worry about having to bundle this in your projects how you would with connect.
Take a look at the config.json file to see how simple it actually is to setup.
I was looking for this and I did find a few things but they didn't support everything I needed which specifically is HTTPS, WS and WSS!
Right now the library I wrote only works for HTTP. But in the next few days I hope to have it finished and working for HTTPS, WS and WSS as well.