Varnish Cache - Initial cache of web pages - apache

I have installed the Varnish cache with my Apache web server and configured them correctly. It works OK and I can now access my web pages though Varnish Cache.
The default behavior of varnish is to store copies of the pages served by the web server. The next time the same page is requested, Varnish will serve the copy instead of requesting the page from the Apache server.
And now comes my question: Is it possible to cache my entire website initially after setting up the Varnish cache, without the need to have a page to be accessed then store it on the cache? This is because, after varnish has been setup, the cache is initially empty, and it will require a page to be accessed in order to be available on the cache. Can this be done without having to access each page manually?

What you are looking for is a way of warming up the cache. You could use varnishreplay or a Web crawler, such as Wget or HTTrack to go through your site. Alternatively if you have a sitemap of your pages you could use that as a starting point and warm up the cache by looping over it and issuing requests on the pages using e.g. curl or wget.
Using varnishreplay requires you to first run varnishlog and gather a log of traffic before you can use it later for playing back the traffic and warming up the cache.
Wget, HTTrack etc. can be pointed to your home page and they will crawl their way through your site. Depending on the size and nature of your site this might not be practical though (for example if you use Ajax extensively).
Unless your pages take a very long time to load from the backend server (i.e. Apache), I wouldn't worry too much about warming up the cache. If the TTL for the cached content is high enough most of the visitors will only ever receive cached content anyway.

There is a much better way to do this which employs req.hash_always_miss and works with Varnish 3 and 4 (employs sitemap too). It warms up your cache and refreshes old pages without having to purge the cache. Full diagram, outline of how to configure it and 3 scripts for various use cases are outlined here http://www.htpcguides.com/smart-warm-up-your-wordpress-varnish-cache-scripts/ and are easily adapted for non-Wordpress sites.

Related

Domain URL masking

I am currently hosting the contents of a site with ProviderA. I have a domain registered with ProviderB. I want users to access the contents (www.providerA.com/sub/content) by visiting www.providerB.com. A domain forward is easy enough and works as intended, however, unless I embed the site in a frame (which is a big no-no), the actual URL reads www.providerA.com/sub/content despite the user inputting www.providerB.com.
I really need a solution for this. A domain masking without the use of a frame. I'm sure this has been done before. An .htaccess domain rewrite?
Your help would be hugely appreciated! I'm going nuts trying to find a solution.
For Apache
Usual way: setup mod_proxy. The apache on providerB becomes a client to providerA's apache. It gets the content and sends it back to the client.
But looks like you only have .htaccess. So no proxy, you need full configuration access for that.
So you cannot, see: How to set up proxy in .htaccess
If you have PHP on providerB
Setup a proxy written in PHP. All requests to providerB are intercepted by that PHP proxy. It gets the content from providerA and sends it back. So it does the same thing as the Apache module. However, depending on the quality of the implementation, it might fail on some requests, types, sizes, timeouts, ...
Search for "php proxy" on the web, you will see a couple available on GitHub and others. YMMV as to how difficult it is to setup, and the reliability.
No PHP but some other server side language
Obviously that could be done in another language, I checked PHP because that is what I use the most.
The best solution would be to transfer the content to providerB :-)

Can a website be cached anywhere other than a browser's cache?

My client is seeing a different version of the website on his computers then what I am seeing on mine. He claims to be deleting the cache. I'm using Safari with the cache disabled via the Develop menu and I see the correct version of the site.
Is it possible that the website is somehow cached by my client's ISP or something along those lines?
Update:
I think I need to describe the problem better:
My client has a web hosting package where he has his domains and email accounts. somedomain.com has it's A record changed to point to Behance's ProSite hosted service.
The problem is that when he goes to somedomain.com he gets the index.html that's sitting in his web server's public_html directory, and not his ProSite. Using the same domain I see the ProSite. He has cleared his cache and tried on a computer at home with the same result. This is what lead me to believe that there is some sort of caching issue somewhere along the line with his ISP(s).
Is there anything I can do about this?
Proxy servers at the ISP or even the client's site might do this. Or even network-compressors in some (mal)configurations.
Depending on the site you might also be seeing actually a different site. e.g. Google redirects to different servers using DNS load balancing.
Yes, you're right. To improve performance and the speed in loading page from the same request modern browser seem to great at caching. I myself have the same problem as well. To resolve this problem You should tag version of your projects whenever you deploy them to production.
Based on the update, the problem was with DNS cache.
DNS can be cached at the following levels:
browser
operation system
router
DNS provider
And each of them has its own way to flush DNS cache. Except DNS provider where the only thing you can is to wait for cache invalidation. Though you can replace your current DNS provider with another one who won't have your domain in his cache. You have all the chances to find such if your domain isn't popular.

Page Caching with Memcached

I am using Memcached in my Ruby on Rails 3 app. It works great with action and fragment caching, but when I try to use page caching, the page is stored in the filesystem instead of in Memcached. How can I tell Rails to use Memcached for page caching too?
In my development.rb file:
config.action_controller.perform_caching = true
config.cache_store = :mem_cache_store
You cant. The equivalent of page caching in memcached is action caching, because the request must be served through Rails. Page caching is meant to bypass Rails, so the data must be stored in a file that can be served from the server, like Nginx or Apache. The reason page caching is so fast is that it does bypass Rails entirely. Here is what the Rails documentation says:
Page caching is a Rails mechanism
which allows the request for a
generated page to be fulfilled by the
webserver (i.e. apache or nginx),
without ever having to go through the
Rails stack at all. Obviously, this is
super-fast. Unfortunately, it can’t be
applied to every situation (such as
pages that need authentication) and
since the webserver is literally just
serving a file from the filesystem,
cache expiration is an issue that
needs to be dealt with.
You can find more information here.
check this :
http://globaldev.co.uk/2012/06/serving_memcached_pages_from_nginx/
Cutting it shortly, install "memcaches_page" gem (add it to GemFile then bundle), then change caches_page directive to memcaches_page, then configure Nginx to serve page memcached server before hitting the application (described in the article) .

Caching dynamic content in Apache (mod_cache)

I'm trying to understand how dynamic caching is done in Apache.
I read the Caching Guide of Apache and an article about dynamic caching, but still don't understand exactly the internals of how dynamic caching works.
Say for example I have a PHP page that serves content through reading from a database according to parameters in the user's URL query-string (or parameters specified in POST).
e.g. www.mySite.com?articleID=31
How is that cached then?
Does mod_cache keeps the content retrieved from the database for this specific article?
Any sources or suggestions are welcomed.
It caches the output of your script, the HTML. What I'm not sure if CacheStorePrivate On will cache PHP scripts with no cache headers. I'm using Apache 2.4.17 and look like it doesn't.

How do I configure apache - that has not got mod_expires or mod_headers - to send expiry headers?

The webserver hosting my website is not returning last-modified or expiry headers. I would like to rectify this to ensure my web content is cacheable.
I don't have access to the apache config files because the site is hosted on a shared environment that I have no control over. I can however make configurations via an .htaccess file. The server - apache 1.3 - is not configured with mod_expires or mod_headers and the company will not install these for me.
With these limitations in mind, what are my options?
Sorry for the post here. I recognise this question is not strictly a programming question, and more a sys admin question. When serverfault is public I'll make sure I direct questions of this nature there.
What sort of content? If static (HTML, images, CSS), then really the only way to attach headers is via the front-end webserver. I'm surprised the hosting company doesn't have mod_headers enabled, although they might not enable it for .htaccess. It's costing them more bandwidth and CPU (ie, money) to not cache.
If it's dynamic content, then you'll have control when generating the page. This will depend on your language; here's an example for PHP (it's from the PHP manual, and is a bad example, as it should also set the response code):
if (!headers_sent()) {
header('Location: http://www.example.com/');
exit;
}
Oh, and one thing about setting caching headers: don't set them for too long a duration, particularly for CSS and scripts. You may not think you want to change these, but you don't want a broken site while people still have the old content in their browsers. I would recommend maximum cache settings in the 4-8 hour range: good for a single user's session, or a work day, but not much more.