Speeding up site using gzip and far-future expiration dates - apache

I recently deployed a site http://boardlite.com . One of the tester websites http://www.gidnetwork.com/tools/gzip-test.php suggests that gzip is not enabled for my site. YSlow gives an A grade for Gzip but does not mention that gzip is on.
How do I make sure the site properly implements Gzip. I am also going to enable far-future expiry dates for static media. I would like to know if there are any best practices for setting the expiry date.
Static media on the site is served by nginx server while the site itself runs on top of apache, just in case if this information is required.

I'd advise against going too far into the future or you'll make site upgrades a nightmare. I believe a week should be enough since after that you'll still only be serving 302 responses not the whole image.

It looks like Gzip is enabled on your server. You can tell by checking the HTTP response headers for 'Content-Encoding: gzip'.
I can't think of any "best practices" for future expiry dates - other than to make sure they're not in the past ;)
There are many other ways you can optimize your web site. Spriting your CSS background images and using a content delivery network for your static content are a few.
Andrew

Related

Domain URL masking

I am currently hosting the contents of a site with ProviderA. I have a domain registered with ProviderB. I want users to access the contents (www.providerA.com/sub/content) by visiting www.providerB.com. A domain forward is easy enough and works as intended, however, unless I embed the site in a frame (which is a big no-no), the actual URL reads www.providerA.com/sub/content despite the user inputting www.providerB.com.
I really need a solution for this. A domain masking without the use of a frame. I'm sure this has been done before. An .htaccess domain rewrite?
Your help would be hugely appreciated! I'm going nuts trying to find a solution.
For Apache
Usual way: setup mod_proxy. The apache on providerB becomes a client to providerA's apache. It gets the content and sends it back to the client.
But looks like you only have .htaccess. So no proxy, you need full configuration access for that.
So you cannot, see: How to set up proxy in .htaccess
If you have PHP on providerB
Setup a proxy written in PHP. All requests to providerB are intercepted by that PHP proxy. It gets the content from providerA and sends it back. So it does the same thing as the Apache module. However, depending on the quality of the implementation, it might fail on some requests, types, sizes, timeouts, ...
Search for "php proxy" on the web, you will see a couple available on GitHub and others. YMMV as to how difficult it is to setup, and the reliability.
No PHP but some other server side language
Obviously that could be done in another language, I checked PHP because that is what I use the most.
The best solution would be to transfer the content to providerB :-)

It is possible to find out the version of Apache HTTPD when ServerSignature is off?

I have a question. Can I find out the version of Apache when full signature is disabled. Is it even possible? If it is, how? I think that is possible because blackhats hacking big, corporate servers while knowledge of the version of the victim services is essential. What do you think? Thanks.
Well for a start there are two (or even three) things to hide:
ServerHeader - which shows the version in the Server response field. This cannot be turned of in Apache config but can be reduced to just "Apache".
ServerSignature - which displays the server version in the footer of error pages.
X-Powered-By which is not used by Apache but used by back end servers and services it might send requests to (e.g. PHP, J2EE servers... etc.).
Now servers do show some information due to differences in how the operate or how they interpret spec. For example the order of response headers, capitalisation, how they respond to certain requests all give clues to what server software might be being used to answer http requests. However using this to fingerprint a specific version of that software is more tricky unless there was an obvious, observable change from the client side.
Other options include looking at server status-page - though you would hope any administrator clever enough to reduce the default server header would also restrict access to the server-status page. Or through another security hole (e.g. being able to upload load executable scripts or the like).
I would guess most hackers would more be aware of bugs and exploits in some versions of Apache or other web servers and try to see if any of those can be exploited rather than trying to guess the specific version first.
In fact, as an interesting aside, Apache themselves have long been of the opinion that hiding server header information is pointless "security through obscurity" (a point I, and many others, disagree with them on) even putting this in their documention:
Setting ServerTokens to less than minimal is not recommended because
it makes it more difficult to debug interoperational problems. Also
note that disabling the Server: header does nothing at all to make
your server more secure. The idea of "security through obscurity" is a
myth and leads to a false sense of safety.
And even allowing open access to their server-status page.

Can you use gzip over SSL? And Connection: Keep-Alive headers

I'm evaluating the front end performance of a secure (SSL) web app here at work and I'm wondering if it's possible to compress text files (html/css/javascript) over SSL. I've done some googling around but haven't found anything specifically related to SSL. If it's possible, is it even worth the extra CPU cycles since responses are also being encrypted? Would compressing responses hurt performance?
Also, I'm wanting to make sure we're keeping the SSL connection alive so we're not making SSL handshakes over and over. I'm not seeing Connection: Keep-Alive in the response headers. I do see Keep-Alive: 115 in the request headers but that's only keeping the connection alive for 115 milliseconds (seems like the app server is closing the connection after a single request is processed?) Wouldn't you want the server to be setting that response header for as long as the session inactivity timeout is?
I understand browsers don't cache SSL content to disk so we're serving the same files over and over and over on subsequent visits even though nothing has changed. The main optimization recommendations are reducing the number of http requests, minification, moving scripts to bottom, image optimization, possible domain sharding (though need to weigh the cost of another SSL handshake), things of that nature.
Yes, compression can be used over SSL; it takes place before the data is encrypted so can help over slow links. It should be noted that this is a bad idea: this also opens a vulnerability.
After the initial handshake, SSL is less of an overhead than many people think* - even if the client reconnects, there's a mechanism to continue existing sessions without renegotiating keys, resulting in less CPU usage and fewer round-trips.
Load balancers can screw with the continuation mechanism, though: if requests alternate between servers then more full handshakes are required, which can have a noticeable impact (~few hundred ms per request). Configure your load balancer to forward all requests from the same IP to the same app server.
Which app server are you using? If it can't be configured to use keep-alive, compress files and so on then consider putting it behind a reverse proxy that can (and while you're at it, relax the cache headers sent with static content - HttpWatchSupport's linked article has some useful hints on that front).
(*SSL hardware vendors will say things like "up to 5 times more CPU" but some chaps from Google reported that when Gmail went to SSL by default, it only accounted for ~1% CPU load)
You should probably never use TLS compression. Some user agents (at least Chrome) will disable it anyways.
You can selectively use HTTP compression
You can always minify
Let's talk about caching too
I am going to assume you are using an HTTPS Everywhere style web site.
Scenario:
Static content like css or js:
Use HTTP compression
Use minification
Long cache period (like a year)
etag is only marginally useful (due to long cache)
Include some sort of version number in the URL in your HTML pointing to this asset so you can cache-bust
HTML content with ZERO sensitive info (like an About Us page):
Use HTTP compression
Use HTML minification
Use a short cache period
Use etag
HTML content with ANY sensitive info (like a CSRF token or bank account number):
NO HTTP compression
Use HTML minification
Cache-Control: no-store, must-revalidate
etag is pointless here (due to revalidation)
some logic to redirect the page after session timeout (taking into account multiple tabs). If someone presses the browser's Back button, the sensitive info is not displayed due to the cache header.
You can use HTTP compression with sensitive data IF:
You never return user input in the response (got a search box? don't use HTTP compression)
Or you do return user input in the response but randomly pad the response
Using compression with SSL opens you up to vulnerabilities like BREACH, CRIME, or other chosen plain-text attacks.
You should disable compression as SSL/TLS have no way to currently mitigate against these length oracle attacks.
To your first question: SSL is working on a different layer than compression. In a sense these two are features of a web server that can work together and not overlap. Yes, by enabling compression you'll use more CPU on your server but have less of outgoing traffic. So it's more of a tradeoff.
To your second question: Keep-Alive behavior is really dependent on HTTP version. You could move your static content to a non-ssl server (may include images, movies, audio, etc)

How do I configure apache - that has not got mod_expires or mod_headers - to send expiry headers?

The webserver hosting my website is not returning last-modified or expiry headers. I would like to rectify this to ensure my web content is cacheable.
I don't have access to the apache config files because the site is hosted on a shared environment that I have no control over. I can however make configurations via an .htaccess file. The server - apache 1.3 - is not configured with mod_expires or mod_headers and the company will not install these for me.
With these limitations in mind, what are my options?
Sorry for the post here. I recognise this question is not strictly a programming question, and more a sys admin question. When serverfault is public I'll make sure I direct questions of this nature there.
What sort of content? If static (HTML, images, CSS), then really the only way to attach headers is via the front-end webserver. I'm surprised the hosting company doesn't have mod_headers enabled, although they might not enable it for .htaccess. It's costing them more bandwidth and CPU (ie, money) to not cache.
If it's dynamic content, then you'll have control when generating the page. This will depend on your language; here's an example for PHP (it's from the PHP manual, and is a bad example, as it should also set the response code):
if (!headers_sent()) {
header('Location: http://www.example.com/');
exit;
}
Oh, and one thing about setting caching headers: don't set them for too long a duration, particularly for CSS and scripts. You may not think you want to change these, but you don't want a broken site while people still have the old content in their browsers. I would recommend maximum cache settings in the 4-8 hour range: good for a single user's session, or a work day, but not much more.

Mask redirect to temporary domain with mod_rewrite

We are putting up a company blog at companyname.com/blog but for now the blog is a Wordpress installation that lives on a different server (blog.companyname.com).
The intention is to have the blog and web site both on the same server in a month or two, but that leaves a problem in the interim.
At the moment I am using mod_rewrite to do the following:
http://companyname.com/blog/article-name redirects to http://blog.companyname.com/article-name
Can I somehow keep the address bar displaying companyname.com/blog even though the content is coming from the latter blog.companyname.com?
I can see how to do this if it is on the same server and vhost, but not across a different server?
Thanks
Rather than using mod_rewrite, you could use mod_proxy to set up a reverse proxy on companyname.com, so that requests to http://companyname.com/blog/article-name are proxied (rather than redirected) to http://blog.companyname.com/article-name.
Here are more instructions and examples.
There is functionality with ZoneEdit called webforwards which could probably do this and hide what you are actually doing (unless someone looked into it).
The only thing that mod_rewrite can do is send HTTP header redirects, and those redirects (across servers) always result in the browser address bar reflecting the reality.
You should instead consider writing a 404 script that 'reflects' the blog. This would essentially be a transparent proxy, and many are already written.
The script would find if the requested page (that was 404'd) started with http://mycompany.com/blog/ . If it did, it would download and then send onto the client the blog page and associated files (probably caching them as well).
So requesting http://mycompany.com/blog/article_xyz would cause the 404 script to download and send http://blog.companyname.com/article_xyz.
It's probably more work than it's worth, but you might be able to design a simple enough 404 script that it's worthwhile.
-Adam