I ran Google's Page Speed on our web app to analyze and optimize our web site .
One of the many items under Web Performance Best Practices as listed in Page Speed says "To take advantage of the full benefits of caching consistently across all browsers, we recommend that you configure your web server to explicitly set caching headers and apply them to all cacheable static resources, not just a small subset (such as images). Cacheable resources include JS and CSS files, image files, and other binary object files (media files, PDFs, Flash files, etc.). In general, HTML is not static, and shouldn't be considered cacheable."
How do I configure tomcat to achieve the same ? I know it can be done via Filters by putting some HTTP headers but can we do it without touching code just by configuration ?
Edit : Just for information we use JSF 1.2 although I think this is irrelevant in context of this question.
If you are on Tomcat7, there is a built-in filter for that.
http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Expires_Filter
We use the wonderful URlRewriteFilter to do this. No code change, just configuration to web.xml, that's all. Link and rule below.
http://tuckey.org/urlrewrite/
<rule>
<from>^.*\.(js|css|gif)$</from>
<set type="expires">6 hours</set>
</rule>
Related
I have seen two popular options to force IE to open an HTML in a particular mode:
1) <meta http-equiv="X-UA-Compatible" content="IE=edge" />
2) Specify it as a Header in httpd.conf
What are the advantages of either of these options? Is there a recommended approach to do this?
Most applications I have seen use Apache as a load balancer and it usually handles a request to www.url.com and sends it to one of possible application servers. Here anyway accessing the IP directly would not get the benefit of emulation because no Headers are set. Meta tag solves the problem closer than Apache does? So isn't that the better way to set a specific emulation or does Apache approach have other benefits?
Neither to be honest.
X-UA-Compatible is no longer supported (as of IE11 and above) and Microsoft recommends not using it and instead using the HTML5 doc type.
Saying that, to answer your question (in case interested in other headers like this), it depends. There are benefits to both.
Benefits of setting HTTP Headers
Can be set once at server level and don't need to remember to include on every page.
Useful if you don't have control over all the pages (e.g. many developers/contributors upload content to the site).
HTTP Header usually takes precedence (though not with X-UA-Compatible).
Benefits of setting at page level:
Doesn't require access to server (e.g. If page is hosted on a server where you don't have access to server config, or served across CDN).
Will be copied when page us served over CDN or other caching solution.
Can be set by page author (e.g. if page requires a specific header and author knows this).
It's usually easier to override per page if you need different settings per page rather than loading all that config in Apache.
When an individual page contains an x-ua-compatible header, it overrides headers provided by the server. There are times this is useful (for serving legacy websites that do not have DOCTYPE directives) and times when it's not. Usually, you know which situation you're by the problems you're trying to resolve.
The recommended practice is to use an HTML5 doctype (<!DOCTYPE html>) for most situations and to only use x-ua-compatible for legacy sites that rely on legacy markup. Ideally, this would be a temporary solution used only until a new version of the site has been developed so that it no longer relies on the legacy behavior.
We have an MVC web site deployed in a Cloud Service on Microsoft Azure. For boosting performance, some of my colleagues suggested that we avoid the bundling and minification provided by ASP.NET MVC4 and instead store the .js and .css files on an Azure blob. Please note that the solution does not use a CDN, it merely serves the files from a blob.
My take on this is that just serving the files this way will not cause any major performance benefits. Since we are not using a CDN, the files will get served from the region in which our storage is deployed all the time. Ever time a user requests a page, at least for the first time, the data will flow across the data center boundary and that will in turn incur cost. Also, since they are not bundled but kept as individual files, the server requests will be more. So we are forfeiting the benefits of bundling and minification. The only benefit I see to this approach is that we can do changes to the .js and .css files and upload them without a need to re-deploy.
Can anyone please tell me which of the two options is preferable in terms of performance?
I can't see how this would be better than bundling and minification unless the intent is to blob store your minified and bundled files. The whole idea is to reduce requests to the server because javascript processes a single thread at a time and in addition there's added download time, I'd do everything I can to reduce that request count.
As a separate note on the image side, I'd also combine images into a single image and use css sprites ala: http://vswebessentials.com/features/bundling
When you add a script or style bundle to an mvc site, the bundling framework will append a version to the output markup.
e.g. <script src="/Scripts/custom/App.js?v=nf9WQHcG-UNbqZZzi4pJC3igQbequHCOPB50bXWkT641"></script>
notice the querystring ?v=xxx-xxx
If you are hosting your App on multiple servers then each server would have a different version appended to the resource url which means in a classic round robin load balanced environment you will download that resource each time you hit a different server.
To me, seems to negate the value of bundling in some ways, since the initial load is quicker but a deteriorated performance is experienced on subsequent user interaction.
In practice how have others handled this issue I know depending on the size of the download it could be insignaficant because the minified and gzipped resource is tiny but in many situations this might not be the case. So how can one with minimal effort reap the benefits of bundling and minification in a high scale out environment.
In practice the version number is a hash of the contents of the files. So if you have the same javascript files on all nodes of your webfarm, they should all get the same version number. If you are getting a different hash this could be an indication that you haven't deployed the same contents of those files on all nodes of your webfarm.
My server has been compromised recently. This morning, I have discovered that the intruder is injecting an iframe into each of my HTML pages. After testing, I have found out that the way he does that is by getting Apache (?) to replace every instance of
<body>
by
<iframe link to malware></iframe></body>
For example if I browse a file residing on the server consisting of:
</body>
</body>
Then my browser sees a file consisting of:
<iframe link to malware></iframe></body>
<iframe link to malware></iframe></body>
I have immediately stopped Apache to protect my visitors, but so far I have not been able to find what the intruder has changed on the server to perform the attack. I presume he has modified an Apache config file, but I have no idea which one. In particular, I have looked for recently modified files by time-stamp, but did not find anything noteworthy.
Thanks for any help.
Tuan.
PS: I am in the process of rebuilding a new server from scratch, but in the while, I would like to keep the old one running, since this is a business site.
I don't know the details of your compromised server. While this is a fairly standard drive-by attack against Apache that you can, ideally, resolve by rolling back to a previous version of your web content and server configuration (if you have a colo, contact the technical team responsible for your backups), let's presume you're entirely on your own and need to fix the problem yourself.
Pulling from StopBadware.org's documentation on the most common drive-by scenarios and resolution cases:
Malicious scripts
Malicious scripts are often used to redirect site visitors to a
different website and/or load badware from another source. These
scripts will often be injected by an attacker into the content of your
web pages, or sometimes into other files on your server, such as
images and PDFs. Sometimes, instead of injecting the entire script
into your web pages, the attacker will only inject a pointer to a .js
or other file that the attacker saves in a directory on your web
server.
Many malicious scripts use obfuscation to make them more difficult for
anti-virus scanners to detect:
Some malicious scripts use names that look like they’re coming from
legitimate sites (note the misspelling of “analytics”):
.htaccess redirects
The Apache web server, which is used by many hosting providers, uses a
hidden server file called .htaccess to configure certain access
settings for directories on the website. Attackers will sometimes
modify an existing .htaccess file on your web server or upload new
.htaccess files to your web server containing instructions to redirect
users to other websites, often ones that lead to badware downloads or
fraudulent product sales.
Hidden iframes
An iframe is a section of a web page that loads content from another
page or site. Attackers will often inject malicious iframes into a web
page or other file on your server. Often, these iframes will be
configured so they don’t show up on the web page when someone visits
the page, but the malicious content they are loading will still load,
hidden from the visitor’s view.
How to look for it
If your site was reported as a badware site by Google, you can use
Google’s Webmaster Tools to get more information about what was
detected. This includes a sampling of pages on which the badware was
detected and, using a Labs feature, possibly even a sample of the bad
code that was found on your site. Certain information can also be
found on the Google Diagnostics page, which can be found by replacing
example.com in the following URL with your own site’s URL:
www.google.com/safebrowsing/diagnostic?site=example.com
There exist several free and paid website scanning services on the
Internet that can help you zero in on specific badware on your site.
There are also tools that you can use on your web server and/or on a
downloaded copy of the files from your website to search for specific
text. StopBadware does not list or recommend such services, but the
volunteers in our online community will be glad to point you to their
favorites.
In short, use the stock-standard tools and scanners provided by Google first. If the threat can't otherwise be identified, you'll need to backpath through the code of your CMS, Apache configuration, SQL setup, and remaining content of your website to determine where you were compromised and what the right remediation steps should be.
Best of luck handling your issue!
I'm building an application with a self-contained HTTP server which can be either accessed directly, or put behind a reverse proxy (like Apache mod_proxy).
So, let's say my application is running on port 8080 and you set up your Apache like this:
ProxyPass /myapp http://localhost:8080
ProxyPassReverse /myapp http://localhost:8080
This will cause HTTP requests coming into the main Apache server that go to /myapp/* to be proxied to my application. If a request comes in like GET /myapp/bar, my application will see GET /bar. This is as it should be.
The problem that arises is in generating URIs that have to be translated from my application's URI-space in order to work correctly via the proxy (i.e. prepending /myapp/).
The ProxyPassReverse directive takes care of handling this for URIs in HTTP headers (redirects and so forth.) But that doesn't handle URIs in the HTML generated by my application, or in static files and templates.
I'm aware of filters like mod_proxy_html, but this is a non-standard Apache module, and in any case, such filters may not be available for other front-end web servers which are capable of acting as a reverse proxy.
So I've come up with a few possible strategies:
Require an environment variable be set somewhere that contains the proxy path, and prepend this to all generated URIs. This seems inelegant; it breaks the encapsulation provided by the reverse proxy.
Put the proxy path in a configuration file for my application. Same objection as above.
Use only relative URIs in my application. This can get somewhat tricky; I would have to calculate the path difference between the current resource and where the link is going and add the appropriate number of ../'es. Seems messy. Another problem is that some things must generate absolute URIs, like RSS feeds and generated emails.
Use some hacky Javascript on the front-end to mungle URIs in the document text. This seems like a really horrible idea from an interoperability standpoint.
Use a singe URI-generating function throughout my code, and require "static" files like Javascript, CSS, etc. to be run through my templating system. This is the idea I'm leaning towards now.
This must be a fairly common problem. How have you approached it in the past? What has worked and what has made things more difficult?
Yep, common problem. How to solve this depends on the kind of app you have and the server platform and web framework you're working with. But there's a general way I've approached these problems which has worked pretty well so far.
My preference is to handle problems like this in application code, rather than relying on web server modules like mod_proxy_html to do it, because there are often too many special cases (e.g. client-side-javascript assembling URLs on the fly) which the server module doesn't catch. That said, I've resorted to the server-module approach in a few cases, but I decided to revise the module code myself to handle the corner cases. Also keep perormance in mind; fixing up URLs in your code at the time they're generated is usually faster than shoving the entire HTML through another server module.
Here's my recommendation of how to handle this in your code:
First, you'll need to figure out what kind of URLs to generate. My preference is for relative URLs. You are correct above that "add the appropriate number of ../'es" is messy, but at least it's your (the programmer's) mess. If you go with the config-file/environment-variable approach, then you'll be dependent on whoever deploys your app (e.g. an underpaid and grumpy IT operations engineer) to always set things up correctly. It also complicates release of your code, even if you're doing deployment yourself, since you can't simply copy your development files into production but need to add a per-deployment-environment custom step. I've found in the past that eliminating potential deployment problems is worth a lot of pre-emptive coding.
Next, you'll need to get those URLs into your code. How you do this varies based on type of content/code:
For server-side code (e.g. PHP, RoR, etc.) you'll want to make sure that server-side URL generation happens in as few places as possible in your code (ideally, one method!). If you're using any of the mainstream MVC web frameworks (e.g. RoR, Django, etc.), this should be trivial since URL generation using an MVC framework already generally goes through a single codepath that you can override. If you're not using one of those frameworks, you likely have URL generation littered throughout your code. But the approach you'll want to take is to generate all URLs via code, and then override that method to support transforming non-relative URLs into relative URLs. You can usually search for patterns in your code (like "/, '/, "http://, 'http://) and do a manual search and replace (or if you're really nerdy and have more patience than I do, craft a regex to replace each common case in your source code).
The key to making this work reliably is that, instead of manually replacing all absolute URLs with relative ones in your server-side code (which, even if you get each of them right, is fragile if files are moved), you can leave the absolute URLs in place and simply wrap them with a call to your "relativizer" method. This is much more reliable and unbrittle.
For Javascript, I generally like to do the same thing as server code-- move all URL generation into a single method and ensure any URL generation calls this method. This can be hard on an app with lots of pre-existing javascript, but the search-and-replace method above seems to work well in JS too.
For CSS, URLs in CSS are relative to the location of the CSS file (not the calling HTML page) so using relative URLs is generally easy. Simply put your CSS into a folder and either put images into deeper folders beneath it, or put images into a parallel folder to your CSS and use a single ../ to get to the images relatively. This is a good best practice in general-- if you're not doing relative URLs in CSS already, you should consider doing it, regardless of reverse proxy.
Finally, you'll need to figure out what to do for other oddball static files (like legacy static HTML files sometimes creep in). In general, I recommend the same practice as CSS and images-- ideally, you'd put static files into predictable directories and rely on relative URLs. Or (depending on your server platform) it may be easier to remap the file extensions of those static files so that they're processed by your web framework-- and then run your server-side URL generator for all URLs. Or, barring that, you can leave the files in place and manually fix up URLs to be relative-- knowing that this is brittle.
Coming full circle, sometimes there are just too many places where URLs are generated, and it's more effective to use a server module like mod_proxy_html. But I consider this a last resort-- especially if you won't be comfortable editing the source code if needed.
BTW, I realize I didn't mention anyting about your idea #4 above (javascript-link-fixup). I wouldn't do that-- if the user has javascript turned off or (more common) some network problem prevents that javascript for some time after the rest of the page loads, then your links won't work. Too risky.