How does Gzip / mod_deflate affect crawlers / spiders / robots (ultimately also my seo)? - seo

I've been tasked to look into Gzip compression. I've figured out so far that is not just generally accepted, but also pretty common. Now my question is how does Gzip compression influence the crawlers that visit my page?
What should I keep in mind when I decide to Gzip my page?
Will my SEO suffer from this in any way?
In short, compression: How to do it the right way?

Gzipping should not affect how crawlers look at the page. Google even suggests compressing the content, so there should not be any problem in using gzipped contents. In fact it should help in your page ranking. Most of the modern browsers supports Gzipped contents and request them in accept-headers. There are certain things you need to do like minifying of js and css before doing gzipping and there are plenty of good articles about what needs to be done. It also depends on your capability of your system environment.
Caution: There might be few region specific crawlers which may not support them, but most of the major ones do.

Related

GZip compression in Contenful

My images coming from Contentful doesn't seems to be gzip compressed. Is there any setting that I need to make?
This link talks about that this is possible but I couldn't find any such setting. Can you please let me know how can I ensure that images I'm getting from Contentful cloudfront are gzip encoded?
there is no need for GZIP compression on images. GZIP and other compression algorithms only make sense for text-based formats. Images (and other binary formats) rely on different compression algorithms.
You can find more information here: https://webmasters.stackexchange.com/questions/8382/is-gzipping-images-worth-it-for-a-small-size-reduction-but-overhead-compressing
Hope that helps. :)

Apache vs Meta tags for specific IE Mode emulation?

I have seen two popular options to force IE to open an HTML in a particular mode:
1) <meta http-equiv="X-UA-Compatible" content="IE=edge" />
2) Specify it as a Header in httpd.conf
What are the advantages of either of these options? Is there a recommended approach to do this?
Most applications I have seen use Apache as a load balancer and it usually handles a request to www.url.com and sends it to one of possible application servers. Here anyway accessing the IP directly would not get the benefit of emulation because no Headers are set. Meta tag solves the problem closer than Apache does? So isn't that the better way to set a specific emulation or does Apache approach have other benefits?
Neither to be honest.
X-UA-Compatible is no longer supported (as of IE11 and above) and Microsoft recommends not using it and instead using the HTML5 doc type.
Saying that, to answer your question (in case interested in other headers like this), it depends. There are benefits to both.
Benefits of setting HTTP Headers
Can be set once at server level and don't need to remember to include on every page.
Useful if you don't have control over all the pages (e.g. many developers/contributors upload content to the site).
HTTP Header usually takes precedence (though not with X-UA-Compatible).
Benefits of setting at page level:
Doesn't require access to server (e.g. If page is hosted on a server where you don't have access to server config, or served across CDN).
Will be copied when page us served over CDN or other caching solution.
Can be set by page author (e.g. if page requires a specific header and author knows this).
It's usually easier to override per page if you need different settings per page rather than loading all that config in Apache.
When an individual page contains an x-ua-compatible header, it overrides headers provided by the server. There are times this is useful (for serving legacy websites that do not have DOCTYPE directives) and times when it's not. Usually, you know which situation you're by the problems you're trying to resolve.
The recommended practice is to use an HTML5 doctype (<!DOCTYPE html>) for most situations and to only use x-ua-compatible for legacy sites that rely on legacy markup. Ideally, this would be a temporary solution used only until a new version of the site has been developed so that it no longer relies on the legacy behavior.

Will Gzip compression affect my db queries?

Maybe it is a stupid question, but will gzip compression affect my db queries ( i mean, will they be cached) or gzip is related only to html and css output delivered to browser and php stuff will still be efecuted on the server with every page load?
If your using HTTP server compression then as you guessed its a way to optimize transmission from server to client, it (generally) works transparently.
Browser extensions as firebug or monitor tools such as fiddler can help you see whats is really happens behind the scenes.

configuring Tomcat for leveraging browser caching?

I ran Google's Page Speed on our web app to analyze and optimize our web site .
One of the many items under Web Performance Best Practices as listed in Page Speed says "To take advantage of the full benefits of caching consistently across all browsers, we recommend that you configure your web server to explicitly set caching headers and apply them to all cacheable static resources, not just a small subset (such as images). Cacheable resources include JS and CSS files, image files, and other binary object files (media files, PDFs, Flash files, etc.). In general, HTML is not static, and shouldn't be considered cacheable."
How do I configure tomcat to achieve the same ? I know it can be done via Filters by putting some HTTP headers but can we do it without touching code just by configuration ?
Edit : Just for information we use JSF 1.2 although I think this is irrelevant in context of this question.
If you are on Tomcat7, there is a built-in filter for that.
http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Expires_Filter
We use the wonderful URlRewriteFilter to do this. No code change, just configuration to web.xml, that's all. Link and rule below.
http://tuckey.org/urlrewrite/
<rule>
<from>^.*\.(js|css|gif)$</from>
<set type="expires">6 hours</set>
</rule>

Why are the files from the Google libraries api loaded via HTTPS

Well the main question says it all, why are the files loaded via https. I am just adding some new libraries to the website, and noticed that the links are all https://.
Now from what I understand you use https when there is some sensitive information, and I do not think that is the case with these libraries I guess. I think nobody is interested in getting the content of these files.
Is there any explanation for this ?
People asked for it so they could use the libraries on things like e-commerce sites, which eventually require an SSL connection. They provide links to the https version by default to make it easier for everyone overall (automatically avoids mixed-content warnings), and for most people the slight performance cost won't matter. But if you know you won't have any need for it, just strip it down to a regular http connection:
https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
They did actually publish the http URLs at one point, but I'd imagine that the resulting mixed-content warnings etc that came about as a result of people adding SSL connections and not thinking it through just created a bunch of support questions, so it was simpler to default to showing https and let people hack it if they really wanted.