Scribd Search Engine Optimization Features for PDF - pdf

All recently noticed that PDF documents in Scribd are also SEO friendly for search engines. For example the link http://www.scribd.com/doc/17135767/FREE-by-Chris-Anderson
If you open the page and see the HTML source code, the plain text from the PDF is not presented. However if you open the cached version of the page from Google search it appears a tag html_wrapper which contains the text from the entire PDF document.
Do they display different content depending of User-agent that make the request - ex. browser or bots?
I've heard some SEO practices that don't recommend displaying different content for bots? How bad practice is this from SEO prospective?

this is what google sees
http://webcache.googleusercontent.com/search?q=cache:-LY7o-liYlsJ:www.scribd.com/doc/17135767/FREE-by-Chris-Anderson+site:www.scribd.com/doc/17135767/FREE-by-Chris-Anderson&hl=en&strip=1
yeah, you should not display googlebot different content then a human user, said that there are ways to do ok conditional rendering (i.e.: render for no cookie clients, render for no javascript clients, render for clients without a language header, ...) this kind of rendering can be missleading, but if is not missleading then it might be ok for google. if you do this kind of conditional rendering it's then always a question of intend.

Related

noscript text is appearing in Google

I have added in the bottom of my html like this (just like how stackoverflow has it implemented):
<noscript>This site works best with Javascript is enabled</noscript>
but in one of my pages that has very little text, the text "Javascript is disabled" appears in Google search.
Is there a way to tell Google to avoid indexing this part? Or is there a better alternative instead of using <noscript> tag?
The issue is that Google often won't render Javascript. It can - but it often won't.
You either need to present a pre-rendered page or provide it with a meta description that accurately describes the content. Look up tags and how Google uses them to embellish it's search listings.
Other options like or can encourage Google from deviating from the provided description. However, a pre-rendered page for it to scrape is always more reliable.

Googlebot and "hidden" content inside dynamically shown (js based) tabs within a page - Impact on SERPS?

Let says someone has 'legitimately' hidden content within a page.
To explain this further, imagine the following:
<div id="tab-one">This is the content inside tab one</div>
<div id="tab-two">This is the content inside tab two</div>
Tab one
Tab two
From an seo perspective, assuming that none of this is done to manipulate google. And in fact, "tab two" contains spam free, relevant data, how does this impact seo?
Will googlebot index, and conciser the 'hidden' content as part of the content of the page?
Will it use this content in the same way as though the content was "visible" on the page without the use of javacscript?
Thanks.
I don't believe there's an official Google response on this topic in the past, however, from experience I can tell you that Google will index the tabbed content just fine. You'll even see SEO traffic from the content. If you're site is fairly clean, I wouldn't worry about being flagged as having "hidden content", as long as the content is accessible by user action (e.g. clicking), and obviously clickable.
However, you'll want to consider this. Say for example, some of the content in a hidden tab is a product description such as "child safe". If a users is looking for "child safe products", and they arrive at your site through a search engine, they probably won't immediate see that information because they don't know it's buried behind a tab.
Most users don't spend a lot of time hunting, so to a user they might not find the content and bounce because they don't feel like they found the relevant information they were looking for. If you subscribe to the idea that Google and Bing use search query refinements as a search signal, this could potentially "harm" your SEO.
Personally, unless it's truly tertiary information, I wouldn't put it behind a tab unless crucial to the Ux. From my experience, users don't mind scrolling if the information is relevant ... but they tend to have "tab" blindness or only really interact with "hidden" elements when it's part of the navigation or already in a transactional flow.
p.s. An alternative is to use crawlable AJAX or pushState() to have the individual tabs indexed separately on their own URLs. But you'll want to be careful ... if you're rendering out the main content on the tab "pages", you might have a duplicate content concern. If it makes sense, you can potentially use the rel="next" and rel="prev" spec that Google released (but only supported by Google right now).
In Webmaster Tools you will find the option to Fetch as Google. There you can see just how Google is crawling the page. I've noticed some JavaScript carousel libraries are crawled, while others aren't. It's just a matter of how Google is able to read the JavaScript code.
As far as impact goes, it's not like all hidden content is bad. The content is still crawled (As you will see with the fetch). Now if there was an abundance of keyword-stuffed content, that would be susceptible to penalty.
Used correctly, it's definitely still beneficial.
The hidden content will be crawled, and this is not a problem for Google, many sites have this kind of menu. I suppose the hidden tabs are not keywords stuffed and useful for the users, so you shouldn't worry about this - it is useful for the user and googlebot!

SEO: does google bot see text in hidden divs

I have login/signup popups on my site which are in hidden div by default.
According to Google SEO and hidden elements googlebot should NOT see it.
But Google Webmaster tool says that keywords "email" and "password" are top keywords over the site.
Why it is so? Why google bot sees them?
Should I worry about relevancy of top keywords at all?
Yes, Googlebot will see the text since it's in the HTML. However, it will probably know that it is hidden text, and thus may not give it a very high priority. Users searching for the text in hidden elements would be less likely to see your page.
Open your site in the Lynx browser it is a browser that displays only text and this is what Googlebot sees also
Also check the Google Webmaster guidelines scroll down to Technical Guidelines and you will see this text
Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

How do I tell search engines about my flash content?

I use the embed & object tag combo to display SWF's. Just like we use alt for img, how do I tell search engines what content my SWF contains?
Lately google introduced indexing swf text content, and maybe other engines will follow.
For details see http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html
AFAIK there's no "ALT" specific for SWF. Google can read the contents just fine though, as long as you don't hide it behind Javascript (generating the tags on the fly or upon page load).

How to tell image search which image matters?

Google image search seems to do a poor job on a site I run in identifying which image on a page should be indexed. In addition it doesn't seem to link that image with lots of the associated data.
Are there any ways of focusing attention for spiders on particular images and associated data, do they need to be within the same tags, or adjacent on the page?
A few tips:
Use a descriptive name, i.e. "tabby-cat.jpg" instead of "img02396.jpg".
Use alt tags on images.
Use descriptive text on the page and around the image.
Make sure the images are in the generated source, i.e. if you click "View source" in your browser, you see <img> tags.
It's also useful to validate your site at http://validator.w3.org in case there are major errors like missing brackets etc that could prevent a spider from parsing the page. (Note: I wouldn't worry about making everything 100% valid since Google is fine with invalid code)
Images in CSS (i.e. backgrounds) are not indexed AFAIK. However I'd suggest using CSS backgrounds for "design" images (a subtle way of getting Google to ignore site headers, custom borders, shadows, etc).
Nor are any images generated from Javascript.
Make sure you're not blocking images through robots.txt. I know that Joomla does this by default.
Sign up at Google Webmaster Tools, add your site, then allow it to be used in Google's "Image Labeller" game which should help tag images.
All images on a page should be indexed. If they aren't then improve your alt tags and possibly rename the image file. There really isn't anything more you can do since search-engines do not read any other context for the image itself except size. If google thinks the image is a duplicate it won't index it either.
Of course if images really do inherit context from the surrounding page then you could just use less images or move them into CSS.
I think Search robot can not read images as we do, so the simple and must thing you should do to your images is using descriptive names, so that spider could know what this image all about. Second one is using ALT tags on images, put in keywords relating to the images.
Those thing are what I do.