Do web spiders read content that is not shown with display none? - seo

As the title says: Do web spiders read content that is not shown with display: none; CSS code?
I have visited my website with Lynx, and I can still see the content.

Yes. Some search engine spiders supposedly don't index it or only use it to check for spam, but it is visible. Most spiders don't process CSS or client side scripting.

Related

noscript text is appearing in Google

I have added in the bottom of my html like this (just like how stackoverflow has it implemented):
<noscript>This site works best with Javascript is enabled</noscript>
but in one of my pages that has very little text, the text "Javascript is disabled" appears in Google search.
Is there a way to tell Google to avoid indexing this part? Or is there a better alternative instead of using <noscript> tag?
The issue is that Google often won't render Javascript. It can - but it often won't.
You either need to present a pre-rendered page or provide it with a meta description that accurately describes the content. Look up tags and how Google uses them to embellish it's search listings.
Other options like or can encourage Google from deviating from the provided description. However, a pre-rendered page for it to scrape is always more reliable.

Prevent a div from Google being read and followed without JavaScript

I don't want some of divs on my page to be followed by Google because they contain duplicate content from other websites, is this possible that I prevent those divs to be 'no-followed'?
I have seen this:
Methods for preventing search engines from indexing irrelevant content on a page
but its suggesting JavaScript for this purpose and I can't use JS in my case, also that question is from 2009, I hope things are bit changed now?
If you really do not want use javascript for this, the only way I'm sure will stop Google for index some content of your page is using iframe + robots noindex/nofollow.
Instead of use a div, create a normal iframe, in a way that it appears that is not a iframe.
To page that is target of iframe, add metatag robots <meta name="robots" content="noindex, nofollow">
Keep in mind that it will probably be interpreted as if it were adversing, so there will be some penalty, but this penalty may be lower than copying significant amount of content.

Combining age verification and google indexing

As spiders will generally not execute javascript i am thinking of taking one of the options below in order to successfully get them to index the content of a website that requires age verification.
My preferred solution:
Checking for a cookie 'ageverification'. If it does not exist, add some javascript to
redirect the user to ~/verifyage.aspx which will add the required cookie and redirect the user to their previous page.
Another solution:
As above, but do not redirect the user. Instead, if the cookie doesnt exist, draw the age verification form 'over the top' of the existing page.
Another solution:
Add a 'Yes I am over 18' anchor link that a crawler can follow. I am slightly skeptical over the legality of this.
Any insight or ideas greatly appreciated.
What I do - I store age verification in session data. If the session variable does not exist, the server appends a div to the end of the body (after the footer) with the click to verify or click to exit. I use CSS to have it cover the content.
For the css - I use:
display: block; width: 100%; height: 100%; position: fixed; top: 0px; left: 0px; z-index: 9999;
That causes the div the cover all other content in a graphical browser even though it is placed at the very end of the body.
For users without JS enabled, the "Enter" link points to a web page that sets the session variable and returns the user to the page they requested. That results in two page loads of the browser for them to get to the content they want which is not ideal, but it is the only way to do it for non JS enabled browsers.
For JS enabled browsers, a small JavaScript is attached to the page that will change the "Enter" link href link to # and attach a very basic function to the click event, so that clicking on Enter triggers the use XMLHttpRequest to tell the server the person clicked "Enter". The server then updates the session and responds to the XMLHttpRequest with a 200 OK response, triggering the JavaScript to hide the age verification div covering the content. Thus the session is updated so the server knows the user verified the age and the user gets to see the content they wanted with no page reloading in the browser, a much better user experience.
The age verification thus works without JavaScript by sending the user to the verify page the stateless way or in a much friendlier way with JavaScript.
When a search spider crawls the site, it gets the age verification div on every page because a spider will not have the necessary session variable set, but since the div is at the very end of the html body the spider still indexes the real content first.
You've got a real problem either way.
If you let the crawler onto the age-verified portion of your site, then it has that content in its index. Which means it will present snippets of that to users who search for things. Who haven't been through your age verification. In the case of Google, this means users actually have access to the entire body of content you were putting behind the verifywall without going through your screener - they can pull it from the Google cache!
No-win situation, sorry. Either have age-verified content or SEO, not both. Even if you somehow tell the search engine not to spit out your content, the mere fact that your URL shows up in search results tells people about your site's (restricted) contents.
Additionally, about your JavaScript idea: this means that users who have JavaScript disabled would get the content without even knowing that there should have been a click-through. And if you display a banner on top, that means that you sent the objectionable content to their computer before they accepted. Which means it's in their browser cache. Or they could just hack out your banner and have at whatever it is you were covering up without clicking 'OK'.
I don't know what it is your website does, but I really suggest forcing users to POST a form to you before they're allowed to view anything mature. Store their acceptance status in a session variable. That's not fakeable. Don't let the search engine in unless it's old enough, too, or you have some strong way to limit what it does with what it sees AND strong information about your own liability.

SEO: does google bot see text in hidden divs

I have login/signup popups on my site which are in hidden div by default.
According to Google SEO and hidden elements googlebot should NOT see it.
But Google Webmaster tool says that keywords "email" and "password" are top keywords over the site.
Why it is so? Why google bot sees them?
Should I worry about relevancy of top keywords at all?
Yes, Googlebot will see the text since it's in the HTML. However, it will probably know that it is hidden text, and thus may not give it a very high priority. Users searching for the text in hidden elements would be less likely to see your page.
Open your site in the Lynx browser it is a browser that displays only text and this is what Googlebot sees also
Also check the Google Webmaster guidelines scroll down to Technical Guidelines and you will see this text
Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

Will <noscript> hide the rest of the static content to Google crawlers?

This should be an easy one for someone:
Will the <noscript> element cause the HTML page to serve only the content within the <noscript> tag itself to google crawlers and hide all the rest of my static content causing it so not to be indexed?
Thanks!
No. The crawlers will see all your content, both within the <noscript> element and everywhere else.
Crawlers behave a lot like browsers with JavaScript turned off - they see all the static content plus the <noscript> content, but not anything JavaScript-dependent.
The whole HTML file is served in response to a GET request. Google should honour robots.txt and not spider directories marked with that.