Is there a way to recognize google bots with the php? - seo

I just made a website for an alcoholic drink. They need to have the age verification on all links. It's a single page website and I use backbone routing system. I've created the check with the SESSION object, so I am loading the intro view (age verification view) if the SESSION object is unset. This is all working as expected, but the problems are google bots. When they are trying to crawl my pages the app is always loading the intro (age verification) view. Here is a link for the website , but I think it won't be very useful, because I guess that this is more a logical then a technical question...
So..my question is how to redirect only visitors and to let google bots see the actual content of the page? Should I use cookies or there is a way to achieve this with the php?

Yes. Something like
If ($_SERVER['HTTP_USER_AGENT'] == "Googlebot") {
$_SESSION['ageverified'] = true;
// do more
}
Should work.
See here for all the exact user-agent names and what they crawl.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1061943

Related

hashHistory, _escaped_fragment_, and Google

I've been working with React Router for some time now and I've been using hashHistory to handle routing. At some point I am going to transition the app to browserHistory, but I'm curious as to why Google's "Fetch as Google" feature does not appear to work for anything other than the root route (/). It's clear that it's rendering something, just not the routes not handled by the server. (Image below)
I see that Google has deprecated their AJAX crawling scheme, which leads me to believe that I no longer need to deal with ?_escaped_fragment_=, but even so, I cannot get Google to render any other routes.
For example, the site is www.learnphoenix.io and the lessons are listed under www.learnphoenix.io/#/phoenix-chat/lessons. Yet, Google's Fetch as Google feature in webmaster redirects to the homepage and only renders the homepage. Using _escaped_fragment_ leads to the same result.
Is there a way to allow Google to index my site using hashHistory, or do I just have to accept that only my homepage will be indexed until I switch to browserHistory?
By default, it seems that google ignores url fragments (#). According to this article, which may be dated, using #! will tell google that the fragments can be used to define different canonical pages.
https://www.oho.com/blog/explained-60-seconds-hash-symbols-urls-and-seo
It's worth a shot, though Hashbang isn't supported by ReactRouter, again, because its supposed to be deprecated.
A better option might be to just bite the bullet and use browserHistory (pushState) in your react-router. The problem with that is if you're running a static server-less app, paths like /phoenix-chat/lessons will return a 404. In AWS, there is a hack around that too. Setting your 404 page to be your app index page.
http://blog.boushley.net/2015/10/29/html5-deep-link-on-amazon-s3/
Feels dirty, but again, worth a shot. Hopefully there's something of value in this answer for you!

is possible to do SEO for API Content?

One of my client having website which is entirely based on API Content i.e. content coming from 3rd party website. He wants to do some seo on the data. I wonder if it is possible as there is data not available in his database and i think google crawler redirect to 3rd party website while crawling on such pages. We already asked for permission from that website owner to let us store API data on our end in order to do some SEO but he refused our request.
It will be highly appericited if you can suggest any other way that should not be against policies and guidelines.
Thank You
Vikas S.
Yes - with a huge BUT:
Google explains how parameters can be set within their Search Console (Google Webmaster) and how these can effect the crawler's behaviour.
#Nadeem Haddadeen is right with the canonical links between duplicates. There's also an issue if you don't have consistent content when calling up the same parameters. This essentially makes your page un-indexable as it's dynamic content. If you are dealing with dynamic content then you need to optimise a host page based around popular queries rather than trying to have your content rate itself.
It's not recommended to take the same content and post it on your website, its duplicate and Google will give you penalty.
If you still want to post it on your website, you have to make some changes on the original text and then post it on your website to look like its original.
Also if you want to keep it without any changes and to avoid any penalties from Google, you you have to add a link for the original article from your website or add a cross domain canonical link like the below example:
<link rel="canonical" href="https://example.com/original-article-url" />

How do the Facebook like button and Google +1 button deal with a redirected url? [duplicate]

I understand the og:url meta tag is the canonical url for the resource in the open graph.
What strategies can I use if I wish to support 301 redirecting of the resource, while preserving its place in the open graph? I don't want to lose my likes because i've changed the URLs.
Is the best way to do this to store the original url of the content, and refer to that? Are there any other strategies for dealing with this?
To clarify - I have page:
/page1, with an og:url of http://www.example.com/page1
I now want to move it to
/page2, using a 301 redirect to http://www.example.com/page2
Do I have any options to avoid losing the likes and comments other than setting the og:url meta to /page1?
Short answer, you can't.
Once the object has been created on Facebook's side its URL in Facebook's graph is fixed - the Likes and Comments are associated with that URL and object; you need that URL to be accessible by Facebook's crawler in order to maintain that object in the future. (note that the object becoming inaccessible doesn't necessarily remove it from Facebook, but effectively you'd be starting over)
What I usually recommend here is (with examples http://www.example.com/oldurl and http://www.example.com/newurl):
On /newpage, keep the og:url tag pointing to /oldurl
Add a HTTP 301 redirect from /oldurl to /newurl
Exempt the Facebook crawler from this redirect
Continue to serve the meta tags for the page on http://www.example.com/oldurl if the request comes from the Facebook crawler.
No need to return any actual content to the crawler, just a simple HTML page with the appropriate tags
Thus:
Existing instances of the object on Facebook will, when clicked, bring users to the correct (new) page via your redirect
The Like button on the (new) page will still produce a like of the correct object (but at the old URL)
If you're moving a lot of URLs around or completely rewriting your URL scheme you should use the new URLs for new articles/products/etc, but you'll need to keep the redirect in place if you want to retain likes, comments, etc on the older content.
This includes if you're changing domain.
The only problem here is maintaining the old URL -> new URL mapping somewhere in your code, but it's not technically difficult, just an additional thing to maintain in the future.
BTW, The Facebook crawler UA is currently facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
I'm having the same problem with my old sites. Domains are changing, admins want to change urls for seo etc
I came to conclusion its best to have some sort uniqe id in db just for facebook - from the beginning. For articles for example I have myurl.com/a/123 where 123 is ID of the article.
Real url is myurl.com/category/article-title. Article can then be put in different category, renamed etc with extensive logic for 301 redirects behind it. But the basic fb identifier can stay the same for ever.
Of course this is viable only when starting with a fresh site or when implementing fb comments for the first time.
Just an idea if you can plan ahead :) Let me know what you think.

SEO Question - Google not getting past cookies?

I'm completely stumped on an SEO issue and could really use some direction from an expert. We built a website recently, http://www.ecovinowines.net and because it is all about wine, we set up an age verification that requires the user to click before entering the site. By using cookies, we prevent the user from accessing any page in the site before clicking the age verification link. It's been a couple of months since launching the site so I thought I'd check out some keywords on google. I just typed in the name of the website to see what pages would be indexed and it is only showing the age verification pages. From the googling I've done, apparently nothing behind the age verification will be visible to the google bots because they ignore cookies.
Is there no safe workaround for this? I checked out New Belgium's site, which uses a similar age verification link, and all of it's pages seem to be getting indexed. Once you click on one of it's links from google, it redirects the user to the age verification page. Are they not using cookies? Or how might they be getting around the cookie bot issue.
Do a test for Google bot's User agent and allow access if it matches. You might want to let other search engines through too...
Googlebot/2.1 (+http://www.google.com/bot.html)
msnbot/1.0 (+http://search.msn.com/msnbot.htm)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
ia_archiver
Semi-official response from the Google:
This topic comes up periodically for sites (alcohol, porn, etc.) that
need to serve an age verification notice on every page. What we
recommend in this case is to serve it via JavaScript. That way users
can see the age verification any time they try to access your content,
but search engines that don't run JavaScript won't see the warning and
will instead be able to see your content.
http://groups.google.com/group/Google_Webmaster_Help-Tools/browse_thread/thread/3ce6c0b3cc72ede8/
I think a more modern technique would be to render all the content normally, then obscure it with a Javascript overlay.
I had a quick look at New Belgium and it's not clear what they're doing. Further investigation needed.
Assuming you are using PHP, something like this will do the job. You'd need to add other bots if you want them.
$isBot = strpos($_SERVER[‘HTTP_USER_AGENT’],"Googlebot");
Then you can toggle your age verification on or off based on that variable.

Can Search Engines bots crawl pages requiring login?

If a homepage on a website has a content if a user is not logged in and another content when the user login, would a search engine bot be able to crawl the user specific content?
If they are not able to crawl, then I can duplicate the content from another part of the website to make it easily accessible to users who have mentioned their needs at the registration time.
My guess is no, but I would rather make sure before I do something stupid.
You cannot assume that crawler support cookies, but you can identify the crawler and let the crawler be "Logged in" in your site by code. However this will open up for any user to pretend being a crawler to gain the data in the logged in area.
The bot will be able to see all the content in your document. If the content does not exist in the document, then it will not be seen by the bot. If it exists in the document but is hidden from view, the crawler will be able to pick it up.
Even if this could be done it is against the terms for most search engines to show the crawler content that is not the same as what any user will get on entry and can cause your site to be banned from the index.
This is why sites like expertsexchange have to provide the answer if you scroll all the way to the bottom even though they try to make it look like you have to register. (This is only possible if you enter expertsexchange with a google referer btw, for this reason)