Will Web API based website suffer SEO problems?
Given that all content of a page is being pulled by javascript...
will search engine crawlers be able to get the page content?
I heard that crawlers do not always support javascript or perform javascript when crawling on a page.
It's not Web API that is bad for SEO, it's choosing an architecture where you use a web browser to navigate to empty HTML pages and then use JS to pull in the data. ASP.NET Web API does not have to be used that way.
You can't blame a hammer for building a bad house.
Depends.
Will ALL search engine crawlers be able to get the page content, I do not know.
Do the search engine crawlers that matter get the page content, yes.
Google and Bing combined own the search market, both can index content pulled in by javascript (and probably others as well).
Robert Scavilla on how content is indexed.
Search Engine Land on google executing javascript for indexing.
Related
I have a Single Page Application with some dynamic content but the Meta tags and stuff don't work when sharing on social sites (for obvious reasons). I was thinking of detecting the user-agent on the server side and rendering a static version of the page when I detect Googlebot or Facebook or others.
Is this good practice? Would it land me in any trouble with any of the social sites/search engines?
I think you may get a ban from Google for this.
I mean, it is not a good idea at all, cause Google will count it as black SEO.
Google wants to see the same that normal user sees.
before responsive design we need mobile specific sitemaps, but with responsive design they were not needed.
But with introduction of Accelerate Mobile Pages (AMP), we are again having mobile specific URLs, so my questions are:
Do we need Separate (Mobile) Sitemap for AMP pages?
if yes, then what schemas we should use?
old schema http://www.google.com/schemas/sitemap-mobile/1.0? or something new?
No need providing you have a rel="amphtml" link in your regular page to tell crawlers the AMP HTML version as discussed here:
https://www.ampproject.org/docs/guides/discovery.html
Similarly your AMP pages should have a rel="canonical" link to point to real page, to avoid search engines thinking you have duplicate content.
In fact for Google, in the Google Search Console for you site there is an AMP section (under Search Appearance section) that shows all AMP pages it has found and if there are any problems with them.
As BazzaDP said their no need for separate sitemap.But you need to add rel="amphtml" to the top of the page. But it is good to have separate site map for AMP page, the major reason is Google crawler will learn how your site interacts having a separate sitemap for amp will make it easy for Google Crawler to detect and display in search result though it is not necessary. My opinion if making sitemap for amp page is difficult for your stack leave it, If it not do it. As this will allow other search engine to detect easily. Creating separate sitemap doesn't give you any advantage.
As for your question, there is no need for it.
I have created widgets for my website(xyz.com), which can be embedded in different websites. Let's say I embed a widget which is a photo album, in another website, abc.com. The content is residing on xyz.com but is pulled via Javascript into abc.com.
Will the content generated by the widgets (Javascript) on abc.com will be indexed by search engines?
Google will not index anything that is not visible when a page is loaded with JavaScript disabled.
There is more information in this similar question:
google indexing text retrieved by ajax or javascript after page load
Also, you can test what Googlebot 'sees' by using the "Fetch as Googlebot" feature of Google Webmaster Tools.
If you want Google to index your Ajax, you can read Google's recommendations here:
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
If you follow Google's scheme for Making AJAX Applications Crawlable, then Google will index content that's generated with Javascript. So will Bing and Yandex.
Implementing this scheme is somewhat involved which is why there are companies that provide it as a service that plugs in at webserver level. (I work for one of these: https://ajaxsnapshots.com
I have a site which has been developed completely in flash. Now the site owners do not want to shift to a more text/html based site. So am planning to create an alternative html/text based site which the googlebot will get redirected to. (By checking the useragent). My question is that is this allowed officially by google?
If not then how come there are many subscription based sites which display a different set of data to google compared to the users? Is that allowed?
Thank you very much.
I've dealt with this exact scenario for a large ecommerce site and Google essentially ignored the site. Google considers it cloaking and addresses it directly here and says:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Instead, create an ADA compliant version of the website so that users with screen readers and vision aids can use your web site. As long as there as link from your home page to your ADA compliant pages, Google will index them.
The official advice seems to be: offer a visible link to a non-flash version of the site. Fooling the googlebot is a surefire way to get in trouble. And remember, Google results will link to the matching page! Do not make useless results.
Google already indexes flash content so my suggestion would be to check how your site is being indexed. Maybe you don't have to do anything.
I don't think showing an alternate version of the site is good from a Google perspective.
If you serve up your page with the exact same address, then you're probably fine. For example, if you show 'http://www.somesite.com/' but direct googlebot to 'http://www.somesite.com/alt.htm', then Google might direct search users to alt.htm. You don't want that, right?
This is called cloaking. I'm not sure what the effects of it are but it is certainly not whitehat. I am pretty sure Google is working on a way to crawl flash now so it might not even be a concern.
I'm assuming you're not really doing a redirect but instead a PHP import or something similar so it shows up as the same page. If you're actually redirecting then it's just going to index the other page like normal.
Some sites offer a different level of content -- they LIMIT the content, they don't offer alternative and additional content. This is done so it doesn't index unrelated things generally.
Technorarati's got their Cosmos api, which works fairly well but limits you to noncommercial use and no more than 500 queries a day.
Yahoo's got a Site Explorer InLink Data API, but it defines the task very literally, returning links from sidebar widgets in blogs rather than just links from inside blog content.
Is there any other alternative for tracking who's linking to a given URL (think of the discussion links that run below stories on Techmeme.com)? Or will I have to roll my own?
Well, it's not an API, but if you google (for example): "link:nytimes.com", the search results that come back show inbound links to that site.
I haven't tried to implement what you want yet, but the Google search API almost certainly has that functionality built in.
Is this for links to Urls under your control?
If so, you could whip up something quick that logs entries in the Referrer HTTP header.
If you wanted to do to this for an entire web site without altering application code, you could implement as an ISAPI filter or equivalent for your web server of choice.
Information available publicly from web crawlers is always going to be incomplete and unreliable (not that my solution isn't...).