We have recently release our website under AMP format to help the mobile user experience. We have now submitted our sitemap to Google Webmaster Tools, and added the correct rel="amphtml".
However, Webmaster Tools reports all my page with 'Missing supported structured data element' and a link to https://developers.google.com/structured-data/rich-snippets/articles
I understand that this is needed for the Top Stories features, what I was not aware is that any other Structured Data seems to be not indexed by Google.
Is the AMP useless for non-Article structured data?
Edit: video is also supported.
Depends how people are getting to your AMP page.
At the moment the top stories features is the only place using AMP - unless your website directs people to the AMP pages? So in that sense AMP is only useful for NewsArticle documents at present. Though Google have said this list is likely to grow, quickly.
However if the AMP page is your main page then users will still benefit from the fact your AMP page is fast - even without the NewsArticle structured data tag. Though personally I don't think AMP pages are ready to (or should ever be) completely replace standard HTML.
Will be interesting to see how this changes in future as others (e.g. Twitter) start to integrate AMP pages. Though I'd imagine they'll likely follow Google's lead and demand the same requirements as they do.
Edit 1st March 2016: Google have added a page to the Structured Data site explaining requirements to get in the top stories section: https://developers.google.com/structured-data/carousels/top-stories
Related
before responsive design we need mobile specific sitemaps, but with responsive design they were not needed.
But with introduction of Accelerate Mobile Pages (AMP), we are again having mobile specific URLs, so my questions are:
Do we need Separate (Mobile) Sitemap for AMP pages?
if yes, then what schemas we should use?
old schema http://www.google.com/schemas/sitemap-mobile/1.0? or something new?
No need providing you have a rel="amphtml" link in your regular page to tell crawlers the AMP HTML version as discussed here:
https://www.ampproject.org/docs/guides/discovery.html
Similarly your AMP pages should have a rel="canonical" link to point to real page, to avoid search engines thinking you have duplicate content.
In fact for Google, in the Google Search Console for you site there is an AMP section (under Search Appearance section) that shows all AMP pages it has found and if there are any problems with them.
As BazzaDP said their no need for separate sitemap.But you need to add rel="amphtml" to the top of the page. But it is good to have separate site map for AMP page, the major reason is Google crawler will learn how your site interacts having a separate sitemap for amp will make it easy for Google Crawler to detect and display in search result though it is not necessary. My opinion if making sitemap for amp page is difficult for your stack leave it, If it not do it. As this will allow other search engine to detect easily. Creating separate sitemap doesn't give you any advantage.
As for your question, there is no need for it.
I have created a sitemap for my site and it complies with the protocol set by http://www.sitemaps.org/
Google has been told about this sitemap via webmaster tools. It has tracked all the urls within the sitemap (500+ urls) but has only indexed 1 of them. The last time google downloaded the sitemap was on the 21st of Oct 2009.
When I do a google search for site:url it picks up 2500+ results.
Google says it can crawl the site.
Does anyone have any ideas as to why only 1 url is actually indexed?
Cheers,
James
First off, make sure Google hasn't been forbidden from those pages using robots.txt, etc. Also make sure those URLs are correct. :)
Second, Google doesn't just take your sitemap at face value. It uses other factors, such as inbound links, etc, to determine whether it wants to crawl all of the pages in your sitemap. The sitemap then serves mostly as a hint more than anything else (it helps Google know when pages are updated more quickly, for example). Get high-quality, relevant, useful links (inbound and outbound) and your site should start getting indexed.
Your two statements seem to contradict one another.
but has only indexed 1 of them.
and
When I do a google search for site:url it picks up 2500+ results
bdonlan is correct in their logic (robot.txt and Google's lack of trust for sitemaps) but I think the issue is what you "think" is true about your site.
That is, Google Webmaster Tools says you only have 1 page indexed but site:yoursite.com shows 2.5k.
Google Webmaster Tools aren't very accurate. They are nice but they are buggy and MIGHT help you learn about issues about your site. Trust the site: command. Your in Google's index if you search site:yoursite.com and you see more than 1 result.
I'd trust site:yoursite.com. You have 2.5k pages in Google, indexed and search-able.
So, now optimize those pages and see the traffic flow. :D
Sidenote: Google can crawl any site, flash, javascript, etc.
What are other ways of making your website searchable by Google, other than submitting the link directly to Google.
Submitting links to yahoo is a breeze, gets crawled for a day or two... Google though takes a while...
Thanks...
if you add a link to your website on a website that's already indexed by google, google will follow that and reach your site without you needing to submit to their page. it's actually not recommended to submit your site to their page because then you're put at the end of the queue. but if you have a link on a page that google indexes in the next minute, it will get to you much faster. more links on many pages with higher ranking the better. cheers
Add your site to DMOZ.org, and encourage everyone you know to link to your site. The more places that link to your site, the more likely it'll get indexed sooner (and more fully), and the better it will rank.
Also, if your site is very large, it is not unreasonable to sign up for their webmaster tools and submit a sitemap index. This is especially effective for fast ranking, and showing up in obscure search results, but it will not help you rank for difficult terms.
Also note that if your site was visited by googlebot,
it doesn't necessarily end up in the google index.
Use this link to check:
http://www.google.com/webmasters/tools/sitestatus
I have a site which has been developed completely in flash. Now the site owners do not want to shift to a more text/html based site. So am planning to create an alternative html/text based site which the googlebot will get redirected to. (By checking the useragent). My question is that is this allowed officially by google?
If not then how come there are many subscription based sites which display a different set of data to google compared to the users? Is that allowed?
Thank you very much.
I've dealt with this exact scenario for a large ecommerce site and Google essentially ignored the site. Google considers it cloaking and addresses it directly here and says:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Instead, create an ADA compliant version of the website so that users with screen readers and vision aids can use your web site. As long as there as link from your home page to your ADA compliant pages, Google will index them.
The official advice seems to be: offer a visible link to a non-flash version of the site. Fooling the googlebot is a surefire way to get in trouble. And remember, Google results will link to the matching page! Do not make useless results.
Google already indexes flash content so my suggestion would be to check how your site is being indexed. Maybe you don't have to do anything.
I don't think showing an alternate version of the site is good from a Google perspective.
If you serve up your page with the exact same address, then you're probably fine. For example, if you show 'http://www.somesite.com/' but direct googlebot to 'http://www.somesite.com/alt.htm', then Google might direct search users to alt.htm. You don't want that, right?
This is called cloaking. I'm not sure what the effects of it are but it is certainly not whitehat. I am pretty sure Google is working on a way to crawl flash now so it might not even be a concern.
I'm assuming you're not really doing a redirect but instead a PHP import or something similar so it shows up as the same page. If you're actually redirecting then it's just going to index the other page like normal.
Some sites offer a different level of content -- they LIMIT the content, they don't offer alternative and additional content. This is done so it doesn't index unrelated things generally.
Are all these types of sites just illegally scraping Google or another search engine?
As far as I can tell ther is no 'legal' way to get this data for a commercial site.. The Yahoo! api ( http://developer.yahoo.com/search/siteexplorer/V1/inlinkData.html ) is only for noncommercial use, Yahoo! Boss does not allow automated queries etc.
Any ideas?
For example, if you wanted to find all the links to Google's homepage, search for
link:http://www.google.com
So if you want to find all the inbound links, you can simply traverse your website's tree, and for each item it finds, build a URL. Then query Google for:
link:URL
And you'll get a collection of all the links that Google has from other websites into your website.
As for the legality of such harvesting, I'm sure it's not-exactly-legal to make a profit from it, but that's never stopped anyone before, has it?
(So I wouldn't bother wondering whether they did it or not. Just assume they do.)
I don't know what hubspot do, but, if you wanted to find out what sites link to your site, and you don't have the hardware to crawl the web, one thing you can do is monitor the HTTP_REFERER of visitors to your site. This is, for example, how Google Analytics (as far as I know) can tell you where your visitors are arriving from. This is not 100% reliable as not all browsers set it, particularly in "Privacy Mode", but you only need one visitor per link to know that it exists!
This is ofter accomplished by embedding a script into each of your webpages (often in a common header or footer). For example, if you examine the source for the page you are currently reading you will find (right down at the bottom) a script that reports back to Google information about your visit.
Now this won't tell you if there are links out there that no one has ever used to get to your site, but let's face it, they are a lot less interesting than the ones people actually use.