What do you think, is Google using Dublin Core tags when indexing site ?
Are these tags (DC.description etc) important for Google ?
Thanks.
I'm assuming you're talking about this: http://dublincore.org/
It's certain that Google are NOT using that in their indexing. I've never heard of them before, never heard Google mention them, and there is no mention of Google or search engines on that site.
That's not to say Google, Bing, Yahoo, etc will never implement them. Google is using more metadata and rich snippets these days. Although from a quick look, they appear unnecessarily complex so it probably won't happen.
The above answers may apply to Google Web Search, but it is worth noting that Google Scholar does use Dublin Core tags (reluctantly) in indexing content. From their indexing guidelines:
Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers. To check that these tags are present, visit several abstracts and view their HTML source.
Related
I'm making a site which will have reviews of the privacy policies of hundreds of thousands of other sites on the internet. Its initial content is based on my running through the CommonCrawl 5 billion page web dump and analyzing all the privacy policies with a script, to identify certain characteristics (e.g. "Sells your personal info").
According to the SEO MOZ Beginner's Guide to SEO:
Search engines tend to only crawl about 100 links on any given page.
This loose restriction is necessary to keep down on spam and conserve
rankings.
I was wondering what would be a smart way to create a web of navigation that leaves no page orphaned, but would still avoid this SEO penalty they speak of. I have a few ideas:
Create alphabetical pages (or Google Sitemap .xml's), like "Sites beginning with Ado*". And it would link "Adobe.com" there for example. This, or any other meaningless split of the pages, seems kind of contrived and I wonder whether Google might not like it.
Using meta keywords or descriptions to categorize
Find some way to apply more interesting categories, such as geographical or content-based. My concern here is I'm not sure how I would be able to apply such categories across the board to so many sites. I suppose if need be I could write another classifier to try and analyze the content of the pages from the crawl. Sounds like a big job in and of itself though.
Use the DMOZ project to help categorize the pages.
Wikipedia and StackOverflow have obviously solved this problem very well by allowing users to categorize or tag all of the pages. In my case I don't have that luxury, but I want to find the best option available.
At the core of this question is how Google responds to different navigation structures. Does it penalize those who create a web of pages in a programmatic/meaningless way? Or does it not care so long as everything is connected via links?
Google PageRank does not penalize you for having >100 links on a page. But each link above a certain threshold decreases in value/importance in the PageRank algorithm.
Quoting SEOMOZ and Matt Cutts:
Could You Be Penalized?
Before we dig in too deep, I want to make it clear that the 100-link
limit has never been a penalty situation. In an August 2007 interview,
Rand quotes Matt Cutts as saying:
The "keep the number of links to under 100" is in the technical
guideline section, not the quality guidelines section. That means
we're not going to remove a page if you have 101 or 102 links on the
page. Think of this more as a rule of thumb.
At the time, it's likely
that Google started ignoring links after a certain point, but at worst
this kept those post-100 links from passing PageRank. The page itself
wasn't going to be de-indexed or penalized.
So the question really is how to get Google to take all your links seriously. You accomplish this by generating a XML sitemap for Google to crawl (you can either have a static sitemap.xml file, or its content can be dynamically generated). You will want to read up on the About Sitemaps section of the Google Webmaster Tools help documents.
Just like having too many links on a page is an issue,having too many links in a XML sitemap file is also an issue. What you need to do is paginate your XML sitemap. Jeff Atwood talks about how StackOverflow implements this: The Importance of Sitemaps. Jeff also discusses the same issue on StackOverflow podcast #24.
Also, this concept applies to Bing as well.
Was wondering if there are some markups in schema.org for a search results page which Google currently honors .. I was trying
ItemList (http://schema.org/ItemList)
and
AggregateOffer (http://schema.org/AggregateOffer),
but none of them seems to be coming up on Google yet (as in they still dont support it or show up that markup on the search page). Are there any other markups I can try ?
Thank you :)
Search for a restaurant, place, or product and you'll see microformats that google recognizes and uses to format its search results. Yelp reviews all also have a price range. They are used widely. I am pretty sure they use the Places stuff widely as well, and believe I have seen cases of books having author name and so on displayed.
But...
How they are used, in what cases, for what sites, and for what queries google decides to use this information is entirely up to the search engine.
Within weeks of announcements about microformats for product ratings, sites entirely unrelated to the topic were adding microformats having product rating information, so think of them as a hint that Google (and other SE's) might use in some cases when they are confident that it's accurate and helpful.
It might just take time for Google to trust your site.
If you type in "Boyce Avenue" on Google, it shows a rich snippet of their upcoming events which I'm assuming is using Event Information. However, if you type in "boyceavenue.com/tour.html" in Google's rich snippet tool, nothing shows up. Why does this happen?
I have answered this here in your other question.
Google and other search engines are using various data sources for generating Rich Snippets and similar functions. Rich meta-data crawled from the pages, be it RDFa, microdata, or microformats, is only ONE (yet important) source.
Google seems to have bilateral agreements with -- typically large -- sites for consuming their structured data directly. This why you may see rich snippets on pages that have no data markup in the HTML.
Google Fusion Tables (http://www.google.com/fusiontables/Home/) may also become a technique for exposing structured data to Google.
However, in general, using schema.org or http://purl.org/goodrelations/ (for ecommerce) markup in RDFa or microdata syntax is IMO the best option for new sites,
because the data will be accessible to search engines and browser extensions alike and
because the data will also send relevance signals to search engines (see http://wiki.goodrelations-vocabulary.org/GoodRelations_for_Semantic_SEO for more details.
Best wishes
Martin Hepp
I would like to know what are the main differences between Microformat's hRecipe and Schema.org's Recipe and how search engines treat each one.
Besides the differences in code and the fact that the former is open while the latter is propietary, how do search engines treat each one and which one is better to implement, both from a long-term perspective and a SEO perspective?
Schema.org with Google, Bing, Yahoo!, and Yandex
Since you asked this question, Microformat's hRecipe has been updated with microformats2 as h-recipe, but otherwise your question remains relevant and is worth answering more than 6 years later.
…how do search engines treat each one…?
Search engine giants, Google, Microsoft (Bing), and Yahoo!, along with Yandex (a popular search engine in Russia and elsewhere globally) collaborated to create Schema.org and the schemas therein.
This collaboration is the biggest differentiator between Schema.org and Microformats; it does and will likely continue to have an impact on how each treats schemas defined by other parties.
You can read about why they created it and how they treat other formats in the Schema.org FAQ.
Specifically, you may be interested in their answers to…
What is the purpose of schema.org?
Why are Google, Bing, Yandex and Yahoo! collaborating?
I have already added markup in some other format (i.e. microformats, RDFa, data-vocabulary.org, etc). Do I need to change anything on my site?
Why microdata? Why not RDFa or microformats?
Why don't you support other vocabularies such as FOAF, SKOS, etc?
…which one is better to implement, both from a long-term perspective and a SEO perspective?
The schema better to implement is the one with the most support; in this case, that appears to be Schema.org's Recipe. While all of the above search engines still support microformats, mentions of it have disappeared from some of Google's official documentation regarding structured data and rich snippets.
Interestingly, Google recommends a newer syntax for structured data called JSON-LD.
JSON-LD: The future of structured data?
From a long-term perspective, you may want to consider adopting the evermore popular JSON-LD markup syntax with the Schema.org Recipe schema, which even Bing is supporting now ( here are examples demonstrating it ) despite their documentation having no mention of it.
Pinterest's interesting support
The popular content discovery platform Pinterest supports both schemas and even supports the new JSON-LD syntax (though it is not explicitly mentioned in their documentation).
Despite Schema.org's growing popularity and adoption, Pinterest offers seemingly greater support for the h-recipe microformat with their inclusion of e-instructions as a supported class, whereas Schema.org's corresponding recipeInstructions property is not a supported property.
It's unclear if this is intentional or even which schema they actually prefer, but it is worth keeping in mind if you intend to develop specifically for this platform.
hRecipe is based on class attributes while schema's Recipe is based on multiple attributes. those are the main differences in the markup; hRecipe is backwards compatible whereas Recipe is not, because it's using html5 data attributes.
the big three search engines say that they'll treat both the same, however i don't buy that; Google has been pushing their web platform(s) long enough for me to think that they'll be adding extra juice to Recipe, even though i can't prove it. even if they aren't throwing extra seo at Recipe, you can be sure that they'll work something into SERPS so that if you are using their proprietary markup, you get noticed....more. take the link element's prefetch and prender attributes as an example; google created prerender and if you use it on your site, voila, it prerenders in SERPS for the user. prefetch does not.
i'm not sure how to differentiate between a long-term perspective or an seo perspective, i look # them the same; i'm not saying that you can't, just trying to explain more. i have thought this over before from a clients perspective and asked myself these same questions in regards to microformats as a whole vs. schema. it's basically a judgement call: microformats are tried and true format; there are millions more sites using micoformatted data than there are using schema's. they aren't going anywhere. and (as noted earlier) they are backwards compatible.
that said, schema is backed by the big three, and being html5 based, shouldn't have portability problems in the future. also previously mentioned, i'm sure all three will be rewarding users (though i have no proof) in their respective search results. one caveat here though, is how fast everything on the web is moving; just as quickly as Schema popped up, it could conceivably be dropped. i doubt it (though i'm hoping) but it is a possibility.
i can't say which is better to implement, but microformats are certainly much easier to implement, they're class based and so freaking easy.
It is better to use the schema.org formats as that has been accepted as standard by all of the major search engines (Google, Yahoo, and Bing). Using an alternative microformat may mean that some of the search engines will not recognize that data as being special and losing any possible advantages it offers.
Pretty much that is the question. Is there a way that is more efficient than the standart sitemap.xml to [add/force recrawl/remove] i.e. manage your website's index entries in google?
I remember a few years ago I was reading an article of an unknown blogger that was saying that when he write news in his website, the url entry of the news will appear immediately in google's search result. I think he was mentioning about something special. I don't remember exactly what.. . some automatic re-crawling system that is offered by google themselves? However, I'm not sure about it. So I ask, do you think that I am blundering myself and there is NO OTHER way to manage index content besides sitemap.xml ? I just need to be sure about this.
Thank you.
I don't think you will find that magical "silver bullet" answer you're looking for, but here's some additional information and tips that may help:
Depth of crawl and rate of crawl is directly influenced by PageRank (one of the few things it does influence). So increasing your site's homepage and internal pages back-link count and quality will assist you.
QDF - this Google algorithm factor, "Query Deserves Freshness", does have a real impact and is one of the core reasons behind the Google Caffeine infrastructure project to allow much faster finding of fresh content. This is one of the main reasons that blogs and sites like SE do well - because the content is "fresh" and matches the query.
XML sitemaps do help with indexation, but they won't result in better ranking. Use them to assist search bots to find content that is deep in your architecture.
Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content - this can also ensure the search engines become immediately aware of it.
Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site (use with caution so as to not remove high value content).
Many of these topics are covered online, but there are other intrinsic things like navigational structure, internal linking, site architecture etc that also contribute just as much as any "trick" or "device".
Getting many links, from good sites, to your website will make the Google "spiders" reach your site faster.
Also links from social sites like Twitter can help the crawlers visit your site (although the Twitter links do not pass "link juice" - the spiders still go through them).
One last thing, update your content regularly, think of content as "Google Spider Food". If the spiders will come to your site, and will not find new food, they will not come back again soon, if each time they come, there is new food, they will come a lot. Article directories for example, get indexed several times a day.