Which microformats should my blog implement? - semantics

Or microdata, RDF(a) or others.
'entities' a blog has would include posts, comments, taxonomies and users.
For posts I found BlogPosting and hAtom, which is a draft spec.
hCard and rel="tag" come to mind for users and taxonomies, but what do you think?

Big discussion in the community about this specific topic. I would go for schema.org - Bing's, Google's and yahoo's recent schema proposal. That is (quoting them):
This site provides a collection of
schemas, i.e., html tags, that
webmasters can use to markup their
pages in ways recognized by major
search providers.
... see BlogPosting for their specific schema for blog postings.
And they also provide a mapping data model to use RDFa ... see this other one
More RDFa related, there is a port of schema.org by the Linked Data comunity RDFa using URIs here, quoting again ...
This site is a complementary effort by
people from the Linked Data community
to express the terms provided by the
Schema.org consortium in RDF. We
currently provide static RDFS
documents of the Schema.org terms in
the formats listed below - and yes,
we're heavily working on more ;)
Which one to use ? As I said, there's a big discussion going on right now around this issue.

Related

Microdata or JSON-LD? I'm confused

I haven't found a clear and updated answer, even after googling for a few hours, so here it goes:
I am aware of the advantages and disadvantages of both Microdata and JSON-LD. I also know that Microdata was dropped from W3C (and consequently from the browsers' API). What I'm not sure about is that how it will affect any site where Microdata is used specifically for SEO purpose.
Does Google support JSON-LD for SERPs? What format does it recommend to use? I am looking for updated answers - not from 2011 or 2012 (if they are still applicable though, feel free to post it).
What is more appropriate for a dynamic site with lots of contents (think: 50000 videos, images etc): JSON-LD, Microdata or RDFa? Why?
Consumers that support Microdata support Microdata, no matter if or where Microdata is specified.
It’s conceivable that new consumers might decide not to support it, but the syntax is still very popular and still part of WHATWG’s HTML Living Standard, so it’s probably not going to vanish.
About the consumer Google
Some years ago, JSON-LD was not supported for many of their features, and they recommended that authors use Microdata (and they supported RDFa, too). Today it’s different.
See Google’s Markup formats and placement:
JSON-LD is the recommended format. Google is in the process of adding JSON-LD support for all markup-powered features. The table below lists the exceptions to this. We recommend using JSON-LD where possible.
According to the mentioned table, Microdata and RDFa support all of Google’s data types, while JSON-LD supports everything except their Breadcrumbs feature.
I wouldn’t give much weight to their recommendation. They say that "Structured data markup is most easily represented in JSON-LD format", but I think it’s safe to say that this only applies to authors that generate the structured data programmatically (especially from tools that support JSON).
For authors that manually add the structured data markup, it’s typically easier to use Microdata or RDFa, and using these syntaxes minimizes the risk that an author updates the content without updating the structured data, too (see DRY principle).
JSON-LD vs. Microdata vs. RDFa
Unless you know (and care for) consumers that don’t support all three syntaxes, it doesn’t matter. Use what is easier for you and your tools.
If you have no preference, I would say JSON-LD or RDFa, because contrary to Microdata,
both are W3C Recommendations,
both can be used in non-HTML5 contexts,
both allow to (easily) mix several vocabularies.
JSON-LD if you like your structured data not "intermingled" with your markup (= duplicating the content), RDFa if you like to use your existing markup (= not duplicating the content).
I've opted to go for JSON-LD because it is easier to read and compile. Spotting errors is easy for more complicated dictionaries. It is the W3C and Google recommended standard.
One caveat (major if you need to support it), is that as of May 16 2017, Bing STILL doesn't support JSON-LD
Google's Understand how structured data works now says:
Google recommends using JSON-LD for structured data whenever possible.
It seems reasonable to me to still mix in microdata to avoid duplication of long content, such as articleBody, but generally the industry is JSON-LD all the way.
I discovered that JSON-LD does support breadcrumbs. I applied breadcrumbs using the latest version of Yoast on my wordpress site, and it passed muster with google search console in the rich results test of the live page as well as a crawl of the live page after submitting the sitemap.
It should be noted that Google had deprecated the use of data-vocabulary.org. It wants schema.org.
microdata easy to use with angular 8+
but you can do the same thing with json-ld.
Humanly, you can read attributs easiest with json-ld but there is no big difference between both. Just use what you know how to do to win time

Microdata - itemid / global identifier conventions for organizations, business or brands markup with schema.org

My question is the following: when marking up an organization, business or brand with microdata and schema.org, should I use as a global identifier it's official webpage URL? Is there any kind of better reference that I could use (like IMDB for movies or actors)?
I'd like to know if there's any standard, convention or common practice recommended.
It would be better to use some kind of controlled vocabulary (e.g. VIAF) that uniqely identifies the organization in question.
The choice of identifiers is part of the explanation of REST. http://www.infoq.com/articles/rest-introduction
Inspect closely the first principle (for convention), though it is in broader terms of resources rather than specific to org/biz/brand. REST is the thesis that started this trend. Microformats accordingly makes use of rel="profile" link tags. The concept is further expanded at http://purl.org/ so, if IMDB, for example, switches to W3 like W3C did, then in the future this will minimize the impact on the application you're making right now. RDFa Dublin Core vocab's use of this is seen in the profile at http://www.w3.org/2011/rdfa-context/rdfa-1.1.html.
(For references) Applications serving general public or open initiatives such as academic support might be better served by these profiles, however when operating a site for commercial purposes, building application-specific "custom" profiles considering various legal matters identified, that should perform reliably with PURLs, might be advantageous to build credible reputation.
Finally, WHATWG considers prefixes too advanced and HTML5 for newbies only, so the support for W3's XHTML xmlns/RDFa prefix is dropped in microdata. This compels us to reuse long-form URLs for schema.org business/org/brand resources with microdata syntax. The "custom" profile then serves as mere good-will when picking up from where tasks are wrapped up, otherwise a more variety of items might appear in the content than actually intended, owing to mix-ups.
The good news is, Google supports schema.org usage as a vocab in RDFa syntax. So considering RDFa as an already "living" standard that originated in W3 spec, as per the (non-)commercial nature of application, defining PURL for scope namespaces, profiles exhibiting prefixes, and syntax (of official web-page or substitute IRIs) as per target processors is the way to go. Currently no vocab besides schema is processed as microdata, and schema in RDFa isn't supported by anybody but Google!

Knowing what RDFA vocabulary to use

How do we know which vocabulary/namespace to use to describe data with RDFa?
I have seen a lot of examples that use xmlns:dcterms="http://purl.org/dc/terms/" or xmlns:sioc="http://rdfs.org/sioc/ns#" then there is this video that uses FOAF vocabulary.
This is all pretty confusing and I am not sure what these vocabularies mean or what is best to use for the data I am describing. Is there some trick I am missing?
There are many vocabularies. And you could create your own, too, of course (but you probably shouldn’t before you checked possible alternatives).
You’d have to look for vocabularies for your specific needs, for example
by browsing and searching on http://lov.okfn.org/dataset/lov/ (they collect and index open vocabularies),
on W3C’s RDFa Core Initial Context (it lists vocabularies that have pre-defined prefixes for use with RDFa), or
by browsing through http://prefix.cc/ (it’s a lookup for typically used namespaces, but you might get an overview by that).
After some time you get to know the big/broad ones: Schema.org, Dublin Core, FOAF, RSS, SKOS, SIOC, vCard, DOAP, Open Graph, Ontology for Media Resources, GoodRelations, DBpedia Ontology, ….
The simplest thing is to check if schema.org covers your needs. Schema.org is backed by Google and the other major search engines and generally pretty awesome.
If it doesn't suit your needs, then enter a few of the terms you need into a vocabulary search engine. My recommendation is LOV.
Another option is to just ask the community about the best vocabularies for the specific domain you need to represent. A good place is answers.semanticweb.com, which is like StackOverflow but with more RDF experts hanging out.
Things have changed quite a bit since that video was posted. First, like Richard said, you should check if schema.org fits your needs. Personally when I need to describe something that's not covered on schema.org, I check LOV as well. If, and only if I can't find anything in LOV, I will then consider creating a new type or property. A quick way to do this is to use http://open.vocab.org/
A newer version of RDFa was published since that video was released: RDFa 1.1 and RDFa Lite. If you want to use schema.org only, I'd recommend to check http://www.w3.org/TR/rdfa-lite/
Vocabularies are usually domain specific. The xmlns line is deprecated. The RDFa profile at http://www.w3.org/profile/rdfa-1.1 lists the vocabularies available as part of initial context. Sometimes vocabularies may overlap in the context of your data. Analogous to solving math prb by either Algebraic or Geometric or other technique, mixing up vocabularies is fine. Equal terms can be found using http://sameas.org/ For addressing your consumer base's favoritism amongst vocab recognition, skos:closeMatch and skos:exactMatch may be used, eg. "gr:Brand skos:closeMatch owl:Thing" with any terms you please. Prefix attribute can be used with vocabularies besides those covered by initial context like: prefix="fb: http://ogp.me/ns/fb# vocab2: path2 ..." For cross-cutting concern across different domain vocabularies such as customizing presentation in search results microdata using schema.org guidelines should be beneficial. However, as this has nothing to do with specialization in any peculiar domain, prefixes are unavailable in this syntax. RDFa vocab have been helpful in such specific domain contexts that content seems to appeal further to participative audience while microdata targets those who've lost their way. For tasks that are too simple to merit full-fledged vocab, but have semantic implications, try http://microformats.org/ Interchanging usage of REST profile URIs for vocabs amongst the 3 syntaxes is valid, but useless owing to lack of affordable manpower to implement alternative support for the vocabs on the Web scale. How & why schema.org vocab merited separate microdata syntax of its own is discussed by Google employee Ian Hickson a. k. a. Hixie- the editor of WHATWG HTML5 draft at http://logbot.glob.com.au/?c=freenode%23whatwg&s=28+Nov+2012&e=28+Nov+2012#c747855 or http://krijnhoetmer.nl/irc-logs/whatwg/20121128#l-1122 If only Google had smart enough employees to implement parser for 1 syntax whose WG included its own employee also, then RDFa Lite inside RDFa would have been another course like Core Java within Java, & no need of separate microdata named mocking rip-off, but alas- our's is an imperfect world!

Microformat's hRecipe vs. Schema's Recipe

I would like to know what are the main differences between Microformat's hRecipe and Schema.org's Recipe and how search engines treat each one.
Besides the differences in code and the fact that the former is open while the latter is propietary, how do search engines treat each one and which one is better to implement, both from a long-term perspective and a SEO perspective?
Schema.org with Google, Bing, Yahoo!, and Yandex
Since you asked this question, Microformat's hRecipe has been updated with microformats2 as h-recipe, but otherwise your question remains relevant and is worth answering more than 6 years later.
…how do search engines treat each one…?
Search engine giants, Google, Microsoft (Bing), and Yahoo!, along with Yandex (a popular search engine in Russia and elsewhere globally) collaborated to create Schema.org and the schemas therein.
This collaboration is the biggest differentiator between Schema.org and Microformats; it does and will likely continue to have an impact on how each treats schemas defined by other parties.
You can read about why they created it and how they treat other formats in the Schema.org FAQ.
Specifically, you may be interested in their answers to…
What is the purpose of schema.org?
Why are Google, Bing, Yandex and Yahoo! collaborating?
I have already added markup in some other format (i.e. microformats, RDFa, data-vocabulary.org, etc). Do I need to change anything on my site?
Why microdata? Why not RDFa or microformats?
Why don't you support other vocabularies such as FOAF, SKOS, etc?
…which one is better to implement, both from a long-term perspective and a SEO perspective?
The schema better to implement is the one with the most support; in this case, that appears to be Schema.org's Recipe. While all of the above search engines still support microformats, mentions of it have disappeared from some of Google's official documentation regarding structured data and rich snippets.
Interestingly, Google recommends a newer syntax for structured data called JSON-LD.
JSON-LD: The future of structured data?
From a long-term perspective, you may want to consider adopting the evermore popular JSON-LD markup syntax with the Schema.org Recipe schema, which even Bing is supporting now ( here are examples demonstrating it ) despite their documentation having no mention of it.
Pinterest's interesting support
The popular content discovery platform Pinterest supports both schemas and even supports the new JSON-LD syntax (though it is not explicitly mentioned in their documentation).
Despite Schema.org's growing popularity and adoption, Pinterest offers seemingly greater support for the h-recipe microformat with their inclusion of e-instructions as a supported class, whereas Schema.org's corresponding recipeInstructions property is not a supported property.
It's unclear if this is intentional or even which schema they actually prefer, but it is worth keeping in mind if you intend to develop specifically for this platform.
hRecipe is based on class attributes while schema's Recipe is based on multiple attributes. those are the main differences in the markup; hRecipe is backwards compatible whereas Recipe is not, because it's using html5 data attributes.
the big three search engines say that they'll treat both the same, however i don't buy that; Google has been pushing their web platform(s) long enough for me to think that they'll be adding extra juice to Recipe, even though i can't prove it. even if they aren't throwing extra seo at Recipe, you can be sure that they'll work something into SERPS so that if you are using their proprietary markup, you get noticed....more. take the link element's prefetch and prender attributes as an example; google created prerender and if you use it on your site, voila, it prerenders in SERPS for the user. prefetch does not.
i'm not sure how to differentiate between a long-term perspective or an seo perspective, i look # them the same; i'm not saying that you can't, just trying to explain more. i have thought this over before from a clients perspective and asked myself these same questions in regards to microformats as a whole vs. schema. it's basically a judgement call: microformats are tried and true format; there are millions more sites using micoformatted data than there are using schema's. they aren't going anywhere. and (as noted earlier) they are backwards compatible.
that said, schema is backed by the big three, and being html5 based, shouldn't have portability problems in the future. also previously mentioned, i'm sure all three will be rewarding users (though i have no proof) in their respective search results. one caveat here though, is how fast everything on the web is moving; just as quickly as Schema popped up, it could conceivably be dropped. i doubt it (though i'm hoping) but it is a possibility.
i can't say which is better to implement, but microformats are certainly much easier to implement, they're class based and so freaking easy.
It is better to use the schema.org formats as that has been accepted as standard by all of the major search engines (Google, Yahoo, and Bing). Using an alternative microformat may mean that some of the search engines will not recognize that data as being special and losing any possible advantages it offers.

Where can I find a good collection of public domain owl ontologies for various domains?

I am building an ontology-processing tool and need lots of examples of various owl ontologies, as people are building and using them in the real world. I'm not talking about foundational ontologies such as Cyc, I'm talking about smaller, domain-specific ones.
There's no definitive collection afaik, but these links all have useful collections of OWL and RDFS ontologies:
schemaweb.info
vocab.org
owlseek
linking open data constellation
RDF schema registry (rather old now)
In addition, there are some general-purpose RDF/RDFS/OWL search engines you may find helpful:
sindice
swoogle
Ian
My go-to site for this probably didn't exist at the time of the question. For latecomers like me:
Linked Open Vocabularies
I wish I'd found it much sooner!
It's well-groomed, maintained, has all the most-popular ontologies, and has a good search engine. However, it doesn't include some specialized collections, most notably, (most of?) the stuff in OBO Foundry.
Thanks! A couple more I found:
OntoSelect - browsable ontology repository
Protege Ontology Library
CO-ODE Ontologies
Within the life-science domain, the publically abvailable ontologies can be found listed on the OBO Foundry site. These ontologies can be queried via the ontology lookup service or the NCBO's Bioportal, which also contains additional resources.
One more concept search tool: falcons
There is also one good web engine for searching for ontologies. It is called Watson Semantic Web Search and you can try it here.