Microdata - itemid / global identifier conventions for organizations, business or brands markup with schema.org - semantic-web

My question is the following: when marking up an organization, business or brand with microdata and schema.org, should I use as a global identifier it's official webpage URL? Is there any kind of better reference that I could use (like IMDB for movies or actors)?
I'd like to know if there's any standard, convention or common practice recommended.

It would be better to use some kind of controlled vocabulary (e.g. VIAF) that uniqely identifies the organization in question.

The choice of identifiers is part of the explanation of REST. http://www.infoq.com/articles/rest-introduction
Inspect closely the first principle (for convention), though it is in broader terms of resources rather than specific to org/biz/brand. REST is the thesis that started this trend. Microformats accordingly makes use of rel="profile" link tags. The concept is further expanded at http://purl.org/ so, if IMDB, for example, switches to W3 like W3C did, then in the future this will minimize the impact on the application you're making right now. RDFa Dublin Core vocab's use of this is seen in the profile at http://www.w3.org/2011/rdfa-context/rdfa-1.1.html.
(For references) Applications serving general public or open initiatives such as academic support might be better served by these profiles, however when operating a site for commercial purposes, building application-specific "custom" profiles considering various legal matters identified, that should perform reliably with PURLs, might be advantageous to build credible reputation.
Finally, WHATWG considers prefixes too advanced and HTML5 for newbies only, so the support for W3's XHTML xmlns/RDFa prefix is dropped in microdata. This compels us to reuse long-form URLs for schema.org business/org/brand resources with microdata syntax. The "custom" profile then serves as mere good-will when picking up from where tasks are wrapped up, otherwise a more variety of items might appear in the content than actually intended, owing to mix-ups.
The good news is, Google supports schema.org usage as a vocab in RDFa syntax. So considering RDFa as an already "living" standard that originated in W3 spec, as per the (non-)commercial nature of application, defining PURL for scope namespaces, profiles exhibiting prefixes, and syntax (of official web-page or substitute IRIs) as per target processors is the way to go. Currently no vocab besides schema is processed as microdata, and schema in RDFa isn't supported by anybody but Google!

Related

Are nested resources RESTful?

How architecturally sound and up to industry standards nesting resource representations in REST APIs is, especially when it comes to nested lists of resources (like books of an author)?
I'm interested in finding links to authoritative sources that answer to this question.
The authoritative source for REST is the dissertation of Roy Fielding, based on work he did during the standardization of HTTP/1.1 (RFC 2068, RFC 2616, etc) in the 1990s.
REST defines resource ("Any information that can be named can be a resource..."), and requires that all resources understand messages the same way (uniform interface) but does not actually constrain your resource model.
"RESTful", historically, is context sensitive; in practice it means something like "more like REST than our current designs". In the web services community, it meant "more like REST than WS-* and SOAP". In Rails, it meant more like REST than the resource models that were recommended prior to Rails 1.2. And so on.
If what you are interested in is describing the relationship between a resource that is a collection and a resource that is an item in that collection, then the standard you want is RFC 6573.
But again, it doesn't tell you how to design the resources, or how to design the identifiers for those resources -- it just tells you how to indicate a relationship between them.
As far as I understand the web resource is something abstract identified by the IRIs and accessible through the web. What dereferencing the IRI gives back is the representation of the actual state of the identified resource, this is why it is called representational state transfer. I don't remember any standard that discusses nested resources. Maybe RDF is the closest what you are looking for. In practice if we follow RDF concepts, then to answer a GET request the REST API responds with a representation of an RDF subgraph starting with the resource indentified by the giving IRI and it can be any level deep. Nestedness is not something I would consider here, because it is a graph, not a hierarchy, it is sort of expanding relationships between resources or returning hyperlinks the API consumers can follow to do the exact same thing.
Not sure if this helps. I did not find any RFC beyond what VoiceOfUnreason's answer contains, I remember to read explicitly about web resources and identifying real things with hashtags or non-dereferenceable IRIs in an RFC 5+ years ago, but I have no idea which one it was. Maybe it was the Lanthaler dissertation or the SemWeb document VoiceOfUnreason suggested. What is certain it was somehow connected to the semantic web and RDF.
REST’s identification of resources constraint requires that resources
are identifiable so that they can be accessed and manipulated via
generic interfaces. On the Web, resources are identified by IRIs [44].
Since a resource may represent con- cepts which cannot be serialized
into a byte stream (e.g., persons or a feeling), resources are not
manipulated directly. Instead, REST is built on the concept of
manipulation of resources through representations; i.e., an additional
layer of indirection in the form of resource representations is
introduced.
https://www.markus-lanthaler.com/research/third-generation-web-apis-bridging-the-gap-between-rest-and-linked-data.pdf
On the Semantic Web, all information has to be expressed as statements
about resources, like the members of the company Example.com are Alice
and Bob or Bob's telephone number is "+1 555 262" or this Web page was
created by Alice. Resources are identified by Uniform Resource
Identifiers (URIs) [RFC3986]. This modelling approach is at the heart
of Resource Description Framework (RDF) [RDFPrimer]. A nice
introduction is given in the N3 primer [N3Primer].
Using RDF, the statements can be published on the Web site of the
company. Others can read the data and publish their own information,
linking to existing resources. This forms a distributed model of the
world. It allows the user to pick any application to view and work
with the same data, for example to see Alice's published address in
your address book.
https://www.w3.org/TR/cooluris/#semweb
So what I want to say that what you see in the HTTP response is not the resource itself, just a representation of it and its relationship to other resources.
REST does not have a constraint which tells you how verbose that response must be. It just tells you that you must use hyperlinks to connect resources and that you must use standard MIME types and document your API. At least this is how I interpret the uniform interface constraint.
I think the question is very good, because this part of the architecture is open and there were many questions in the past years which ask how to use the URIs for querying nested resources. The answer is always that REST does not cover it, the URI and URI template standards don't cover it either. There are standards like OData and Hydra, which have suggestions, but it is just up to you. Your problem is connected to it, because it asks how verbose a response to such a query can be. It is not covered as well as far as I can tell, but what is certain that it can and must contain at least hyperlinks to other resources. RDF allows describing several resources in a single document, so if we extend the RDF approach to REST, which does not say this is forbidden, then I guess we can do it.
From practical perspective for example a collection is a sort of nested resource too and if the API consumer would send a dedicated request for every collection item just to know basic things like product names, then it would be wasting resources. Normally we respond this kind of requests with a single HTTP response or multiple ones with 25-50-100 items on a page. It does not make much sense from usability and scalability perspective to give hyperlinks to the consumer for each item and force them to follow those links one by one. In fact we like to respond with the exact view model the consumer needs and design APIs this way. I think the same is true for nested properties as well. From RDF perspective these responses represent a subgraph of a massive resource graph, which are managed by the REST service and by for example RDF vocabulary maintainers like OWL, Schema.org, etc.
So to have a one sentence answer: the representation of "nested resources" is not covered by REST and as far as I know not covered by standards like HTTP and URI either, but currently it is the best practice to use them and MIME types we frequently use for REST e.g. HAL+JSON or RDF/JSON-LD support nested representations too, so I would say yes.

What would be REST way to get relevant resources?

Let's say I have an API to get a product given an product ID as: api/products/<productid>.
What would be a REST way to get relevant products given a product ID. I think I can use a query on the same endpoint as api/products?id=<productid>, but not sure if this is ideal or it might be confusing.
Standard practice for making api url is api/products/<productid>.
for params other than id (generally for filtering or searching purpose), query params are considered api/products?name=somename
The detailed resource naming guide can be found here
What would be REST way to get relevant resources?
Answer this question: how would you do it on the web?
You might have a web page for the product, and then a link from that page to a new page describing the related resources, where each entry in that page would include links to the product page of the specific resources.
What we need to define for the client (or make it easy to discover) is how to find the links.
In the case of the web, we are typically using HTML representation of our pages. HTML is special in that it has a standardized understanding of links. For human consumers, we surround the link with context (typically, the contents of the A element); for machines, we should be a bit more precise about defining the rule.
For representations that don't have a standardized understanding of links, we need to do something else. The most common answer is to use Web Linking, which is to say we put a description of the relationship between to URI into the headers of the response.
REST doesn't care what spelling conventions you use for your resource identifiers, so long as they are consistent with the production rules defined in RFC 3896. So you can choose any spelling convention you like.
For instance, it is common to use spelling conventions that include key value pairs in the query part of the URI, because HTML GET forms can be used to describe links with that shape, which simplifies certain use cases.
Since REST doesn't care about your spelling conventions, you can use the extra freedom to address other problems (what spellings are easy for people reading the logs? what spellings are easy for people documenting the API? and so on....)

Microdata or JSON-LD? I'm confused

I haven't found a clear and updated answer, even after googling for a few hours, so here it goes:
I am aware of the advantages and disadvantages of both Microdata and JSON-LD. I also know that Microdata was dropped from W3C (and consequently from the browsers' API). What I'm not sure about is that how it will affect any site where Microdata is used specifically for SEO purpose.
Does Google support JSON-LD for SERPs? What format does it recommend to use? I am looking for updated answers - not from 2011 or 2012 (if they are still applicable though, feel free to post it).
What is more appropriate for a dynamic site with lots of contents (think: 50000 videos, images etc): JSON-LD, Microdata or RDFa? Why?
Consumers that support Microdata support Microdata, no matter if or where Microdata is specified.
It’s conceivable that new consumers might decide not to support it, but the syntax is still very popular and still part of WHATWG’s HTML Living Standard, so it’s probably not going to vanish.
About the consumer Google
Some years ago, JSON-LD was not supported for many of their features, and they recommended that authors use Microdata (and they supported RDFa, too). Today it’s different.
See Google’s Markup formats and placement:
JSON-LD is the recommended format. Google is in the process of adding JSON-LD support for all markup-powered features. The table below lists the exceptions to this. We recommend using JSON-LD where possible.
According to the mentioned table, Microdata and RDFa support all of Google’s data types, while JSON-LD supports everything except their Breadcrumbs feature.
I wouldn’t give much weight to their recommendation. They say that "Structured data markup is most easily represented in JSON-LD format", but I think it’s safe to say that this only applies to authors that generate the structured data programmatically (especially from tools that support JSON).
For authors that manually add the structured data markup, it’s typically easier to use Microdata or RDFa, and using these syntaxes minimizes the risk that an author updates the content without updating the structured data, too (see DRY principle).
JSON-LD vs. Microdata vs. RDFa
Unless you know (and care for) consumers that don’t support all three syntaxes, it doesn’t matter. Use what is easier for you and your tools.
If you have no preference, I would say JSON-LD or RDFa, because contrary to Microdata,
both are W3C Recommendations,
both can be used in non-HTML5 contexts,
both allow to (easily) mix several vocabularies.
JSON-LD if you like your structured data not "intermingled" with your markup (= duplicating the content), RDFa if you like to use your existing markup (= not duplicating the content).
I've opted to go for JSON-LD because it is easier to read and compile. Spotting errors is easy for more complicated dictionaries. It is the W3C and Google recommended standard.
One caveat (major if you need to support it), is that as of May 16 2017, Bing STILL doesn't support JSON-LD
Google's Understand how structured data works now says:
Google recommends using JSON-LD for structured data whenever possible.
It seems reasonable to me to still mix in microdata to avoid duplication of long content, such as articleBody, but generally the industry is JSON-LD all the way.
I discovered that JSON-LD does support breadcrumbs. I applied breadcrumbs using the latest version of Yoast on my wordpress site, and it passed muster with google search console in the rich results test of the live page as well as a crawl of the live page after submitting the sitemap.
It should be noted that Google had deprecated the use of data-vocabulary.org. It wants schema.org.
microdata easy to use with angular 8+
but you can do the same thing with json-ld.
Humanly, you can read attributs easiest with json-ld but there is no big difference between both. Just use what you know how to do to win time

Knowing what RDFA vocabulary to use

How do we know which vocabulary/namespace to use to describe data with RDFa?
I have seen a lot of examples that use xmlns:dcterms="http://purl.org/dc/terms/" or xmlns:sioc="http://rdfs.org/sioc/ns#" then there is this video that uses FOAF vocabulary.
This is all pretty confusing and I am not sure what these vocabularies mean or what is best to use for the data I am describing. Is there some trick I am missing?
There are many vocabularies. And you could create your own, too, of course (but you probably shouldn’t before you checked possible alternatives).
You’d have to look for vocabularies for your specific needs, for example
by browsing and searching on http://lov.okfn.org/dataset/lov/ (they collect and index open vocabularies),
on W3C’s RDFa Core Initial Context (it lists vocabularies that have pre-defined prefixes for use with RDFa), or
by browsing through http://prefix.cc/ (it’s a lookup for typically used namespaces, but you might get an overview by that).
After some time you get to know the big/broad ones: Schema.org, Dublin Core, FOAF, RSS, SKOS, SIOC, vCard, DOAP, Open Graph, Ontology for Media Resources, GoodRelations, DBpedia Ontology, ….
The simplest thing is to check if schema.org covers your needs. Schema.org is backed by Google and the other major search engines and generally pretty awesome.
If it doesn't suit your needs, then enter a few of the terms you need into a vocabulary search engine. My recommendation is LOV.
Another option is to just ask the community about the best vocabularies for the specific domain you need to represent. A good place is answers.semanticweb.com, which is like StackOverflow but with more RDF experts hanging out.
Things have changed quite a bit since that video was posted. First, like Richard said, you should check if schema.org fits your needs. Personally when I need to describe something that's not covered on schema.org, I check LOV as well. If, and only if I can't find anything in LOV, I will then consider creating a new type or property. A quick way to do this is to use http://open.vocab.org/
A newer version of RDFa was published since that video was released: RDFa 1.1 and RDFa Lite. If you want to use schema.org only, I'd recommend to check http://www.w3.org/TR/rdfa-lite/
Vocabularies are usually domain specific. The xmlns line is deprecated. The RDFa profile at http://www.w3.org/profile/rdfa-1.1 lists the vocabularies available as part of initial context. Sometimes vocabularies may overlap in the context of your data. Analogous to solving math prb by either Algebraic or Geometric or other technique, mixing up vocabularies is fine. Equal terms can be found using http://sameas.org/ For addressing your consumer base's favoritism amongst vocab recognition, skos:closeMatch and skos:exactMatch may be used, eg. "gr:Brand skos:closeMatch owl:Thing" with any terms you please. Prefix attribute can be used with vocabularies besides those covered by initial context like: prefix="fb: http://ogp.me/ns/fb# vocab2: path2 ..." For cross-cutting concern across different domain vocabularies such as customizing presentation in search results microdata using schema.org guidelines should be beneficial. However, as this has nothing to do with specialization in any peculiar domain, prefixes are unavailable in this syntax. RDFa vocab have been helpful in such specific domain contexts that content seems to appeal further to participative audience while microdata targets those who've lost their way. For tasks that are too simple to merit full-fledged vocab, but have semantic implications, try http://microformats.org/ Interchanging usage of REST profile URIs for vocabs amongst the 3 syntaxes is valid, but useless owing to lack of affordable manpower to implement alternative support for the vocabs on the Web scale. How & why schema.org vocab merited separate microdata syntax of its own is discussed by Google employee Ian Hickson a. k. a. Hixie- the editor of WHATWG HTML5 draft at http://logbot.glob.com.au/?c=freenode%23whatwg&s=28+Nov+2012&e=28+Nov+2012#c747855 or http://krijnhoetmer.nl/irc-logs/whatwg/20121128#l-1122 If only Google had smart enough employees to implement parser for 1 syntax whose WG included its own employee also, then RDFa Lite inside RDFa would have been another course like Core Java within Java, & no need of separate microdata named mocking rip-off, but alas- our's is an imperfect world!

Microformat's hRecipe vs. Schema's Recipe

I would like to know what are the main differences between Microformat's hRecipe and Schema.org's Recipe and how search engines treat each one.
Besides the differences in code and the fact that the former is open while the latter is propietary, how do search engines treat each one and which one is better to implement, both from a long-term perspective and a SEO perspective?
Schema.org with Google, Bing, Yahoo!, and Yandex
Since you asked this question, Microformat's hRecipe has been updated with microformats2 as h-recipe, but otherwise your question remains relevant and is worth answering more than 6 years later.
…how do search engines treat each one…?
Search engine giants, Google, Microsoft (Bing), and Yahoo!, along with Yandex (a popular search engine in Russia and elsewhere globally) collaborated to create Schema.org and the schemas therein.
This collaboration is the biggest differentiator between Schema.org and Microformats; it does and will likely continue to have an impact on how each treats schemas defined by other parties.
You can read about why they created it and how they treat other formats in the Schema.org FAQ.
Specifically, you may be interested in their answers to…
What is the purpose of schema.org?
Why are Google, Bing, Yandex and Yahoo! collaborating?
I have already added markup in some other format (i.e. microformats, RDFa, data-vocabulary.org, etc). Do I need to change anything on my site?
Why microdata? Why not RDFa or microformats?
Why don't you support other vocabularies such as FOAF, SKOS, etc?
…which one is better to implement, both from a long-term perspective and a SEO perspective?
The schema better to implement is the one with the most support; in this case, that appears to be Schema.org's Recipe. While all of the above search engines still support microformats, mentions of it have disappeared from some of Google's official documentation regarding structured data and rich snippets.
Interestingly, Google recommends a newer syntax for structured data called JSON-LD.
JSON-LD: The future of structured data?
From a long-term perspective, you may want to consider adopting the evermore popular JSON-LD markup syntax with the Schema.org Recipe schema, which even Bing is supporting now ( here are examples demonstrating it ) despite their documentation having no mention of it.
Pinterest's interesting support
The popular content discovery platform Pinterest supports both schemas and even supports the new JSON-LD syntax (though it is not explicitly mentioned in their documentation).
Despite Schema.org's growing popularity and adoption, Pinterest offers seemingly greater support for the h-recipe microformat with their inclusion of e-instructions as a supported class, whereas Schema.org's corresponding recipeInstructions property is not a supported property.
It's unclear if this is intentional or even which schema they actually prefer, but it is worth keeping in mind if you intend to develop specifically for this platform.
hRecipe is based on class attributes while schema's Recipe is based on multiple attributes. those are the main differences in the markup; hRecipe is backwards compatible whereas Recipe is not, because it's using html5 data attributes.
the big three search engines say that they'll treat both the same, however i don't buy that; Google has been pushing their web platform(s) long enough for me to think that they'll be adding extra juice to Recipe, even though i can't prove it. even if they aren't throwing extra seo at Recipe, you can be sure that they'll work something into SERPS so that if you are using their proprietary markup, you get noticed....more. take the link element's prefetch and prender attributes as an example; google created prerender and if you use it on your site, voila, it prerenders in SERPS for the user. prefetch does not.
i'm not sure how to differentiate between a long-term perspective or an seo perspective, i look # them the same; i'm not saying that you can't, just trying to explain more. i have thought this over before from a clients perspective and asked myself these same questions in regards to microformats as a whole vs. schema. it's basically a judgement call: microformats are tried and true format; there are millions more sites using micoformatted data than there are using schema's. they aren't going anywhere. and (as noted earlier) they are backwards compatible.
that said, schema is backed by the big three, and being html5 based, shouldn't have portability problems in the future. also previously mentioned, i'm sure all three will be rewarding users (though i have no proof) in their respective search results. one caveat here though, is how fast everything on the web is moving; just as quickly as Schema popped up, it could conceivably be dropped. i doubt it (though i'm hoping) but it is a possibility.
i can't say which is better to implement, but microformats are certainly much easier to implement, they're class based and so freaking easy.
It is better to use the schema.org formats as that has been accepted as standard by all of the major search engines (Google, Yahoo, and Bing). Using an alternative microformat may mean that some of the search engines will not recognize that data as being special and losing any possible advantages it offers.