I started digging into RDFa recently and try to spice my website with semantic information. The site offers services, events, a blog and may offer products in future. Happily schema.org has coarse but adequate categories for it all. But now it comes to practical questions.
All the examples have all information on a single page, which seems pretty academic to me. E.g. on my landing page is a list with upcoming events. Events have a location property. My events run at 2 different locations. I could paste the location information for each entry in and inflate my html. I'd rather link to pages, which describe the locations and hold full details. Not sure, whether this is what sameAs is for. But even then, how would it know which RDFa information on the target URL should be used as the appropriate vCard?
Similarly, my landing page has only partial company information visible. I could add a lot of <meta>, but again a reference to the contact page would be nice.
I just don't want to believe that this aspect slipped the RDF creators. Are there any best practices for redundancy reduction?
URIs! (or IRIs, e.g., in RDFa 1.1)
That’s one of the primary qualities of RDF, and it makes Linked Data possible, as coined by Tim Berners-Lee (emphasis mine):
The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data.
Like the web of hypertext, the web of data is constructed with documents on the web. However, unlike the web of hypertext, where links are relationships anchors in hypertext documents written in HTML, for data they links between arbitrary things described by RDF
From my answer to a question about the Semantic Web:
Use RDF (in the form of a serialization format of your choice) and define URIs for your entities so that you and other people can make statements about them.
So give all your "entities" an URI and use it as subject resp. object in RDF triples. Note that you may not want to use the same URI which your web pages have, as it would make it hard to distinguish between data about the web page and data about the thing represented by the web page (see my answer describing this in more detail).
So let’s say your website has these two pages:
http://example.com/event/42 (about the event 42, i.e., the HTML page)
http://example.com/location/51 (about the location 51, i.e., the HTML page)
Using the hash URI method, you could mint these URIs:
http://example.com/event/42#it (the event 42, i.e., the real thing)
http://example.com/location/51#it (the location 51, i.e., the real thing)
Now when you want to use the Schema.org vocabulary to give information about your event, you may use resource to give its URI:
<!-- on http://example.com/event/42 -->
<article resource="#it" typeof="schema:Event">
<h1 property="schema:name">Event 42</h1>
</article>
And when you want to specify the event’s location (using Place), you could use the URI of the location:
<!-- on http://example.com/event/42 -->
<article about="#it" typeof="schema:Event">
<h1 property="schema:name">Event 42</h1>
<a property="schema:location" typeof="schema:Place" href="/location/51#it">Location 51</a>
</article>
And on the location page you might have something like:
<!-- on http://example.com/location/51 -->
<article about="#it" typeof="schema:Place">
<h1 property="schema:name">Location 51</h1>
<a property="schema:event" typeof="schema:Event" href="/event/42#it">Event 42</a>
</article>
Aggregating this data, you’ll have these triples (in Turtle):
#prefix schema: <http://schema.org/> .
<http://example.com/location/51#it> a schema:Place .
<http://example.com/location/51#it> schema:event <http://example.com/event/42#it> .
<http://example.com/location/51#it> schema:name "Location 51" .
<http://example.com/event/42#it> a schema:Event .
<http://example.com/event/42#it> schema:location <http://example.com/location/51#it> .
<http://example.com/event/42#it> schema:name "Event 42" .
EDIT: I’m not sure (and I hope it’s not the case), but maybe Schema.org expects a blank node with a url (or sameAs?) property instead, e.g.:
<article about="#it" typeof="schema:Event">
<h1 property="schema:name">Event 42</h1>
<div property="schema:location" typeof="schema:Place">
<a property="schema:url" href="/location/51#it">Location 51</a>
</div>
</article>
Each RDF resource has identifier. Identifier is an IRI (and URL is a subset of IRIs). So, just reference locations by their identifiers.
Usually, each page describes one implicit main resource and several explicit additional ones. Take a look at RDFa 1.1 Primer. It has a lot of relevant info
Related
I am trying to decide whether to use schema.org entities in my own open source app, for potential compatibility with existing open data sets. So I'm looking for usage of relevant schema.org entities "in the wild".
Right now I'm looking for dietary supplement data, IE http://schema.org/DietarySupplement, or http://health-lifesci.schema.org/DietarySupplement
I've been searching for semantic web search engines, and have only found Swoogle, but I get no results for that URI, or "service temporarily unavailable".
The DietarySupplement page on schema.org says that "between 10 and 100" domains are using this entity. Is that talking about DNS, abstract domains that are defined on Schema.org, abstractions defined elsewhere, or something else?
There are only a couple of other resources I can find on this subject.
Web Data Commons - RDFa, Microdata, and Microformat Data
Sets
BuiltWith trends - Microdata Usage Statistics
I'm using a Drupal site which seems to add FOAF by default, therefore an image would have the following tag:
<img typeof="foaf:Image" src="....
I'm trying to setup Rich Snippets and on a wrapper I have this div:
<div typeof="schema:Recipe">
and then I have schema markup on all fields within this.
Using Google Search Console it pulls in all the other values but I've had trouble with the Recipe Image specifically. I'm wondering if there's a conflict going on? How do FOAF and Schema act with each other? What is FOAF even used for?
Google Structured Data tool is only pulling the Image type in, whereas I'd assume it should by default pull the src URL in too?
For reference the Recipe I'm testing is here: http://www.simplybeefandlamb.co.uk/recipes/pulled-beef-brisket
Sorry for being a total rookie.
I am trying to help my professor implement this advice:
Either as a courtesy to Forbes or a favor to yourself, you may want to include the rel="canonical" link element on your cross-posts. To do this, on the content you want to take the backseat in search engines, you add in the head of the page. The URL should be for the content you want to be favored by search engines. Otherwise, search engines see duplicate content, grow confused, and then get upset. You can read more about the canonical tag here: http://www.mattcutts.com/blog/canonical-link-tag/. Have a great day!
The problem is I am having trouble figuring out how to edit the head element on a post-by-post basis. We are currently on a super old blogging platform (Movable Type 3.2 from 2005), so maybe it is not possible. But I'd like to know if that is likely the reason, so I'm not missing out on a workaround.
If anyone could point me in the right direction, I would greatly appreciate it!
Without knowing much about your installation, I'll give a general description, and hopefully it matches what you see and helps.
In Movable Type, each blog has a "Design" section where you can see and edit the templates for the blog. On this page, the templates that are published once are listed under "Index Templates," and the templates published multiple times, once per entry, per category, etc., are listed under "Archive Templates."
There probably is an archive template called "Entry" (could be renamed) publishing to a path like category/sub-category/entry-basename.php. This is the main template that publishes each entry. Click on this to open the template editor.
This template could be an entire HTML document, or it might have "includes" that look like <MTInclude module=""> or <$mt:Include module=""$> (MT supports varying tag styles.).
You may find there is an included module that contains the <head> content, or it might just be right in that template. To "follow" the includes and see those templates, there should be links on the side of the included templates.
Once you find the <head> content, you can add a canonical link tag like this:
<mt:IfArchiveType type="Individual">
<mt:If tag="EntryPermalink">
<link rel="canonical" href="<$mt:EntryPermalink$>" />
</mt:If>
</mt:IfArchiveType>
Depending on your needs, you might want to customize this to output a specific URL structure for other types of content, like category listings. The above will just take care of telling search engines the preferred URL for each entry.
#Charlie: may be I'm missing something, but your solution basically places a canonical link on each entry to… itself, which is a no-no for search engines (the link should point to another page that's considered the canonical one).
#user2359284 you need a way to define the canonical entry for those which need this link. As Shmuel suggested, either reuse an unused field or a custom field plugin. Then you simply add that link in the header in the proper archive template that outputs your notes. In the hypothesis that the Entry template includes the same header as other templates, and, say, you're using the Keywords field to set the URL, then the following code should work (the mt:IfArchiveType test simply ensures it's output in the proper context, which you don't need if your Entry template has its own code for the header):
<mt:IfArchiveType type="Individual">
<link rel="canonical" href="<$mt:EntryKeywords$>" />
</mt:IfArchiveType>
I am a bit confused about the semantics of websites. I understand that every URI should represent a ressource. I assume that all information provided by RDFa inside a webpage describes the ressource represented by the URI of that webpage. My question is: What are best practices for providing semantic data for subpages of a website.
In my case I want to create a website for a theater group called magma using RDFa with schema.org and opengraph vocabularies. Let's say I have the welcome page (http://magma.com/), a contact page (http://magma.com/contact/) and pages for individual plays (http://magma.com/play/<playid>/).
Now I would think that both the welcome page and the contact page represent the same ressource (magma) while providing different information about that ressource. The play pages however represent plays that only happen to be performed by magma. Or is it better to say that the play pages also represent magma but providing information about plays which will be performed by that group? The third option I stumbled upon is http://schema.org/WebPage. Especially subtypes like ContactPage seems to be relevant.
When it comes to implementation, where do I put the RDFa?
And finally: How will my choice change the way the website is treated by 3rd parties (google, facebook, ...)?
I realize this question is a bit blurry. To make it more concrete I will add an example that you might critizise:
<html vocab="http://schema.org/" typeof="TheaterGroup">
<head>
<meta charset="UTF-8"/>
<title>Magma - Romeo and Juliet</title>
<!-- magma sematics from a template file -->
<meta property="name" content="Magma"/>
<meta property="logo" content="/static/logo.png"/>
<link rel="home" property="url" content="http://magma.com/"/>
</head>
<body>
<h1>Romeo and Juliet</h1>
<!-- semantics of the play -->
<div typeof="CreativeWork" name="Romeo and Juliet">
...
</div>
<h2>Shows</h2>
<!-- samantics of magma events -->
<ul property="events">
<li typeof="Event"><time property="startDate">...</time></li>
...
</ul>
</body>
</html>
I understand that every URI should represent a ressource. I assume that all information provided by RDFa inside a webpage describes the ressource represented by the URI of that webpage.
Well, a HTTP URI could identify the page itself OR the thing the page is about. You can't tell if an URI identifies the page or the thing by simply looking at it.
Example (in Turtle syntax):
<http://en.wikipedia.org/wiki/The_Lord_of_the_Rings> ex:author "John Doe"
This could mean that the HTML page with the URI http://en.wikipedia.org/wiki/The_Lord_of_the_Rings is authored by "John Doe". Or it could mean that the thing described by that HTML page (→ the novel) is authored by "John Doe". Of course this is an important difference.
There are various ways to differentiate what an URI represents, and there is some dispute about it. The discussion around this is known as httpRange-14 issue. See for example the Wikipedia article Web resource.
One way is using hash URIs (see also this answer). Example: http://magma.com/play/42 could identify the page about the play, http://magma.com/play/42#play could identify the play.
Another way is using HTTP status code 303. The code 200 gives the representation of the page about the thing, the code 303 See Other gives an additional URI identifying the thing. This method is used by DBpedia:
http://dbpedia.org/resource/The_Lord_of_the_Rings represents the novel
http://dbpedia.org/page/The_Lord_of_the_Rings represents the page about the novel
(resp. http://dbpedia.org/data/The_Lord_of_the_Rings for machines)
See Choosing between 303 and Hash.
Now, when using RDFa, you can make statements about both, the page itself and the thing represented by the page. Just use the corresponding URI as subject (e.g., by using the resource attribute).
So let's say http://magma.com/#magma represents the theater group. Now you could use this URI on every page (/contact, /play/, …) to make statements about the group resp. to refer to the group.
<div resource="http://magma.com/#magma">
<span property="ex:name">Magma</span>
</div>
<div resource="http://magma.com/">
<span property="ex:name">Website of Magma</span>
</div>
I suggest that you first look at the schema.org straightforward documentation. This vocabulary is very comprehensive for your concerns and supported by the major search engines.
Here is a snippet example for you to get started, you can include this straight in an HTML page. When you speak about the performance of the play on a page you could use:
<div itemscope itemtype="http://schema.org/TheaterEvent">
<h1 itemprop="name">Romeo and Juliet</h1>
<span itemprop="location">Council Bluffs, IA, US</span>
<meta itemprop="startDate" content="2011-05-23">May 23
Buy tickets
</div>
On your contact page you could include:
<div itemscope itemtype="http://schema.org/TheaterGroup">
<span itemprop="name">Magma</span>
Tel:<span itemprop="telephone">( 33 1) 42 68 53 00 </span>
</div>
One of the core ideas behind HATEOAS is that clients should be able to start from single entry point URL and discover all exposed resources and state transitions available for those. While I can perfectly see how that works with HTML and a human behind a browser clicking on links and "Submit" buttons, I'm quizzed about how this principle can be applied to problems I'm (un)lucky to deal with.
I like how RESTful design principle is presented in papers and educational articles where it all makes sense, How to GET a Cup of Coffee is a good example of such. I'll try to follow convention and come up with an example which is simple and free from tedious details. Let's look at zip codes and cities.
Problem 1
Let's say I want to design RESTful api for finding cities by zip codes. I come up with resources called 'cities' nested into zip codes, so that GET on http://api.addressbook.com/zip_codes/02125/cities returns document containing, say, two records which represent Dorchester and Boston.
My question is: how such url can be discovered through HATEOAS? It's probably impractical to expose index of all ~40K zip codes under http://api.addressbook.com/zip_codes. Even if it's not a problem to have 40K item index, remember that I've made this example up and there are collections of much greater magnitude out there.
So essentially, I would want to expose not link, but link template, rather, like this: http://api.addressbook.com/zip_codes/{:zip_code}/cities, and that goes against principles and relies on out-of-band knowledge possessed by a client.
Problem 2
Let's say I want to expose cities index with certain filtering capabilities:
GET on http://api.addressbook.com/cities?name=X would return only cities with names matching X.
GET on http://api.addressbook.com/cities?min_population=Y would only return cities with population equal or greater than Y.
Of course these two filters can be used together: http://api.addressbook.com/cities?name=X&min_population=Y.
Here I'd like to expose not only url, but also these two possible query options and the fact that they can be combined. This seems to be simply impossible without client's out-of-band knowledge of semantics of those filters and principles behind combining them into dynamic URLs.
So how principles behind HATEOAS can help making such trivial API really RESTful?
I suggest using XHTML forms:
GET /
HTTP/1.1 OK
<form method="get" action="/zip_code_search" rel="http://api.addressbook.com/rels/zip_code_search">
<p>Zip code search</p>
<input name="zip_code"/>
</form>
GET /zip_code_search?zip_code=02125
HTTP/1.1 303 See Other
Location: /zip_code/02125
What's missing in HTML is a rel attribute for form.
Check out this article:
To summarize, there are several reasons to consider XHTML as the
default representation for your RESTful services. First, you can
leverage the syntax and semantics for important elements like <a>,
<form>, and <input> instead of inventing your own. Second, you'll end
up with services that feel a lot like sites because they'll be
browsable by both users and applications. The XHTML is still
interpreted by a human—it's just a programmer during development
instead of a user at runtime. This simplifies things throughout the
development process and makes it easier for consumers to learn how
your service works. And finally, you can leverage standard Web
development frameworks to build your RESTful services.
Also check out OpenSearch.
To reduce the number of request consider this response:
HTTP/1.1 200 OK
Content-Location: /zip_code/02125
<html>
<head>
<link href="/zip_code/02125/cities" rel="related http://api.addressbook.com/rels/zip_code/cities"/>
</head>
...
</html>
This solution comes to mind, but I'm not sure that I'd actually recommend it: instead of returning a resource URL, return a WADL URL that describes the endpoint. Example:
<application xmlns="http://wadl.dev.java.net/2009/02" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<grammars/>
<resources base="http://localhost:8080/cities">
<resource path="/">
<method name="GET">
<request>
<param name="name" style="query" type="xs:string"/>
<param name="min-population" style="query" type="xs:int"/>
</request>
<response>
<representation mediaType="application/octet-stream"/>
</response>
</method>
</resource>
</resources>
</application>
That example was autogenerated by CXF from this Java code:
import javax.ws.rs.GET;
import javax.ws.rs.QueryParam;
import javax.ws.rs.core.Response;
public class Cities {
#GET
public Response get(#QueryParam("name") String name, #QueryParam("min-population") int min_poulation) {
// TODO: build the real response
return Response.ok().build();
}
}
In answer to question 1, I'm assuming your single entry point is http://api.addressbook.com/zip_codes, and the intention, is to enable the client to traverse the entire collection of zip codes and ultimately retrieve the cities related to them.
In which case i would make the http://api.addressbook.com/zip_codes resource return a redirect to the first page of zip codes, for example:
http://api.addressbook.com/zip_codes?start=0&end=xxxx
This would contain a "page" worth of zip code links (whatever number is suitable for the system to handle, plus a link to the next page (and previous page if there is one).
This would enable a client to crawl the entire list of zip codes if it so desired.
The urls returned in each page would look similar to this:
http://api.addressbook.com/zip_codes/02125
And then it would be a matter of deciding whether to include the city information in the representation returned by a zip code URL, or the link to it depending on the need.
Now the client has a choice whether to traverse the entire list of zip codes and then request the zipcode (and then cities) for each, or request a page of zip codes, and then request drill down to a parti
I was running into these same questions - so I worked through a practical example that solves both of these problems (and a few you haven't thought of yet). http://thereisnorightway.blogspot.com/2012/05/api-example-using-rest.html?m=1
Basically, the solution to problem 1 is that you change your representation (as Roy says, spend your time on the resource). You don't have to return all zips, just make your resource contain paging. As an example, when you request news pages from a news site - it gives you todays news, and links to more, even though all the articles may live under the same url structure, I.e. ...article/123, etc
Problem 2 is a little ackward - there is a little used command in http called OPTIONS that I used in the example to basically reflect the url's capability - although you could solve this in the representation too, it would just be more complicated. Basically, it gives back a custom structure that shows the capabilities of the resource (including optional parameters).
Let me know what you think!
I feel like you skipped over the bookmark URL. That is the first url, not the ones to get cities or zip codes.
So you start at ab:=http://api.addressbook.com
This first link returns back a list of available links. This is how the web works. You go to www.yahoo.com and then you start clicking links not knowing where they go.
So from the original link ab: you would get back the other links and they could have REL links that explain how those resources should be accessed or what parameters can be submitted.
The first think we did when designing our systems is to start from the bookmark page and determine all the different links that could be accessed.
I do agree with you about the 'client's out-of-band knowledge of semantics of those filters' it's hard for me to buy that a machine can just adapt to what is there unless it had some preconceived specification like HTML. It's more likely that the client is built by a developer who knows all the possibilities and then codes the application to 'potentially' expect those links to be available. If the link is available then the program can use the logic the developer implemented prior to act the resource. If it's not there then it just doesn't execute the link. In the end possible paths are laid out prior to beginning to traverse the application.