How to restrict publishing of RDF graphs on the Semantic Web? - sparql

I am trying to create a sample ontology with some dummy data using protege 5.5. But in the owl file it generated, it is showing something like this:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.semanticweb.org/hs/ontologies/2019/3/untitled-ontology-3#"
xml:base="http://www.semanticweb.org/hs/ontologies/2019/3/untitled-ontology-3"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<owl:Ontology rdf:about="http://www.semanticweb.org/hs/ontologies/2019/3/untitled-ontology-3"/>
Seems like this data can be accessed publicly (http://www.semanticweb.org/hs/ontologies/2019/3/untitled-ontology-3). I dont wish to publish my data on the Semantic Web. Is there any way to privatise these datas? Could not find the answer on the web.

No, just because a URI shows up in an OWL or RDF file doesn't mean the data is publicly accessible. A local file on your computer is just a local file, until you upload it to a server somewhere.
OWL and RDF use URIs mainly just as identifiers—that is, as names that allows different programs and people to work out whether they are talking about the same thing. So, if your ontology and my ontology use the same URI for some entity, we know that we are talking about the same entity. This doesn't mean your ontology or my ontology are publicly accessible, and works even if we keep both ontologies private.
By convention, the owner of a URI gets to decide what entity to use a URI for. This ensures that there are no accidental clashes. Ownership of URIs is based on domain names. For example, the owner of the domain dbpedia.org (the DBpedia project) has decided that http://dbpedia.org/resource/London is a URI that names the city of London. They also happen to publish some data about London at that URI, which is a good way of letting the world know what the URI identifies.
Protégé is actually a bad citizen of the web by encouraging people to use URIs on a domain they don't own (www.semanticweb.org).
If you don't own a domain, you can use http://example.org/ for local experiments and private use, because that domain is explicitly allowed to be used by anyone. But if you actually decide to publish your ontology/data at some point, then you should change to a real domain.

Related

What's the correct way to create an endpoint in a API REST

I'm drawing my API routes.
A user has projects, projects have actors, actors have addresses.
There is no address without an actor, there is no actor without a project, and there is no project without a user.
Would this be the correct way to build the end_point?
GET /users/{user_id}/projects/{project_id}/actors/{actor_id}/addresses
There is no such thing as a REST endpoint. There are resources. -- Fielding, 2018
What you seem to be asking about here is how to design the identifier for your resource.
REST doesn't care what spelling conventions you use for your resource identifiers, so long as they satisfy the production rules described by RFC 3986.
Identifiers that load data into the query part are convenient if you are expecting to leverage HTML forms:
GET /addresses?user={user_id}&project={project_id}&actor=actor_id
But that design is not particularly convenient if you are expecting to use dot segments to reference other resources.
Choosing some alternative that is described by a URI Template will make some things easier down the road.
/users/{user_id}/projects/{project_id}/actors/{actor_id}/addresses
That's fine. Other spellings would also be fine (hint: URL shorteners work).
Broadly, you choose identifier spellings by thinking about the different contexts in which a human being has to look at the URI (documentation for your API, browser histories, access logs, etc.) and choose a spelling that works well in at least one of those settings.

REST API: How to name a derived resource?

There is a gazillion of questions about RESTful interface naming conventions, esp. around singular vs plural resource names. A somewhat convention is:
GET /users Retrieve collection of users
GET /users/{id} Retrieve user
POST /users Create user
PUT /users/{id} Update user
DELETE /users/{id} Delete user
However, the above does not work when resource is a value derived from the environment.
My hypothetical application has the following endpoint:
GET /source Get information about the source of the query.
That responds with:
Referrer URL
Remote IP
Since source is derived from the environment, there is never more than one source, therefore calling the resource sources or providing sources/{foo} lookup is not practical.
Does REST style propose how to handle naming of these entities?
Dr. Fielding notes in section 6.2.1 of his famous dissertation :
..authors need an identifier that closely matches the semantics they
intend by a hypermedia reference, allowing the reference to remain
static even though the result of accessing that reference may change
over time.
Therefore, it makes sense to use plain source endpoint.
It would be a different thing if you wanted to provide more general service around IP address provided, like this one.

Ontologies on linked open data cloud

I am currently working on the Linked Open Data cloud and would like to know whether it is possible to have the ontologies of datasets present in the LOD cloud.
Publishing in the Linked Open Data cloud is as easy as making your data publicly available for example as RDF, as HTML, via a SPARQL endpoint, ...
To spread awareness about your data, so that people can use it and link to it, describe your data according to Guidelines for Collecting Metadata on Linked Datasets in CKAN and add it to datahub.io (a canonical CKAN installation).
For an intro about publishing open data have a look at Richard Cygniak's presentation. A comprehensive guide is the Linked Data: Evolving the Web into a Global Data Space book.
Linked Open Data normally contain both ontologies and instance data. I don't think that the ontologies of LOD datasets are available separately. There are various ontology repositories: Where can I find useful ontologies? Ontology repositories list at W3C but none of them are really comprehensive and really well maintained as far as I know.
For most datasets, you should be able to get the ontology by dereferencing its namespace URL. For example if I take the arbitrary LOD dataset http://datahub.io/dataset/oferta-empleo-zaragoza/resource/6704184d-42e3-4778-a576-826f1d3672e2 I can see that it starts with:
<rdf:RDF
xmlns:j.0="http://purl.org/dc/terms/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:j.1="http://purl.org/ctic/empleo/oferta#" >
So if you point your browser to http://purl.org/ctic/empleo/oferta# it will redirect to http://data.fundacionctic.org/vocab/empleo/oferta.html which is the HTML description of the ontology which also links to an RDF representation: http://data.fundacionctic.org/vocab/empleo/oferta_20130903.rdf
The namespace URL has content-negotiation configured so you can get the ontology in RDF directly if you set the header "Accept: application/rdf+xml" for the get request:
curl -LH "Accept: application/rdf+xml" http://purl.org/ctic/empleo/oferta#

RESTful api design, HATEOAS and resource discovery

One of the core ideas behind HATEOAS is that clients should be able to start from single entry point URL and discover all exposed resources and state transitions available for those. While I can perfectly see how that works with HTML and a human behind a browser clicking on links and "Submit" buttons, I'm quizzed about how this principle can be applied to problems I'm (un)lucky to deal with.
I like how RESTful design principle is presented in papers and educational articles where it all makes sense, How to GET a Cup of Coffee is a good example of such. I'll try to follow convention and come up with an example which is simple and free from tedious details. Let's look at zip codes and cities.
Problem 1
Let's say I want to design RESTful api for finding cities by zip codes. I come up with resources called 'cities' nested into zip codes, so that GET on http://api.addressbook.com/zip_codes/02125/cities returns document containing, say, two records which represent Dorchester and Boston.
My question is: how such url can be discovered through HATEOAS? It's probably impractical to expose index of all ~40K zip codes under http://api.addressbook.com/zip_codes. Even if it's not a problem to have 40K item index, remember that I've made this example up and there are collections of much greater magnitude out there.
So essentially, I would want to expose not link, but link template, rather, like this: http://api.addressbook.com/zip_codes/{:zip_code}/cities, and that goes against principles and relies on out-of-band knowledge possessed by a client.
Problem 2
Let's say I want to expose cities index with certain filtering capabilities:
GET on http://api.addressbook.com/cities?name=X would return only cities with names matching X.
GET on http://api.addressbook.com/cities?min_population=Y would only return cities with population equal or greater than Y.
Of course these two filters can be used together: http://api.addressbook.com/cities?name=X&min_population=Y.
Here I'd like to expose not only url, but also these two possible query options and the fact that they can be combined. This seems to be simply impossible without client's out-of-band knowledge of semantics of those filters and principles behind combining them into dynamic URLs.
So how principles behind HATEOAS can help making such trivial API really RESTful?
I suggest using XHTML forms:
GET /
HTTP/1.1 OK
<form method="get" action="/zip_code_search" rel="http://api.addressbook.com/rels/zip_code_search">
<p>Zip code search</p>
<input name="zip_code"/>
</form>
GET /zip_code_search?zip_code=02125
HTTP/1.1 303 See Other
Location: /zip_code/02125
What's missing in HTML is a rel attribute for form.
Check out this article:
To summarize, there are several reasons to consider XHTML as the
default representation for your RESTful services. First, you can
leverage the syntax and semantics for important elements like <a>,
<form>, and <input> instead of inventing your own. Second, you'll end
up with services that feel a lot like sites because they'll be
browsable by both users and applications. The XHTML is still
interpreted by a human—it's just a programmer during development
instead of a user at runtime. This simplifies things throughout the
development process and makes it easier for consumers to learn how
your service works. And finally, you can leverage standard Web
development frameworks to build your RESTful services.
Also check out OpenSearch.
To reduce the number of request consider this response:
HTTP/1.1 200 OK
Content-Location: /zip_code/02125
<html>
<head>
<link href="/zip_code/02125/cities" rel="related http://api.addressbook.com/rels/zip_code/cities"/>
</head>
...
</html>
This solution comes to mind, but I'm not sure that I'd actually recommend it: instead of returning a resource URL, return a WADL URL that describes the endpoint. Example:
<application xmlns="http://wadl.dev.java.net/2009/02" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<grammars/>
<resources base="http://localhost:8080/cities">
<resource path="/">
<method name="GET">
<request>
<param name="name" style="query" type="xs:string"/>
<param name="min-population" style="query" type="xs:int"/>
</request>
<response>
<representation mediaType="application/octet-stream"/>
</response>
</method>
</resource>
</resources>
</application>
That example was autogenerated by CXF from this Java code:
import javax.ws.rs.GET;
import javax.ws.rs.QueryParam;
import javax.ws.rs.core.Response;
public class Cities {
#GET
public Response get(#QueryParam("name") String name, #QueryParam("min-population") int min_poulation) {
// TODO: build the real response
return Response.ok().build();
}
}
In answer to question 1, I'm assuming your single entry point is http://api.addressbook.com/zip_codes, and the intention, is to enable the client to traverse the entire collection of zip codes and ultimately retrieve the cities related to them.
In which case i would make the http://api.addressbook.com/zip_codes resource return a redirect to the first page of zip codes, for example:
http://api.addressbook.com/zip_codes?start=0&end=xxxx
This would contain a "page" worth of zip code links (whatever number is suitable for the system to handle, plus a link to the next page (and previous page if there is one).
This would enable a client to crawl the entire list of zip codes if it so desired.
The urls returned in each page would look similar to this:
http://api.addressbook.com/zip_codes/02125
And then it would be a matter of deciding whether to include the city information in the representation returned by a zip code URL, or the link to it depending on the need.
Now the client has a choice whether to traverse the entire list of zip codes and then request the zipcode (and then cities) for each, or request a page of zip codes, and then request drill down to a parti
I was running into these same questions - so I worked through a practical example that solves both of these problems (and a few you haven't thought of yet). http://thereisnorightway.blogspot.com/2012/05/api-example-using-rest.html?m=1
Basically, the solution to problem 1 is that you change your representation (as Roy says, spend your time on the resource). You don't have to return all zips, just make your resource contain paging. As an example, when you request news pages from a news site - it gives you todays news, and links to more, even though all the articles may live under the same url structure, I.e. ...article/123, etc
Problem 2 is a little ackward - there is a little used command in http called OPTIONS that I used in the example to basically reflect the url's capability - although you could solve this in the representation too, it would just be more complicated. Basically, it gives back a custom structure that shows the capabilities of the resource (including optional parameters).
Let me know what you think!
I feel like you skipped over the bookmark URL. That is the first url, not the ones to get cities or zip codes.
So you start at ab:=http://api.addressbook.com
This first link returns back a list of available links. This is how the web works. You go to www.yahoo.com and then you start clicking links not knowing where they go.
So from the original link ab: you would get back the other links and they could have REL links that explain how those resources should be accessed or what parameters can be submitted.
The first think we did when designing our systems is to start from the bookmark page and determine all the different links that could be accessed.
I do agree with you about the 'client's out-of-band knowledge of semantics of those filters' it's hard for me to buy that a machine can just adapt to what is there unless it had some preconceived specification like HTML. It's more likely that the client is built by a developer who knows all the possibilities and then codes the application to 'potentially' expect those links to be available. If the link is available then the program can use the logic the developer implemented prior to act the resource. If it's not there then it just doesn't execute the link. In the end possible paths are laid out prior to beginning to traverse the application.

Semantic store and entity hub

I am working on a content platform that should provide semantic features such as querying with SPARQL and providing rdf documents for the contained content.
I would be very thankful for some
clarification on the following
questions:
Did I get that right, that an entity
hub can connect several semantic
stores to a single point of access?
And if not, what is the difference
between a semantic store and an
entity hub?
What frameworks would you use to
store content documents as well as
their semantic annotation?
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
Thanks in advance,
Chris
The only Entityhub term that I know is belong to Apache Stanbol project. And here is a paragraph from the original documentation explaining what Entityhub does:
The Entityhub provides two main services. The Entityhub provides the
connection to external linked open data sites as well as using indexes
of them locally. Its services allow to manage a network of sites to
consume entity information and to manage entities locally.
Entityhub documentation:
http://incubator.apache.org/stanbol/docs/trunk/entityhub.html
Enhancer component of Apache Stanbol provides extracting external entities related with the submitted content using the linked open data sites managed by Entityhub. These enhancements of contents are formed as RDF data. Then, it is also possible to store those content items in Apache Stanbol and run SPARQL queries on top of RDF enhancements. Contenthub component of Apache Stanbol also provides faceted search functionality over the submitted content items.
Documentation of Apache Stanbol:
http://incubator.apache.org/stanbol/docs/trunk/
Access to running demos:
http://dev.iks-project.eu/
You can also ask your further questions to stanbol-dev AT incubator.apache.org.
Alternative suggestion...
Drupal 7 has in-built RDFa support for annotation and is more of a general purpose CMS than Semantic MediaWiki
In more detail...
I'm not really sure what you mean by entity hub, where are you getting that definition from or what do you mean by it?
Yes one can easily write a system that connects to multiple semantic stores, given the context of your question I assume you are referring to RDF Triple Stores?
Any decent CMS should be assigning documents some form of unique/persistent ID to documents so even if the system you go with does not support semantic annotation natively you could build your own extension for this. The extension would simply store annotations against the documents ID in whatever storage layer you chose (I'd assume a Triple Store would be appropriate) and then you can build appropriate query and presentation layers for querying and viewing this data as required.
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
Apache Stanbol
Do you want to implement a traditional CMS extended with some Semantic capabilities, or do you want to build a Semantic CMS? It could look the same, but actually both a two completely opposite approaches.
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
You can integrate Apache Stanbol with a JCR/CMIS compliant CMS like Alfresco. To get custom annotations, I suggest creating your own custom enhancement engine (maven archetype) based on your domain and adding it to the enhancement engine chain.
https://stanbol.apache.org/docs/trunk/components/enhancer/
One this is done, you can use the REST API endpoints provided by Stanbol to retrieve the results in RDF/Turtle format.