Semantic store and entity hub - semantics

I am working on a content platform that should provide semantic features such as querying with SPARQL and providing rdf documents for the contained content.
I would be very thankful for some
clarification on the following
questions:
Did I get that right, that an entity
hub can connect several semantic
stores to a single point of access?
And if not, what is the difference
between a semantic store and an
entity hub?
What frameworks would you use to
store content documents as well as
their semantic annotation?
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
Thanks in advance,
Chris

The only Entityhub term that I know is belong to Apache Stanbol project. And here is a paragraph from the original documentation explaining what Entityhub does:
The Entityhub provides two main services. The Entityhub provides the
connection to external linked open data sites as well as using indexes
of them locally. Its services allow to manage a network of sites to
consume entity information and to manage entities locally.
Entityhub documentation:
http://incubator.apache.org/stanbol/docs/trunk/entityhub.html
Enhancer component of Apache Stanbol provides extracting external entities related with the submitted content using the linked open data sites managed by Entityhub. These enhancements of contents are formed as RDF data. Then, it is also possible to store those content items in Apache Stanbol and run SPARQL queries on top of RDF enhancements. Contenthub component of Apache Stanbol also provides faceted search functionality over the submitted content items.
Documentation of Apache Stanbol:
http://incubator.apache.org/stanbol/docs/trunk/
Access to running demos:
http://dev.iks-project.eu/
You can also ask your further questions to stanbol-dev AT incubator.apache.org.

Alternative suggestion...
Drupal 7 has in-built RDFa support for annotation and is more of a general purpose CMS than Semantic MediaWiki
In more detail...
I'm not really sure what you mean by entity hub, where are you getting that definition from or what do you mean by it?
Yes one can easily write a system that connects to multiple semantic stores, given the context of your question I assume you are referring to RDF Triple Stores?
Any decent CMS should be assigning documents some form of unique/persistent ID to documents so even if the system you go with does not support semantic annotation natively you could build your own extension for this. The extension would simply store annotations against the documents ID in whatever storage layer you chose (I'd assume a Triple Store would be appropriate) and then you can build appropriate query and presentation layers for querying and viewing this data as required.

http://semantic-mediawiki.org/wiki/Semantic_MediaWiki

Apache Stanbol

Do you want to implement a traditional CMS extended with some Semantic capabilities, or do you want to build a Semantic CMS? It could look the same, but actually both a two completely opposite approaches.

It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
You can integrate Apache Stanbol with a JCR/CMIS compliant CMS like Alfresco. To get custom annotations, I suggest creating your own custom enhancement engine (maven archetype) based on your domain and adding it to the enhancement engine chain.
https://stanbol.apache.org/docs/trunk/components/enhancer/
One this is done, you can use the REST API endpoints provided by Stanbol to retrieve the results in RDF/Turtle format.

Related

Python / rdflib HTTP server for sparql endpoint

Is there a capability for or example of creating a Sparql HTTP endpoint with rdflib? We would want it to follow the spec and be able to return json and/or csv formats. This would mostly be for POC usage. It would also be possible to use Javascript/Node.
Thanks!
You might try https://github.com/rdflib/pyLDAPI. It's been touched much more recently than https://github.com/RDFLib/rdflib-web and there are some public examples of it to follow, e.g. https://geofabricld.net. Also, the SKOS-specific tool VocPrez uses it under the hood.
As of earlier this year, pyLDAPI implements the W3C's Content Negotiation by Profile specification which is, I suppose, the latest an greatest Linked Data API-relevant specification, although it's not just for Linked Data APIs.
Feel free to contact me directly if you need more of a hand with this.

How to create a searchable central repository of code documentation using DocFx

I'm looking to create a central repository for all of our published API documentation using DocFx. I have documentation auto-generated via my build (using TFS) and published through my release (using Octopus) just fine for multiple individual sites. However, I'm wanting to pull it altogether in one location. The thinking is that through a parent site you could filter content in any of the individual sites without having to drill down into them. Do you have a recommendation on how to do this?
Also, within this same documentation repository I want to provide the capability to search by all of the meta data (project-level documentation) across the hundreds of projects in our portfolio. This will give our BA, DEV and QA teams easier access to what all our systems do. I like the "filtering" capability built into DocFx, but I'm wanting full-text search across all of the meta data. Do you have a recommendation for this functionality as well?
To change the location of the docfx output, edit the docfx.json file and specify the dest value. By default it is "dest": "_site". For more formatting guidance, reference: https://dotnet.github.io/docfx/tutorial/docfx.exe_user_manual.html.
Regarding full-text search, that is possible by simply ensuring the ExtractSearchIndex post-processor is invoked (in order to generate an index.json file of keywords) and that the global _enableSearch value is set to true in the docfx.json file. A snippet from that file would look like:
"postProcessors": [ "ExtractSearchIndex" ],
"globalMetadata": {
"_enableSearch": "true"
}
For your first question:
I think what you expect is like the .NET API Browser. The source code behind this page is not open to public, so you need create this page by yourself, through collecting xrefmap.yml from multiple sites, and extract the needed data into this page.
For your second question:
DocFX uses Luna to scan all the output files and generate an index file called index.json for later search use. In your case, you should want to limit the search scope only in the metadata you defined. This is also not supported by DocFX by default. You can also use Luna in your central place to search these meta. You can create your specific index.json for each project first, and the cental place to collect them for the search page.

How do I find data sets using a particular Schema.org entity?

I am trying to decide whether to use schema.org entities in my own open source app, for potential compatibility with existing open data sets. So I'm looking for usage of relevant schema.org entities "in the wild".
Right now I'm looking for dietary supplement data, IE http://schema.org/DietarySupplement, or http://health-lifesci.schema.org/DietarySupplement
I've been searching for semantic web search engines, and have only found Swoogle, but I get no results for that URI, or "service temporarily unavailable".
The DietarySupplement page on schema.org says that "between 10 and 100" domains are using this entity. Is that talking about DNS, abstract domains that are defined on Schema.org, abstractions defined elsewhere, or something else?
There are only a couple of other resources I can find on this subject.
Web Data Commons - RDFa, Microdata, and Microformat Data
Sets
BuiltWith trends - Microdata Usage Statistics

Authors fields in domino DDS REST api

I am building a Javascript Web application with a Domino back end, using the Domino DDS REST api to do POST, PUT, and GET operations against the database. I want to use Authors and Readers fields in documents to control which users can see which documents and to give users with Author access in the ACL the ability to edit documents they have created. When doing a POST of a new document (implemented by the save() method of a new Backbone model) is there a way to designate one or more fields as Readers or Authors?
Doing a GET on an existing document returns a JSON object with an attribute named '#authors' containing the names and roles in the Authors fields. Is this attribute read/write?
Can I populate #authors with the desired values before doing a POST to have these values control author access?
My colleague says the Domino REST api makes no provision for setting Authors and Readers fields, and that this functionality can only be done through Java servlets. Is this right?
I'm not familiar with the Domino DDS REST API, but from what I gather it is doubtfull that when POSTing a document, you get to chose the type of the fields. I suspect they all end up as text.
What you could do however is to link the action of your form to a Domino agent which, using the backend Java or LotusScript API, will be able to control precisely the final shape of your document, hereby allowing you to fully utilize the powerfull security model of Domino.
Nevertheless, keep in mind that at some point, your users will have to authenticate against the Domino Directory. Depending where your users originally log in, you may need to talk to your Domino administrator to sort out a Single Sing-On scheme linked to your other directory.
Alternatively, you could take advantage of the fact that Domino is also a web server and an application server : you can build your HTML form in there, starting with a Domino form (simple) or an xPage (a bit more complex).
You may want to have a look here.
Some would say that you could even build your whole application in Domino, as using it as a mere back-end data repository is akin to using a Rolls-Royce to ferry potatoes, but I suppose that you and your organization have good reasons to do so.
Finally you could also completely ditch Domino and use another nosql database like MongoDB, but that would only displace your access control problem.
You can post data back to Domino and nominate a form to use. If you use the 'computewithform=true' parameter and the form design includes the authors/reader fields you need, this will set the field flags correctly and automatically.

How to implement HATEOAS in Rails

I've started with ActiveResource, but quickly hit the wall. Could not get ActiveResource to work when overriding to_json and to_xml on the underlying model. Plus, could not make resource representation inject links into the generated xml document. Oh btw, I'm using Rails 3.2.1.
I did a bit of research and found out about its gem. Tried it, for some reason didn't work for me. So my question is:
If I have one resource (say books) hosted in one web site (something like http://books.org), and another resource (say students, http://students.org), hosted in another web site, how can I get books to represent themselves to a student in their full HATEOS glory?
I was able to get the book resource to represent itself to the asking student as an XML document. I did that by using vanilla Rails ActiveResource in the students site. I've created Books resource that inherits from ActiveResource::Base. Then I specified the self.site and self.element_name, after which I was able to perform some rudimentary ActiveRecord-like queries against the remote books site. The only thing that worked for me was Book.all and Book.find(1). Even that was not satisfactory because the representation contained all database columns, and I wanted to at least remove some of those, which turned out not to be possible.
Now that I've abandoned that approach, I am wondering if there is a working example in Rails where it is possible to build a more sophisticated representation of a resource (i.e. books) that will contain links that will drive the application state transfer? I find it simply unbelievable that such a simple requirement seems so devilishly difficult to implement in Rails. All I'm trying to do is create a representation of a resource that will include some links which will guide the consumer on its discovery of what that resource is capable of. I'm mostly interested in implementing the workflow, which is a layered, peeling-the-onion type of conversational process of discovery.
In Rails, you'd need to change the way the serialization of your object happens if you're looking to do this in JSON. (You need to override the way Rails gives back representations of resources.) The most common gem for doing that would be: https://github.com/rails-api/active_model_serializers
If you don't want to use AMS or want to return HTML, consider following this presenter pattern: http://blog.steveklabnik.com/posts/2012-01-06-implementing-hateoas-with-presenters