RDF4J SAIL API implementation - sesame

I am trying to build a federated RDF application based on rdf4j and FedX. What I need is to be able to:
Optimize the querying plan and joining strategies.
To expose different and heterogeneous databases (A timeseries or a relational DB for example) in a federated fashion.
I went a little bit through the rdf4j documentation and I got a grasp. And therefore I have some little questions:
Is there any documentation that explains how to implement the SAIL API? I tried to debug and follow the flow of execution of an example query using a RDF memory store and I got lost.
Suppose I want to expose a relational database in my datacenter, Should I implement a SPARQL repository or an HTTP repository? should I in anyway implement the SAIL api?
Concerning fedX, how can I make it possible to use the SERVICE and VALUES terms as proposed in the SPARQL 1.1 federated queries? How can I change the Joning strategies? the query plan?
I know that this can be answered if I dive deeply into the code but I wonder if someone has already exposed some kind of a database using the rdf4j API or even worked and tuned RDF4J.
Thanks to you all!

Is there any documentation that explains how to implement the SAIL API? I tried to debug and follow the flow of execution of an example
query using a RDF memory store and I got lost.
There is a basic design draft but it's incomplete. A more comprehensive HowTo has been in the planning for a while but it never quite gets the priority it needs.
That said, I don't think you need to implement your own SAIL for what you have in mind. There's plenty of existing implementations that can do what you need.
Suppose I want to expose a relational database in my datacenter, Should I implement a SPARQL repository or an HTTP repository?
I don't understand the question. HTTPRepository is a client-side proxy for an RDF4J Server. SPARQLRepository is a client-side proxy for a (non-RDF4J) SPARQL endpoint. Neither has anything to do with relational database.
should I in anyway implement the SAIL api?
It depends on your use case, but I doubt it - at least not right at the outset. I'd probably use an existing R2RML library that is compatible with RDF4J, like for example the R2RML API, or CARML - either a live mapping or an offline batch mapping between the relational data and your triplestore may solve your problem.
Concerning fedX, how can I make it possible to use the SERVICE and VALUES terms as proposed in the SPARQL 1.1 federated queries?
You don't need to "make it possible" to do that, FedX supports this out of the box.
How can I change the Joning strategies? the query plan?
You can't (at least not easily), nor should you want to. Quite a lot of research and development went into RDF4J's and FedX query planning strategies. I'm not saying either is perfect, but you're unlikely to do better.

Related

SPARQL over custom representation of semantic data

I have a non-standard way of storing and representing semantic data, and I was looking into some possibilities of supporting SPARQL queries. It seems that the best solution is to implement a so-called driver of a standard API framework, such as Apache Jena, but at least for Jena it's not so clear how can this be done. The following image taken from the official documentation suggests that I should implement the Store API, however I couldn't find documentation concerning this. Furthermore, the Java docs of TDB, Jena's native triple store, implies that there is no Store API.
A secondary question is whether there is a Python alternative to Jena (which is written in Java)?

Umbraco Hive and Services Layer

I'm experimenting with the new Umbraco 5 hive, and I'm kinda a bit confused.
I'm plugging in an existing Linq to SQL services layer, which I developed for a webforms site.
I don't know much about the repository pattern, my services handle all connections with the data context, and work very well.
I have made a few repositories that plug in to the hive, and handle conversion of my entities to the Umbraco TypedEntity type.
These repositiories reference my existing services layer, to retrieve, add, update and delete. The services also handle other entity specific functions, which will not be used by the hive.
Now, it's nice to plug in these services, and just reference them in the hive repositories, but it seems I may be doing things the wrong way round, according to the offical repository pattern as I have read about.
I know there's no hard fast rules, but I would appreciate comments on what I'm doing to achieve this functionality.
I've asked this here instead of the Umbraco forum, as I want a wider perspective.
Cheers.
I personally feel that the Hive is overkill. With the ability to use your own classes directly within razor macros, I think the best approach is to forego the hive altogether and simply use your classes. Why would you trade all of the power of your existing service just to make it fit into the hive interface?
If you're writing a library for other Umbraco developers, you may need to do this, but it's my personal opinion that the hive is over-engineered at worst and a layer of abstraction aimed at newish developers at best.
So, if I were to advise you, I would say to consider the more general principles: "Keep It Simple" and "You Aren't Gonna Need It". If the interface they give you offers a tangible benefit, implement it. If not, consider what you really gain for all of that work.

How to reflect the semantic web benefits in Enterprise Information System?

I am developing a demo of semantic web-based Information System, which just uses SPARQL instead of traditional SQL to manipulate dataset. How the application can demonstrate Semantic Web benefits.
I did steps as below:
The client gets parameters from web UI.
Requests a web service.
The service generates a SPARQL command according to given parameters.
The service uses Jena/SDB API to execute the SPARQL command.
Retrieves or persists data from or to MySQL.
Parsing returned result set.
Responses a JSON object to the client.
The client uses Javascript + html to display data.
Currently, the application just has CRUD operations. Only one difference to the traditional IS, which is using SPARQL instead of SQL. It seems that cannot see obviously semantic features. I'm just thinking of two points:
To demonstrate data federating through SPARQL. From this point, can I imagine that the system can be broken down into several subsystem and work on their independent dataset but they can communicate with each other by SPARQL, which because they work on the RDF specification.
Reasoning over datasets. I use Ontologies to describe data schema, should my reasoning operation need to based on them. In my application, I try to get a RDF model, and use Pellet to do inferences. Is that corrent way?
Basically, if the application can demostrate data federating and reasoning, which can be seen as a semantic web-based application. Do I understand it right?
Hopefully, the application can combine services together automatically through semantic description. Furthermore, any other third party data sources may be communicate with the system and work immediately.
Yes ,you are right.the benefit with semantic web being you can write separate set of ontologies which will describe the domains(e.g. product,user) and then combine them using inference ,reasoning and make the data seem much more useful(r.g. product types and user preferences).
The difference being the rules for the data are now written with the data and not in the business logic layer.
Hope this helps .:)

Which one is better for efficient free text search, Hibernate Search or Lucene?

We are developing a web application using Spring MVC, Spring and Hibernate.
We need to add efficient free text search capabilities to our applications. For this we are thinking of using either Hibernate Search (it uses Lucene under the hood) or directly lucene.
What is the best option for us as we are already using hibernate in our application? What are the pros and cons of one over the other?
Thanks.
You said it yourself - you'll be using Lucene one way or the other.
The raw Lucene API isn't very easy to use. It's much more low-level than Hibernate Search. if you're already using Hibernate, then it's a no-brainer - use Hibernate Search to implement your text search functionality.
disclaimer: I'm one of the developers of Hibernate Search.
The goal of the project is not to compete with Lucene nor Solr, but to facilitate as much as possible integration with Hibernate applications, to avoid having to maintain the two worlds in sync and duplicate all mapping and CRUD operations.
While we provide some common helpers and a nice encapsulation, Hibernate Search can also hand you over a direct reference to the Lucene API, so in case you find yourself needing to use the "raw" Lucene API you will never be stuck. Also for writing to the index Hibernate Search provides a common pattern which will solve most of known requirements, but in case you have very non-standard requirements you can get full control of the written Documents.
Solr is a good alternative, but as it is a separate server you have to interact with it via REST APIs which is quite different, with it's pros and cons. Having a second service to manage is not always wanted, and of course the remote invocations will never be as efficient as direct references to Lucene and to all it's internal filters and caches.
Not all functionality of Lucene can be exposed via a remote API, and if you need to do some "low level" operation, if this is not implemented in Solr you won't be able to do it (without patching Solr). Still Solr is very cute, especially when you want to share the index with other non-Java applications, and so we might add a Solr backend for Hibernate Search to eventually keep a Solr server in synch (especially if there's interest for it, and possibly some help).
Finally, the Lucene API is really hard core stuff. We spend a lot of effort to make the best use of it to provide top performance while exposing a stable API to people using Hibernate Search, basically until now all releases have been backwards compatible to provide a "drop-in" performance boost to use latest greatest tricks from Lucene - which actually changes API quite often; these changes are always exciting, but be prepared to maintain that in your application if you don't use a proper abstraction.
The other way of using Lucene is to get the middlman API which is known as SOLR. SOLR will connect to Lucene and perfom HTTP calls for search. Please note that you will need to build and Parse the XML what Solr consumes. All the functionality of Lucene is exponse via SOLR and should be really helpful.

Is Nhibernate is right choice for Enterprise Applications?

Hello All
I am planing to use Nhibernate in Dotnet 3.5 application.This application is like an enterprise application which will provide core services to all other application of my company. So my question is:
Is Nhibernate is right choice with this kind of application?
Are there any performance issues with Nhibernate?
NHibernate is good for any kind of data access applications.
As for performance its good enough for most of the apps. The only thing where I think it won't fit is synchronization tasks where really tons of data could be transferred. For that kind of work any ORM would suck.
I know of many places that use Nhibernate for Enterprise and mission critical application and are satisfied.
Nhibernate supplies an object oriented abstraction to data base entities. if you're comparing them to equivalent SQL operations, the performance penalty is negligible.
However, when using more complex operations, mainly on large sets of data, native SQL may perform significantly better.
NHibernate is the probably the right choice. The library is very mature (currently v3), based on java Hibernate which is there for even longer.
It depends. NHibernate is designed for OLTP scenarios. That means you load some small chunk of data, process it and you save it back. Critical part is: "small chunk".
If you find yourself in any OLAP-like scneario where you must batch process large chunks, NH is not a good choice. In such situation you most likely don't want to use any ORM at all.
As always: Right tool for the right job.
If your application is a typical enterprise like, you will be happy with NH. The library is extremely flexible with many fine-tuning options. Also remember that you can still use plain old ADO.NET in OLAP parts of your app if there are any.
If you run into any problems, NH community is very supportive plus there are companies and individuals offering consulting and paid support (in case your company policy requires it).
I'm using NH for 3 years and I can recommend it - the tool does its job.