semantic web + linked data integration - sparql

i'm new to semantic web.
I'm trying to do a sample application where i can query data from different data sources in one query.
i have created a small rdf file which contains references to dbpedia resources for defining localities. my question is : how can i get the data contained in my file and other information which is in the description of the distant resource (for example : the name of the person from the local file, and the total poulation in a city dbpedia-owl:populationTotal from the distant rdf file).
i don't really understand the sparql query language, i tried to use the JENA ARQ API with the SERVICE keyword but it doesn't solve the problem.
Any help please?

I guess you are looking for something like the Semantic Web Client Library, which tries to leverage the GGG. Albeit, the standard exploration algorithm of this framework is that it follows rdfs:seeAlso links. Nevertheless, the general approach seems to be what your are looking for, i.e., you would create a local graph that consists of your seed graph and that traverse the relations up to a certain level, e.g., three steps, resolves the URIs and load that content into your local triple. Utilising advanced technologies like SPARQL federation might be something for later ;)

I have retrived data from two different sources using SPARQL query with named graphs.
I used jena-ARQ to execute the sparql query.

Related

Is there a central SPARQL endpoint for the semantic web

We are going to have the semantic web. Now we have LOD cloud.
Every data set has its own SPARQL endpoint.
I can query the dataset triples.
How can I query the whole semantic web or LOD?
No, there is no such single SPARQL endpoint, because the Semantic Web is decentralized by design. However, SPARQL 1.1 supports federated queries over different SPARQL endpoints using the SERVICE keyword. See https://www.w3.org/TR/sparql11-federated-query/ for reference. More specifically, there is a mention in the literature about how to determine which data sources might be relevant for query answering at Internet scale:
Hartig O., Bizer C., Freytag J.C. (2009) Executing SPARQL queries over the Web of Linked Data. In: Bernstein A. et al. (eds.) The Semantic Web – ISWC 2009. ISWC 2009. Lecture Notes in Computer Science, vol. 5823, pp. 293–309. Heidelberg: Springer. doi: 10.1007/978-3-642-04930-9_19
There exists a W3C-owned and (un-?)maintained wiki page with ~60 SPARQL endpoints. Many "last accessed/checked" entries are from 2010. On that page is a link to http://sparqles.ai.wu.ac.at/availability which lists more endpoints and is much more recent and up-to-date.
Read the 2nd paragraphs titled "SPARQL Endpoints" of the blogpost Querying DBpedia with GraphQL for a skeptical view of the state of SPARQL today. Cannot say it any better myself.
Also note that SPARQL permits every endpoint to offer any number of "named GRAPH" constructs that can be queried at that endpoint. So that is another feature more to consider.
There is no central point regarding the notion of a Semantic Web of Linked Data. Instead, like any Super Information Highway, you have major concentration points (hubs or junctions) that enable you to discover routes to a variety of destinations.
Major Semantic Web of Linked Data hubs that we oversee at OpenLink Software include:
DBpedia
DBpedia-Live
URIBurner
LOD Cloud Cache
Remember, the fundamental principle behind Linked Open Data is that hyperlinks (HTTP URIs) function as words in sentences constructed using RDF Language. Thus, you can use the SPARQL Query Language to produce query solutions (tables or graphs) that expose desired routes (e.g., using Property Paths).
Finally, you can also use Federated SPARQL Query (SPARQL-FED) to navigate a Semantic Web of Linked Data.
Examples:
select distinct *
where {
?s a <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 50
Query Solution Document Link.
We are also working on a publicly available Google Spreadsheet that provides additional information related to the kinds of datasets accessible via the LOD Cloud that we maintain.
To my knowledge LOD-a-lot is currently the one ongoing effort that gets closest to the vision of querying the whole web of data. And this is obviously done using different means than SPARQL endpoints.
It's still a prototype, which means bugs, but one of the aims of wimuQ is to provide a way to query all 539 public SPARQL endpoints + all datasets from LODLaundromat and LODStats, that is more than 600,000 datasets, more than 5 terabytes. As far as I know, it is the most extensive collection datasets accessible from one single place.
For more information, the paper is available here:

Mount a SPARQL endpoint for use with custom ontologies and triple RDFs

I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.

Using an ontology to produce semantic full information from the raw data

Problem Definition: Store sensor data (temperature readings, sensor description) into rdf form using an ontoloy. Further, use SPARQL to perform queries on stored data.
My Approach: I am not an expert in this domain, but I have some basic understanding and accordingly I am using this approach: 1. Create Ontology, 2. convert data according to ontology vocabulary, 3. store converted data into triple store, 4. Perform SPARQL queries. I am not sure whether I am following the right way. Any comments from your side will be valuable.
Till now, I have done the following:
I have created an ontology in Protege 5.0.0 as for representing temperature sensor. This ontology only represents one part of full ontology.
I have collected data in a CSV file which includes date, time and temperature reading as shown as
Now, I want to use this ontoloy for storing the csv file in rdf form in some data store. At this step I am stuck from last three days. I have found some links like link1, link2 but still I am finding it difficult to proceed further. Do I need a script which will read csv file and perform mapping to given ontology concepts. If yes, is there a sample script which does the same? Possibly, outcome might look like:
<datetime>valX</datetime>
<tempvalue>valY</tempvalue>
Can anyone guide me in the following:
1. Am I taking correct steps to solve the problem?
2. How should I solve step 3, i.e, store data according to ontology.
P.S: I have posted this question on answers.semanticweb.com also. This is only to get the response asap.
actually, this is a great use of D2RQ mapping language, and D2RQ server.
Go install D2RQ, then start it up with a connection to your relational database. Then generate a mapping file using their generator that comes with the software. Then you'll have a mapping file -- edit that and swap out the automatically generated ontology prefixes with your own. Their website has a page that explains how the mapping language works.
Once you've done that and there are no errors in the mapping file, you can actually query your whole relational dataset with SPARQL without even having to export it and load it in a real triplestore.
However, if you want to export and load into a triplestore, you'd just run the D2RQ generate triples functionality (Also included in d2rq server), and then import that triples file into a triplestore like Jena Fuseki.

Manipulating RDF in Jena

currently, I found out that I can query using model (Model) syntax in Jena in a rdf after loading the model from a file, it gives me same output if I apply a sparql query. So, I want to know that , is it a good way to that without sparql? Though I have tested it with a small rdf file. I also want to know if I use Virtuoso can i manipulate using model syntax without sparql?
Thanks in Advance.
I'm not quite sure if I understand your question. If I can paraphrase, I think you're asking:
Is it OK to query and manipulate RDF data using the Jena Model API instead of using
SPARQL? Does it make a difference if the back-end store is Virtuoso?
Assuming that's the right re-phrasing of the question, then the first part is definitively yes: you can manipulate RDF data through the Model and OntModel APIs. In fact, I would say that's what the majority of Jena users do, particularly for small queries or updates. I find personally that going direct to the API is more succinct up to a certain point of complexity; after that, my code is clearer and more concise if I express the query in SPARQL. Obviously circumstances will have an effect: if you're working with a mixture of local stores and remote SPARQL endpoints (for which sending a query string is your only option) then you may find the consistency of always using SPARQL makes your code clearer.
Regarding Virtuoso, I don't have any direct experience to offer. As far as I know, the Virtuoso Jena Provider fully implements the features of the Model API using a Virtuoso store as the storage layer. Whether the direct API or using SPARQL queries gives you a performance advantage is something you should measure by benchmark with your data and your typical query patterns.

Named Graphs and Federated SPARQL Endpoints

I recently came across the working draft for SPARQL 1.1 Federation Extensions and wondered whether this was already possible using Named Graphs (not to detract from the usefulness of the aforementioned draft).
My understanding of Named Graphs is a little hazy, save that the only thing I have gleamed from reading the specs comprises rules around the merger, non merger in relation to other graphs at query time. Since this doesn't fully satisfy my understanding, my question is as follows:
Given the following query:
SELECT ?something
FROM NAMED <http://www.vw.co.uk/models/used>
FROM NAMED <http://www.autotrader.co.uk/cars/used>
WHERE {
...
}
Is it reasonable to assume that a query processor/endpoint could or should in the context of the named graphs do the following:
Check is the named graph exists locally
If it doesn't then perform the following operation (in the case of the above query, I will use the second named graph)
GET /sparql/?query=EncodedQuery HTTP/1.1
Host: www.autotrader.co.uk
User-agent: my-sparql-client/0.1
Where the EncodedQuery only includes the second named graph in the FROM NAMED clause and the WHERE clause is amended accordingly with respect to GRAPH clauses (e.g if a GRAPH <http://www.vw.co.uk/models/used> {...} is being used).
Only if it can't perform the above, then do any of the following:
GET /cars/used HTTP/1.1
Host: www.autotrader.co.uk
or
LOAD <http://www.autotrader.co.uk/cars/used>
Return appropriate search results.
Obviously there might be some additional considerations around OFFSET's and LIMIT's
I also remember reading somewhere a long time ago in galaxy far far away, that the default graph of any SPARQL endpoint should be a named graph according to the following convention:
For: http://www.vw.co.uk/sparql/ there should be a named graph of: http://www.vw.co.uk that represents the default graph and so by the above logic, it should already be possible to federate SPARQL endpoints using named graphs.
The reason I ask is that I want to start promoting federation across the domains in the above example, without having to wait around for the standard, making sure that I won't do something that is out of kilter or incompatible with something else in the future.
Named graph and URLs used in federated queries (using SERVICE or FROM) are two different things. The latter point to SPARQL endpoints, the named graphs are within a triple store and have the main function of separating different data sets. This, in turn, can be useful to both improve performance and represent knowledge, such as representing what is the source of a set of statements.
For instance, you might have two data sources both stating that ?movie has-rating ?x and you might want to know which source is stating which rating, in this case you can use two named graphs associated to the two sources (e.g., http://www.example.com/rotten-tomatoes and http://www.example.com/imdb). If you're storing both data sets in the same triple store, probably you will want to use NGs, and remote endpoints are a different thing. Furthermore, the URL of a named graph can be used with vocabularies like VoID to describe a dataset as a whole (eg, the data set name, where and when the triples are imported from, who is the maintainer, user licence). This is another reason to partition your triple store into NGs.
That said, your mechanism to bind NGs to endpoint URLs might be implemented as an option, but I don't think it's a good idea to have it as mandatory, since managing remote endpoint URLs and NGs separately can be more useful.
Moreover, the real challenge in federated queries is to offer endpoint-transparent queries, making the query engine smart enough to analyse the query and understand how to split it and perform partial queries on the right endpoints (and join the results later, in an efficient way). There is a lot of research being done on that, one of the most significant results (as far as I know) is FedX, which has been used to implement several query distribution optimisations (example).
Last thing to add, I vaguely remember the convention that you mention about $url, $url/sparql. There are a couple of approaches around (e.g., LOD cloud). That said, in most nowadays triple stores (e.g., Virtuoso), queries that don't specify a named graph (don't use GRAPH) work in a way different than falling into a default graph case, they actually query the union of all named graphs in the store, which is usually much more useful (when you don't know where something is stated, or you want to integrate cross-graph data).