Federated SPARQL Query on a subgraph of a SPARQL endpoint - sparql

I would like to ask, how to do a federated SPARQL query on a subgraph of a SPARQL endpoint (not the entire remote SPARQL endpoint).
I got my data in Virtuoso v7 while the SPARQL endpoint is "http://localhost:8890/sparql", I'd like to do a remote query on a subgraph of this endpoint which is "http://localhost:8890/TC", and I tried
SELECT *
WHERE
{ SERVICE <http://localhost:8890/sparql>
{ SELECT ?subject ?predicate ?object
FROM <http://localhost:8890/TC>
WHERE
{ ?subject ?predicate ?object }
}
} LIMIT 50
And I got the error that "FROM" is not correctly used, so I have two questions:
1) can I do a remote query on a subgraph of a SPARQL endpoint?
2) can I have a SPARQL endpoint for each graph in Virtuoso v7?
Thanks a lot for your help.

You can use graph instead of from.
In your example:
SELECT *
WHERE
{
SERVICE <http://localhost:8890/sparql>
{
SELECT ?subject ?predicate ?object
WHERE
{ graph <http://localhost:8890/TC> { ?subject ?predicate ?object } }
}
} LIMIT 50
I tested this syntax with the following query in the Uniprot SPARQL endpoint (Virtuoso) while federating with dbpedia (Virtuoso):
SELECT *
WHERE
{ SERVICE <http://dbpedia.org/sparql>
{select distinct ?activity where { graph <http://dbpedia.org> {?activity a <http://www.ontologydesignpatterns.org/ont/d0.owl#Activity>} } LIMIT 10
}
} LIMIT 50

Related

SPARQL query containing 'coalesce' returns no result on rdflib-endpoint.SparqlEndpoint()

I am running a SparqlEndpoint from this package: https://pypi.org/project/rdflib-endpoint/
I define a graph,
g = Graph(), then
g.parse() a number of .ttl files and create the endpoint with SparqlEndpoint(graph=g).
Then I use uvicorn.run(app, ...) to expose the endpoint on my local machine.
I can successfully run this simple sparql statement to query for keywords:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT
?keyword
WHERE { ?subj dcat:keyword ?keyword }
However, as soon as I add a coalesce statement, the query does not return results any more:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT
?keyword
(coalesce(?keyword, "xyz") as ?foo)
WHERE { ?subj dcat:keyword ?keyword }
I tried something similar on wikidata which has no problems:
SELECT ?item ?itemLabel
(coalesce(?itemLabel, 2) as ?foo)
WHERE
{
?item wdt:P31 wd:Q146. # Must be of a cat
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Is there maybe an issue with the rdflib-endpoint.SparqlEndpoint()?
The server debug output for the keywords query looks ok to me.
INFO: 127.0.0.1:43942 - "GET /sparql?format=json&query=%0APREFIX+dcat%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fdcat%23%3E%0ASELECT+%0A%3Fkeyword%0A%28coalesce%28%3Fkeyword%2C+%22xyz%22%29+as+%3Ffoo%29+%0AWHERE+%7B+%3Fsubj+dcat%3Akeyword+%3Fkeyword+%7D%0ALIMIT+10%0A HTTP/1.1" 200 OK
Putting COALESCE in the WHERE clause does not solve the problem:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT
?keyword
?foo
WHERE {
?subj dcat:keyword ?keyword
BIND(coalesce(?keyword, 2) as ?foo)
}
TIA

Prevent timeout while querying Dbpedia endpoint using Apache Jena

I'm using Apache Jena to fetch a huge amount of data from Dbpedia and write it into a CSV file. However, I'm only able to get about 10,000 triples and not the entire data. I need it to fetch all triples in the query. I can't identify whether it is an endpoint timeout or something else. The code I've written is as follows:
public class FetchCountriesData {
public void getCountriesInformation() throws FileNotFoundException {
ParameterizedSparqlString qs = new ParameterizedSparqlString("PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n "
+ "SELECT * { ?Subject rdf:type <http://dbpedia.org/ontology/Country> . ?Subject ?Predicate ?Object } ORDER BY ?Subject ");
QueryExecution exec = QueryExecutionFactory.sparqlService("https://dbpedia.org/sparql", qs.asQuery());
//exec.setTimeout(10000000);
exec.setTimeout(10, TimeUnit.MINUTES);
ResultSet results = exec.execSelect();
ResultSetFormatter.outputAsCSV(new FileOutputStream(new File("C:/fakepath/CountryData.csv")), results);
ResultSetFormatter.out(results);
}
}
You are almost certainly hitting one of DBPedias limits. For further information see http://wiki.dbpedia.org/OnlineAccess and http://lists.w3.org/Archives/Public/public-lod/2011Aug/0028.html

SPARQL Query Error with OPTION(TRANSITIVE) on Jena

I have the following Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type
WHERE
{
{
SELECT *
WHERE
{
?x rdfs:subClassOf ?type .
}
}
OPTION (TRANSITIVE, t_distinct, t_in (?x), t_out (?type) ) .
FILTER (?x = <http://dbpedia.org/ontology/Hospital>)
}
It works fine when i send it to Virtuoso endpoint but does not work on my Jena instance. In specific i get the following error:
INFO [1] 400 Parse error:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type
WHERE
{
{
SELECT *
WHERE
{
?x rdfs:subClassOf ?type .
}
}
OPTION (TRANSITIVE, t_distinct, t_in (?x), t_out (?type) ) .
FILTER (?x = <http://dbpedia.org/ontology/Hospital>)
}
Lexical error at line 12, column 39. Encountered: " " (32), after : "OPTION" (17 ms)
In case this a Virtuoso specific function, I would appreciate to know an equivalent for this query that would work with *Jena/Standard SPARQL). The expected output should be:
http://dbpedia.org/ontology/Building
http://dbpedia.org/ontology/ArchitecturalStructure
http://dbpedia.org/ontology/Place
http://dbpedia.org/ontology/d0:Location
which represents all superclasses for "Hospital"
This is the expected behavior. This part of the query:
OPTION (TRANSITIVE, t_distinct, t_in (?x), t_out (?type) )
is not standard SPARQL 1.1 but it is a Virtuoso specific extension.
Jena is a SPARQL 1.1 compliant implementation.
The following query does the same thing using standard SPARQL 1.1 syntax, and should work with both Fuseki and Virtuoso (just tested on the dbpedia endpoint and got the same result):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type
WHERE
{
{
SELECT *
WHERE
{
?x rdfs:subClassOf+ ?type .
}
}
FILTER (?x = <http://dbpedia.org/ontology/Hospital>)
}
The feature used is the "property path".
See http://www.w3.org/TR/sparql11-query/

extract city data from dbpedia or LinkedGeoData

I'm trying now for a couple of hours to figure out how to get various informations out of dbpedia or LinkedGeoData.
I used this interface (http://dbpedia.org/snorql) and tried a different approaches, but I never got the result that I need.
If I use something lik this:
SELECT * WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationTotal> ?populationTotal.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationUrban> ?populationUrban.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/areaTotal> ?areaTotal.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationUrbanDensity> ?populationUrbanDensity.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/isPartOf> ?isPartOf.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/country> ?country.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/utcOffset> ?utcOffset.
}
OPTIONAL {
?subject <http://dbpedia.org/property/janHighC> ?utcOffset.
}
OPTIONAL {
?subject <http://dbpedia.org/property/janLowC> ?utcOffset.
}
}
LIMIT 20
I run out of limits.
I also tried this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
?subject rdfs:label ?label.
FILTER ( lang(?label) = 'en'
}
LIMIT 100
But that give me en error, which I don't understand. If I remove the FILTER, it works but give me the labels in all languages...
What I'm looking for is something like this http://dbpedia.org/page/Vancouver
But not all the data, but some of it like population, area, coutry, elevation, lat, long, timezone, label#en, abstract#en etc.
Can someone help me to get working syntax?
Thanks for y'all help.
UPDATE:
I got it to work so far with:
SELECT DISTINCT *
WHERE {
?city rdf:type dbpedia-owl:Settlement ;
rdfs:label ?label;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:populationTotal ?pop ;
dbpedia-owl:country ?country ;
dbpprop:website ?website .
FILTER ( lang(?abstract) = 'en' && lang(?label) = 'en')
}
LIMIT 20
But still running out of limits if I want to get all settlements. Btw. is there a way to get all cities and settlements in one table?
By "run out of limits", do you mean the error "Bandwidth Limit Exceeded URI = '/!sparql/'"? I guess this is a limit set by dbpedia to make sure that it is not flooded with queries that take "forever" to run, and if so, then there is probably not so much you can do. You can ask for data in chunks, using OFFSET, LIMIT and ORDER BY, see http://www.w3.org/TR/rdf-sparql-query/#modOffset.
UPDATE: Yes, this seems to be the way to go: http://www.mail-archive.com/dbpedia-discussion#lists.sourceforge.net/msg03368.html
In the second query the error is a missing parenthesis. This
FILTER ( lang(?label) = 'en'
should be
FILTER ( lang(?label) = 'en')
For your last question, a natural way to collect multiple things/(similiar queries) in one query/table is using UNION, e.g.,
SELECT ?x
WHERE {
{ ?x rdf:type dbpedia-owl:City }
UNION
{ ?x rdf:type dbpedia-owl:Settlement }
}

Rewrite SPARQL DESCRIBE query as CONSTRUCT

For some reason I can't issue DESCRIBE queries using Redland ( librdf.org ), is it possible to rewrite DESCRIBE as a CONSTRUCT QUERY for a given URI?
DESCRIBE <urn:my-uri>
I was thinking about writting it into something like this but I don't think this is valid in SPARQL
CONSTRUCT { ?subject ?predicate ?object }
WHERE {
{ ?subject ?predicate ?object }
AND {
{ <urn:my-uri> ?predicate ?object }
OR { ?subject <urn:my-uri> ?object }
OR { ?subject ?predicate <urn:my-uri> }
}
}
Your are right that is not a valid SPARQL. The closest thing to your OR is UNION. And, there is no need for the AND operator, every triple pattern is by default a join not a union.
For what you are trying is better to use a FILTER, like this example:
CONSTRUCT { ?subject ?predicate ?object }
WHERE { ?subject ?predicate ?object .
FILTER ( ?subject = <urn:your_uri> || ?object = <urn:your_uri>)
}
In some systems, for large knowledge bases, this query can be very expensive. And also if your database contains bNodes this query won't get the description of those nodes, it will get just the internal code. For most cases, running a DESCRIBE manually can't be accomplished with a single query and you'll have to implement some recursive logic in order to get all the information that describes a URI.
After trying something like the FILTER ( A || B ) method, I got the impression that it is pretty slow.
I think you can do the same thing, basically, but using VALUES and UNION
I tried it on DBPedia (~2.46 billion triples) with a movie, and it seemed to perform well.
CONSTRUCT {
?subject ?predicate ?object
}
WHERE {
{ ?subject ?predicate ?object .
VALUES ?subject { dbpedia:The_Matrix }
}
UNION
{ ?subject ?predicate ?object .
VALUES ?object { dbpedia:The_Matrix }
}
}
sparql result on dbpedia
Edit: Just for the sake of additional info, I think you could technically also write the following:
CONSTRUCT { ?subject ?predicate ?object }
WHERE {
?subject ?predicate ?object .
OPTIONAL { dbpedia:The_Matrix ?predicate ?object . }
OPTIONAL { ?subject ?predicate dbpedia:The_Matrix . }
}
but some popular RDF databases really can't handle OPTIONAL very performantly yet, and will die.