SPARQL - CONSTRUCT Query Performance Problem - sparql

i've to query a set of statements before i create a new version of an object with the same identifier. With the result i analyse the changes and reuse referenced objects, if the haven't changed. But the following query is aweful and very slow. if there are ~ 1000 object versions, it runs about 120 seconds. and i've to import lot more !
Is there a way to query the statements in a more performant way? I know the "OPTIONAL" is bad, but the properties can be empty.
Thanks
PREFIX schema: <https://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX as: <https://www.w3.org/ns/activitystreams#>
CONSTRUCT {
?publication rdf:type ?type ;
schema:about ?about ;
schema:about ?about;
schema:name ?name ;
as:name ?asname ;
as:published ?published ;
as:attributedTo ?attributedTo ;
schema:creativeWorkStatus ?creativeWorkStatus ;
schema:dateCreated ?dateCreated ;
schema:dateModified ?dateModified ;
schema:description ?description ;
schema:identifier ?identifier ;
schema:keywords ?keywords ;
schema:license ?license ;
schema:version ?version .
?about rdf:type ?aboutType ;
schema:name ?aboutName ;
schema:contactPoint ?aboutContactPoint ;
schema:location ?aboutLocation .
?aboutContactPoint rdf:type ?contactPointType ;
schema:email ?contactPointEmail ;
schema:name ?contactPointName ;
schema:telephone?contactPointTelephone .
?aboutLocation rdf:type ?locationType ;
schema:latitude ?locationLatitude ;
schema:longitude ?locationLongitude ;
schema:address ?locationAddress .
# Adress
?locationAddress rdf:type ?addressType ;
schema:addressCountry ?addressCountry ;
schema:addressLocality ?addressLocality ;
schema:streetAddress ?addressStreetAddress ;
schema:postalCode ?addressPostalCode .
}
FROM <http://localhost:8081/camel/fef6dc4b-e1e9-46f5-a34e-8ab5c1600ec4>
WHERE {
?publication rdf:type schema:CreativeWork ;
rdf:type ?type ;
schema:identifier "6472c8bbf6504b56b473a1e6a10fcc8d".
OPTIONAL {?publication schema:name ?name . }
OPTIONAL {?publication as:name ?asname . }
OPTIONAL {?publication as:published ?published . }
OPTIONAL {?publication as:attributedTo ?attributedTo . }
OPTIONAL {?publication schema:creativeWorkStatus ?creativeWorkStatus . }
OPTIONAL {?publication schema:dateCreated ?dateCreated . }
OPTIONAL {?publication schema:dateModified ?dateModified . }
OPTIONAL {?publication schema:description ?description . }
OPTIONAL {?publication schema:identifier ?identifier . }
OPTIONAL {?publication schema:keywords ?keywords . }
OPTIONAL {?publication schema:license ?license . }
OPTIONAL {?publication schema:version ?version . }
OPTIONAL {?publication schema:about ?about . }
# Organization
OPTIONAL {?publication schema:about/rdf:type ?aboutType . }
OPTIONAL {?publication schema:about/schema:name ?aboutName . }
OPTIONAL {?publication schema:about/schema:contactPoint ?aboutContactPoint . }
OPTIONAL {?publication schema:about/schema:location ?aboutLocation . }
# ContactPoint
OPTIONAL {?publication schema:about/schema:contactPoint/rdf:type ?contactPointType . }
OPTIONAL {?publication schema:about/schema:contactPoint/schema:email ?contactPointEmail . }
OPTIONAL {?publication schema:about/schema:contactPoint/schema:name ?contactPointName . }
OPTIONAL {?publication schema:about/schema:contactPoint/schema:telephone?contactPointTelephone . }
# Place
OPTIONAL {?publication schema:about/schema:location/rdf:type ?locationType . }
OPTIONAL {?publication schema:about/schema:location/schema:latitude ?locationLatitude . }
OPTIONAL {?publication schema:about/schema:location/schema:longitude ?locationLongitude . }
OPTIONAL {?publication schema:about/schema:location/schema:address ?locationAddress . }
# Adress
OPTIONAL {?publication schema:about/schema:location/schema:address/rdf:type ?addressType . }
OPTIONAL {?publication schema:about/schema:location/schema:address/schema:addressLocality ?addressLocality . }
OPTIONAL {?publication schema:about/schema:location/schema:address/schema:streetAddress ?addressStreetAddress . }
OPTIONAL {?publication schema:about/schema:location/schema:address/schema:postalCode ?addressPostalCode . }
OPTIONAL {?publication schema:about/schema:location/schema:address/schema:addressCountry ?addressCountry . }
}

Use VALUES to avoid OPTIONAL.
VALUES ?p {
schema:name
as:name
as:published
as:attributedTo
schema:creativeWorkStatus
schema:dateCreated
schema:dateModified
schema:description
schema:identifier
schema:keywords
schema:license
schema:version
schema:about
}
?publication ?p ?o .

Related

SPARQL query for dbpedia to find ski lifts

I'm currently trying to teach myself how to formulate SPARQL queries to extract tourism-related information from DBpedia (via http://dbpedia.org/sparql/).
So far, I've managed to get all museums for a country.
select ?thing ?type ?category ?long ?lat ?country
where
{
VALUES ?country { <http://dbpedia.org/resource/Canada> }
optional
{
?city dbo:country ?country
}
?thing dbo:location ?city.
optional
{
?thing a ?type .
VALUES ?type { dbo:Museum }
BIND( 'Museum' as ?category )
}
optional
{
?thing a ?type.
VALUES ?type { dbo:skiLift }
BIND( 'Skilift' as ?category )
}
optional
{
?thing geo:long ?long.
?thing geo:lat ?lat
}
{
?thing a dbo:Place
}
filter (BOUND (?type))
}
However, I don't understand what I need to do to also get the same information for things like dbo:skiLift, dbo:touristicSite and the like (found here: http://dbpedia.org/ontology/Place).
What am I doing wrong?
This is because both dbo:skiLift and dbo:touristicSite are properties. These resources show up in the page for Place not as subclasses of Place, but as properties which have the class Place as their domain or range. If you want to find subclasses of Place you can perform the exploratory query (which also uses property path to retrieve the transitive closure of the subClassOf property):
select ?thing
where
{
?thing rdfs:subClassOf+ <http://dbpedia.org/ontology/Place> .
}
Apart from that, I cannot understand why you use two optional clauses for different types in the same query. For example, the following query retrieves museums located at cities of Canada, possibly with their lat and lon, without the use of other optional clauses:
select ?thing ?city ?long ?lat
where
{
?city dbo:country <http://dbpedia.org/resource/Canada> .
?thing dbo:location ?city .
?thing a dbo:Museum .
optional
{
?thing geo:long ?long .
?thing geo:lat ?lat
}
}

Group_concat in SPARQL

I am a beginner with SPARQL, and I am trying to deal with the endpoint of the Spanish National Library.
I have a code that works, here it is:
prefix bne: <http://datos.bne.es/def/> # base URI for ontology documented at http://datos.bne.es/def/
prefix resource: <http://datos.bne.es/resource/>
select distinct
?book
?author
?title
?subtitle
?ISBN
?publisher
?date
?pags
?size
?series
?edition
?subjectLabel
where {
?book a bne:C1003 .
?book bne:P3001 "Errata Naturae" .
?book bne:P1011 ?author .
?book bne:P3002 ?title .
OPTIONAL { ?book bne:P3013 ?ISBN }
OPTIONAL { ?book bne:P3001 ?publisher }
OPTIONAL { ?book bne:P3006 ?date }
OPTIONAL { ?book bne:P3004 ?pags }
OPTIONAL { ?book bne:P3007 ?size }
OPTIONAL { ?book bne:P3016 ?series }
OPTIONAL { ?book bne:P1004 ?date }
OPTIONAL { ?book bne:P3017 ?edition }
OPTIONAL { ?book bne:P3014 ?subtitle }
OPTIONAL { ?book bne:OP3008 ?subject }
?subject rdfs:label ?subjectLabel
}
limit 50
But as some books has two or more subjects, the SPARQL repeats them in the results. I used group_concat, but for some reason it doesn't work:
prefix bne: <http://datos.bne.es/def/> # base URI for ontology documented at http://datos.bne.es/def/
prefix resource: <http://datos.bne.es/resource/>
select distinct
?book
?author
?title
?subtitle
?ISBN
?publisher
?date
?pags
?size
?series
?edition
(GROUP_CONCAT(DISTINCT(?subjectLabel); separator="//") as ?subjects)
where {
?book a bne:C1003 .
?book bne:P3001 "Errata Naturae" .
?book bne:P1011 ?author .
?book bne:P3002 ?title .
OPTIONAL { ?book bne:P3013 ?ISBN }
OPTIONAL { ?book bne:P3001 ?publisher }
OPTIONAL { ?book bne:P3006 ?date }
OPTIONAL { ?book bne:P3004 ?pags }
OPTIONAL { ?book bne:P3007 ?size }
OPTIONAL { ?book bne:P3016 ?series }
OPTIONAL { ?book bne:P1004 ?date }
OPTIONAL { ?book bne:P3017 ?edition }
OPTIONAL { ?book bne:P3014 ?subtitle }
OPTIONAL { ?book bne:OP3008 ?subject }
?subject rdfs:label ?subjectLabel
}
limit 50
group by ?book
order by ?date
Does someone know where I am making a mistake?
Thanks!
Edit:
I was doing one thing wrong: as #AKSW said, I have to group all the variables at the end of the code, or add variables on the select. I have a reduced version of the code for testing this:
PREFIX bne: <http://datos.bne.es/def/>
PREFIX resource: <http://datos.bne.es/resource/>
SELECT ?book ?author ?title (GROUP_CONCAT(DISTINCT ?subject ; separator='//') AS ?subjects)
WHERE
{ ?book a bne:C1003 ;
bne:P3001 "Errata Naturae" ;
bne:P1011 ?author ;
bne:P3002 ?title ;
bne:OP3008 ?subject
}
#group everything here
GROUP BY ?book ?author ?title
#JeenBroekstra, when I run it in a SPARQL Validator, it says it is OK, but when I try to run it in the SPARQL endpoint of the library, it gives me an error:
Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at 'GROUP_CONCAT' before '('
As noted in comments, the problem is that the target endpoint is running on Virtuoso, Open Source Edition, v06.01.3127 as of 2011-11-16), which did not support GROUP_CONCAT in SPARQL, as SPARQL 1.1 had not yet been finalized.
Upgrading to a current version is strongly recommended!
There is a built-in function in this version of Virtuoso, available as sql:group_concat, as documented, which may serve you for now.

Virtuoso giving wrong result, redirect involved

I have this query
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia_property: <http://dbpedia.org/property/>
PREFIX dbpedia_ontology: <http://dbpedia.org/ontology/>
PREFIX yago: <http://dbpedia.org/class/yago/>
PREFIX schema: <http://schema.org/>
SELECT * WHERE
{
{
SELECT ?school, ?name, ?snippet, ?url, ?pageid, ?alias_1, ?alias_2, ?alias_3
WHERE
{
{ ?school rdf:type schema:EducationalOrganization . }
UNION
{ ?school rdf:type yago:EducationalInstitution108276342 . }
?school rdfs:label ?name .
OPTIONAL {
?school foaf:isPrimaryTopicOf ?url .
}
OPTIONAL {
?school dbpedia_ontology:wikiPageID ?pageid .
}
OPTIONAL {
?school rdfs:comment ?snippet .
FILTER (langMatches(lang(?snippet),"en"))
}
OPTIONAL {
?school dbpedia_property:name ?alias_1 .
FILTER ( langMatches(lang(?alias_1),"en") )
}
OPTIONAL {
?school foaf:name ?alias_2 .
FILTER ( langMatches(lang(?alias_2),"en") )
}
OPTIONAL {
?school dbpedia_ontology:wikiPageRedirects ?temp .
?temp rdfs:label ?alias_3 .
FILTER ( langMatches(lang(?alias_3),"en") )
}
OPTIONAL {
?school rdf:type ?excluded .
FILTER (?excluded = schema:Library)
}
FILTER ( langMatches(lang(?name),"en") && !BOUND(?excluded) )
}
ORDER BY ?name
}
}
LIMIT 1
OFFSET 0
You can see that the result gives the resource
http://dbpedia.org/resource/"Wesleyan_Methodist_College"
This will be redirected to
http://dbpedia.org/page/Southern_Wesleyan_University
Why doesn't Virtuoso resolve the resource and give the final destination?
Is there a way to instruct it to ignore the redirects?
The /resource/ and the /page/ about the resource are different things. One has a length in bytes, for example.
A web page is not an schema:EducationalOrganization.
If you look up with HTTP the /resource/, DBpedia sends back an HTTP 303 which a browser will then follows. That's your browser's choice.
See the output from:
wget --max-redirect 0 -O/dev/null -S http://dbpedia.org/resource/Wesleyan_Methodist_College
or
curl -v --max-redirs 0 http://dbpedia.org/resource/Wesleyan_Methodist_College

SPARQL DBpedia by taxonomic term

Have the following working SPARQL query that selects items from DBpedia that include the string "fish" in their name.
SELECT ?name, ?kingdom, ?phylum, ?class, ?order, ?family, ?genus, ?species, ?subspecies, ?img, ?abstract
WHERE {
?s dbpedia2:regnum ?hasValue;
rdfs:label ?name
FILTER regex( ?name, "fish", "i" )
FILTER ( langMatches( lang( ?name ), "EN" ))
?animal dbpedia2:name ?name;
foaf:depiction ?img;
dbpedia2:regnum ?kingdom
OPTIONAL { ?animal dbpedia2:ordo ?order . }
OPTIONAL { ?animal dbpedia2:phylum ?phylum . }
OPTIONAL { ?animal dbpedia2:classis ?class . }
OPTIONAL { ?animal dbpedia2:familia ?family . }
OPTIONAL { ?animal dbpedia2:genus ?genus . }
OPTIONAL { ?animal dbpedia2:species ?species . }
OPTIONAL { ?animal dbpedia2:subspecies ?subspecies . }
OPTIONAL {
FILTER ( langMatches( lang( ?abstract ), "EN" ))
}
}
GROUP BY ?name
LIMIT 500
Here is the result on SNORQL.
This approach finds animals with the word "fish" in their name (example: "starfish" which is not a fish but member of the phylum Echinoderm).
Would like a more precise query that selects DBpedia items by phylum, or by class, or by order, etc.
How to change the query to search only on dbpedia2:phylum (Chordata); on dbpedia2:classis (Actinopterygii); on dbpedia2:familia; etc. ?
Looking at Tuna, I see that there is a rdf:type assertion for the class
http://umbel.org/umbel/rc/Fish
that looks useful. E.g.,
select ?fish { ?fish a <http://umbel.org/umbel/rc/Fish> }
SPARQL results (10,000)
There's also the dbpedia-owl:Fish class, which gets more results:
select (count(*) as ?nFish) where {
?fish a dbpedia-owl:Fish .
}
SPARQL results (17,420)
While Wikipedia has lots of scientific classification information, I don't see much of it reflected in DBpedia. E.g,. while the Wikipedia article for Tuna has kingdom, phylum, class, order, etc., I don't see that data in the corresponding DBpedia resource.
Notes
Note that your query, as written, isn't actually legal SPARQL (even if Virtuoso, the SPARQL endpoint that DBpedia uses, accepts it). You can't have commas between the projection variables. Also, once you group by one variable, the non-group variables can't appear in the variable list. You could sample the other values though. E.g., you should end up with something like:
SELECT
?name
(sample(?kingdom) as ?kingdom_)
(sample(?phylum) as ?phylum_)
#-- ...
(sample(?img) as ?img_)
(sample(?abstract) as ?abstract_)
WHERE {
?s dbpedia2:regnum ?hasValue;
rdfs:label ?name
FILTER regex( ?name, "fish", "i" )
FILTER ( langMatches( lang( ?name ), "EN" ))
?animal dbpedia2:name ?name;
foaf:depiction ?img;
dbpedia2:regnum ?kingdom
OPTIONAL { ?animal dbpedia2:ordo ?order . }
OPTIONAL { ?animal dbpedia2:phylum ?phylum . }
OPTIONAL { ?animal dbpedia2:classis ?class . }
OPTIONAL { ?animal dbpedia2:familia ?family . }
OPTIONAL { ?animal dbpedia2:genus ?genus . }
OPTIONAL { ?animal dbpedia2:species ?species . }
OPTIONAL { ?animal dbpedia2:subspecies ?subspecies . }
OPTIONAL {
FILTER ( langMatches( lang( ?abstract ), "EN" ))
}
}
GROUP BY ?name
LIMIT 500

sparql - get a list of cities in certain country from dbpedia

I want to get triples about cities, which are from certain country. How can I do that?
I tried:
CONSTRUCT { ?c rdfs:label ?name . ?c rdfs:comment ?desc }
WHERE {
?c dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_settlement> .
?c rdfs:label ?name .
?c rdfs:comment ?desc .
?c <http://dbpedia.org/ontology/country> ?country . ?country a <http://dbpedia.org/resource/CountryName>
FILTER ( lang(?name) = "en" && lang(?desc) = "en" )
}
but no luck :/ how can i do this?
CONSTRUCT { ?c rdfs:label ?name }
WHERE {
?c dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_settlement> .
?c rdfs:label ?name .
?c dbpedia-owl:country <http://dbpedia.org/resource/Country> .
OPTIONAL { ?c dbpedia-owl:areaCode ?areacode }
FILTER ( lang(?name) = "pl" && ?population > 5000)
}
Hope it will help :)