SPARQL property paths - sparql

I have the following RDF data
(Person, worksAt, Branch)
(Branch, location, Town)
(Town, country, Country)
and the property path
worksAt, location, country
Can I formulate a SPARQL query where given a property e.g. country
it will return me the most-left class (i.e. Person) of the graph?

without property paths
SELECT ?person WHERE
{
?person :worksAt ?branch.
?branch :location ?town.
?town :country :?country.
}
with property paths
Using a sequence path defined as:
elt1 / elt2 A sequence path of elt1, followed by elt2
SELECT ?person WHERE
{
?person :worksAt/:location/:country ?country.
}
I don't understand what you mean with "given a property e.g. country", do you mean a specific instance instead? For example if the person should work in Germany, you would replace ?country with :Germany.
These examples assume that your properties only have a single class as a domain, e.g. only persons can work somewhere. Otherwise you have to add ?person a :Person. and so on.

Related

Wikidata SPARQL: Get value if only one value, otherwise get value with sub-property

Question
Is there a way of expressing the following in Wikidata sparql:
#pseudo code
if property has one value:
use value
else:
use value with sub-property x
Context and Current Attempts
I'm trying to use Wikidata get the given names of members of the Swedish parliament (to use in a data visualisation of their elections).
I want to get one given name per member of the Swedish parliament.
Here is a Wikidata example of a person with one given name (Fredrick).
Here is a Wikidata example of a person with multiple given names (Gustav, Per and Edvard).
The name Gustav has the 'object has role' property with a value 'usual first name'.
The first example's name (Fredrick) does not have the 'object has role' property with a value 'usual first name'.
The following code will return a row for each first name (i.e. 3 rows for the second example: Gustav, Per and Edvard)
SELECT ?personLabel ?givenNamesLabel
WHERE {
?person wdt:P39 wd:Q10655178 . # ?person, held position, member of the Swedish parialment
?person wdt:P735 ?givenNames . # ?person, given name, ?givenNames
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE]".
}
}
The following code selects only the names with the 'object has role' property with a value 'usual first name'.
The first example (Fredrick) would not be included.
SELECT ?personLabel ?givenNameSingularLabel
WHERE {
?person wdt:P39 wd:Q10655178 . # ?person, held position, member of the Swedish parialment
?person p:P735 ?givenNames . # ?person, given name, ?givenNames
?givenNames ps:P735 ?givenNameSingular . # ?person, given name, ?givenNameSingular
?givenNames pq:P3831 wd:Q3409033 . # ?givenNames, object has role, usual first name
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE]".
}
}
Question in Context
Is there a way of writing:
#pseudo code
if given name has one value:
use value
else:
use value with sub-property 'object has role' = 'usual first name' value
I could not find a query structure to implement:
#pseudo code
if given name has one value:
use value
else:
use value with sub-property 'object has role' = 'usual first name' value
But I managed to implement:
#pseudo code
if 'given name' with sub-property 'object has role' = 'usual first name' exists:
use value
else:
use the MAX given name
This works well enough for what I trying to solve
SELECT ?person ?personLabel (MAX(?firstNameLbl) AS ?firstName)
WHERE {
?person wdt:P39 wd:Q10655178 . # ?person, held position, member of the Swedish parialment
#Optional is used because some members of the Swedish parliament do not have given names
#.. in Wikidata (example: Q97965511) but we want to include them.
OPTIONAL {
?person wdt:P735 ?givenNames . # ?person, given name, ?givenNames
}
#If a person has a given name with a 'usual first name' role:
#.. assign that name to ?usualFirstName
OPTIONAL{
?person p:P735 ?givenNamesProp . # ?person, given name, ?givenNames
?givenNamesProp ps:P735 ?usualFirstName . # ?person, given name, ?givenNameSingular
?givenNamesProp pq:P3831 wd:Q3409033 . # ?givenNames, object has role, usual first name
}
#If usualFirstNameLabel use it, otherwise if usualFirstNameLabel use it, otherwise "<no label>"
BIND(COALESCE(?usualFirstNameLabel, ?givenNamesLabel, "<no label>") AS ?firstNameLbl).
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE]".
#Not 100% about this, but apparently there is a bug in WDQS and
#.. sometimes the labels have to be set manually
?person rdfs:label ?personLabel .
?givenNames rdfs:label ?givenNamesLabel .
?usualFirstName rdfs:label ?usualFirstNameLabel .
}
}
GROUP BY ?person ?personLabel

DBPedia: all population fields?

I am trying to extract all entities in DBPedia that have a population. However, I have found that there are different field names for population depending on the entity. For instance, http://dbpedia.org/page/Boston has the field populationTotal while http://dbpedia.org/page/Alaska has the field 2010pop. Is there a complete list of the population fields that I can query for?
Solution via #AKSW above: query for all properties that start with "pop" and have a specified range.
SELECT ?p ?range {
?p a rdf:Property
FILTER(regex(str(?p), "pop"))
OPTIONAL {?p rdfs:range ?range}
}

How to exclude nodes from path?

I want to get all mathematicians from DBpedia, so I wrote this query for DBpedia's SPARQL service:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
}
The problem with this is that the category Mathematicians is polluted, due to categories like dbc:Euclid, which then includes all of Euclidean geometry. I believe it's categories like these which cause the query to fail:
Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory. use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool
A lot of the problematic categories are in dbc:Wikipedia_categories_named_after_mathematicians.
Is there some way to ignore these categories in the skos:broader* path that would make the error go away?
You can list the categories that you don't want to include by filtering them out:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
FILTER (?category NOT IN (dbc:Euclid))
}
But that won't remove the error because Virtuoso still needs to traverse the skos:broader hierarchy, exhausting 'transitive heap memory'. Other approaches include selecting specific categories or traversing part of the hierarchy.
The specific category could use UNION statements, but the VALUES shortcut is a simpler syntax:
SELECT DISTINCT ?person
{
VALUES ?category {dbc:Mathematicians dbc:Mental_calculators dbc:Lists_of_mathematicians}
?person dct:subject ?category.
}
For querying part of the hierarchy, you can use some property path expressions. This one will get parents and grandparents:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader | (skos:broader/skos:broader) dbc:Mathematicians.
# filter as desired - FILTER (?category NOT IN (dbc:Euclid))
}

filter for two properties having the same value

I need to retrieve the list of south american countries where the capital is also the largest city. For some reason I can display all South American countries with their capitals, but I can't use a filter option to compare the capital with the largest city:
SELECT DISTINCT ?country ?capital
WHERE {
?country a dbo:Country .
?country a <http://dbpedia.org/class/yago/SouthAmericanCountries>.
?country dbp:largestCity|dbo:largestCity ?capital.
}
For the 3rd triplet I use 2 arguments because not all countries have complete data. After running this on the dbpedia validator I obtain this:
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+DISTINCT+%3Fcountry+%3Fcapital%0D%0AWHERE+%7B%0D%0A%0D%0A++++%3Fcountry+a+dbo%3ACountry+.%0D%0A++++%3Fcountry+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fclass%2Fyago%2FSouthAmericanCountries%3E.%0D%0A++++%0D%0A++++%3Fcountry+dbp%3AlargestCity%7Cdbo%3AlargestCity+%3Fcapital.%0D%0A%0D%0A++++%0D%0A%0D%0A%7D&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on
The next step is to filter the countries using this logic: largestCity=capital. So I used:
FILTER (dbp:largestcity="capital")
but the Viruoso validator sends me an exception for exceeding the execution time.
Any suggestions?
I'm not sure what you're expecting your filter to do. In FILTER (dbp:largestcity="capital"), "capital" is a string, and dbp:largestcity (which is not the same as dbp:largestCity, by the way) is an IRI; they're never going to be equal.
Now, I do see that sometimes the value of dbp:largestCity is the string "capital"#en, so there is some string matching that may be helpful. In general, though, the DBpedia ontology properties have much cleaner data than the raw infobox properties, so if you can, you should prefer the dbo: properties. Here, I guess, you'll want both, though.
You need your query to extract the capital, as well the largest city, and then you want to filter for cases where they're equal, or where the largest city is the string "capital".
SELECT DISTINCT ?country ?capital WHERE {
?country a dbo:Country .
?country a <http://dbpedia.org/class/yago/SouthAmericanCountries>.
?country dbo:capital ?capital .
?country dbp:largestCity|dbo:largestCity ?largestCity .
filter (?largestCity = ?capital || str(?largestCity)= "capital")
}
SPARQL results

How to know if a string is a proper name of person or a place name using DBpedia?

I am using SPARQL query on DBpedia into a Prolog project and I have a doubt. I would know if a word is, most probably, a NAME OF A PERSON (something like: John, Mario) or a PLACE (like a city: Rome, London, New York).
I have implement the following two queries, the first gives me the number of persons having a specific name, and the second gives me the number of places having a specific name.
1) Query for a PERSON NAME:
select COUNT(?person) where {
?person a dbpedia-owl:Person .
{ ?person foaf:givenName "John"#en }
UNION
{ ?person foaf:surname "John"#en }
}
For the name John, I obtain the following output: callret-0: 7313, so I think that it has found 7313 instances for the proper name John. Is it right?
2) Query for a PLACE NAME:
select COUNT(?place) where {
?place a dbpedia-owl:Place .
{ ?x rdfs:label "John"#en }
}
The problem is that, as you can see in the previous “place” query, I have inserted John as parameter, which is not a place name but a proper name of persons, but I obtain the following strange result: callret-0: 81900104
The problem is that, in this way, if I compare the output of the previous two queries, it seems that John is a place and not a person name! This is not good for my scope; I have tried with other personal names and it always happens that the place query gives me a bigger output than the name query.
Why? What am I missing? Are there some errors in my queries? How can I solve it to have a correct result?
Actually, when I run the query you provided:
select COUNT(?place) where {
?place a dbpedia-owl:Place .
{ ?x rdfs:label "John"#en }
}
I get the result 93027312, not 81900104, but that does not really matter much. The strange results arise because ?x and ?place don't have to be bound to the same thing, so you are getting all the dbpedia-owl:Places and counting them, but the number of result rows is the number of dbpedia-owl:Place multiplied by the number of things with rdfs:label "John#en":
select COUNT(?place) where { ?place a dbpedia-owl:Place }
=> 646023
select COUNT(?x) where { ?x rdfs:label "John"#en }
=> 144
646023 × 144 = 93027312
If you actually ask for dbpedia-owl:Places that have the rdfs:label "John#en", you'll get no results:
select COUNT(?place) as ?numPlaces where {
?place a dbpedia-owl:Place ;
rdfs:label "John"#en .
}
SPARQL results
Also, you might consider using dbpprop:name instead of rdfs:label. Some results seem like they are more useful that way. For instance, let us find places called "Springfield". If we ask for places with that name we get no results:
select * where {
?place a dbpedia-owl:Place ;
rdfs:label "Springfield"#en .
}
SPARQL results
However, if we modify the query and use dbpprop:name, we get 17. Some of these are duplicates though, so you might have to do something else to remove duplicates. The point, though, is that dbpprop:name got some results, and rdfs:label didn't.
select * where {
?place a dbpedia-owl:Place ;
dbpprop:name "Springfield"#en .
}
SPARQL results
You can even use dbpprop:name in working with the names of persons, although it's not as useful, because the dbpprop:name value for most persons is their entire name. To find persons with the given name John using dbpprop:name requires a query like:
select * where {
?place a dbpedia-owl:Person ;
dbpprop:name ?name .
FILTER( STRSTARTS( str( ?name ), "John" ) )
}
(or you could use CONTAINS instead of STRSTARTS), but this becomes much more expensive, because it has to select all persons and their names, and then filter through that set. Being able to select persons based on a specific name (e.g., with foaf:givenName) is much more efficient.