DBPedia SPARQL OPTIONAL - sparql

I am running this simple query on http://dbpedia.org/sparql
PREFIX p: <http://dbpedia.org/property/>
SELECT * WHERE {
?person dcterms:subject category:British_journalists .
OPTIONAL { ?person p:name ?name } .
OPTIONAL { ?person p:dateOfBirth ?dob } .
}
LIMIT 10
I expect to get the first 10 people from http://dbpedia.org/resource/Category:British_journalists whether they have name and dateOfBirth or not. I am getting the first 10 people who have both properties. For e.g. there are 2 missing people after Andrew Rothstein.
What am I doing wrong?

The problem with your query is that a) LIMIT limits the number of rows returned, not subjects, and b) some people have more than one name.
So, for example, the journalist Daniel Singer (http://dbpedia.org/resource/Daniel_Singer_(journalist)) has at least two names: "Daniel Singer"#en and "Singer, Daniel"#en. That doubles the number of rows with him as a subject.
If you GROUP BY ?person you ensure only one person per row. You then need to SAMPLE names and dobs to pick just one per person.
PREFIX p: <http://dbpedia.org/property/>
SELECT ?person (SAMPLE(?name) as ?aname) (SAMPLE(?dob) as ?adob) WHERE {
?person dcterms:subject category:British_journalists .
OPTIONAL { ?person p:name ?name } .
OPTIONAL { ?person p:dateOfBirth ?dob } .
}
GROUP BY ?person
LIMIT 10
(I'm not sure how well SAMPLE plays with unbound cases, i.e. where there isn't a name)

As per O.R Mapper's comment - removing the limit brings full result set which reveals that OPTIONAL is in fact working.
Thank you.

Related

How can I build a better SPARQL query, to get only the data I want from DBpedia? (was: "How to get rid of multiple rows with DBPEDIA SPARQL")

I run from SPARQL Explorer at DBpedia. I wish to get each President only once, but as some of them have multiple entries for birthplace it gives multiple rows.
SELECT DISTINCT ?person ?birthPlace ?presidentStart ?presidentEnd
WHERE {
?person dct:subject dbc:Presidents_of_the_United_States.
?person dbo:birthPlace ?birthPlace .
OPTIONAL { ?person dbp:presidentEnd ?presidentEnd } .
OPTIONAL { ?person dbp:presidentStart ?presidentStart } .
FILTER ( regex(?birthPlace, "_") OR
regex(?birthPlace, ";_")
) .
}
GROUP BY ?person
ORDER BY ?presidentStart ?person
LIMIT 100
I would like to have only the STATE where they are born.
:Abraham_Lincoln [http] :Hodgenville,_Kentucky [http] - -
:Barack_Obama [http] :Kapiolani_Medical_Center_for_Women_and_Children [http] - -
:Bill_Clinton [http] :Hope,_Arkansas [http] - -
:Dwight_D._Eisenhower [http] :Denison,_Texas [http] - -
:George_W._Bush [http] :New_Haven,_Connecticut [http] - -
:George_Washington [http] :Westmoreland_County,_Virginia [http] - -
:George_Washington [http] :British_America [http] - -
:George_Washington [http] :George_Washington_Birthplace_National_Monument [http] - -
:James_A._Garfield [http] :Orange,_Ohio [http] - -
:James_A._Garfield [http] :Moreland_Hills,_Ohio [http] - -
:Jimmy_Carter [http] :Plains,_Georgia
As SPARQL is a pattern matching language, the trick, when your query result is "too broad/general", is to create a more specific pattern. In this case, your intent is not just to get back all resources that are marked as dbo:birthPlace values, but only those resources that represent U.S. states.
So we need to figure out how U.S. states are distinguished from other locations in DBPedia.
Let's take Kentucky as an example. The resource representing Kentucky is http://dbpedia.org/resource/Kentucky . If we scroll down the page outlining the properties of that resource, we find multiple entries for the rdf:type relation, but the one that jumps out at me as most suitable is yago:WikicatStatesOfTheUnitedStates (http://dbpedia.org/class/yago/WikicatStatesOfTheUnitedStates).
If we modify your query to put that in as an extra restriction, and drop the weird regular expression, like so:
SELECT DISTINCT ?person ?birthPlace ?presidentStart ?presidentEnd
WHERE {
?person dct:subject dbc:Presidents_of_the_United_States.
?person dbo:birthPlace ?birthPlace .
?birthPlace a yago:WikicatStatesOfTheUnitedStates .
OPTIONAL { ?person dbp:presidentEnd ?presidentEnd } .
OPTIONAL { ?person dbp:presidentStart ?presidentStart } .
}
GROUP BY ?person
ORDER BY ?presidentStart ?person
LIMIT 100
You should get what you need.
Unfortunately, if you try, you find that you don't. This is because DBPedia data is messy. The above query only returns three results, and worse, one result is clearly incorrect:
person birthPlace presidentStart presidentEnd
dbr:Barack_Obama dbr:Hawaii
dbr:George_Washington dbr:Virginia
dbr:Theodore_Roosevelt dbr:New_York_City
There's two things going on here: first of all, New York City is incorrectly classified as a state in DBPedia. Secondly, most presidents do not explicitly have their state marked as their birthplace, but only things like their home town.
Fortunately, we can amend slightly. DBPedia knows that HodgenVille, Kentucky, is located in Kentucky. How does it know? Well, have a look at the resource page for Hodgenville: http://dbpedia.org/resource/Hodgenville,_Kentucky . You'll see that it has a dbo:isPartOf relation with the resource representing the state of Kentucky.
So, we need to rephrase our query again: we want the state for each president where their birthplace is part of that state. In SPARQL:
SELECT DISTINCT ?person ?birthState ?presidentStart ?presidentEnd
WHERE {
?person dct:subject dbc:Presidents_of_the_United_States.
?person dbo:birthPlace ?birthPlace .
?birthPlace dbo:isPartOf ?birthState .
?birthState a yago:WikicatStatesOfTheUnitedStates .
OPTIONAL { ?person dbp:presidentEnd ?presidentEnd } .
OPTIONAL { ?person dbp:presidentStart ?presidentStart } .
}
GROUP BY ?person
ORDER BY ?presidentStart ?person
LIMIT 100
This should get you almost completely the result you need.
Update as you noted, Donald Trump is missing from the list. This looks to be because DBPedia is behind the times, and he's still classified as a "presidential candidate" rather than a president.
As for Grover Cleveland appearing four times, this is an interesting anomaly. Cleveland served two non-consecutive terms as president, from 1885 to 1889, and again from 1893 to 1897. So there's two start dates, and two end dates. Because in DBPeda it is not explicitly modeled which start date belongs to which end date, you simply get a result for each combination of start and end dates, four in total. There may be a way to query around this (one option would be to group start and end dates together using a group_concat aggregate), but it's such an edge case that it might be simpler to just handle it in post-processing.
Focusing on
I would like to have only the STATE where they are born
rather than on
How to get rid of multiple rows with DBPEDIA SPARQL
this could be a solution:
SELECT DISTINCT ?person ?birthState ?presidentStart ?presidentEnd
WHERE {
?person dct:subject dbc:Presidents_of_the_United_States.
OPTIONAL { ?person dbp:presidentEnd ?presidentEnd } .
OPTIONAL { ?person dbp:presidentStart ?presidentStart } .
OPTIONAL {?person dbo:birthPlace/dbp:subdivisionType/dbp:territory ?birthState } .
FILTER ( regex(?birthState, "_") OR
regex(?birthState, ";_")
) .
}
GROUP BY ?person
ORDER BY ?presidentStart ?person
LIMIT 100

SPARQL Query DBpedia Getting Multiple Values

I am new to DBPedia SPARQL Query, I am currently using the http://dbpedia.org/snorql to test queries.
My query looks like this
SELECT ?name ?school ?person
WHERE {
?person dbo:almaMater :Harvard_University .
?person foaf:name ?name .
?person dbo:birthDate ?birth .
?person dbo:country ?country .
?person dbo:almaMater ?school .
FILTER (?birth > "1980-01-01"^^xsd:date) .
} ORDER BY ?name
And below is my result
From the result above, looks like the name "Mingze Xi"#en was repeated thrice. I checked on the link under the person field and it shows that name "Mingze Xi"#en has attended Harvard University, Hangzhou ... and Zhejiang University.
Is there a way for me to query and show that under this name the person has attended these schools? I need this because there is no unique ID that I can use to indicate that this is the same person.

SPARQL: selecting people by country

I am trying to select all people born in a specific country (e.g. Portugal) from DBPedia.
I could use this query:
SELECT DISTINCT ?person
WHERE {
?person dbpedia-owl:birthPlace dbpedia:Portugal.
}
But the problem is that not all people have dbpedia:Portugal as birthPlace. About 30% of people have just a town name as birthPlace, e.g.
dbpedia:Lisbon
I could add all Portugal cities in a FILTER clause but it's a big list.
May be it's possible to infer Portugal from Lisbon in the SPARQL query somehow?
(to not to add all Portugal cities in FILTER to get ALL persons)
If we assume all the cities in a specific country are defined as part of that country in dbpedia, you could have a query that first looks for the people that have dbpedia:Portugal as a country and then cities within dbpedia:Portugal.
SELECT DISTINCT ?person
WHERE {
?person a dbpedia-owl:Person.
Optional{
?person dbpedia-owl:birthPlace ?country.
}
Optional{
?person dbpedia-owl:birthPlace ?place.
?place dbpedia-owl:country ?country
}
filter(?country= dbpedia:Portugal)
}
The query that you have written identifies 1723 distinct URIs, and this finds 2563 URIs.
Artemis' answer works, but it's very verbose for what's a pretty simple query. It can be simplified to:
select distinct ?person where {
?person a dbpedia-owl:Person ;
dbpedia-owl:birthPlace/dbpedia-owl:country? dbpedia:Portugal
}
SPARQL results (2449)
Full results may be achieved by this http://answers.semanticweb.com/questions/22450/sparql-selecting-people-by-country
- 2730 persons
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
SELECT ?person
WHERE
{
{?person a <http://dbpedia.org/ontology/Person>;
<http://dbpedia.org/ontology/birthPlace> ?place.
?place <http://dbpedia.org/ontology/country> ?birthCountry.
?birthCountry a <http://dbpedia.org/ontology/Country>.
FILTER (?birthCountry = dbpedia:Portugal).
}
UNION
{ ?person a <http://dbpedia.org/ontology/Person>;
<http://dbpedia.org/ontology/birthPlace> ?birthCountry.
?birthCountry a <http://dbpedia.org/ontology/Country>.
FILTER (?birthCountry = dbpedia:Portugal).
}
}
GROUP BY ?person
ORDER BY ?person

dbpedia fetch entitites in language other than english

I'm trying to extract entity dictionary contains person name etc. from dbpedia using sparql.
PREFIX owl: <http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT ?name
WHERE {
?person a owl:Person .
?person dbpprop:name ?name . FILTER(lang(?name) = "en")
}
The query above did succeed, but when I change the language name to fr, there is nothing to fetch.
How can I fetch names in other languages?
Moreover, why can't I filter language using query below?
SELECT ?name
WHERE {
?person a owl:Person .
?person dbpprop:language "English"
?person dbpprop:name ?name .
}
// this query returns nothing
I tried to fetch all languages using
SELECT DISTINCT ?lanName
WHERE {
?person a owl:Person .
?person dbpprop:language ?lanName .
}
and the result set contains English.
You need to filter based on the language of the value of the property. Not every property will have values in different languages, but some properties will. It seems, from your example, that dbpprop:name doesn't have values in every language. You may find more values in other languages if you look on the other language specific DBpediae.
However, for something like a name, you'll probably get multi-language results if you use the rdfs:label property. For instance, to get the names of Barack Obama, Daniel Webster, and Johnny Cash in Russian, you could do:
select ?label {
values ?person { dbpedia:Johnny_Cash dbpedia:Barack_Obama dbpedia:Daniel_Webster }
?person rdfs:label ?label .
filter langMatches(lang(?label),"ru")
}
SPARQL results
As an aside, note the use of langMatches rather than equality for matching language tags. This is usually a better approach, because it will correctly handle the different language tags within a language For example (from the SPARQL specification), you can find both of the French literals:
"Cette Série des Années Soixante-dix"#fr .
"Cette Série des Années Septante"#fr-BE .
with langMatches(lang(?title),"fr"), but only the first one with lang(?title) = "fr".
You are looking for rdfs:label for a name, of course all the names are English, you are looking at the English dbpedia.
PREFIX owl: <http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT distinct *
WHERE {
?person a owl:Person .
?person rdfs:label ?name .
FILTER(lang(?name) = "fr")
}
Again, for the second one, if you replace the name with the rdfs: label you can have:
PREFIX owl: <http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT distinct *
WHERE {
?person a owl:Person .
?person rdfs:label ?name .
?person dbpprop:language <http://dbpedia.org/resource/English_language>.
}

Sparql - Order by to return empty values last

I use AllegroGraph and Sparql 1.1.
I need to do ascending sort on a column and make the Sparql query to return empty values at the last.
Sample data:
<http://mydomain.com/person1> <http://mydomain.com/name> "John"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
<http://mydomain.com/person1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>
<http://mydomain.com/person2> <http://mydomain.com/name> "Abraham"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
<http://mydomain.com/person2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>
<http://mydomain.com/person3> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>
Here I need the Sparql to return Abraham, followed by John and person3 that does not have a name attribute.
Query I use:
select ?name ?person {
?person <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>.
optional {?person <http://mydomain.com/name> ?name.}
} order by asc(?name )
Current output is person3 (null), followed by Abraham and John.
Please let me know your thoughts.
I don't have AllegroGraph at hand but AFAIK it supports multiple order conditions:
select ?name ?person {
?person <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person> .
optional {?person <http://mydomain.com/name> ?name . }
} order by (!bound(?name)) asc(str(?name))
First condition sorts based on whether ?name is bound or not and if this condition does not find a difference, the second condition is used. Note the use of str() to convert rdf:XMLLiteral to a datatype for which comparison is supported.
(You may also want to add . at the end of each row in your ntriples data.)