SPARQL: Exclude double resources in COUNT - sparql

I am trying to write a SPARQL query that only counts the number of dutch politicians for each university. How ever, for each dbo:almaMater there is for some politicians one extra resource (their major, dbr:Sociology for example). This is also reflected in the COUNT. For example. I get count 49 for Leiden University where this should be only 42. Any idea how I can resolve this? I've tried FILTER NOT EXISTS and MINUS but both do nothing to the count. Thank you.
My query:
SELECT distinct ?education (COUNT(?education) AS ?edu_count)
WHERE { ?name <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person>.
?name <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Netherlands>.
?name <http://dbpedia.org/ontology/party> ?party.
?name <http://dbpedia.org/ontology/almaMater> ?education.
?education <http://dbpedia.org/ontology/type> <http://dbpedia.org/resource/Public_university>.
} GROUP BY ?education
ORDER BY DESC(?edu_count)

Revised query, based on comments, with extra whitespace for clarity. Query in form, and results --
SELECT ?education
( COUNT ( DISTINCT ?name ) AS ?cnt )
WHERE { ?name <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person> .
?name <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Netherlands> .
?name <http://dbpedia.org/ontology/party> ?party .
?name <http://dbpedia.org/ontology/almaMater> ?education .
?education <http://dbpedia.org/ontology/type> <http://dbpedia.org/resource/Public_university> .
}
GROUP BY ?education
ORDER BY DESC ( ?cnt )
ASC ( ?education )

Related

SPARQL count item in an intersection

Given the following triples
#prefix ex: <http://example.org/> .
#base <http://example.com/> .
<person1> ex:has_interpretation <interpretation1> .
<interpretation1> ex:refers_to <objectA> ;
ex:resultIn <X> .
<person2> ex:has_interpretation <interpretation2> .
<interpretation2> ex:refers_to <objectA> ;
ex:resultIn <Y> .
<person2> ex:has_interpretation <interpretation3> .
<interpretation3> ex:refers_to <objectB> ;
ex:resultIn <Z> .
<person3> ex:has_interpretation <interpretation4> .
<interpretation3> ex:refers_to <objectA> ;
ex:resultIn <ZZ> .
I am trying to use SPARQL to:
count only the number of object referred to by an interpretation by both person1 and person2 (intersection)
count the number of distinct interpretations over the object
count the number of object not referred to by an interpretation by both person1 and person2
having the above count together with a list of objects referred to by an interpretation and the people who create the interpretation.
I am having trouble specifically with 1 (and consequently, 3), as I cannot find a way to count the intersection of the interpreted objects.
My current SPARQL query which does not obtain what I want:
PREFIX ex: <http://example.org/>
SELECT ?person (COUNT(distinct ?object) as ?c_object) (group_concat(distinct ?interpretation;separator="; ") as ?interpretations)
WHERE {
?person ex:has_interpretation ?interpretation .
?interpretation
ex:refers_to ?object ;
ex:resultIn ?result .
FILTER (?person = <http://example.com/person1> || ?person = <http://example.com/person2> )
}
GROUP BY ?person ?object
What instead I would like is just:
object_uri
number_object
interpretations
person_involved
<objectA>
1
<interpretation1>,<interpretation2>
<person1>,<person2>
Any ideas?
count only the number of object referred to by an interpretation by both person1 and person2 (intersection)
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
<http://example.com/person1> ex:has_interpretation/ex:refers_to ?obj .
<http://example.com/person2> ex:has_interpretation/ex:refers_to ?obj .
}
Here the query matches only those objects (?obj) which have ex:has_interpretation/ex:refers_to paths to them from both <person1> and <person2>.
count the number of distinct interpretations over the object
Just use the COUNT() function as the paths used in the answer for 1. above are distinct (different):
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
<http://example.com/person1> ex:has_interpretation/ex:refers_to ?obj .
<http://example.com/person2> ex:has_interpretation/ex:refers_to ?obj .
}
count the number of object not referred to by an interpretation by both person1 and person2
You might be able to find a fancy way to query for this but I would just count distinct objects:
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
?x ex:refers_to ?obj .
}
...and then subtract the results from 1. above. This is easy to understand and you've already got the results from 1. to work with.

SPARQL Query DBpedia Getting Multiple Values

I am new to DBPedia SPARQL Query, I am currently using the http://dbpedia.org/snorql to test queries.
My query looks like this
SELECT ?name ?school ?person
WHERE {
?person dbo:almaMater :Harvard_University .
?person foaf:name ?name .
?person dbo:birthDate ?birth .
?person dbo:country ?country .
?person dbo:almaMater ?school .
FILTER (?birth > "1980-01-01"^^xsd:date) .
} ORDER BY ?name
And below is my result
From the result above, looks like the name "Mingze Xi"#en was repeated thrice. I checked on the link under the person field and it shows that name "Mingze Xi"#en has attended Harvard University, Hangzhou ... and Zhejiang University.
Is there a way for me to query and show that under this name the person has attended these schools? I need this because there is no unique ID that I can use to indicate that this is the same person.

how to remove duplicates in sparql query

I wrote this query and return list of couples and particular condition. ( in http://live.dbpedia.org/sparql)
SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
select DISTINCT ?actor ?person2 (count (?film) as ?cnt)
where {
?film dbo:starring ?actor .
?actor dbo:spouse ?person2.
?film dbo:starring ?person2.
}
order by ?actor
}
FILTER (?cnt >9)
}
Problem is that some rows is duplicate.
example:
http://dbpedia.org/resource/George_Burns http://dbpedia.org/resource/Gracie_Allen 12
http://dbpedia.org/resource/Gracie_Allen http://dbpedia.org/resource/George_Burns 12
how to remove these duplications?
I added gender to ?actor but it damage current result.
Natan Cox's answer shows the typical way to exclude these kind of pseudo-duplicates. The results aren't actually duplicates, because in one, e.g., George Burns is the ?actor, and in the other he is the ?person2. In many cases, you can add a filter to require that the two things are ordered, and that will remove the duplicate cases. E.g., when you have data like:
:a :likes :b .
:a :likes :c .
and you search for
select ?x ?y where {
:a :likes ?x, ?y .
}
you can add filter(?x < ?y) to enforce an ordering between the between ?x and ?y which will remove these pseudo-duplicates. However, in this case, it's a bit trickier, since ?actor and ?person2 aren't found using the same critera. If DBpedia contains
:PersonB dbo:spouse :PersonA
but not
:PersonA dbo:spouse :PersonB
then the simple filter won't work, because you'll never find the triple where the subject PersonA is less than the object PersonB. So in this case, you also need to modify your query a bit to make the criteria symmetric:
select distinct ?actor ?spouse (count(?film) as ?count) {
?film dbo:starring ?actor, ?spouse .
?actor dbo:spouse|^dbo:spouse ?spouse .
filter(?actor < ?spouse)
}
group by ?actor ?spouse
having (count(?film) > 9)
order by ?actor
(This query also shows that you don't need a subquery here, you can use having to "filter" on aggregate values.) But the important part is using the property path dbo:spouse|^dbo:spouse to find a value for ?spouse such that either ?actor dbo:spouse ?spouse or ?spouse dbo:spouse ?actor. This makes the relationship symmetric, so that you're guaranteed to get all the pairs, even if the relationship is only declared in one direction.
It is not actual duplicates of course since you can look at it from both ways. The way to fix it if you want to is to add a filter. It is a bit of a dirty hack but it only takes on of the 2 rows that are the "same".
SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
select DISTINCT ?actor ?person2 (count (?film) as ?cnt)
where {
?film dbo:starring ?actor .
?actor dbo:spouse ?person2.
?film dbo:starring ?person2.
FILTER (?actor < ?person2)
}
order by ?actor
}
FILTER (?cnt >9)
}

SPARQL: selecting people by country

I am trying to select all people born in a specific country (e.g. Portugal) from DBPedia.
I could use this query:
SELECT DISTINCT ?person
WHERE {
?person dbpedia-owl:birthPlace dbpedia:Portugal.
}
But the problem is that not all people have dbpedia:Portugal as birthPlace. About 30% of people have just a town name as birthPlace, e.g.
dbpedia:Lisbon
I could add all Portugal cities in a FILTER clause but it's a big list.
May be it's possible to infer Portugal from Lisbon in the SPARQL query somehow?
(to not to add all Portugal cities in FILTER to get ALL persons)
If we assume all the cities in a specific country are defined as part of that country in dbpedia, you could have a query that first looks for the people that have dbpedia:Portugal as a country and then cities within dbpedia:Portugal.
SELECT DISTINCT ?person
WHERE {
?person a dbpedia-owl:Person.
Optional{
?person dbpedia-owl:birthPlace ?country.
}
Optional{
?person dbpedia-owl:birthPlace ?place.
?place dbpedia-owl:country ?country
}
filter(?country= dbpedia:Portugal)
}
The query that you have written identifies 1723 distinct URIs, and this finds 2563 URIs.
Artemis' answer works, but it's very verbose for what's a pretty simple query. It can be simplified to:
select distinct ?person where {
?person a dbpedia-owl:Person ;
dbpedia-owl:birthPlace/dbpedia-owl:country? dbpedia:Portugal
}
SPARQL results (2449)
Full results may be achieved by this http://answers.semanticweb.com/questions/22450/sparql-selecting-people-by-country
- 2730 persons
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
SELECT ?person
WHERE
{
{?person a <http://dbpedia.org/ontology/Person>;
<http://dbpedia.org/ontology/birthPlace> ?place.
?place <http://dbpedia.org/ontology/country> ?birthCountry.
?birthCountry a <http://dbpedia.org/ontology/Country>.
FILTER (?birthCountry = dbpedia:Portugal).
}
UNION
{ ?person a <http://dbpedia.org/ontology/Person>;
<http://dbpedia.org/ontology/birthPlace> ?birthCountry.
?birthCountry a <http://dbpedia.org/ontology/Country>.
FILTER (?birthCountry = dbpedia:Portugal).
}
}
GROUP BY ?person
ORDER BY ?person

DBPedia SPARQL OPTIONAL

I am running this simple query on http://dbpedia.org/sparql
PREFIX p: <http://dbpedia.org/property/>
SELECT * WHERE {
?person dcterms:subject category:British_journalists .
OPTIONAL { ?person p:name ?name } .
OPTIONAL { ?person p:dateOfBirth ?dob } .
}
LIMIT 10
I expect to get the first 10 people from http://dbpedia.org/resource/Category:British_journalists whether they have name and dateOfBirth or not. I am getting the first 10 people who have both properties. For e.g. there are 2 missing people after Andrew Rothstein.
What am I doing wrong?
The problem with your query is that a) LIMIT limits the number of rows returned, not subjects, and b) some people have more than one name.
So, for example, the journalist Daniel Singer (http://dbpedia.org/resource/Daniel_Singer_(journalist)) has at least two names: "Daniel Singer"#en and "Singer, Daniel"#en. That doubles the number of rows with him as a subject.
If you GROUP BY ?person you ensure only one person per row. You then need to SAMPLE names and dobs to pick just one per person.
PREFIX p: <http://dbpedia.org/property/>
SELECT ?person (SAMPLE(?name) as ?aname) (SAMPLE(?dob) as ?adob) WHERE {
?person dcterms:subject category:British_journalists .
OPTIONAL { ?person p:name ?name } .
OPTIONAL { ?person p:dateOfBirth ?dob } .
}
GROUP BY ?person
LIMIT 10
(I'm not sure how well SAMPLE plays with unbound cases, i.e. where there isn't a name)
As per O.R Mapper's comment - removing the limit brings full result set which reveals that OPTIONAL is in fact working.
Thank you.