how to remove duplicates in sparql query - sparql

I wrote this query and return list of couples and particular condition. ( in http://live.dbpedia.org/sparql)
SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
select DISTINCT ?actor ?person2 (count (?film) as ?cnt)
where {
?film dbo:starring ?actor .
?actor dbo:spouse ?person2.
?film dbo:starring ?person2.
}
order by ?actor
}
FILTER (?cnt >9)
}
Problem is that some rows is duplicate.
example:
http://dbpedia.org/resource/George_Burns http://dbpedia.org/resource/Gracie_Allen 12
http://dbpedia.org/resource/Gracie_Allen http://dbpedia.org/resource/George_Burns 12
how to remove these duplications?
I added gender to ?actor but it damage current result.

Natan Cox's answer shows the typical way to exclude these kind of pseudo-duplicates. The results aren't actually duplicates, because in one, e.g., George Burns is the ?actor, and in the other he is the ?person2. In many cases, you can add a filter to require that the two things are ordered, and that will remove the duplicate cases. E.g., when you have data like:
:a :likes :b .
:a :likes :c .
and you search for
select ?x ?y where {
:a :likes ?x, ?y .
}
you can add filter(?x < ?y) to enforce an ordering between the between ?x and ?y which will remove these pseudo-duplicates. However, in this case, it's a bit trickier, since ?actor and ?person2 aren't found using the same critera. If DBpedia contains
:PersonB dbo:spouse :PersonA
but not
:PersonA dbo:spouse :PersonB
then the simple filter won't work, because you'll never find the triple where the subject PersonA is less than the object PersonB. So in this case, you also need to modify your query a bit to make the criteria symmetric:
select distinct ?actor ?spouse (count(?film) as ?count) {
?film dbo:starring ?actor, ?spouse .
?actor dbo:spouse|^dbo:spouse ?spouse .
filter(?actor < ?spouse)
}
group by ?actor ?spouse
having (count(?film) > 9)
order by ?actor
(This query also shows that you don't need a subquery here, you can use having to "filter" on aggregate values.) But the important part is using the property path dbo:spouse|^dbo:spouse to find a value for ?spouse such that either ?actor dbo:spouse ?spouse or ?spouse dbo:spouse ?actor. This makes the relationship symmetric, so that you're guaranteed to get all the pairs, even if the relationship is only declared in one direction.

It is not actual duplicates of course since you can look at it from both ways. The way to fix it if you want to is to add a filter. It is a bit of a dirty hack but it only takes on of the 2 rows that are the "same".
SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
select DISTINCT ?actor ?person2 (count (?film) as ?cnt)
where {
?film dbo:starring ?actor .
?actor dbo:spouse ?person2.
?film dbo:starring ?person2.
FILTER (?actor < ?person2)
}
order by ?actor
}
FILTER (?cnt >9)
}

Related

Type of returned value in SPARQL Query

Is it possible to know the type of the return values in a SPARQL query?
For example, is there a function to define the type of ?x ?price ?p
in the following query?
SELECT DISTINCT ?x ?price ?p
WHERE {
?x a :Product .
?x :price ?price .
?x ?p ?o .
}
I want to know that
typeOf(x) = resource
typeOf(?p) = property
typeOf(?price) = property target etc.
datatype(?x)
The datatype function will tell you whether a result is a resource or a literal and in the latter case tell you which datatype it has exactly.
For example, the following query on https://dbpedia.org/sparql...
SELECT DISTINCT ?x ?code ?p datatype(?x) datatype(?code) datatype(?p)
WHERE {
?x a dbo:City.
?x dbo:areaCode ?code .
?x ?p ?o .
} limit 1
...will return:
x
code
p
callret-3
callret-4
callret-5
http://dbpedia.org/resource/Aconchi
"+52 623"
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2001/XMLSchema#anyURI
http://www.w3.org/2001/XMLSchema#string
http://www.w3.org/2001/XMLSchema#anyURI
However this will not differentiate between "resource" and "property" because a resource may be a property. What you probably mean is "individual" and "property" but even a property can be treated as an individual, for example in the triple rdfs:label rdfs:label "label".
However you can always query the rdf:type of a resource, which may give you rdf:Property, owl:DatatypeProperty or owl:ObjectProperty.

i want to get the names of similar types using sparql queries from dbpedia

I need to find the names of similar types from DBpedia so I'm trying to figure out a query which can return me the names of entities which have same subject type in its dct:subject (example I want to find similar types of white house so i want to write a query for same . I'm considering the dct:subject to find them ). If there is any other approach please mention it
Previously I tried it for rdf:type but the result are not so good and some time it shows time out
I have done my problem by the query mentioned below and now i want to consider dct:subject instead of rdf:type
select distinct ?label ?resource count(distinct ?type) as ?score where {
values ?type { dbo:Thing dbo:Organization yago:WikicatIslam-relatedControversies yago:WikicatIslamistGroups yago:WikicatRussianFederalSecurityServiceDesignatedTerroristOrganizations yago:Abstraction100002137 yago:Act100030358 yago:Cabal108241798 yago:Group100031264 yago:Movement108464601 yago:PoliticalMovement108472335
}
?resource rdfs:label ?label ;
foaf:name ?name ;
a ?type .
FILTER (lang(?label) = 'en').
}
ORDER BY DESC(?score)

Sparql MULTIPLE not exists condition

I'm executing a Sparql query that returns all the URIs where the keyword apple does not belong to a specific subclass Species
select distinct ?s
where
{
?s a owl:Thing . ?s rdfs:label ?label .
filter(langmatches(lang(?label), 'en')) ?label bif:contains '"apple"' .
filter not exists {?s rdf:type/rdfs:subClassOf* dbo:Species }
}
I want to include more subclasses. I want to include MANY subclasses, so I want filter out like so:
filter not exists {?s rdf:type/rdfs:subClassOf* dbo:Species AND filter not exists {?s rdf:type/rdfs:subClassOf* dbo:Organisation AND filter not exists {?s rdf:type/rdfs:subClassOf* dbo:SomeOtherSubclass
How do I chain MULTIPLE ANDs together?
You can do this:
FILTER NOT EXISTS {
VALUES ?clazz { dbo:Species dbo:Organisation dbo:SomeOtherSubclass }
?s rdf:type/rdfs:subClassOf* ?clazz.
}
No guarantees on how well this performs though.

Retrieving data from blank nodes in Wikidata

I am attempting to retrieve data about the lifespans of certain people. This is problematic in cases of people that have lived a while ago. The dataset for e.g. Pythagoras seems to have a so called "blank node" for date of birth (P569). But this blank node references another node earliest date (P1319) which has data I could work with just fine.
But for some reason I am not able to retrieve that node. My first try looked like this, but somehow that results in a completly empty result set:
SELECT DISTINCT ?person ?name ?dateofbirth ?earliestdateofbirth WHERE {
?person wdt:P31 wd:Q5. # This thing is Human
?person rdfs:label ?name. # Name for better conformation
?person wdt:P569 ?dateofbirth. # Birthday may result in a blank node
?dateofbirth wdt:P1319 ?earliestdateofbirth # Problem: Plausbible Birth
}
I then found another Syntax that suggested using ?person wdt:P569/wdt:P1319 ?earliestdateofbirth as some kind of "shortcut"-syntax for the explicit navigation I did above but this also ends with a empty result set.
SELECT DISTINCT ?person ?name ?dateofbirth ?earliestdateofbirth WHERE {
?person wdt:P31 wd:Q5. # Is Human
?person rdfs:label ?name. # Name for better conformation
?person wdt:P569/wdt:P1319 ?earliestdateofbirth.
}
So how do I access a node referenced by a blank node (in my case specifically the earliest birthdate) in Wikidata?
But this blank node references another node…
Things are slightly different. The earliest date property is not a property of _:t550690019, but rather is a property of the statement wd:Q10261 wdt:P569 _:t550690019.
In the Wikidata data model, these annotations are expressed using qualifiers.
Your query should be:
SELECT DISTINCT ?person ?name ?dateofbirth ?earliestdateofbirth WHERE {
VALUES (?person) {(wd:Q10261)}
?person wdt:P31 wd:Q5. # --Is human
?person rdfs:label ?name. # --Name for better conformation
?person p:P569/pq:P1319 ?earliestdateofbirth.
FILTER (lang(?name) = "en")
}
Try it!
By the way, time precision (which is used when date of birth is known) is yet another qualifier:
SELECT ?person ?personLabel ?value ?precisionLabel {
VALUES (?person) {(wd:Q859) (wd:Q9235)}
?person wdt:P31 wd:Q5 ;
p:P569/psv:P569 [ wikibase:timeValue ?value ;
wikibase:timePrecision ?precisionInteger ]
{
SELECT ?precision (xsd:integer(?precisionDecimal) AS ?precisionInteger) {
?precision wdt:P2803 ?precisionDecimal .
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Try it!

Querying DBPedia using SPARQL to find subjects with same objects

I am trying to find all those resources from dbpedia for eg rdf:type person who have same object eg date of birth.?
I thought of doing it with subquery but its definitely not the solution.
Can anyone provide some useful pointer?
From what you describe I think you mean:
prefix dbp: <http://dbpedia.org/property/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select ?s1 ?s2 ?dob
where {
?s1 a foaf:Person ; dbp:birthDate ?dob . # Find a person, get their dob
?s2 a foaf:Person ; dbp:birthDate ?dob . # Find a person with the same dob
}
Adjust type and predicate to suit.
This will include some redundancy: you will find answers where the subjects are the same ('Napoleon' 'Napoleon') and get answers twice ('Daniel Dennett' 'Neil Kinnock', 'Neil Kinnock' 'Daniel Dennett'). You can remove that with a filter:
filter (?s1 < ?s2)
which just ensures that one comes before the other (however the query engine wants to do that).
prefix dbp: <http://dbpedia.org/property/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select ?s1 ?s2 ?dob
where {
?s1 a foaf:Person ; dbp:birthDate ?dob .
?s2 a foaf:Person ; dbp:birthDate ?dob .
filter (?s1 < ?s2)
}
See the result
A SPARQL query is basically a set of triple patterns, i.e., a join (logical AND) of queries of the form
?subject ?predicate ?object.
What you need is identical ?object. Considering that you only care about ?subject (?predicate is not of importance), you can perform such a query you by ordering the results depending on ?object. Thus you will see results sharing ?object together.
select ?s ?p ?o where {
?s ?p ?o.
}
order by ?o
If you care about ?predicate as well, you should order the result using it second.
select ?s ?p ?o where {
?s ?p ?o.
}
order by ?o ?p
As those couple of queries may involve too many results as they will retrieve all the results possible. I recommend filtering ?object depending on some specific criteria. For example, to select all ?subject sharing an instance of Person as their ?object, use:
select ?s where {
?s ?p ?o.
{select ?o where{
?o a <http://dbpedia.org/ontology/Person>}
}
}
An alternative solution to the others is using aggregate functions like in this query template
select ?o (count(distinct ?s) as ?cnt) (group_concat(distinct ?s; separator=";") as ?subjects) {
?s a <CLASS> ;
<PREDICATE> ?o .
}
group by ?o
order by desc(count(distinct ?s))
which returns for each object the number of subjects and the list of subject belonging to a class CLASS for a given predicate PREDICATE
For example, asking for the dates of soccer players one could use
prefix dbo: <http://dbpedia.org/ontology/>
select ?date (count(distinct ?s) as ?cnt) (group_concat(distinct ?s; separator=";") as ?subjects) {
?s a dbo:SoccerPlayer ;
dbo:birthDate ?date .
}
group by ?date
order by desc(count(distinct ?s))
select * where {
?person1 a <http://dbpedia.org/ontology/Person>.
?person1 dbo:birthYear ?date.
?person2 a <http://dbpedia.org/ontology/Person>.
?person2 dbo:birthYear ?date
FILTER (?person1 != ?person2)
}
limit 10
Dbpedia will not allow you to execute that query on its public endpoint because it consumes more time that allowed, and you cannot change that time. Nevertheless, there are ways to execute it