restricting DISTINCT in SPARQL - sparql

I want to get the name, the surname and place of birth from a certain graph. I get results like "John" "Doe" "London" and "John" "Doe" "UK" . Is there a way to restrict and express somehow the DISTINCT in only the 2 (name and surname?) out of 3 (name, surname and birthplace) ?

You haven't shown any example data or queries you've tried which makes this question harder to answer, please include those details in future.
One approach is to use GROUP BY instead of DISTINCT e.g.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?surname (SAMPLE(?loc) AS ?location)
WHERE
{
?person foaf:givenName ?name ;
foaf:familyName ?surname ;
foaf:based_near ?loc .
}
GROUP BY ?name ?surname
This query groups your results together by ?name and ?surname and selects one of the possible locations for each group of results.
The SAMPLE() aggregate used here basically asks the query engine to select one possible value for a non-group key variable from the grouped results.

Related

SPARQL group by and order by: not ordered

I follow up on query where the schema.org database is used to find the number of children of a class - as a simpler database than my application. I want to get the names of the children concatenated in alphabetic order. The query:
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?child (group_concat (?string) as ?strings)
where {
?child rdfs:subClassOf schema:Event .
?grandchild rdfs:subClassOf ?child .
bind (strafter(str(?grandchild), "http://schema.org/") as ?string)
} group by ?child order by asc(?string)
limit 20
gives
schema:PublicationEvent "OnDemandEvent BroadcastEvent"
schema:UserInteraction "UserPageVisits UserComments UserPlays UserBlocks UserDownloads UserPlusOnes UserLikes UserCheckins UserTweets"
Which is not alphabetically ordered. If I replace the sort order to desc the result is exactly the same. I seem not to understand how group by, order by and possibly bind interact.
An additional select subquery is required to push the order inside the groups:
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?child (group_concat (?string) as ?strings)
where {
select *
{
?child rdfs:subClassOf schema:Event .
?grandchild rdfs:subClassOf ?child .
bind (strafter(str(?grandchild), "http://schema.org/") as ?string)
} order by asc(?string)
} group by ?child
limit 20
18.5.1.7 GroupConcat:
The order of the strings is not specified.
From the horse's mouth:
On 2011-04-22, at 19:01, Steve Harris wrote:
On 2011-04-22, at 06:18, Jeen Broekstra wrote:
However, looking at the SPARQL 1.1 query spec, I think this is not a guaranteed result: as far as I can tell the solution modifier ORDER BY should be applied to the solution sequence after grouping and aggregation, so it can not influence the order of the input for the GROUP_CONCAT.
That's correct.

DBpedia SPARQL to eliminate unwanted data

PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT DISTINCT ?attractions
?location
WHERE
{ ?attractions dcterms:subject ?places
. ?places skos:broader ?border
. ?attractions dbpprop:location|dbpedia-owl:locatedInArea|dbpprop:locale ?location
. FILTER( ?border = category:Visitor_attractions_in_Delhi )
}
I have above query giving result of attraction location of Delhi. I need to make it generic for all places, and secondly I want to filter out unwanted data. I want only attraction places, e.g., I didn't want List of Monuments and SelectCityWalk like data in my output.

DBPedia SPARQL OPTIONAL

I am running this simple query on http://dbpedia.org/sparql
PREFIX p: <http://dbpedia.org/property/>
SELECT * WHERE {
?person dcterms:subject category:British_journalists .
OPTIONAL { ?person p:name ?name } .
OPTIONAL { ?person p:dateOfBirth ?dob } .
}
LIMIT 10
I expect to get the first 10 people from http://dbpedia.org/resource/Category:British_journalists whether they have name and dateOfBirth or not. I am getting the first 10 people who have both properties. For e.g. there are 2 missing people after Andrew Rothstein.
What am I doing wrong?
The problem with your query is that a) LIMIT limits the number of rows returned, not subjects, and b) some people have more than one name.
So, for example, the journalist Daniel Singer (http://dbpedia.org/resource/Daniel_Singer_(journalist)) has at least two names: "Daniel Singer"#en and "Singer, Daniel"#en. That doubles the number of rows with him as a subject.
If you GROUP BY ?person you ensure only one person per row. You then need to SAMPLE names and dobs to pick just one per person.
PREFIX p: <http://dbpedia.org/property/>
SELECT ?person (SAMPLE(?name) as ?aname) (SAMPLE(?dob) as ?adob) WHERE {
?person dcterms:subject category:British_journalists .
OPTIONAL { ?person p:name ?name } .
OPTIONAL { ?person p:dateOfBirth ?dob } .
}
GROUP BY ?person
LIMIT 10
(I'm not sure how well SAMPLE plays with unbound cases, i.e. where there isn't a name)
As per O.R Mapper's comment - removing the limit brings full result set which reveals that OPTIONAL is in fact working.
Thank you.

Sparql - Order by to return empty values last

I use AllegroGraph and Sparql 1.1.
I need to do ascending sort on a column and make the Sparql query to return empty values at the last.
Sample data:
<http://mydomain.com/person1> <http://mydomain.com/name> "John"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
<http://mydomain.com/person1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>
<http://mydomain.com/person2> <http://mydomain.com/name> "Abraham"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
<http://mydomain.com/person2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>
<http://mydomain.com/person3> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>
Here I need the Sparql to return Abraham, followed by John and person3 that does not have a name attribute.
Query I use:
select ?name ?person {
?person <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person>.
optional {?person <http://mydomain.com/name> ?name.}
} order by asc(?name )
Current output is person3 (null), followed by Abraham and John.
Please let me know your thoughts.
I don't have AllegroGraph at hand but AFAIK it supports multiple order conditions:
select ?name ?person {
?person <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mydomain.com/person> .
optional {?person <http://mydomain.com/name> ?name . }
} order by (!bound(?name)) asc(str(?name))
First condition sorts based on whether ?name is bound or not and if this condition does not find a difference, the second condition is used. Note the use of str() to convert rdf:XMLLiteral to a datatype for which comparison is supported.
(You may also want to add . at the end of each row in your ntriples data.)

How to know if a string is a proper name of person or a place name using DBpedia?

I am using SPARQL query on DBpedia into a Prolog project and I have a doubt. I would know if a word is, most probably, a NAME OF A PERSON (something like: John, Mario) or a PLACE (like a city: Rome, London, New York).
I have implement the following two queries, the first gives me the number of persons having a specific name, and the second gives me the number of places having a specific name.
1) Query for a PERSON NAME:
select COUNT(?person) where {
?person a dbpedia-owl:Person .
{ ?person foaf:givenName "John"#en }
UNION
{ ?person foaf:surname "John"#en }
}
For the name John, I obtain the following output: callret-0: 7313, so I think that it has found 7313 instances for the proper name John. Is it right?
2) Query for a PLACE NAME:
select COUNT(?place) where {
?place a dbpedia-owl:Place .
{ ?x rdfs:label "John"#en }
}
The problem is that, as you can see in the previous “place” query, I have inserted John as parameter, which is not a place name but a proper name of persons, but I obtain the following strange result: callret-0: 81900104
The problem is that, in this way, if I compare the output of the previous two queries, it seems that John is a place and not a person name! This is not good for my scope; I have tried with other personal names and it always happens that the place query gives me a bigger output than the name query.
Why? What am I missing? Are there some errors in my queries? How can I solve it to have a correct result?
Actually, when I run the query you provided:
select COUNT(?place) where {
?place a dbpedia-owl:Place .
{ ?x rdfs:label "John"#en }
}
I get the result 93027312, not 81900104, but that does not really matter much. The strange results arise because ?x and ?place don't have to be bound to the same thing, so you are getting all the dbpedia-owl:Places and counting them, but the number of result rows is the number of dbpedia-owl:Place multiplied by the number of things with rdfs:label "John#en":
select COUNT(?place) where { ?place a dbpedia-owl:Place }
=> 646023
select COUNT(?x) where { ?x rdfs:label "John"#en }
=> 144
646023 × 144 = 93027312
If you actually ask for dbpedia-owl:Places that have the rdfs:label "John#en", you'll get no results:
select COUNT(?place) as ?numPlaces where {
?place a dbpedia-owl:Place ;
rdfs:label "John"#en .
}
SPARQL results
Also, you might consider using dbpprop:name instead of rdfs:label. Some results seem like they are more useful that way. For instance, let us find places called "Springfield". If we ask for places with that name we get no results:
select * where {
?place a dbpedia-owl:Place ;
rdfs:label "Springfield"#en .
}
SPARQL results
However, if we modify the query and use dbpprop:name, we get 17. Some of these are duplicates though, so you might have to do something else to remove duplicates. The point, though, is that dbpprop:name got some results, and rdfs:label didn't.
select * where {
?place a dbpedia-owl:Place ;
dbpprop:name "Springfield"#en .
}
SPARQL results
You can even use dbpprop:name in working with the names of persons, although it's not as useful, because the dbpprop:name value for most persons is their entire name. To find persons with the given name John using dbpprop:name requires a query like:
select * where {
?place a dbpedia-owl:Person ;
dbpprop:name ?name .
FILTER( STRSTARTS( str( ?name ), "John" ) )
}
(or you could use CONTAINS instead of STRSTARTS), but this becomes much more expensive, because it has to select all persons and their names, and then filter through that set. Being able to select persons based on a specific name (e.g., with foaf:givenName) is much more efficient.