SPARQL group by and order by: not ordered - sparql

I follow up on query where the schema.org database is used to find the number of children of a class - as a simpler database than my application. I want to get the names of the children concatenated in alphabetic order. The query:
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?child (group_concat (?string) as ?strings)
where {
?child rdfs:subClassOf schema:Event .
?grandchild rdfs:subClassOf ?child .
bind (strafter(str(?grandchild), "http://schema.org/") as ?string)
} group by ?child order by asc(?string)
limit 20
gives
schema:PublicationEvent "OnDemandEvent BroadcastEvent"
schema:UserInteraction "UserPageVisits UserComments UserPlays UserBlocks UserDownloads UserPlusOnes UserLikes UserCheckins UserTweets"
Which is not alphabetically ordered. If I replace the sort order to desc the result is exactly the same. I seem not to understand how group by, order by and possibly bind interact.

An additional select subquery is required to push the order inside the groups:
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?child (group_concat (?string) as ?strings)
where {
select *
{
?child rdfs:subClassOf schema:Event .
?grandchild rdfs:subClassOf ?child .
bind (strafter(str(?grandchild), "http://schema.org/") as ?string)
} order by asc(?string)
} group by ?child
limit 20

18.5.1.7 GroupConcat:
The order of the strings is not specified.
From the horse's mouth:
On 2011-04-22, at 19:01, Steve Harris wrote:
On 2011-04-22, at 06:18, Jeen Broekstra wrote:
However, looking at the SPARQL 1.1 query spec, I think this is not a guaranteed result: as far as I can tell the solution modifier ORDER BY should be applied to the solution sequence after grouping and aggregation, so it can not influence the order of the input for the GROUP_CONCAT.
That's correct.

Related

Sparql get "is dbo:parent of" records from Dbpedia

I am trying to get children of persons. Some persons have children listed in "dbo:child", others have them listed as "is dbo:parent of"
Here are examples of the two types
https://dbpedia.org/page/Elizabeth_II
https://dbpedia.org/page/George_V
For the first one I can pull child records off as follows:
SELECT * WHERE {
<http://dbpedia.org/resource/Elizabeth_II>
dbo:child?child
}
For the second it's a bit harder as I think I have to find other records which point to George_V. I'm struggling to find anything that works, this is the best I can come up with
SELECT *
WHERE
{
?person a dbo:Person ;
dbo:parent [dbo:Person dbr:George_V]
}
limit 10
What's the best way to do this? Is there a way I can combine the two approaches?
Here's an example that solves the problem. Note, you have to look at the data for familial properties (relationship types) and then explore regarding data quality.
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT DISTINCT ?royal ?ancestor
WHERE {
{
<http://dbpedia.org/resource/Charles,_Prince_of_Wales>
<http://dbpedia.org/property/father>+ ?ancestor .
}
UNION
{
<http://dbpedia.org/resource/Charles,_Prince_of_Wales>
<http://dbpedia.org/property/mother>+ ?ancestor .
}
BIND ( <http://dbpedia.org/resource/Charles,_Prince_of_Wales> as ?royal )
}
Live DBpedia Query Solution Link.
Alternatively, which is closer to your original property preference (i.e., dbo:parent), here is another property-paths based example.
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT DISTINCT ?royal ?ancestor ?ancestorName
WHERE {
<http://dbpedia.org/resource/Elizabeth_II>
dbo:parent* ?ancestor .
?ancestor
foaf:name ?ancestorName .
FILTER (LANG(?ancestorName) = "en")
BIND ( <http://dbpedia.org/resource/Elizabeth_II> as ?royal )
}
Live Query Results Link.
We are working on a new DBpedia release that should have increased quality regarding this kind of information.
Finally, a tweak to some earlier suggestions (posted as comments).
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT ?royal ?child
WHERE {
dbr:George_V dbo:child+|^dbo:parent+ ?child .
BIND ( dbr:George_V AS ?royal )
}
Live DBpedia Query Link.

Matching two words together in sparql

How to List the laureate awards (given by their label) for which the description of the contribution (given by nobel:motivation) contains the word "human" together with the word "peace" (i.e., both words must be there).
I have use the bds:search namespace from the the full-text search feature of Blazegraph.
After visiting this link i have composed this query
Free text search in sparql when you have multiword and scaping character
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bds: <http://www.bigdata.com/rdf/search#>
PREFIX nobel: <http://data.nobelprize.org/terms/>
SELECT ?awards ?description
WHERE {
?entity rdfs:label ?awards .
?entity nobel:motivation ?description .
FILTER ( bds:search ( ?description, '"human" AND "peace"' ) )
}
This query is returning me the following error on execution shown in image.
Error Image
How to correct this query and get the desired result?
You may take a look at the specification of this dataset or download an RDF dump of the dataset
Use bds:search to search for "human" category.Then apply filter and contain function to "peace".
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bds: <http://www.bigdata.com/rdf/search#>
PREFIX nobel: <http://data.nobelprize.org/terms/>
PREFIX bif: <http://www.openlinksw.com/schemas/bif#>
SELECT ?awards ?description
WHERE {
?entity rdfs:label ?awards .
?entity nobel:motivation ?description .
?description bds:search "human" .
FILTER (CONTAINS(?description, "peace"))
}

Openlink Virtuoso SPARQL OFFSET and LIMIT behavior

The following SPARQL query returns 20 results. I was expecting 10 given the OFFSET and LIMIT
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpedia:<http://dbpedia.org/resource/>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT ?person_id ?person2_id
WHERE {
{
SELECT DISTINCT ?person_id ?person2_id WHERE {
?person rdf:type dbpedia-owl:Person .
?person2 rdf:type dbpedia-owl:Person .
?person ?link ?person2 .
?person dbpedia-owl:wikiPageID ?person_id .
?person2 dbpedia-owl:wikiPageID ?person2_id .
FILTER (?link = dbpedia-owl:wikiPageWikiLink) .
} ORDER BY ?link
}
} OFFSET 10 LIMIT 10
I execute the code in the SPARQL endpoint of an OpenLink Virtuoso Server.
What is the problem with the query?
The clause that makes the query behave weird is ORDER BY ?link. Replacing it with ORDER BY ?person_id all works as expected. It makes still no sense to me but I am a newbie using SPARQL too.
#jordipala said
The clause that makes the query behave weird is ORDER BY ?link. Replacing it with ORDER BY ?person_id all works as expected. It makes still no sense to me but I am a newbie using SPARQL too.
Part of the issue is that the ?link values aren't string literals, though they may appear to be if you include that variable in the SELECT clauses. (Also note that the ?link values are all the same for these solutions, so you definitely need to put something else into the ORDER BY such that it does the desired job of preventing both duplicate solutions and omitted solutions.)
Also, counterintuitively given the ?link datatype, the numeric-appearing person_id and person2_id are not typed as numbers -- they're strings, unless you force their datatype.
If you simply change ?link to str(?link) in the ORDER BY, the query will deliver the desired 10 rows -- and you may note that all the ?link values are identical! -- and if you include person_id and person2_id in the ORDER BY (done in my following links by using the ordinals of the SELECT variables, because of where and how I've coerced the datatypes), you'll get more useful output, as in this query and these results

SPARQL not grouping my results properly

I have the following SPARQL query to get the list of countries with the smallest density of population per km and their presidents (leaders):
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?populationdensity ?leader
WHERE {
?country a dbpedia-owl:Country ;
rdfs:label ?country_name ;
prop:populationDensityKm ?populationdensity ;
dbpedia-owl:leader ?leader .
FILTER (?populationdensity < 10 && langMatches(lang(?country_name), "en")) .
}
GROUP BY ?populationdensity
ORDER BY ASC(?populationdensity)
limit 10
As you can see, I am grouping results by population density, yet I am getting results which include numerous population densities duplicates: SPARQL Query
Can someone tell me what am I doing wrong?
I assume it has something to do with list of leaders, where for each country more than one is return.
Is there a way to limit that to 1 leader per country somehow?
The first thing is that you should put all variables you use in the group by clause.
Virtuoso currently is loose in its parsing of queries and allows things it should not.
The second is you need to select just one leader, if you don't care which one then you should use SAMPLE. If you want all of them then use a group_concat variation.
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?populationdensity (sample(?leader) as ?ls)
WHERE {
?country a dbpedia-owl:Country ;
rdfs:label ?country_name ;
prop:populationDensityKm ?populationdensity ;
dbpedia-owl:leader ?leader .
FILTER (?populationdensity < 10 && langMatches(lang(?country_name), "en")) .
}
GROUP BY ?country_name ?populationdensity
ORDER BY ASC(?populationdensity)
limit 10
If you want the current leader you need to replace the line
dbpedia-owl:leader ?leader .
With this
dbpprop:leaderTitle/dbpprop:incumbent ?leader .

How do i fit that sparql calculation?

this is my actual problem:
?var0 is a group variable and ?var1 is not. But whenever I try to validate the syntax, there comes the following error message:
Non-group key variable in SELECT: ?var1 in expression ( sum(?var0) / ?var1 )
The complete Query:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cz: <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster>
PREFIX n: <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node>
SELECT ( (SUM(?var0) / ?var1) AS ?result)
WHERE{
?chain0 rdf:type rdfs:Property .
?chain0 rdfs:domain <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster> .
?chain0 rdfs:range <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node> .
?this ?chain0 ?arg0 .
?arg0 n:node_realtime_cpu ?var0 .
?this cz:node_count ?var1 .
}
My question is how to correct that calculation to fit the SPARQL syntax?
The immediate problem is that ?var1 is not grouped on, so a fix would be to simply append
GROUP BY ?var1
at the end of your query.
However, whether that gives you the calculation you actually want is another matter.
It's not quite clear what you're trying to calculate, but it looks as if you're attempting to determine the average node_realtime_cpu for a cluster. If that is the case, you can probably do your calculation by just using SPARQL's AVG function instead:
SELECT ( AVG(?var0) AS ?result)
WHERE{
?chain0 rdf:type rdfs:Property .
?chain0 rdfs:domain <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster> .
?chain0 rdfs:range <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node> .
?this ?chain0 ?arg0 .
?arg0 n:node_realtime_cpu ?var0 .
}
GROUP BY ?this // grouping on the cluster identifier so we get an average _per cluster_
Yet another alternative would be to keep your query as-is, but group on two variables:
GROUP BY ?this ?var1
Which is best depends on what your data looks like and what, exactly, you're trying to calculate.