SPARQL limit the result for each value of a varible - sparql

This is the minimum data required to reproduce the problem
#prefix : <http://example.org/rs#>
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
:artist1 rdf:type :Artist .
:artist2 rdf:type :Artist .
:artist3 rdf:type :Artist .
:en rdf:type :Language .
:it rdf:type :Language .
:gr rdf:type :Language .
:c1
rdf:type :CountableClass ;
:appliedOnClass :Artist ;
:appliedOnProperty :hasArtist
.
:c2
rdf:type :CountableClass ;
:appliedOnClass :Language ;
:appliedOnProperty :hasLanguage
.
:i1
rdf:type :RecommendableClass ;
:hasArtist :artist1 ;
:hasLanguage :en
.
:i2
rdf:type :RecommendableClass ;
:hasArtist :artist1 ;
:hasLanguage :en
.
:i3
rdf:type :RecommendableClass;
:hasArtist :artist1 ;
:hasLanguage :it
.
:i4
rdf:type :RecommendableClass;
:hasArtist :artist2 ;
:hasLanguage :en
.
:i5
rdf:type :RecommendableClass;
:hasArtist :artist2 ;
:hasLanguage :it
.
:i6
rdf:type :RecommendableClass;
:hasArtist :artist3 ;
:hasLanguage :gr
.
:ania :likes :i1 .
:ania :likes :i3 .
:ania :likes :i4 .
This is my query
PREFIX : <http://example.org/rs#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rs: <http://spektrum.ctu.cz/ontologies/radio-spectrum#>
SELECT ?item ?count ?value
WHERE
{ ?item rdf:type :RecommendableClass
{ SELECT ?countableProperty ?value (count(*) AS ?count)
WHERE
{ VALUES ?user { :ania }
VALUES ?countableConfiguration { :c1 }
?user :likes ?x .
?countableConfiguration :appliedOnProperty ?countableProperty .
?countableConfiguration :appliedOnClass ?countableClass .
?x ?countableProperty ?value .
?value rdf:type ?countableClass
}
GROUP BY ?countableProperty ?value
ORDER BY DESC(?count)
LIMIT 3
}
FILTER NOT EXISTS {?user :likes ?item}
}
This is the result:
As you see, there're three items that have value artist1 and three other that have artist2
is there any way so i can limit the result to just 2 for each them

First some minimal data, with three artists, and some items for each one. I always stress the point of minimal data on Stack Overflow, because it's important for isolating the problem. In this case, you've still provided a relatively large query and a lot more data that we need. Since we know the problem is in how to group artists that are each related to a number of items, all the data needs here is some artists that are related to a number of items. Then we can retrieve them easily, and group them easily.
#prefix : <urn:ex:> .
:artist1 :p :a1, :a2, :a3, :a4 .
:artist2 :p :b2, :b2, :b3, :b4, :b5 .
:artist3 :p :c2 .
Now, you can select artists and their items, and you can determine an index for each item. This method checks for each item how many other items there are that are less than equal to it (there's always at least one equal to it (itself), so the counts are essentially a 1-based index).
prefix : <urn:ex:>
select ?artist ?item (count(?item_) as ?pos){
?artist :p ?item_, ?item .
filter (str(?item_) <= str(?item))
}
group by ?artist ?item
-------------------------
| artist | item | pos |
=========================
| :artist1 | :a1 | 1 |
| :artist1 | :a2 | 2 |
| :artist1 | :a3 | 3 |
| :artist1 | :a4 | 4 |
| :artist2 | :b2 | 1 |
| :artist2 | :b3 | 2 |
| :artist2 | :b4 | 3 |
| :artist2 | :b5 | 4 |
| :artist3 | :c2 | 1 |
-------------------------
Now you can use having to filter on the position, so that you get at most two per artist:
prefix : <urn:ex:>
select ?artist ?item {
?artist :p ?item_, ?item .
filter (str(?item_) <= str(?item))
}
group by ?artist ?item
having (count(?item_) < 3)
-------------------
| artist | item |
===================
| :artist1 | :a1 |
| :artist1 | :a2 |
| :artist2 | :b2 |
| :artist2 | :b3 |
| :artist3 | :c2 |
-------------------
References
Doing "n per each x" queries in SPARQL is kind of challenge, and there's no great solution for it yet. Some related reading that might help (be sure to check the comments on these questions and answers, too), include:
SPARQL using subquery with limit (subqueries with limits can sometimes be helpful)
How to select first N row of each group (canonical question, in my opinion, but has no answer, since there's no general answer)
Find the two nearest neighbors of points (recent question with a "hack" answer)

Related

anzo query console results URI compression

Suppose i have data inserted into ANZO:
insert data {<a> rdf:type <c1>}
When I issue query:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?s ?p ?o where {bind(rdf:type as ?p). ?s ?p ?o}
I've got the answer from console:
s | p | o
---+-------------------------------------------------+----
a | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | c1
Now my question: is any way in ANZO I can get answer to console where rdf:type is shown as compressed URI:
s | p | o
---+----------+----
a | rdf:type | c1

Conjunction in SPARQL query using OWL and RDF [duplicate]

This question already has answers here:
How to find similar content using SPARQL
(2 answers)
Closed 5 years ago.
I wanted to query the movies that have the highest number of shared type with Matrix movie.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?movie_name (count(distinct ?atype) as ?numatype)
FROM <http://dbpedia.org/>
WHERE {
?movie rdf:type dbo:Film;
rdf:type ?ftype.
dbr:The_Matrix rdf:type ?ttype.
?atype a owl:class;
owl:intersectionOf [?ftype ?ttype].
?movie rdfs:label ?movie_name.
FILTER (LANG(?movie_name)="en").
}
GROUP BY ?movie_name
ORDER BY DESC(?numatype)
LIMIT 100
I defined ?ttype as the type for The matrix movie and ?ftype as the type of ?movie.
when I query this in http://dbpedia.org/sparq there are no results.
The idea is to use a simple join on the types:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT (SAMPLE(?l) as ?movie_name)
(count(distinct ?ttype) as ?numSharedTypes)
WHERE {
VALUES ?s {dbr:The_Matrix}
?s a ?ttype .
?movie a dbo:Film ;
a ?ttype .
FILTER(?movie != ?s)
?movie rdfs:label ?l .
FILTER (LANGMATCHES(LANG(?l), 'en'))
}
GROUP BY ?movie
ORDER BY desc(?numSharedTypes)
LIMIT 100
The JOIN itself might be expensive, thus, you could get a timeout resp. due to the anytime feature of Virtuoso get an incomplete result back.
It looks like the query optimizer isn't that smart enough, especially the labels make the performance worse. A bunch of sub-SELECTs make it much faster, although more complex in reading the query:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?movie_name ?numSharedTypes
WHERE
{ ?movie rdfs:label ?l
FILTER langMatches(lang(?l), "en")
BIND(replace(replace(str(?l), "\\(film\\)$", ""), "[^0-9]*\\sfilm\\)$", ")") AS ?movie_name)
{ SELECT ?movie (COUNT(?type) AS ?numSharedTypes)
WHERE
{ ?movie rdf:type dbo:Film ;
rdf:type ?type
{ SELECT ?type
WHERE
{ dbr:The_Matrix rdf:type ?type
}
}
FILTER ( ?movie != dbr:The_Matrix )
}
GROUP BY ?movie
ORDER BY DESC(?numSharedTypes) ASC(?movie)
LIMIT 100
}
}
ORDER BY DESC(?numSharedTypes) ASC(?movie_name)
Result (chunk):
+------------------------+----------------+
| movie_name | numSharedTypes |
+------------------------+----------------+
| The Matrix Reloaded | 36 |
| The Matrix Revolutions | 33 |
| The Matrix (franchise) | 30 |
| Demolition Man | 28 |
| Freejack | 28 |
| Conspiracy Theory | 27 |
| Deep Blue Sea (1999) | 27 |
| Fair Game (1995) | 27 |
| Judge Dredd | 27 |
| Revenge Quest | 27 |
| Screamers (1995) | 27 |
| Soldier (1998) | 27 |
| The Invasion | 27 |
| Timecop | 27 |
| Total Recall (1990) | 27 |
| V for Vendetta | 27 |
| Assassins | 26 |
| ... | ... |
+------------------------+----------------+

How can retrieve relation between 2 class with object property using sparql query

Assume we have an OWL file which contains follows the following properties with domains and ranges:
Domain Property Range
----------------------------
tour hascountry country
country hascity city
city hasward ward
ward hashouse house
Using SPARQL, how can I get results "between" the Tour and House classes? That is, properties whose domains and ranges such there's a "path" from Tour to the domain and from the range to House. With just these two classes, how could we find results like the following? It seems like some kind of loop might be necessary, but I don't know how to do that in SPARQL.
|tour -------- (hascountry) ----- country|
|country -------- (hascity) ----- city |
|city -------- (hasward) ----- ward |
|ward -------- (hashouse) ----- house |
First, it's always easier to work with some real data. We can't write real SPARQL queries against data that we don't have. In the future, please be sure to provide some sample that we can work with. For now, here's some sample data that describes the domains and ranges of the properties that you mentioned. Also note that properties don't connect classes; properties connect individuals. Properties can have domains and ranges, and that provides us with a way to infer additional information about the individuals that are related by the property. Anyhow, here's the data:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix : <https://stackoverflow.com/q/29737549/1281433/> .
:hascountry rdfs:domain :tour ; rdfs:range :country .
:hascity rdfs:domain :country ; rdfs:range :city .
:hasward rdfs:domain :city ; rdfs:range :ward .
:hashouse rdfs:domain :ward ; rdfs:range :house .
Now, note that you could get from :tour to :country if you follow the rdfs:domain property backward to :hascountry, and then follow the rdfs:range property forward to :country. In SPARQL, you can write that as a property path:
:tour ^rdfs:domain/rdfs:range :country
If you can follow chains of that property path, you can find all the properties that are "between" :tour and :house:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix : <https://stackoverflow.com/q/29737549/1281433/>
select ?domain ?property ?range where {
#-- find ?properties that have a domain
#-- and range...
?property rdfs:domain ?domain ;
rdfs:range ?range .
#-- where there's a ^rdfs:domain to rdfs:range
#-- chain from :tour to ?domain...
:tour (^rdfs:domain/rdfs:range)* ?domain .
#-- and from ?range to :house.
?range (^rdfs:domain/rdfs:range)* :house .
}
-------------------------------------
| domain | property | range |
=====================================
| :ward | :hashouse | :house |
| :city | :hasward | :ward |
| :country | :hascity | :city |
| :tour | :hascountry | :country |
-------------------------------------
Getting the results "in order"
If you want the properties "in order" from the start class to the end class, you can compute the distance from the start class to each property and order by that. You can do that using the technique in my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL?. Here's what it looks like as a SPARQL query:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix : <https://stackoverflow.com/q/29737549/1281433/>
select ?domain ?property ?range
(count(?mid) as ?dist)
where {
#-- find ?properties that have a domain
#-- and range...
?property rdfs:domain ?domain ;
rdfs:range ?range .
#-- where there's a ^rdfs:domain to rdfs:range
#-- chain from :tour to ?domain...
:tour (^rdfs:domain/rdfs:range)* ?domain .
#-- and from ?range to :house.
?range (^rdfs:domain/rdfs:range)* :house .
#-- then, compute the "distance" from :tour
#-- to the property. This is based on binding
#-- ?mid to each class in between them and
#-- taking the number of distinct ?mid values
#-- as the distance.
:tour (^rdfs:domain/rdfs:range)* ?mid .
?mid (^rdfs:domain/rdfs:range)* ?domain .
}
group by ?domain ?property ?range
order by ?dist
--------------------------------------------
| domain | property | range | dist |
============================================
| :tour | :hascountry | :country | 1 |
| :country | :hascity | :city | 2 |
| :city | :hasward | :ward | 3 |
| :ward | :hashouse | :house | 4 |
--------------------------------------------
I included ?dist in the select just so we could see the values. You don't have to select it in order to sort by it. You can do this too:
select ?domain ?property ?range {
#-- ...
}
group by ?domain ?property ?range
order by count(?mid)
-------------------------------------
| domain | property | range |
=====================================
| :tour | :hascountry | :country |
| :country | :hascity | :city |
| :city | :hasward | :ward |
| :ward | :hashouse | :house |
-------------------------------------

The SPARQL query - the closest blond antecedor

Let us consider we two classes: Person and its subclass BlondePerson.
Let us consider a relationship: isParent where a Person is parent of another person.
Let us define the relationship: isAncestor where there is a sequence of isParent relationships.
There might be many BlondPersons ancestor of me.
My question: how to write a SPARQL query so I learn the closest ancestor who is blond. The closest means my parent if possible, if not the grandparents, otherwise grandgrandparents and so on.
How to compose a SPARQL query for that? How to assure that I will get the ancestor who it the closest one?
Thank you.
This isn't too hard; you can use the same technique demonstrated in Is it possible to get the position of an element in an RDF Collection in SPARQL?. The idea is essentially to treat the ancestry as a sequence, from which you can get the "closeness" of each ancestor, and select the closest ancestor from a given class. If we create some sample data, we end up with something like this:
#prefix : <urn:ex:>
:a a :Person ; :hasParent :b .
:b a :Person ; :hasParent :c .
:c a :Person, :Blond ; :hasParent :d .
:d a :Person, :Blond ; :hasParent :e .
:e a :Person .
prefix : <urn:ex:>
select distinct
?person
?ancestor
(count(distinct ?mid) as ?closeness)
?isBlond
where {
values ?person { :a }
?a :hasParent+ ?mid .
?mid a :Person .
?mid :hasParent* ?ancestor .
?ancestor a :Person .
bind( if( exists { ?ancestor a :Blond }, true, false ) as ?isBlond )
}
group by ?person ?ancestor ?isBlond
order by ?person ?closeness
-------------------------------------------
| person | ancestor | closeness | isBlond |
===========================================
| :a | :b | 1 | false |
| :a | :c | 2 | true |
| :a | :d | 3 | true |
| :a | :e | 4 | false |
-------------------------------------------
That's actually more information than we needed, I just included it to show how this works. Now we can actually just require that ?ancestor is blond, order by closeness, and limit the results to the first (and thus the closest):
prefix : <urn:ex:>
select distinct
?person
?ancestor
(count(distinct ?mid) as ?closeness)
where {
values ?person { :a }
?a :hasParent+ ?mid .
?mid a :Person .
?mid :hasParent* ?ancestor .
?ancestor a :Person, :Blond .
}
group by ?person ?ancestor
order by ?person ?closeness
limit 1
---------------------------------
| person | ancestor | closeness |
=================================
| :a | :c | 2 |
---------------------------------

How to rank values in SPARQL?

I would like to create a ranking of observations using SPARQL. Suppose I have:
#prefix : <http://example.org#> .
:A :value 60 .
:B :value 23 .
:C :value 89 .
:D :value 34 .
The ranking should be: :C = 1 (the highest), :A = 2, :D = 3, :B = 4. Up until now, I was able solve it using the following query:
prefix : <http://example.org#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?x ?v ?ranking {
?x :value ?v .
{ SELECT (GROUP_CONCAT(?x;separator="") as ?ordered) {
{ SELECT ?x {
?x :value ?v .
} ORDER BY DESC(?v)
}
}
}
BIND (str(?x) as ?xName)
BIND (strbefore(?ordered,?xName) as ?before)
BIND ((strlen(?before) / strlen(?xName)) + 1 as ?ranking)
} ORDER BY ?ranking
But that query only works if the URIs for ?x have the same length. A better solution would be to have a function similar to strpos in PHP or isIndexOf in Java, but as far as I know, they are not available in SPARQL 1.1. Are there simpler solutions?
One way of doing this is to take as the ranking for a value the number of values which are less than or equal to it. This might be inefficient for larger data sets, since for each value it has to check all the other values. It doesn't require string manipulation though.
PREFIX : <http://example.org#>
SELECT ?x ?v (COUNT(*) as ?ranking) WHERE {
?x :value ?v .
[] :value ?u .
FILTER( ?v <= ?u )
}
GROUP BY ?x ?v
ORDER BY ?ranking
---------------------
| x | v | ranking |
=====================
| :C | 89 | 1 |
| :A | 60 | 2 |
| :D | 34 | 3 |
| :B | 23 | 4 |
---------------------