SPARQL full aggregation on a group aggregation - sparql

I have an Ontology where users can use one of five predicates to express how much they like an item.
The Ontology contains specific predicates that have a property called hasSimilarityValue.
I am trying to do the following:
Having a user let's say rs:ania
Extract all the items that this user has rated before. (this is easy because the Ontology already contains triple from the user to the items)
Extract similary items to the items that have been extracted in step 2 and calculate their similarities. (here we are using our own approach to calculate the similarites ). However the issue is: from step 2, we have many items the user has rated, from step there we are extracting and calculating similar items to these items that came from step 2. So, it is possible that an item in step 3 is similar to two (or more) items from step 2. Thus we end up with the following:
user :ania rated item x1
user :ania rated item x2
item y is similar by y1 to x1
item y is similar by y2 to x2
item z is similar by z1 to x1
y1, y2, and z1 are values between 0 and 1
the thing is that we need to normalize these values to know the final similarities for item y and item z.
the normalization is simple, just group by the item and divide by the maximum number of items
so to know the similarity with y, i should do (y1+y2/2)
to know the similarity with z, i should do (z1/2)
my problem
as you see, i need to count the items and then know the max of this count
this is the query that calculates everything without the normalization part
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
order by ?s
the result is:
now I need to divide each row by the maximum of the count column,
my proposed solution is to repeat the exact query twice, once to get the similarities and once to get the max and then join them and then do the divide (normalization). it is working but it is ugly, the performance will be disaster because i am repeating the same query twice. it is stupid solution and i would like to ask you guys for a better one please
here is my stupid solutions
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rs: <http://www.musicontology.com/rs#>
PREFIX pdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#select
#?s ?similarityWithRating (max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxNumberOfItemsUsedInDeterminingTheSimilarities)
#where {
# {
select ?s ?similarity ?similarityWithRating ?countOfItemsUsedInDeterminingTheSimilarities ?maxCountOfItemsUsedInDeterminingTheSimilarities ?finalSimilarity where {
{
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
#}
#}
#group by ?s
order by ?s
} #end first part
{
select (Max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxCountOfItemsUsedInDeterminingTheSimilarities) where {
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
#}
#}
#group by ?s
order by ?s
}
}#end second part
bind (?similarityWithRating/?maxCountOfItemsUsedInDeterminingTheSimilarities as ?finalSimilarity)
}
order by desc(?finalSimilarity)
Finally
Here is the data if you want to try it yourself.
http://www.mediafire.com/view/r4qlu3uxijs4y30/musicontology

It's really helpful if you can provide data to work with in these examples that's minimal. That means data that doesn't have stuff we don't need in order to solve the problem, and that is pretty much as simple as possible. I think that How to create a Minimal, Complete, and Verifiable example might be useful for your Stack Overflow questions.
Anyhow, here's some simple data that should be enough for us to work with. There are two users who have made some ratings, and some similarities in the data. Note that I made the similarities directed; you'd probably want them to be bidirectional, but that's not really the main part of this problem.
#prefix : <urn:ex:>
:user1 :rated :a , :b .
:user2 :rated :b , :c , :d .
:a :similarTo [ :piece :c ; :value 0.1 ] ,
[ :piece :d ; :value 0.2 ] .
:b :similarTo [ :piece :d ; :value 0.3 ] ,
[ :piece :e ; :value 0.4 ] .
:c :similarTo [ :piece :e ; :value 0.5 ] ,
[ :piece :f ; :value 0.6 ] .
:d :similarTo [ :piece :f ; :value 0.7 ] ,
[ :piece :g ; :value 0.8 ] .
Now, the query just needs to retrieve a user and the pieces that they've rated, along with similar pieces and the actual similarity values. Now, if you group by the user and the similar piece, you end up with a groups that have a single similar piece, a single user, and a bunch of rated pieces and their similarity to the similar piece. Since all the similarity ratings are in a fixed range (0,1), you can just average them to get overall similarity. In this query, I've also added a group_concat to show which rated pieces the similarity value is based on.
prefix : <urn:ex:>
select
?user
(group_concat(?piece) as ?ratedPieces)
?similarPiece
(avg(?similarity_) as ?similarity)
where {
#-- Find ?pieces that ?user has rated.
?user :rated ?piece .
#-- Find other pieces (?similarPiece) that are
#-- similar to ?piece, along with the
#-- similarity value (?similarity_)
?piece :similarTo [ :piece ?similarPiece ; :value ?similarity_ ] .
}
group by ?user ?similarPiece
------------------------------------------------------------
| user | ratedPieces | similarPiece | similarity |
============================================================
| :user1 | "urn:ex:a" | :c | 0.1 | ; a-c[0.1]
| :user1 | "urn:ex:b urn:ex:a" | :d | 0.25 | ; b-d[0.3], a-d[0.2]
| :user1 | "urn:ex:b" | :e | 0.4 | ; b-e[0.4]
| :user2 | "urn:ex:b" | :d | 0.3 | ; b-d[0.3]
| :user2 | "urn:ex:c urn:ex:b" | :e | 0.45 | ; c-e[0.5], b-e[0.4]
| :user2 | "urn:ex:d urn:ex:c" | :f | 0.65 | ; d-f[0.7], c-f[0.6]
| :user2 | "urn:ex:d" | :g | 0.8 | ; d-g[0.8]
------------------------------------------------------------

Related

The mechanism of "FILTER NOT EXISTS" in SPARQL

Assuming the triples are following:
#prefix : <http://example/> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
:alice rdf:type foaf:Person .
:alice foaf:name "Alice" .
:bob rdf:type foaf:Person .
and then we perform 3 queries based on SPARQL 1.1:
Q1:
SELECT ?s
WHERE
{
?s ?p ?o .
FILTER NOT EXISTS { ?s foaf:name ?y }
}
Q2:
SELECT ?s
WHERE
{
?s ?p ?o .
FILTER NOT EXISTS { ?x foaf:name ?y }
}
Q3:
SELECT ?s
WHERE
{
?s ?p ?o .
FILTER NOT EXISTS { ?x foaf:mailbox ?y }
}
These three queries return three different solutions. Could anyone help me figure out why Q2 evaluates to no query solution in contrast to Q1 and Q3? Many thanks in advance :)
Q2 returns no solution because in your data, there exists a statement that matches ?x foaf:name ?y: ?x = :alice and ?y = "Alice". You've put no further constraints on either ?x or ?y. So no matter what the other variables in your query (?s, ?p and ?o) are bound to, the NOT EXISTS condition will always fail and therefore the query returns no result.

Finding the relative position of elements in a list using SPARQL

I'm trying to return subjects based on the relative position of their subjects in an ordered list.
A subject can be associated with multiple objects (via a single predicate) and all objects are in an ordered list. Given a reference object in this list I'd like to return the subjects in order of relative distance of their objects from the reference object.
:a : :x
:b : :v
:b : :z
:c : :v
:c : :y
:ls :list (:v :w :x :y :z)
Taking x as our starting object in the list, the code below returns
:a :x :0
:c :y :1
:b :v :2
:b :z :2
:c :v :2
Instead of returning all positions I would like only the objects relating to the subject's minimum object 'distance' to be returned (which may mean up to two objects per subject - both up and down the list). So I'd like to return
:a :x :0
:c :y :1
:b :v :2
:b :z :2
The code so far...
(with a lot of help from Find lists containing ALL values in a set? and Is it possible to get the position of an element in an RDF Collection in SPARQL?)
SELECT ?s ?p (abs(?refPos-?pos) as ?dif)
WHERE {
:ls :list/rdf:rest*/rdf:first ?o .
?s : ?o .
{
SELECT ?o (count(?mid) as ?pos) ?refPos
WHERE {
[] :list/rdf:rest* ?mid . ?mid rdf:rest* ?node .
?node rdf:first ?o .
{
SELECT ?o (count(?mid2) as ?refPos)
WHERE {
[] :list/rdf:rest* ?mid2 . ?mid2 rdf:rest* ?node2 .
?node2 rdf:first :x .
}
}
}
GROUP BY ?o
}
}
GROUP BY ?s ?o
ORDER BY ?dif
I've been trying to get a minimum ?dif (difference/distance) by grouping by ?s but because I then have to apply this (something like ?dif = ?minDif) to the ?s ?o grouping from earlier I don't know how to go back and forward between these two groupings.
Thanks for any assistance you can provide
All you needed to compound a solution is yet another one Joshua Taylor's answer: this or this.
Here below I'm using Jena functions, but I hope the idea is clear.
Query 1
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>
SELECT ?s ?el ?dif {
?s : ?el .
:ls :list/list:index (?pos ?el) .
:ls :list/list:index (?ref :x) .
BIND (ABS(?pos -?ref) AS ?dif)
{
SELECT ?s (MIN (?dif_) AS ?dif) WHERE {
?s : ?el_ .
:ls :list/list:index (?pos_ ?el_) .
:ls :list/list:index (?ref_ :x) .
BIND (ABS(?pos_ - ?ref_) AS ?dif_)
} GROUP by ?s
}
}
Query 2
PREFIX list: <http://jena.apache.org/ARQ/list#>
SELECT ?s ?el ?dif {
?s : ?el .
:ls :list/list:index (?pos ?el) .
:ls :list/list:index (?ref :x) .
BIND (ABS(?pos -?ref) AS ?dif)
FILTER NOT EXISTS {
?s : ?el_ .
:ls :list/list:index (?pos_ ?el_) .
BIND (ABS(?pos_ - ?ref) AS ?dif_) .
FILTER(?dif_ < ?dif)
}
}
Update
Query 1 can be rewritten in this way:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?s ?el ?dif {
?s : ?el
{ select (count(*) as ?pos) ?el {[] :list/rdf:rest*/rdf:rest*/rdf:first ?el} group by ?el }
{ select (count(*) as ?ref) {[] :list/rdf:rest*/rdf:rest*/rdf:first :x} }
BIND (ABS(?pos - ?ref) AS ?dif)
{
SELECT ?s (MIN(?dif_) AS ?diff) {
?s : ?el_
{ select (count(*) as ?pos_) ?el_ {[] :list/rdf:rest*/rdf:rest*/rdf:first ?el_} group by ?el_ }
{ select (count(*) as ?ref_) {[] :list/rdf:rest*/rdf:rest*/rdf:first :x} }
BIND (ABS(?pos_ - ?ref_) AS ?dif_)
} GROUP by ?s
}
FILTER (?dif = ?diff)
}
Notes
As you can see, this is not what SPARQL was designed for. For example, Blazegraph supports Gremlin...
Possibly this is not what RDF was designed for. Or try other modeling approach: do you really need RDF lists?
I haven't tested the above query in Virtuoso.

sparql "diff" (minus with some unshared variables)

This is a followup on my question Compare models for identity, but with variables? Construct with minus?. Either I forgot what I learned then, or I didn't learn as much as I has thought.
I have triples like this:
prefix : <http://example.com/>
:rose :color :red .
:violet :color :blue .
:rose a :flower .
:flower rdfs:subClassOf :plant .
:dogs :love :trucks .
I want to discover any triples in my triplestore that don't satisfy at least one of these rules:
take :rose as the subject
take :rose's parent class as the subject
use any of the predicates that are used in any triple with :rose as the subject
take, as their object, the object of any triple with :rose as the subject.
Thus, in this case, the exception-discovery query (either select or construct) should only return
:dogs :love :trucks .
This query shows what should be in the triplestore:
PREFIX : <http://example.com/>
construct where {
:rose ?p ?o .
:rose a ?c .
?c rdfs:subClassOf ?super .
?s ?p ?x1 .
?x2 ?x3 ?o
}
.
+---+---------+-----------------+---------+
| | subject | predicate | object |
+---+---------+-----------------+---------+
| A | :flower | rdfs:subClassOf | :plant |
| B | :rose | :color | :red |
| C | :rose | rdf:type | :flower |
| D | :violet | :color | :blue |
+---+---------+-----------------+---------+
Is there a way to subtract that pattern out of everything in the triplestore, { ?s ?p ?o }, even though I'm using variable names other than ?s, ?p and ?o in the construct statement?
I have seen this post with strategies for comparing RDF, but I'd like to do it with standard SPARQL.
To tie it together with my earlier post, this final query erroneously suggests that several desired triples violate the rules set out at the top of the message.
+---------+-----------------+---------+
E | :dogs | :love | :trucks |
A | :flower | rdfs:subClassOf | :plant |
D | :violet | :color | :blue |
+---------+-----------------+---------+
Triple E is indeed undesired. But A is desired because it has :rose's class as its subject (rule 2), and triple D is desired because it's predicate is also used in some triples with :rose as the subject (rule 3).
PREFIX : <http://example.com/>
CONSTRUCT
{
?s ?p ?o .
}
WHERE
{ SELECT ?s ?p ?o
WHERE
{ { ?s ?p ?o }
MINUS
{ :rose ?p ?o ;
rdf:type ?c .
?c rdfs:subClassOf ?super .
?s ?p ?x1 .
?x2 ?x3 ?o
}
}
}
If I understand this correctly, you want to allow four types of triples in your data. If a triple (s,p,o) is in your data, it should satisfy at least one of the following criteria:
s = rose (About rose)
p = rdfs:subClassOf, and data contains (rose,a,s) (About a type)
The data also contains (rose,p,x) (Shared predicate)
The data also contains (rose,q,o) (Shared object)
It's easy enough to write a pattern for each one of those. You just need to find each triple (s,p,o) and filter out the ones that match none of those criteria. I think you can do it like this:
select ?s ?p ?o {
?s ?p ?o
filter not exists {
{ values ?s { :rose } } #-- (1)
union { values ?p { rdfs:subClassOf } :rose a ?s } #-- (2)
union { :rose ?p ?x } #-- (3)
union { :rose ?x ?o } #-- (4)
}
}

SPARQL: extract distinct values from dbpedia

I want to extract graphs for 5 individuals who are Film(or movies) from DBPedia.
My query is:
ParameterizedSparqlString qs = new ParameterizedSparqlString( "" +
"construct{?s ?p ?o}"+
"where{?s a http://dbpedia.org/ontology/Film ."+
"?s ?p ?o"} OFFSET 0 LIMIT 5" );
I get the following result:
1- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Film .
2- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Thing .
3- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q386724 .
4- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Wikidata:Q11424 .
5- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Work .
Problem:
The same film is returned 5 times as all the class: Film, Thing, Q386724,WIKIdata:Q11424, and Work are equivalent class (or Subclass relation exist).
My question:
I want to return once the triple
<http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Film> .
and filter out the other 4 triples.
How do it please?
Thank you in advance
I think the following should work for you:
CONSTRUCT {?s ?p ?o}
WHERE {
{ SELECT DISTINCT ?s
WHERE {
?s a <http://dbpedia.org/ontology/Film> .
} LIMIT 5
}
?s ?p ?o .
}
I think you want this query
construct {?s a <http://dbpedia.org/ontology/Film> .}
where { ?s a <http://dbpedia.org/ontology/Film>. }
limit 5

sparql how to group correctly this data

First of all, I didn't make a minimum example because i think that my problem can be understood without it.
Second, I didn't give you the data because i think that my problem can be solved without it. However, I'm open to give it to you if you ask.
This is my query:
select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)
{
values ?user {bo:ania}
#the variable ?x is bound to the items the user :ania has liked.
?user rs:hasRated ?ratings.
?ratings a rs:Likes.
?ratings rs:aboutItem ?x.
?ratings rs:ratesBy ?ratingValue.
#level 0 class similarities
{
#extract all the items that are from the same class (type) as the liked items.
#I assumed the being from the same class accounts for 50% of the similarities.
#This value can be changed according to the test or the application domain.
values ?classImportance {0.5} #class level
bind (?classImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
?classSimilarity rs:appliedOnClass ?class .
?classSimilarity rs:hasClassSimilarityValue ?similarity .
?item a ?class.
bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
}
union
#level 0 instance similarities
{
#extract the items that share the same value for important predicates with the already liked items..
#I assumed that having the same instance for important predicates account for 100% of the similarities.
#This value can be changed according to the test or the application domain.
values ?instanceImportance {1} #instance level
bind (?instanceImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
?propertySimilarity rs:appliedOnProperty ?property .
?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
?x ?property ?value .
?item ?property ?value .
bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
}
filter (?x != ?item)
}
This is the result:
As you see, the result contains many values for the same suggestedItem, I want to make group according to the suggestedItem and sum the values of finalSimilarity
I tried this:
select ?item (SUM(?similarity * ?importance * ?levelImportance ) as ?finalSimilarity) (group_concat(distinct ?x) as ?likedItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason) where
{
values ?user {bo:ania}
#the variable ?x is bound to the items the user :ania has liked.
?user rs:hasRated ?ratings.
?ratings a rs:Likes.
?ratings rs:aboutItem ?x.
?ratings rs:ratesBy ?ratingValue.
#level 0 class similarities
{
#extract all the items that are from the same class (type) as the liked items.
#I assumed the being from the same class accounts for 50% of the similarities.
#This value can be changed according to the test or the application domain.
values ?classImportance {0.5} #class level
bind (?classImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
?classSimilarity rs:appliedOnClass ?class .
?classSimilarity rs:hasClassSimilarityValue ?similarity .
?item a ?class.
bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
}
union
#level 0 instance similarities
{
#extract the items that share the same value for important predicates with the already liked items..
#I assumed that having the same instance for important predicates account for 100% of the similarities.
#This value can be changed according to the test or the application domain.
values ?instanceImportance {1} #instance level
bind (?instanceImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
?propertySimilarity rs:appliedOnProperty ?property .
?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
?x ?property ?value .
?item ?property ?value .
bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
}
filter (?x != ?item)
}
group by ?item
order by desc(?finalSimilarity)
but the result is:
this is something wrong in my way because if you look at the finalSimilarity in, the value is 1.7. However, if you sum that manually from the first query, you get 0.62 so I did something wrong,
could you help me discover it?
Please note that the two queries are the same, it is just the select statment are different
Hint
I am already able to solve it using two selects like this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rs: <http://www.SemanticRecommender.com/rs#>
PREFIX bo: <http://www.BookOntology.com/bo#>
PREFIX :<http://www.SemanticBookOntology.com/sbo#>
select ?suggestedItem ( SUM (?finalSimilarity) as ?summedFinalSimilarity) (group_concat(distinct strafter(str(?likedItem), "#")) as ?becauseYouHaveLikedThisItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason)
where {
select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)
where
{
values ?user {bo:ania}
#the variable ?x is bound to the items the user :ania has liked.
?user rs:hasRated ?ratings.
?ratings a rs:Likes.
?ratings rs:aboutItem ?x.
?ratings rs:ratesBy ?ratingValue.
#level 0 class similarities
{
#extract all the items that are from the same class (type) as the liked items.
#I assumed the being from the same class accounts for 50% of the similarities.
#This value can be changed according to the test or the application domain.
values ?classImportance {0.5} #class level
bind (?classImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
?classSimilarity rs:appliedOnClass ?class .
?classSimilarity rs:hasClassSimilarityValue ?similarity .
?item a ?class.
bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
}
union
#level 0 instance similarities
{
#extract the items that share the same value for important predicates with the already liked items..
#I assumed that having the same instance for important predicates account for 100% of the similarities.
#This value can be changed according to the test or the application domain.
values ?instanceImportance {1} #instance level
bind (?instanceImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
?propertySimilarity rs:appliedOnProperty ?property .
?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
?x ?property ?value .
?item ?property ?value .
bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
}
filter (?x != ?item)
}
}
group by ?suggestedItem
order by desc(?summedFinalSimilarity)
but to me that is a stupid solution and there must be a more clever one where i can take the aggregated data using one select
Without seeing your data, it's impossible to say, and with a query this big, it's probably not worth trying to debug the exact problem, but it's easy for this to happen if you can have duplicates (which would be easy to get, especially if you're using unions where some condition could match both parts). For instance, suppose you have data like this:
#prefix : <urn:ex:>
:x :similar [ :sim 0.10 ; :mult 2 ] ,
[ :sim 0.12 ; :mult 1 ] ,
[ :sim 0.12 ; :mult 1 ] , # yup, a duplicate
[ :sim 0.15 ; :mult 4 ] .
Then if you run this query, you'll get four result rows:
prefix : <urn:ex:>
select ?sim ((?sim * ?mult) as ?final) {
:x :similar [ :sim ?sim ; :mult ?mult ] .
}
----------------
| sim | final |
================
| 0.15 | 0.60 |
| 0.12 | 0.12 |
| 0.12 | 0.12 |
| 0.10 | 0.20 |
----------------
However, if you select distinct, you'll only see three:
select distinct ?sim ((?sim * ?mult) as ?final) {
:x :similar [ :sim ?sim ; :mult ?mult ] .
}
----------------
| sim | final |
================
| 0.15 | 0.60 |
| 0.12 | 0.12 |
| 0.10 | 0.20 |
----------------
Once you start to group by and sum, those non-distinct values will both get included:
select (sum(?sim * ?mult) as ?final) {
:x :similar [ :sim ?sim ; :mult ?mult ] .
}
---------
| final |
=========
| 1.04 |
---------
That sum is the sum of all four terms, not the three distinct ones. Even if the data doesn't have the duplicate values, the union can introduce the duplicate results:
#prefix : <urn:ex:>
:x :similar [ :sim 0.10 ; :mult 2 ] ,
[ :sim 0.12 ; :mult 1 ] ,
[ :sim 0.15 ; :mult 4 ] .
prefix : <urn:ex:>
select (sum(?sim * ?mult) as ?final) {
{ :x :similar [ :sim ?sim ; :mult ?mult ] }
union
{ :x :similar [ :sim ?sim ; :mult ?mult ] }
}
---------
| final |
=========
| 1.84 |
---------
Since you found the need to use group_concat(distinct …), I wouldn't be surprised if there are duplicates of that nature.