sparql how to group correctly this data - sparql

First of all, I didn't make a minimum example because i think that my problem can be understood without it.
Second, I didn't give you the data because i think that my problem can be solved without it. However, I'm open to give it to you if you ask.
This is my query:
select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)
{
values ?user {bo:ania}
#the variable ?x is bound to the items the user :ania has liked.
?user rs:hasRated ?ratings.
?ratings a rs:Likes.
?ratings rs:aboutItem ?x.
?ratings rs:ratesBy ?ratingValue.
#level 0 class similarities
{
#extract all the items that are from the same class (type) as the liked items.
#I assumed the being from the same class accounts for 50% of the similarities.
#This value can be changed according to the test or the application domain.
values ?classImportance {0.5} #class level
bind (?classImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
?classSimilarity rs:appliedOnClass ?class .
?classSimilarity rs:hasClassSimilarityValue ?similarity .
?item a ?class.
bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
}
union
#level 0 instance similarities
{
#extract the items that share the same value for important predicates with the already liked items..
#I assumed that having the same instance for important predicates account for 100% of the similarities.
#This value can be changed according to the test or the application domain.
values ?instanceImportance {1} #instance level
bind (?instanceImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
?propertySimilarity rs:appliedOnProperty ?property .
?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
?x ?property ?value .
?item ?property ?value .
bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
}
filter (?x != ?item)
}
This is the result:
As you see, the result contains many values for the same suggestedItem, I want to make group according to the suggestedItem and sum the values of finalSimilarity
I tried this:
select ?item (SUM(?similarity * ?importance * ?levelImportance ) as ?finalSimilarity) (group_concat(distinct ?x) as ?likedItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason) where
{
values ?user {bo:ania}
#the variable ?x is bound to the items the user :ania has liked.
?user rs:hasRated ?ratings.
?ratings a rs:Likes.
?ratings rs:aboutItem ?x.
?ratings rs:ratesBy ?ratingValue.
#level 0 class similarities
{
#extract all the items that are from the same class (type) as the liked items.
#I assumed the being from the same class accounts for 50% of the similarities.
#This value can be changed according to the test or the application domain.
values ?classImportance {0.5} #class level
bind (?classImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
?classSimilarity rs:appliedOnClass ?class .
?classSimilarity rs:hasClassSimilarityValue ?similarity .
?item a ?class.
bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
}
union
#level 0 instance similarities
{
#extract the items that share the same value for important predicates with the already liked items..
#I assumed that having the same instance for important predicates account for 100% of the similarities.
#This value can be changed according to the test or the application domain.
values ?instanceImportance {1} #instance level
bind (?instanceImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
?propertySimilarity rs:appliedOnProperty ?property .
?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
?x ?property ?value .
?item ?property ?value .
bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
}
filter (?x != ?item)
}
group by ?item
order by desc(?finalSimilarity)
but the result is:
this is something wrong in my way because if you look at the finalSimilarity in, the value is 1.7. However, if you sum that manually from the first query, you get 0.62 so I did something wrong,
could you help me discover it?
Please note that the two queries are the same, it is just the select statment are different
Hint
I am already able to solve it using two selects like this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rs: <http://www.SemanticRecommender.com/rs#>
PREFIX bo: <http://www.BookOntology.com/bo#>
PREFIX :<http://www.SemanticBookOntology.com/sbo#>
select ?suggestedItem ( SUM (?finalSimilarity) as ?summedFinalSimilarity) (group_concat(distinct strafter(str(?likedItem), "#")) as ?becauseYouHaveLikedThisItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason)
where {
select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)
where
{
values ?user {bo:ania}
#the variable ?x is bound to the items the user :ania has liked.
?user rs:hasRated ?ratings.
?ratings a rs:Likes.
?ratings rs:aboutItem ?x.
?ratings rs:ratesBy ?ratingValue.
#level 0 class similarities
{
#extract all the items that are from the same class (type) as the liked items.
#I assumed the being from the same class accounts for 50% of the similarities.
#This value can be changed according to the test or the application domain.
values ?classImportance {0.5} #class level
bind (?classImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
?classSimilarity rs:appliedOnClass ?class .
?classSimilarity rs:hasClassSimilarityValue ?similarity .
?item a ?class.
bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
}
union
#level 0 instance similarities
{
#extract the items that share the same value for important predicates with the already liked items..
#I assumed that having the same instance for important predicates account for 100% of the similarities.
#This value can be changed according to the test or the application domain.
values ?instanceImportance {1} #instance level
bind (?instanceImportance as ?importance)
bind( 4/7 as ?levelImportance)
?x a ?class.
?class rdfs:subClassOf ?mainClass .
?mainClass rdfs:subClassOf rs:RecommendableClass .
?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
?propertySimilarity rs:appliedOnProperty ?property .
?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
?x ?property ?value .
?item ?property ?value .
bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
}
filter (?x != ?item)
}
}
group by ?suggestedItem
order by desc(?summedFinalSimilarity)
but to me that is a stupid solution and there must be a more clever one where i can take the aggregated data using one select

Without seeing your data, it's impossible to say, and with a query this big, it's probably not worth trying to debug the exact problem, but it's easy for this to happen if you can have duplicates (which would be easy to get, especially if you're using unions where some condition could match both parts). For instance, suppose you have data like this:
#prefix : <urn:ex:>
:x :similar [ :sim 0.10 ; :mult 2 ] ,
[ :sim 0.12 ; :mult 1 ] ,
[ :sim 0.12 ; :mult 1 ] , # yup, a duplicate
[ :sim 0.15 ; :mult 4 ] .
Then if you run this query, you'll get four result rows:
prefix : <urn:ex:>
select ?sim ((?sim * ?mult) as ?final) {
:x :similar [ :sim ?sim ; :mult ?mult ] .
}
----------------
| sim | final |
================
| 0.15 | 0.60 |
| 0.12 | 0.12 |
| 0.12 | 0.12 |
| 0.10 | 0.20 |
----------------
However, if you select distinct, you'll only see three:
select distinct ?sim ((?sim * ?mult) as ?final) {
:x :similar [ :sim ?sim ; :mult ?mult ] .
}
----------------
| sim | final |
================
| 0.15 | 0.60 |
| 0.12 | 0.12 |
| 0.10 | 0.20 |
----------------
Once you start to group by and sum, those non-distinct values will both get included:
select (sum(?sim * ?mult) as ?final) {
:x :similar [ :sim ?sim ; :mult ?mult ] .
}
---------
| final |
=========
| 1.04 |
---------
That sum is the sum of all four terms, not the three distinct ones. Even if the data doesn't have the duplicate values, the union can introduce the duplicate results:
#prefix : <urn:ex:>
:x :similar [ :sim 0.10 ; :mult 2 ] ,
[ :sim 0.12 ; :mult 1 ] ,
[ :sim 0.15 ; :mult 4 ] .
prefix : <urn:ex:>
select (sum(?sim * ?mult) as ?final) {
{ :x :similar [ :sim ?sim ; :mult ?mult ] }
union
{ :x :similar [ :sim ?sim ; :mult ?mult ] }
}
---------
| final |
=========
| 1.84 |
---------
Since you found the need to use group_concat(distinct …), I wouldn't be surprised if there are duplicates of that nature.

Related

Make a UNION return a single result in SPARQL

I am trying to query for the sh:description of a property shape, and if there is none, I want to follow the path to a property and get the rdfs:comment (Also for sh:name and rdfs:label). My data looks like the following:
ex:File
a owl:Class ;
a sh:NodeShape ;
sh:property ex:File-name ;
ex:File-name
a sh:PropertyShape ;
sh:path ex:filename ;
sh:datatype xsd:string ;
sh:description "The name of the file" ;
sh:maxCount 1 ;
ex:filename
a rdf:Property ;
rdfs:label "Filename" ;
rdfs:comment "The file name" ;
.
I have the query below, but it returns two results. How can I modify it to only return a single result (preferring sh:description over rdfs:comment)?
PREFIX : <http://www.example.org/#>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?propertyShape ?property ?name ?description where {
:File sh:property ?propertyShape .
?propertyShape sh:path ?property .
{ ?propertyShape sh:name ?name } UNION { ?property rdfs:label ?name } .
{ ?propertyShape sh:description ?description } UNION { ?property rdfs:comment ?description } .
The above returns something like this:
propertyShape
property
name
description
:File-name
ex:filename
"Filename"
"The name of the file"
:File-name
ex:filename
"Filename"
"The file name"
I would like for it to return something like:
propertyShape
property
name
description
:File-name
ex:filename
"Filename"
"The name of the file"
You should be able to use a sequence of OPTIONAL blocks to do this:
SELECT ?propertyShape ?property ?name ?description WHERE {
:File sh:property ?propertyShape .
?propertyShape sh:path ?property .
OPTIONAL { ?propertyShape sh:name ?name }
OPTIONAL { ?property rdfs:label ?name }
OPTIONAL { ?propertyShape sh:description ?description }
OPTIONAL { ?property rdfs:comment ?description }
}
You'll still have to think about cases where there are multiple names/descriptions per property (this isn't guaranteed to return just a single result), but it should give you the "preferring sh:description over rdfs:comment" behavior.
Can't you just use path expressions such as
...
?propertyShape sh:name|(sh:path/rdfs:label) ?name .
?propertyShape sh:description|(sh:path/rdfs:comment) ?description .

SPARQL Query: How to get all individual and data property assertions?

Same as you know if we retrieve the object property or data property and subclass can do that by joining them in one variable :
Query 1
SELECT ?x ?y
WHERE { ?x rdfs:subClassOf ?y.
?x rdf:type owl:ObjectProperty.
}
the x var it's same object property and subclass of other class
im need to join (all individual "NamedIndividual")
with object property or subclass .
the problem that is ( ?x rdf:type owl:NamedIndividual . )
couldn't use "?x" in any other location same as :
?x rdfs:subClassOf ?y.
Query 2
SELECT ?x ?y
WHERE { ?x rdfs:subClassOf ?y.
?x rdf:type owl:NamedIndividual .
}
Query 3
SELECT ?x ?y
WHERE { ?x rdf:type owl:ObjectProperty.
?x rdf:type owl:NamedIndividual .
}
So: Query 2 and Query 3 cannot be implemented.
how I can solve this problem?

SPARQL query for individuals with same properties

I'd like to identify all individuals, that have the same properties as some other individuals. Imagine having different shopping lists with different items on them one can either buy or rent (object properties). To make things more complex, I also want to pay an exact amount for the parking and travel a certain distance (data properties). At the same time, there exist different stores offering different items.
Being a lazy person, I would like to identify the stores for each list, that offer all the items on the list.
I believe this is a generalization of this question but I somehow cannot wrap my head around how to do this.
I created some sample individuals:
#prefix : <http://www.shopping.org/model#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix xml: <http://www.w3.org/XML/1998/namespace> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#base <http://www.shopping.org/model> .
<http://www.shopping.org/model> rdf:type owl:Ontology .
# Object Properties
:buy rdf:type owl:ObjectProperty .
:rent rdf:type owl:ObjectProperty .
# Data properties
:distance rdf:type owl:DatatypeProperty .
:parking rdf:type owl:DatatypeProperty .
# Classes
:Product rdf:type owl:Class .
:ShoppingList rdf:type owl:Class .
:Store rdf:type owl:Class .
# Individuals
:Apples rdf:type owl:NamedIndividual ,
:Product .
:Cereal rdf:type owl:NamedIndividual ,
:Product .
:List1 rdf:type owl:NamedIndividual ,
:ShoppingList ;
:buy :Apples ,
:Milk ;
:distance "9.0"^^xsd:float ;
:parking "10.0"^^xsd:float .
:List2 rdf:type owl:NamedIndividual ,
:ShoppingList ;
:buy :Cereal ,
:Milk ;
:rent :TV ;
:distance "5.0"^^xsd:float ;
:parking "10.0"^^xsd:float .
:Milk rdf:type owl:NamedIndividual ,
:Product .
:Store1 rdf:type owl:NamedIndividual ,
:Store ;
:buy :Apples ,
:Cereal ,
:Milk ,
:TV ;
:distance "9.0"^^xsd:float ;
:parking "10.0"^^xsd:float .
:Store2 rdf:type owl:NamedIndividual ,
:Store ;
:buy :Cereal ,
:Milk ;
:rent :TV ;
:distance "5.0"^^xsd:float ;
:parking "10.0"^^xsd:float .
:TV rdf:type owl:NamedIndividual ,
:Product .
# General axioms
[ rdf:type owl:AllDisjointClasses ;
owl:members ( :Product
:ShoppingList
:Store
)
] .
and also tried some first queries. This is the best I have:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <http://www.shopping.org/model#>
SELECT DISTINCT ?list ?store ?prop
WHERE {
?list a :ShoppingList .
?store a :Store .
?list ?prop [] .
FILTER NOT EXISTS {
?list ?prop ?value .
FILTER NOT EXISTS {
?store ?prop ?value .
}
}
}
ORDER BY ?list ?store
However, this query returns all combinations of stores and lists, since every store charges 10.0f for parking.
How can I filter only those stores, which meet all the requirements of a list?
An additional assumption is, that there may exist other business models than buy and rent, as well as other criteria, i.e. data properties, which are of interest, too. This is why I do not want to specify these properties, but want to use variables instead.
This query did what I intended to do:
PREFIXES [as above]
PREFIX : <http://www.shopping.org/model#>
SELECT DISTINCT ?list ?store
WHERE {
?list a :ShoppingList .
?store a :Store .
FILTER NOT EXISTS {
?compat a owl:DatatypeProperty
FILTER NOT EXISTS {
?list ?compat ?value .
?store ?compat ?value .
}
}
FILTER NOT EXISTS {
?subset a owl:ObjectProperty .
?list ?subset ?value .
FILTER NOT EXISTS {
?store ?subset ?value .
}
}
}
ORDER BY ?list ?store
My mistake was actually just that I defined the ?prop outside of the filters. This resulted in them being part of the solution, i.e. I was was unable to remove whole stores through the filter(s).
If you're willing to specific about the particular property that the things need to have in common, you can do something like this. I've cleaned up your data, because it didn't actually have the necessary prefix declarations. Please include complete data if you expect people to be able to work with what you're providing. I didn't add correct prefixes, but just enough to get things working.
Sample Data
#prefix : <urn:ex:> .
#prefix owl: <file:///home/taylorj/tmp/data.ttl> .
#prefix xsd: <file:///home/taylorj/tmp/data.ttl> .
:Store1 a owl:NamedIndividual , :Store ;
:buy :Apples , :Cereal , :Milk , :TV ;
:distance "9.0"^^owl:float ;
:parking "10.0"^^owl:float .
:List2 a owl:NamedIndividual , :ShoppingList ;
:buy :Cereal , :Milk ;
:distance "5.0"^^owl:float ;
:parking "10.0"^^owl:float ;
:rent :TV .
:List1 a owl:NamedIndividual , :ShoppingList ;
:buy :Apples , :Milk ;
:distance "9.0"^^owl:float ;
:parking "10.0"^^owl:float .
:Store2 a owl:NamedIndividual , :Store ;
:buy :Cereal , :Milk ;
:distance "5.0"^^owl:float ;
:parking "10.0"^^owl:float ;
:rent :TV .
Specific Solution
First, we can address the specific case where we need one individual to have a subset of the property values of another for one property (buy), and to have intersecting values for other properties (distance, parking).
Query
prefix : <urn:ex:>
select ?list ?store {
#-- For each list and store
?list a :ShoppingList .
?store a :Store .
#-- Check that they have the same distance and parking.
?distance ^:distance ?list, ?store .
?parking ^:parking ?list, ?store .
#-- Check that every item to buy (or rent) on the list is also
#-- in the store to buy (or rent).
filter not exists {
values ?get { :buy :rent }
?list ?get ?item .
filter not exists {
?store ?get ?item
}
}
}
Results
--------------------
| list | store |
====================
| :List1 | :Store1 |
| :List2 | :Store2 |
--------------------
General Approach
Often times it's helpful to think in terms of eliminating the results that you don't want. In this case, you have two kinds of properties: (i) properties where the individuals must have a compatible value, and (ii) properties where one individual must have a superset of the values of the other individual. Checking if either of those conditions doesn't hold is not too hard:
prefix : <urn:ex:>
select ?list ?store {
#-- For each list and store
?list a :ShoppingList .
?store a :Store .
#-- Check that for each "compatible property" ?compat,
#-- there must be some value that the ?list and the
#-- ?store have in common.
filter not exists {
values ?compat { :distance :parking }
filter not exists {
?list ?compat ?value .
?store ?compat ?value .
}
}
#-- Check that for each "subset property" ?subset,
#-- there is no value that the ?list contains that
#-- the store does not.
filter not exists {
values ?subset { :buy :rent }
?list ?subset ?value
filter not exists {
?store ?subset ?value
}
}
}
Results (the same)
--------------------
| list | store |
====================
| :List1 | :Store1 |
| :List2 | :Store2 |
--------------------

SPARQL - Query based on some property

I want to get all the pizza names which has cheese toppings but the result shows (_:b0) which is kind of an owl restriction following is my query
PREFIX pizza: <http://www.co-ode.org/ontologies/pizza/pizza.owl#>
SELECT ?X WHERE {
?X rdfs:subClassOf* [
owl:onProperty pizza:hasTopping ;
owl:someValuesFrom pizza:CheeseTopping
]
}
using Pizza ontology from stanford
This works (Without reasoning enabled)
PREFIX pizza: <http://www.co-ode.org/ontologies/pizza/pizza.owl#>
SELECT ?X ?topping WHERE {
?X rdfs:subClassOf ?Y .
?Y owl:someValuesFrom ?topping .
?topping rdfs:subClassOf* pizza:CheeseTopping
}
ORDER BY ?X
Some are listed more than once as they could contain more than one CheeseTopping. To remove duplicates:
PREFIX pizza: <http://www.co-ode.org/ontologies/pizza/pizza.owl#>
SELECT DISTINCT ?X WHERE {
?X rdfs:subClassOf ?Y .
?Y owl:someValuesFrom ?topping .
?topping rdfs:subClassOf* pizza:CheeseTopping
}
ORDER BY ?X
This works if you enable a reasoner:
PREFIX pizza: <http://www.co-ode.org/ontologies/pizza/pizza.owl#>
SELECT DISTINCT ?X WHERE {
?X rdfs:subClassOf pizza:CheeseyPizza
}
Ref:
Used the pizza ontology from here: http://protege.stanford.edu/ontologies/pizza/pizza.owl
That query works but is really complex and might be incomplete because some pizzas use complex OWL constructs:
PREFIX pizza: <http://www.co-ode.org/ontologies/pizza/pizza.owl#>
SELECT DISTINCT ?pizza WHERE {
{
?pizza rdfs:subClassOf* pizza:Pizza .
?pizza owl:equivalentClass|rdfs:subClassOf [
rdf:type owl:Restriction ;
owl:onProperty pizza:hasTopping ;
owl:someValuesFrom/rdfs:subClassOf* pizza:CheeseTopping
]
} UNION {
?pizza owl:equivalentClass _:b0 .
_:b0 rdf:type owl:Class ;
owl:intersectionOf _:b1 .
_:b1 (rdf:rest)*/rdf:first ?otherClass.
?otherClass rdf:type owl:Restriction ;
owl:onProperty pizza:hasTopping ;
owl:someValuesFrom/rdfs:subClassOf* pizza:CheeseTopping
}
}

SPARQL full aggregation on a group aggregation

I have an Ontology where users can use one of five predicates to express how much they like an item.
The Ontology contains specific predicates that have a property called hasSimilarityValue.
I am trying to do the following:
Having a user let's say rs:ania
Extract all the items that this user has rated before. (this is easy because the Ontology already contains triple from the user to the items)
Extract similary items to the items that have been extracted in step 2 and calculate their similarities. (here we are using our own approach to calculate the similarites ). However the issue is: from step 2, we have many items the user has rated, from step there we are extracting and calculating similar items to these items that came from step 2. So, it is possible that an item in step 3 is similar to two (or more) items from step 2. Thus we end up with the following:
user :ania rated item x1
user :ania rated item x2
item y is similar by y1 to x1
item y is similar by y2 to x2
item z is similar by z1 to x1
y1, y2, and z1 are values between 0 and 1
the thing is that we need to normalize these values to know the final similarities for item y and item z.
the normalization is simple, just group by the item and divide by the maximum number of items
so to know the similarity with y, i should do (y1+y2/2)
to know the similarity with z, i should do (z1/2)
my problem
as you see, i need to count the items and then know the max of this count
this is the query that calculates everything without the normalization part
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
order by ?s
the result is:
now I need to divide each row by the maximum of the count column,
my proposed solution is to repeat the exact query twice, once to get the similarities and once to get the max and then join them and then do the divide (normalization). it is working but it is ugly, the performance will be disaster because i am repeating the same query twice. it is stupid solution and i would like to ask you guys for a better one please
here is my stupid solutions
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rs: <http://www.musicontology.com/rs#>
PREFIX pdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#select
#?s ?similarityWithRating (max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxNumberOfItemsUsedInDeterminingTheSimilarities)
#where {
# {
select ?s ?similarity ?similarityWithRating ?countOfItemsUsedInDeterminingTheSimilarities ?maxCountOfItemsUsedInDeterminingTheSimilarities ?finalSimilarity where {
{
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
#}
#}
#group by ?s
order by ?s
} #end first part
{
select (Max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxCountOfItemsUsedInDeterminingTheSimilarities) where {
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
#}
#}
#group by ?s
order by ?s
}
}#end second part
bind (?similarityWithRating/?maxCountOfItemsUsedInDeterminingTheSimilarities as ?finalSimilarity)
}
order by desc(?finalSimilarity)
Finally
Here is the data if you want to try it yourself.
http://www.mediafire.com/view/r4qlu3uxijs4y30/musicontology
It's really helpful if you can provide data to work with in these examples that's minimal. That means data that doesn't have stuff we don't need in order to solve the problem, and that is pretty much as simple as possible. I think that How to create a Minimal, Complete, and Verifiable example might be useful for your Stack Overflow questions.
Anyhow, here's some simple data that should be enough for us to work with. There are two users who have made some ratings, and some similarities in the data. Note that I made the similarities directed; you'd probably want them to be bidirectional, but that's not really the main part of this problem.
#prefix : <urn:ex:>
:user1 :rated :a , :b .
:user2 :rated :b , :c , :d .
:a :similarTo [ :piece :c ; :value 0.1 ] ,
[ :piece :d ; :value 0.2 ] .
:b :similarTo [ :piece :d ; :value 0.3 ] ,
[ :piece :e ; :value 0.4 ] .
:c :similarTo [ :piece :e ; :value 0.5 ] ,
[ :piece :f ; :value 0.6 ] .
:d :similarTo [ :piece :f ; :value 0.7 ] ,
[ :piece :g ; :value 0.8 ] .
Now, the query just needs to retrieve a user and the pieces that they've rated, along with similar pieces and the actual similarity values. Now, if you group by the user and the similar piece, you end up with a groups that have a single similar piece, a single user, and a bunch of rated pieces and their similarity to the similar piece. Since all the similarity ratings are in a fixed range (0,1), you can just average them to get overall similarity. In this query, I've also added a group_concat to show which rated pieces the similarity value is based on.
prefix : <urn:ex:>
select
?user
(group_concat(?piece) as ?ratedPieces)
?similarPiece
(avg(?similarity_) as ?similarity)
where {
#-- Find ?pieces that ?user has rated.
?user :rated ?piece .
#-- Find other pieces (?similarPiece) that are
#-- similar to ?piece, along with the
#-- similarity value (?similarity_)
?piece :similarTo [ :piece ?similarPiece ; :value ?similarity_ ] .
}
group by ?user ?similarPiece
------------------------------------------------------------
| user | ratedPieces | similarPiece | similarity |
============================================================
| :user1 | "urn:ex:a" | :c | 0.1 | ; a-c[0.1]
| :user1 | "urn:ex:b urn:ex:a" | :d | 0.25 | ; b-d[0.3], a-d[0.2]
| :user1 | "urn:ex:b" | :e | 0.4 | ; b-e[0.4]
| :user2 | "urn:ex:b" | :d | 0.3 | ; b-d[0.3]
| :user2 | "urn:ex:c urn:ex:b" | :e | 0.45 | ; c-e[0.5], b-e[0.4]
| :user2 | "urn:ex:d urn:ex:c" | :f | 0.65 | ; d-f[0.7], c-f[0.6]
| :user2 | "urn:ex:d" | :g | 0.8 | ; d-g[0.8]
------------------------------------------------------------