sparql "diff" (minus with some unshared variables) - sparql

This is a followup on my question Compare models for identity, but with variables? Construct with minus?. Either I forgot what I learned then, or I didn't learn as much as I has thought.
I have triples like this:
prefix : <http://example.com/>
:rose :color :red .
:violet :color :blue .
:rose a :flower .
:flower rdfs:subClassOf :plant .
:dogs :love :trucks .
I want to discover any triples in my triplestore that don't satisfy at least one of these rules:
take :rose as the subject
take :rose's parent class as the subject
use any of the predicates that are used in any triple with :rose as the subject
take, as their object, the object of any triple with :rose as the subject.
Thus, in this case, the exception-discovery query (either select or construct) should only return
:dogs :love :trucks .
This query shows what should be in the triplestore:
PREFIX : <http://example.com/>
construct where {
:rose ?p ?o .
:rose a ?c .
?c rdfs:subClassOf ?super .
?s ?p ?x1 .
?x2 ?x3 ?o
}
.
+---+---------+-----------------+---------+
| | subject | predicate | object |
+---+---------+-----------------+---------+
| A | :flower | rdfs:subClassOf | :plant |
| B | :rose | :color | :red |
| C | :rose | rdf:type | :flower |
| D | :violet | :color | :blue |
+---+---------+-----------------+---------+
Is there a way to subtract that pattern out of everything in the triplestore, { ?s ?p ?o }, even though I'm using variable names other than ?s, ?p and ?o in the construct statement?
I have seen this post with strategies for comparing RDF, but I'd like to do it with standard SPARQL.
To tie it together with my earlier post, this final query erroneously suggests that several desired triples violate the rules set out at the top of the message.
+---------+-----------------+---------+
E | :dogs | :love | :trucks |
A | :flower | rdfs:subClassOf | :plant |
D | :violet | :color | :blue |
+---------+-----------------+---------+
Triple E is indeed undesired. But A is desired because it has :rose's class as its subject (rule 2), and triple D is desired because it's predicate is also used in some triples with :rose as the subject (rule 3).
PREFIX : <http://example.com/>
CONSTRUCT
{
?s ?p ?o .
}
WHERE
{ SELECT ?s ?p ?o
WHERE
{ { ?s ?p ?o }
MINUS
{ :rose ?p ?o ;
rdf:type ?c .
?c rdfs:subClassOf ?super .
?s ?p ?x1 .
?x2 ?x3 ?o
}
}
}

If I understand this correctly, you want to allow four types of triples in your data. If a triple (s,p,o) is in your data, it should satisfy at least one of the following criteria:
s = rose (About rose)
p = rdfs:subClassOf, and data contains (rose,a,s) (About a type)
The data also contains (rose,p,x) (Shared predicate)
The data also contains (rose,q,o) (Shared object)
It's easy enough to write a pattern for each one of those. You just need to find each triple (s,p,o) and filter out the ones that match none of those criteria. I think you can do it like this:
select ?s ?p ?o {
?s ?p ?o
filter not exists {
{ values ?s { :rose } } #-- (1)
union { values ?p { rdfs:subClassOf } :rose a ?s } #-- (2)
union { :rose ?p ?x } #-- (3)
union { :rose ?x ?o } #-- (4)
}
}

Related

Return values under same column in SPARQL query

Given three possible objects for triples, foaf:name, foaf:givenName, and foaf:familyName, where statements either have foaf:name or foaf:givenName + foaf:familyName, e.g.:
<uri1> <foaf:name> "Lolly Loozles" .
<uri2> <foaf:givenName> "Stotly" .
<uri2> <foaf:familyName> "Styles" .
wondering how to write a SPARQL query to return a new variable like pretty_name that is either the value of foaf:name or a concatenation of the values from foaf:givenName and foaf:familyName.
Resulting in something like:
?o | ?pretty_name
----------------------
<uri1> | Lolly Loozles
<uri2> | Stotly Styles
This is what I have so far, but unsure how to proceed:
PREFIX : <https://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
# select two variables, not ideal...
SELECT ?foaf_fullName ?pretty_name
WHERE {
# Find all triples
?s ?p ?o .
# Binds
OPTIONAL { ?s foaf:name ?foaf_fullName }
OPTIONAL { ?s foaf:givenName ?givenName }
OPTIONAL { ?s foaf:familyName ?familyName }
# Filter where predicate is part of list
FILTER (?p IN (foaf:name, foaf:givenName, foaf:familyName ) )
# Binds
BIND( CONCAT(?givenName, ' ', ?familyName) AS ?pretty_name ) .
}
I had imagined, and tried, adding another BIND to add to ?pretty_name, but the SPARQL engine wouldn't have it:
BIND( ?foaf_fullName AS ?pretty_name ) .
I also had luck writing a CONSTRUCT statement to get the values I'm looking for, but don't have the ability to write back to this triplestore (for a number of reasons):
CONSTRUCT {
?s :hasPrettyName ?foaf_fullName .
?s :hasPrettyName ?pretty_name .
}
I had thought that CONSTRUCT could accompany SELECT, but must have been mistaken?
Any insight or suggestions would much appreciated.
Using #StanislavKralin comment/suggestion to use COALESCE without IF clauses works great:
PREFIX : <https://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
# select two variables, not ideal...
SELECT ?foaf_fullName ?pretty_name
WHERE {
# Find all triples
?s ?p ?o .
# Binds
OPTIONAL { ?s foaf:name ?foaf_fullName }
OPTIONAL { ?s foaf:givenName ?givenName }
OPTIONAL { ?s foaf:familyName ?familyName }
# Filter where predicate is part of list
FILTER (?p IN (foaf:name, foaf:givenName, foaf:familyName ) )
# Binds
BIND( COALESCE(?foaf_fullName, CONCAT(?givenName, ' ', ?familyName)) AS ?pretty_name )
}

SPARQL: extract distinct values from dbpedia

I want to extract graphs for 5 individuals who are Film(or movies) from DBPedia.
My query is:
ParameterizedSparqlString qs = new ParameterizedSparqlString( "" +
"construct{?s ?p ?o}"+
"where{?s a http://dbpedia.org/ontology/Film ."+
"?s ?p ?o"} OFFSET 0 LIMIT 5" );
I get the following result:
1- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Film .
2- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Thing .
3- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q386724 .
4- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Wikidata:Q11424 .
5- http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Work .
Problem:
The same film is returned 5 times as all the class: Film, Thing, Q386724,WIKIdata:Q11424, and Work are equivalent class (or Subclass relation exist).
My question:
I want to return once the triple
<http://dbpedia.org/resource/1001_Inventions_and_the_World_of_Ibn_Al-Haytham>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Film> .
and filter out the other 4 triples.
How do it please?
Thank you in advance
I think the following should work for you:
CONSTRUCT {?s ?p ?o}
WHERE {
{ SELECT DISTINCT ?s
WHERE {
?s a <http://dbpedia.org/ontology/Film> .
} LIMIT 5
}
?s ?p ?o .
}
I think you want this query
construct {?s a <http://dbpedia.org/ontology/Film> .}
where { ?s a <http://dbpedia.org/ontology/Film>. }
limit 5

SPARQL full aggregation on a group aggregation

I have an Ontology where users can use one of five predicates to express how much they like an item.
The Ontology contains specific predicates that have a property called hasSimilarityValue.
I am trying to do the following:
Having a user let's say rs:ania
Extract all the items that this user has rated before. (this is easy because the Ontology already contains triple from the user to the items)
Extract similary items to the items that have been extracted in step 2 and calculate their similarities. (here we are using our own approach to calculate the similarites ). However the issue is: from step 2, we have many items the user has rated, from step there we are extracting and calculating similar items to these items that came from step 2. So, it is possible that an item in step 3 is similar to two (or more) items from step 2. Thus we end up with the following:
user :ania rated item x1
user :ania rated item x2
item y is similar by y1 to x1
item y is similar by y2 to x2
item z is similar by z1 to x1
y1, y2, and z1 are values between 0 and 1
the thing is that we need to normalize these values to know the final similarities for item y and item z.
the normalization is simple, just group by the item and divide by the maximum number of items
so to know the similarity with y, i should do (y1+y2/2)
to know the similarity with z, i should do (z1/2)
my problem
as you see, i need to count the items and then know the max of this count
this is the query that calculates everything without the normalization part
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
order by ?s
the result is:
now I need to divide each row by the maximum of the count column,
my proposed solution is to repeat the exact query twice, once to get the similarities and once to get the max and then join them and then do the divide (normalization). it is working but it is ugly, the performance will be disaster because i am repeating the same query twice. it is stupid solution and i would like to ask you guys for a better one please
here is my stupid solutions
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rs: <http://www.musicontology.com/rs#>
PREFIX pdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#select
#?s ?similarityWithRating (max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxNumberOfItemsUsedInDeterminingTheSimilarities)
#where {
# {
select ?s ?similarity ?similarityWithRating ?countOfItemsUsedInDeterminingTheSimilarities ?maxCountOfItemsUsedInDeterminingTheSimilarities ?finalSimilarity where {
{
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
#}
#}
#group by ?s
order by ?s
} #end first part
{
select (Max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxCountOfItemsUsedInDeterminingTheSimilarities) where {
select ?s (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {
values (?user) { (rs:ania) }
values (?ratingPredict) {(rs:ratedBy4Stars) (rs:ratedBy5Stars)}
?user ?ratingPredict ?x.
?ratingPredict rs:hasRatingValue ?ratings.
{
?s ?p ?o .
?x ?p ?o .
bind(4/7 as ?weight)
}
union
{
?s ?a ?b . ?b ?p ?o .
?x ?c ?d . ?d ?p ?o .
bind(1/7 as ?weight)
}
?p rs:hasSimilarityValue ?factor .
filter (?s != ?x)
}
group by ?s
#}
#}
#group by ?s
order by ?s
}
}#end second part
bind (?similarityWithRating/?maxCountOfItemsUsedInDeterminingTheSimilarities as ?finalSimilarity)
}
order by desc(?finalSimilarity)
Finally
Here is the data if you want to try it yourself.
http://www.mediafire.com/view/r4qlu3uxijs4y30/musicontology
It's really helpful if you can provide data to work with in these examples that's minimal. That means data that doesn't have stuff we don't need in order to solve the problem, and that is pretty much as simple as possible. I think that How to create a Minimal, Complete, and Verifiable example might be useful for your Stack Overflow questions.
Anyhow, here's some simple data that should be enough for us to work with. There are two users who have made some ratings, and some similarities in the data. Note that I made the similarities directed; you'd probably want them to be bidirectional, but that's not really the main part of this problem.
#prefix : <urn:ex:>
:user1 :rated :a , :b .
:user2 :rated :b , :c , :d .
:a :similarTo [ :piece :c ; :value 0.1 ] ,
[ :piece :d ; :value 0.2 ] .
:b :similarTo [ :piece :d ; :value 0.3 ] ,
[ :piece :e ; :value 0.4 ] .
:c :similarTo [ :piece :e ; :value 0.5 ] ,
[ :piece :f ; :value 0.6 ] .
:d :similarTo [ :piece :f ; :value 0.7 ] ,
[ :piece :g ; :value 0.8 ] .
Now, the query just needs to retrieve a user and the pieces that they've rated, along with similar pieces and the actual similarity values. Now, if you group by the user and the similar piece, you end up with a groups that have a single similar piece, a single user, and a bunch of rated pieces and their similarity to the similar piece. Since all the similarity ratings are in a fixed range (0,1), you can just average them to get overall similarity. In this query, I've also added a group_concat to show which rated pieces the similarity value is based on.
prefix : <urn:ex:>
select
?user
(group_concat(?piece) as ?ratedPieces)
?similarPiece
(avg(?similarity_) as ?similarity)
where {
#-- Find ?pieces that ?user has rated.
?user :rated ?piece .
#-- Find other pieces (?similarPiece) that are
#-- similar to ?piece, along with the
#-- similarity value (?similarity_)
?piece :similarTo [ :piece ?similarPiece ; :value ?similarity_ ] .
}
group by ?user ?similarPiece
------------------------------------------------------------
| user | ratedPieces | similarPiece | similarity |
============================================================
| :user1 | "urn:ex:a" | :c | 0.1 | ; a-c[0.1]
| :user1 | "urn:ex:b urn:ex:a" | :d | 0.25 | ; b-d[0.3], a-d[0.2]
| :user1 | "urn:ex:b" | :e | 0.4 | ; b-e[0.4]
| :user2 | "urn:ex:b" | :d | 0.3 | ; b-d[0.3]
| :user2 | "urn:ex:c urn:ex:b" | :e | 0.45 | ; c-e[0.5], b-e[0.4]
| :user2 | "urn:ex:d urn:ex:c" | :f | 0.65 | ; d-f[0.7], c-f[0.6]
| :user2 | "urn:ex:d" | :g | 0.8 | ; d-g[0.8]
------------------------------------------------------------

SPARQL algebra: tricky ASK from named graphs if any triples do not exist

Take these two named graphs:
# graph :yesterday
:Foo
:likes :Bar ;
:likes :Qux .
# graph :today
:Foo
:likes :Bar ;
:likes :Baz .
Now say you want to find out if any of the triples from graph :yesterday are absent from graph :today. How would you ASK this query?
ASK
FROM NAMED :yesterday
FROM NAMED :today
{
GRAPH :yesterday {
?s ?p ?o .
...
}
}
SPARQL has two operations for negation : use the one you find most natural. As I read the problem description, it read more like the first one below to me but in this problem situation they are very similar. They differ in their effects when one part or other of the pattern does not match anything or when there are no variables in common.
NOT EXISTS tests for the absence of a pattern (there is EXISTS as well). It is a filter applied to each solution of the first pattern. It is like a nested ASK where also that the variables are substituted for the incoming ones to the filter.
PREFIX : <http://example/>
SELECT * {
GRAPH :yesterday { ?s ?p ?o }
FILTER NOT EXISTS { GRAPH :today { ?s ?p ?o } }
}
MINUS executes the two patterns (left and right sides) and then returns the rows of the left where there is no matching one anywhere on the right. It is an anti-join.
PREFIX : <http://example/>
SELECT * {
GRAPH :yesterday { ?s ?p ?o }
MINUS { GRAPH :today { ?s ?p ?o } }
}
For both I get:
------------------------
| s | p | o |
========================
| :Foo | :likes | :Qux |
------------------------
TriG:
#prefix : <http://example/> .
:yesterday {
:Foo
:likes :Bar ;
:likes :Qux .
}
:today {
:Foo
:likes :Bar ;
:likes :Baz .
}

SPARQL: Query to get all Instances of two classes that share the same connections to a third class

Im trying to understand the capabilities of SPARQL an im wondering if this kind of query is possible:
Diagram of my Ontology structure (sorry, Im not allowed to post pictures yet)
I want to get all instances of class A and B that have connections to the same Instances of class B. So some kind of:
Select ?a, ?c
Where
{
?a myOntology:ab ?c .
?c myOntology:cb ?B .
}
Which would give me:
A:1 C:1
A:2 C:1 (with B:2)
A:2 C:1 (with B:3)
(Where the letter is the class and the number the instance, counted from the top)
But with the difference that I only want the ones that have exactly the same related instances of B:
A:2 C:1 (with B:2 and B:3)
Is that possible or do I have to use external logic to get that?
I would be pleased for any answers...
Yes, you can! if you can use NOT EXISTS.
SPARQL, like SQL, does not have a universal quantifier, but you can do with nested NOT EXISTS-s.
Your query is, in pseudo SPARQL, "Give me all pairs (a,c) such that a-b-c such that no other bb such that a-bb and not bb-c --- and vise verse: bb-c and not a-bb":
PREFIX : <http://test/>
SELECT ?a ?b ?c
WHERE
{ ?a :ab ?b .
?b :bc ?c .
FILTER NOT EXISTS
{ ?a :ab ?bb .
FILTER NOT EXISTS
{ ?bb :bc ?c . }
}
## vise verse:
FILTER NOT EXISTS
{ ?bb :bc ?c .
FILTER NOT EXISTS
{ ?a :ab ?bb . }
}
}
Running it on
#prefix : <http://test/> .
:a1 :ab :b1 .
:a1 :ab :b2 .
:a2 :ab :b2 .
:a2 :ab :b3 .
:b2 :bc :c1 .
:b3 :bc :c1 .
gives
----------------------------------------------------------
| a | b | c |
==========================================================
| <http://test/a2> | <http://test/b3> | <http://test/c1> |
| <http://test/a2> | <http://test/b2> | <http://test/c1> |
----------------------------------------------------------