This question already has answers here:
How to find similar content using SPARQL
(2 answers)
Closed 5 years ago.
I wanted to query the movies that have the highest number of shared type with Matrix movie.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?movie_name (count(distinct ?atype) as ?numatype)
FROM <http://dbpedia.org/>
WHERE {
?movie rdf:type dbo:Film;
rdf:type ?ftype.
dbr:The_Matrix rdf:type ?ttype.
?atype a owl:class;
owl:intersectionOf [?ftype ?ttype].
?movie rdfs:label ?movie_name.
FILTER (LANG(?movie_name)="en").
}
GROUP BY ?movie_name
ORDER BY DESC(?numatype)
LIMIT 100
I defined ?ttype as the type for The matrix movie and ?ftype as the type of ?movie.
when I query this in http://dbpedia.org/sparq there are no results.
The idea is to use a simple join on the types:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT (SAMPLE(?l) as ?movie_name)
(count(distinct ?ttype) as ?numSharedTypes)
WHERE {
VALUES ?s {dbr:The_Matrix}
?s a ?ttype .
?movie a dbo:Film ;
a ?ttype .
FILTER(?movie != ?s)
?movie rdfs:label ?l .
FILTER (LANGMATCHES(LANG(?l), 'en'))
}
GROUP BY ?movie
ORDER BY desc(?numSharedTypes)
LIMIT 100
The JOIN itself might be expensive, thus, you could get a timeout resp. due to the anytime feature of Virtuoso get an incomplete result back.
It looks like the query optimizer isn't that smart enough, especially the labels make the performance worse. A bunch of sub-SELECTs make it much faster, although more complex in reading the query:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?movie_name ?numSharedTypes
WHERE
{ ?movie rdfs:label ?l
FILTER langMatches(lang(?l), "en")
BIND(replace(replace(str(?l), "\\(film\\)$", ""), "[^0-9]*\\sfilm\\)$", ")") AS ?movie_name)
{ SELECT ?movie (COUNT(?type) AS ?numSharedTypes)
WHERE
{ ?movie rdf:type dbo:Film ;
rdf:type ?type
{ SELECT ?type
WHERE
{ dbr:The_Matrix rdf:type ?type
}
}
FILTER ( ?movie != dbr:The_Matrix )
}
GROUP BY ?movie
ORDER BY DESC(?numSharedTypes) ASC(?movie)
LIMIT 100
}
}
ORDER BY DESC(?numSharedTypes) ASC(?movie_name)
Result (chunk):
+------------------------+----------------+
| movie_name | numSharedTypes |
+------------------------+----------------+
| The Matrix Reloaded | 36 |
| The Matrix Revolutions | 33 |
| The Matrix (franchise) | 30 |
| Demolition Man | 28 |
| Freejack | 28 |
| Conspiracy Theory | 27 |
| Deep Blue Sea (1999) | 27 |
| Fair Game (1995) | 27 |
| Judge Dredd | 27 |
| Revenge Quest | 27 |
| Screamers (1995) | 27 |
| Soldier (1998) | 27 |
| The Invasion | 27 |
| Timecop | 27 |
| Total Recall (1990) | 27 |
| V for Vendetta | 27 |
| Assassins | 26 |
| ... | ... |
+------------------------+----------------+
Related
My data is basically an event log in RDF. I have cases and events, the latter belong to the former. Events have timestamps and an actor who triggered them.
For each case I now need the latest event, when it happened, and who triggered it.
This is roughly my current query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex: <http://example.org/>
SELECT ?case ?event ?timestamp ?actor
WHERE {
?case rdf:type ex:Case ;
ex:hasEvent ?event .
?event ex:timestamp ?timestamp ;
ex:hasActor ?actor .
}
ORDER BY ASC(?case) DESC(?timestamp)
Which yields something like this:
| case | event | timestamp | actor |
=================================================================================
| ex:case1 | ex:event1 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Alice |
| ex:case1 | ex:event2 | "2020-01-01T01:00:00Z"^^xsd:dateTimeStamp | ex:Bob |
| ex:case2 | ex:event3 | "2020-01-01T03:00:00Z"^^xsd:dateTimeStamp | ex:Charlie |
| ex:case2 | ex:event4 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Dan |
However I would like to only get the first and third row, as they correspond to the latest events for this case. Like this:
| case | event | timestamp | actor |
=================================================================================
| ex:case1 | ex:event1 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Alice |
| ex:case2 | ex:event3 | "2020-01-01T03:00:00Z"^^xsd:dateTimeStamp | ex:Charlie |
In order to achieve this I tried to use SELECT ?case ?event (MAX(?timestamp) AS ?latest) ?actor combined with GROUP BY ?case however SPARQL complains I need to group by ?event and ?actor as well which is not what I want of course.
I am aware that PostgreSQL has DISTINCT ON which would solve my problem, but I need to do it in SPARQL. Is there a nice way to achieve this?
Self answer based on #UninformedUser's comment:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex: <http://example.org/>
SELECT ?case ?event (?latest as ?timestamp) ?actor WHERE {
?case ex:hasEvent ?event .
?event ex:timestamp ?latest ;
ex:hasActor?actor .
{ SELECT ?case (MAX(?timestamp) AS ?latest) {
?case rdf:type ex:case ;
ex:hasEvent ?event .
?event ex:timestamp ?timestamp }
group by ?case }
}
Suppose i have data inserted into ANZO:
insert data {<a> rdf:type <c1>}
When I issue query:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?s ?p ?o where {bind(rdf:type as ?p). ?s ?p ?o}
I've got the answer from console:
s | p | o
---+-------------------------------------------------+----
a | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | c1
Now my question: is any way in ANZO I can get answer to console where rdf:type is shown as compressed URI:
s | p | o
---+----------+----
a | rdf:type | c1
I developed an ontology about machine learning using Protege.
I have the following classes with their instances:
Algorithm : A1, A2
LearningMethod : M1, M2
An algorithm can have has-learning-method object property to a learning method.
I want to build a query to select all learning methods assigned for instance A1.
I managed to build a query that gives me all the instances of Algorithm with their corresponding LearningMethod.
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX ml: <http://www.semanticweb.org/machine-learning-ontology#>
SELECT DISTINCT ?x0 ?x1 WHERE {
?x0 rdf:type ml:Algorithm.
?x1 rdf:type ml:LearningMethod.
?x0 ml:has-learning-method ?x1.
}
Given answer:
+---------+---------+
| x0 | x1 |
+---------+---------+
| A1 | M1 |
+---------+---------+
| A2 | M2 |
+---------+---------+
How can I select only the learning methods linked to A1?
Is there any prefix i could use?
The expected result should be:
+---------+---------+
| x0 | x1 |
+---------+---------+
| A1 | M1 |
+---------+---------+
This is the minimum data required to reproduce the problem
#prefix : <http://example.org/rs#>
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
:artist1 rdf:type :Artist .
:artist2 rdf:type :Artist .
:artist3 rdf:type :Artist .
:en rdf:type :Language .
:it rdf:type :Language .
:gr rdf:type :Language .
:c1
rdf:type :CountableClass ;
:appliedOnClass :Artist ;
:appliedOnProperty :hasArtist
.
:c2
rdf:type :CountableClass ;
:appliedOnClass :Language ;
:appliedOnProperty :hasLanguage
.
:i1
rdf:type :RecommendableClass ;
:hasArtist :artist1 ;
:hasLanguage :en
.
:i2
rdf:type :RecommendableClass ;
:hasArtist :artist1 ;
:hasLanguage :en
.
:i3
rdf:type :RecommendableClass;
:hasArtist :artist1 ;
:hasLanguage :it
.
:i4
rdf:type :RecommendableClass;
:hasArtist :artist2 ;
:hasLanguage :en
.
:i5
rdf:type :RecommendableClass;
:hasArtist :artist2 ;
:hasLanguage :it
.
:i6
rdf:type :RecommendableClass;
:hasArtist :artist3 ;
:hasLanguage :gr
.
:ania :likes :i1 .
:ania :likes :i3 .
:ania :likes :i4 .
This is my query
PREFIX : <http://example.org/rs#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rs: <http://spektrum.ctu.cz/ontologies/radio-spectrum#>
SELECT ?item ?count ?value
WHERE
{ ?item rdf:type :RecommendableClass
{ SELECT ?countableProperty ?value (count(*) AS ?count)
WHERE
{ VALUES ?user { :ania }
VALUES ?countableConfiguration { :c1 }
?user :likes ?x .
?countableConfiguration :appliedOnProperty ?countableProperty .
?countableConfiguration :appliedOnClass ?countableClass .
?x ?countableProperty ?value .
?value rdf:type ?countableClass
}
GROUP BY ?countableProperty ?value
ORDER BY DESC(?count)
LIMIT 3
}
FILTER NOT EXISTS {?user :likes ?item}
}
This is the result:
As you see, there're three items that have value artist1 and three other that have artist2
is there any way so i can limit the result to just 2 for each them
First some minimal data, with three artists, and some items for each one. I always stress the point of minimal data on Stack Overflow, because it's important for isolating the problem. In this case, you've still provided a relatively large query and a lot more data that we need. Since we know the problem is in how to group artists that are each related to a number of items, all the data needs here is some artists that are related to a number of items. Then we can retrieve them easily, and group them easily.
#prefix : <urn:ex:> .
:artist1 :p :a1, :a2, :a3, :a4 .
:artist2 :p :b2, :b2, :b3, :b4, :b5 .
:artist3 :p :c2 .
Now, you can select artists and their items, and you can determine an index for each item. This method checks for each item how many other items there are that are less than equal to it (there's always at least one equal to it (itself), so the counts are essentially a 1-based index).
prefix : <urn:ex:>
select ?artist ?item (count(?item_) as ?pos){
?artist :p ?item_, ?item .
filter (str(?item_) <= str(?item))
}
group by ?artist ?item
-------------------------
| artist | item | pos |
=========================
| :artist1 | :a1 | 1 |
| :artist1 | :a2 | 2 |
| :artist1 | :a3 | 3 |
| :artist1 | :a4 | 4 |
| :artist2 | :b2 | 1 |
| :artist2 | :b3 | 2 |
| :artist2 | :b4 | 3 |
| :artist2 | :b5 | 4 |
| :artist3 | :c2 | 1 |
-------------------------
Now you can use having to filter on the position, so that you get at most two per artist:
prefix : <urn:ex:>
select ?artist ?item {
?artist :p ?item_, ?item .
filter (str(?item_) <= str(?item))
}
group by ?artist ?item
having (count(?item_) < 3)
-------------------
| artist | item |
===================
| :artist1 | :a1 |
| :artist1 | :a2 |
| :artist2 | :b2 |
| :artist2 | :b3 |
| :artist3 | :c2 |
-------------------
References
Doing "n per each x" queries in SPARQL is kind of challenge, and there's no great solution for it yet. Some related reading that might help (be sure to check the comments on these questions and answers, too), include:
SPARQL using subquery with limit (subqueries with limits can sometimes be helpful)
How to select first N row of each group (canonical question, in my opinion, but has no answer, since there's no general answer)
Find the two nearest neighbors of points (recent question with a "hack" answer)
There is a strange behaviour in the connection of the commandline tools of ARQ, TDB and Named Graphs. If importing data via tdbloader in a named graph it can not be queried via GRAPH clause in a SPARQL SELECT query. However, this query is possible when inserting the data in the same graph with SPARQL INSERT.
I have following assembler description file tdb.ttl:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
#prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
[] rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
.
There is a dataset in the file data.ttl:
<a> <b> <c>.
Now, I am inserting this data with tdbloader and secondly another triple with SPARQL INSERT, both in the named graph data:
tdbloader --desc tdb.ttl --graph data data.ttl
update --desc tdb.ttl "INSERT DATA {GRAPH <data> {<d> <e> <f>.}}"
Now, the data can be queried with SPARQL via:
$arq --desc tdb.ttl "SELECT * WHERE{ GRAPH ?g {?s ?p ?o.}}"
----------------------------
| s | p | o | g |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | <data> |
----------------------------
Everything seems perfect. But now I want to query only this specifc named graph data:
$ arq --desc tdb.ttl "SELECT * WHERE{ GRAPH <data> {?s ?p ?o.}}"
-------------------
| s | p | o |
===================
| <d> | <e> | <f> |
-------------------
Why is the data imported from tdbloader missing? What is wrong with this query? How can I get results back from both imports?
Try this query:
PREFIX : <data>
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
and the output is
----------------------------
| s | p | o | g |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | : |
----------------------------
or try:
tdbquery --loc DB --file Q.rq -results srj
to get the results in a different form.
The text output is makign things look nice but two different things end up as <data>.
What you are seeing is that
tdbloader --desc tdb.ttl --graph data data.ttl
used data exactly as is to name the graph. But
INSERT DATA {GRAPH <data> {<d> <e> <f>.}}
does a full SPARQL parse, and resolves against the base URI, probably looking like file://*currentdirectory*.
When printing in text, URIs get abbreviated, including using the base. So both the original data (from tdbloader) and file:///path/data appear as <data>.
PREFIX : <data>
gives the text output a different way to write it as :.
Finally try:
BASE <http://example/>
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
which sets the base URI to something no where near your data URIs so switching off nice formatting by base URI:
----------------------------------------------------------------------------------------------------------------
| s | p | o | g |
================================================================================================================
| <file:///home/afs/tmp/a> | <file:///home/afs/tmp/b> | <file:///home/afs/tmp/c> | <data> |
| <file:///home/afs/tmp/d> | <file:///home/afs/tmp/e> | <file:///home/afs/tmp/f> | <file:///home/afs/tmp/data> |
----------------------------------------------------------------------------------------------------------------