How to transpose the query result in SPARQL - sparql

I am using TopBraid Composer for writing SPARQL queries. I have queried the following result:
| Header | Total |
|-------- |------- |
| | |
| A | 5 |
| | |
| B | 6 |
| | |
| C | 7 |
| | |
| D | 8 |
Now my humble question is whether we can transpose the result somehow as follows:
| Header | A | B | C | D |
|-------- |--- |--- |--- |--- |
| Total | 5 | 6 | 7 | 8 |

Yes and no. The first notation you use is essential for understanding SPARQL SELECT - each row represents a separate graph pattern match on the data where the first column shows the binding for ?Header and the second column shows the binding for ?Total, per your unstated query. E.g. in one of the matches, ?Header is bound to "A" and ?Total is bound to "5". Another match is ?Header = "B" and ?Total = "6", etc. (I'd suggest doing some homework on SPARQL)
From that, any language computing the SPARQL query will have some means of iterating over the result set, and you can place them in an inverted table as you show.
So, no, SPARQL can't do that (look into SPARQL graph pattern matching), but whatever language you are using should be able to iterate over the result set to get what you are looking for.

you can use a filtered left outer join query to build your own transposed table (aka pivot table).
PREFIX wd: <http://cocreate-cologne.wiki.opencura.com/entity/>
PREFIX wdt: <http://cocreate-cologne.wiki.opencura.com/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://cocreate-cologne.wiki.opencura.com/prop/>
PREFIX ps: <http://cocreate-cologne.wiki.opencura.com/prop/statement/>
PREFIX pq: <http://cocreate-cologne.wiki.opencura.com/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>
select ?item ?itemLabel ?enthalten_in1Label ?enthalten_in2Label ?enthalten_in3Label {
SELECT ?item ?itemLabel ?enthalten_in1Label ?enthalten_in2Label ?enthalten_in3Label WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],de". }
?item p:P3 ?statement.
?statement ps:P3 wd:Q15.
?statement pq:P13 wd:Q17.
OPTIONAL { ?item wdt:P11 ?buendnis. FILTER (?buendnis in (wd:Q32)) }
OPTIONAL { ?item wdt:P11 ?sdgKarte. FILTER (?sdgKarte in (wd:Q14)) }
OPTIONAL { ?item wdt:P11 ?agora. FILTER (?agora in (wd:Q3)) }
BIND(?buendnis as ?enthalten_in1).
BIND(?sdgKarte as ?enthalten_in2).
BIND(?agora as ?enthalten_in3).
#debug
#Filter (?item in (wd:Q1))
}
LIMIT 2000
} ORDER BY ?itemLabel

Related

Is there a "DISTINCT ON" equivalent in SPARQL?

My data is basically an event log in RDF. I have cases and events, the latter belong to the former. Events have timestamps and an actor who triggered them.
For each case I now need the latest event, when it happened, and who triggered it.
This is roughly my current query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex: <http://example.org/>
SELECT ?case ?event ?timestamp ?actor
WHERE {
?case rdf:type ex:Case ;
ex:hasEvent ?event .
?event ex:timestamp ?timestamp ;
ex:hasActor ?actor .
}
ORDER BY ASC(?case) DESC(?timestamp)
Which yields something like this:
| case | event | timestamp | actor |
=================================================================================
| ex:case1 | ex:event1 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Alice |
| ex:case1 | ex:event2 | "2020-01-01T01:00:00Z"^^xsd:dateTimeStamp | ex:Bob |
| ex:case2 | ex:event3 | "2020-01-01T03:00:00Z"^^xsd:dateTimeStamp | ex:Charlie |
| ex:case2 | ex:event4 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Dan |
However I would like to only get the first and third row, as they correspond to the latest events for this case. Like this:
| case | event | timestamp | actor |
=================================================================================
| ex:case1 | ex:event1 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Alice |
| ex:case2 | ex:event3 | "2020-01-01T03:00:00Z"^^xsd:dateTimeStamp | ex:Charlie |
In order to achieve this I tried to use SELECT ?case ?event (MAX(?timestamp) AS ?latest) ?actor combined with GROUP BY ?case however SPARQL complains I need to group by ?event and ?actor as well which is not what I want of course.
I am aware that PostgreSQL has DISTINCT ON which would solve my problem, but I need to do it in SPARQL. Is there a nice way to achieve this?
Self answer based on #UninformedUser's comment:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex: <http://example.org/>
SELECT ?case ?event (?latest as ?timestamp) ?actor WHERE {
?case ex:hasEvent ?event .
?event ex:timestamp ?latest ;
ex:hasActor?actor .
{ SELECT ?case (MAX(?timestamp) AS ?latest) {
?case rdf:type ex:case ;
ex:hasEvent ?event .
?event ex:timestamp ?timestamp }
group by ?case }
}

SPARQL: find objects with all sub-objects matching a criteria

Given this data, where each person may optionally have a "smart" predicate, and each department may have zero or more people, I need to find departments that contain only the smart people. The result should only include departments 1 and 2. Ideally, the result should also include the "smart" objects for each department. Thanks!
person:A p:type 'p' ;
p:smart 'yes' .
person:B p:type 'p' ;
p:smart 'maybe' .
person:C p:type 'p' .
department:1 p:type 'd' ;
p:has person:A, person:B .
department:2 p:type 'd' ;
p:has person:B .
department:3 p:type 'd' ;
p:has person:B, person:C .
department:4 p:type 'd' .
I have a feeling I've answered something similar before, but anyway there is a reasonably nice way to do this:
select ?dept
(count(?person) as ?pc) (count(?smart) as ?sc)
(group_concat(?smart; separator=',') as ?smarts)
{
?dept p:has ?person .
optional { ?person p:smart ?smart }
}
group by ?dept
having (?pc = ?sc)
That is: find the departments, people, and (where available) smart value. For each department find ones where the number of people matches the number of smart values.
-------------------------------------------------------------
| dept | pc | sc | smarts |
=============================================================
| <http://example.com/department#2> | 1 | 1 | "maybe" |
| <http://example.com/department#1> | 2 | 2 | "yes,maybe" |
-------------------------------------------------------------
When you want to get results for each object, matching some criteria, group by / having is often the cleanest answer (in that you can separate out matching from filtering).
Something like double-negation might work:
SELECT DISTINCT ?dept WHERE {
?dept p:has ?person .
FILTER NOT EXISTS {
?dept p:has ?person1 .
FILTER NOT EXISTS {
?person1 p:smart ?smartVal
}
}
}
Result:
+---------------+
| dept |
+---------------+
| department:1 |
| department:2 |
+---------------+
With values:
SELECT ?dept (GROUP_CONCAT(DISTINCT ?smart;separator=";") as ?smartValues) WHERE {
?dept p:has ?person .
?person p:smart ?smart
FILTER NOT EXISTS {
?dept p:has ?person1 .
FILTER NOT EXISTS {
?person1 p:smart ?smartVal
}
}
}
GROUP BY ?dept
Result:
+---------------+-------------+
| dept | smartValues |
+---------------+-------------+
| department:1 | maybe;yes |
| department:2 | maybe |
+---------------+-------------+

Conjunction in SPARQL query using OWL and RDF [duplicate]

This question already has answers here:
How to find similar content using SPARQL
(2 answers)
Closed 5 years ago.
I wanted to query the movies that have the highest number of shared type with Matrix movie.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?movie_name (count(distinct ?atype) as ?numatype)
FROM <http://dbpedia.org/>
WHERE {
?movie rdf:type dbo:Film;
rdf:type ?ftype.
dbr:The_Matrix rdf:type ?ttype.
?atype a owl:class;
owl:intersectionOf [?ftype ?ttype].
?movie rdfs:label ?movie_name.
FILTER (LANG(?movie_name)="en").
}
GROUP BY ?movie_name
ORDER BY DESC(?numatype)
LIMIT 100
I defined ?ttype as the type for The matrix movie and ?ftype as the type of ?movie.
when I query this in http://dbpedia.org/sparq there are no results.
The idea is to use a simple join on the types:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT (SAMPLE(?l) as ?movie_name)
(count(distinct ?ttype) as ?numSharedTypes)
WHERE {
VALUES ?s {dbr:The_Matrix}
?s a ?ttype .
?movie a dbo:Film ;
a ?ttype .
FILTER(?movie != ?s)
?movie rdfs:label ?l .
FILTER (LANGMATCHES(LANG(?l), 'en'))
}
GROUP BY ?movie
ORDER BY desc(?numSharedTypes)
LIMIT 100
The JOIN itself might be expensive, thus, you could get a timeout resp. due to the anytime feature of Virtuoso get an incomplete result back.
It looks like the query optimizer isn't that smart enough, especially the labels make the performance worse. A bunch of sub-SELECTs make it much faster, although more complex in reading the query:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?movie_name ?numSharedTypes
WHERE
{ ?movie rdfs:label ?l
FILTER langMatches(lang(?l), "en")
BIND(replace(replace(str(?l), "\\(film\\)$", ""), "[^0-9]*\\sfilm\\)$", ")") AS ?movie_name)
{ SELECT ?movie (COUNT(?type) AS ?numSharedTypes)
WHERE
{ ?movie rdf:type dbo:Film ;
rdf:type ?type
{ SELECT ?type
WHERE
{ dbr:The_Matrix rdf:type ?type
}
}
FILTER ( ?movie != dbr:The_Matrix )
}
GROUP BY ?movie
ORDER BY DESC(?numSharedTypes) ASC(?movie)
LIMIT 100
}
}
ORDER BY DESC(?numSharedTypes) ASC(?movie_name)
Result (chunk):
+------------------------+----------------+
| movie_name | numSharedTypes |
+------------------------+----------------+
| The Matrix Reloaded | 36 |
| The Matrix Revolutions | 33 |
| The Matrix (franchise) | 30 |
| Demolition Man | 28 |
| Freejack | 28 |
| Conspiracy Theory | 27 |
| Deep Blue Sea (1999) | 27 |
| Fair Game (1995) | 27 |
| Judge Dredd | 27 |
| Revenge Quest | 27 |
| Screamers (1995) | 27 |
| Soldier (1998) | 27 |
| The Invasion | 27 |
| Timecop | 27 |
| Total Recall (1990) | 27 |
| V for Vendetta | 27 |
| Assassins | 26 |
| ... | ... |
+------------------------+----------------+

Select literal in SPARQL?

I want to construct a SPARQL query populated with values I'm setting as literals.
e.g.
SELECT 'A', 'B', attribute
FROM
TABLE
Would return a table that might look like this:
A | B | attribute
-------|---------|--------------
A | B | Mary
A | B | has
A | B | a
A | B | little
A | B | lamb
What I want to do is run a query like this to get all the object types in a triplestore:
select distinct ?o ("class" as ?item_type)
where {
?s rdf:type ?o.
}
and then (ideally) UNION it with a second query that pulls out all the distinct predicate values:
select distinct ?p ("predicate" as ?item_type)
where {
?s ?p ?o.
}
the results of which might look like:
item | item_type
-----------------|-----------------
a_thing | class
another_thing | class
a_relation | predicate
another_relation | predicate
But a UNION in SPARQL only links in an additional where clause, which means I can't specify the item_type literal I want to inject into my results-set.
I think the following should get you what you want:
SELECT DISTINCT ?item ?item_type
WHERE {
{ ?s a ?item .
BIND("class" AS ?item_type)
}
UNION
{ ?s ?item ?o
BIND("predicate" AS ?item_type)
}
}

tbloader vs SPARQL INSERT - Why different behaviour with named graphs?

There is a strange behaviour in the connection of the commandline tools of ARQ, TDB and Named Graphs. If importing data via tdbloader in a named graph it can not be queried via GRAPH clause in a SPARQL SELECT query. However, this query is possible when inserting the data in the same graph with SPARQL INSERT.
I have following assembler description file tdb.ttl:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
#prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
[] rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
.
There is a dataset in the file data.ttl:
<a> <b> <c>.
Now, I am inserting this data with tdbloader and secondly another triple with SPARQL INSERT, both in the named graph data:
tdbloader --desc tdb.ttl --graph data data.ttl
update --desc tdb.ttl "INSERT DATA {GRAPH <data> {<d> <e> <f>.}}"
Now, the data can be queried with SPARQL via:
$arq --desc tdb.ttl "SELECT * WHERE{ GRAPH ?g {?s ?p ?o.}}"
----------------------------
| s | p | o | g |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | <data> |
----------------------------
Everything seems perfect. But now I want to query only this specifc named graph data:
$ arq --desc tdb.ttl "SELECT * WHERE{ GRAPH <data> {?s ?p ?o.}}"
-------------------
| s | p | o |
===================
| <d> | <e> | <f> |
-------------------
Why is the data imported from tdbloader missing? What is wrong with this query? How can I get results back from both imports?
Try this query:
PREFIX : <data>
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
and the output is
----------------------------
| s | p | o | g |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | : |
----------------------------
or try:
tdbquery --loc DB --file Q.rq -results srj
to get the results in a different form.
The text output is makign things look nice but two different things end up as <data>.
What you are seeing is that
tdbloader --desc tdb.ttl --graph data data.ttl
used data exactly as is to name the graph. But
INSERT DATA {GRAPH <data> {<d> <e> <f>.}}
does a full SPARQL parse, and resolves against the base URI, probably looking like file://*currentdirectory*.
When printing in text, URIs get abbreviated, including using the base. So both the original data (from tdbloader) and file:///path/data appear as <data>.
PREFIX : <data>
gives the text output a different way to write it as :.
Finally try:
BASE <http://example/>
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
which sets the base URI to something no where near your data URIs so switching off nice formatting by base URI:
----------------------------------------------------------------------------------------------------------------
| s | p | o | g |
================================================================================================================
| <file:///home/afs/tmp/a> | <file:///home/afs/tmp/b> | <file:///home/afs/tmp/c> | <data> |
| <file:///home/afs/tmp/d> | <file:///home/afs/tmp/e> | <file:///home/afs/tmp/f> | <file:///home/afs/tmp/data> |
----------------------------------------------------------------------------------------------------------------