SPARQL returning empty result when passing MIN(?date1) from subquery into outer query, with BIND((YEAR(?minDate) - YEAR(?date2)) AS ?diffDate) - sparql

<This question is now resolved, see comment by Valerio Cocchi>
I am trying to pass a variable from a subquery, that takes the minimum date of a set of dates ?date1 belonging to ?p and passes this to the outer query, which then takes another date ?date2 belonging to ?p (there can be at most 1 ?date2 for every ?p) and subtracts ?minDate from ?date2 to get an integer value for the number of years between. I am getting a blank value for this, i.e. ?diffDate returns no value.
I am using Fuseki version 4.3.2. Here is an example of the query:
SELECT ?p ?minDate ?date2 ?diffDate
{
?p a abc:P;
abc:hasAnotherDate ?date2.
BIND((YEAR(?minDate) - YEAR(?date2)) AS ?diffDate)
{
SELECT ?p (MIN(?date1) as ?minDate)
WHERE
{
?p a abc:P;
abc:hasDate ?date1.
} group by ?p
}
}
and an example of the kind of result I am getting:
|-?p----|-----------------?minDate-------------|-----------------?date2------------- |?diffDate|
|<123>|20012-11-22T00:00:00"^^xsd:dateTime|2008-08-18T00:00:00"^^xsd:dateTime| |
I would expect that ?diffDate would give me an integer value. Am I missing something fundamental about how subqueries work in SPARQL?

It seems you have encountered quite an obscure part of the SPARQL spec, namely how BIND works.
Normally SPARQL is evaluated without regard for the position of atoms, i.e.
SELECT *
WHERE {
?a :p1 ?b .
?b :p2 ?c .}
is the same query as:
SELECT *
WHERE {
?b :p2 ?c .
?a :p1 ?b .}
However, BIND is position dependent, so e.g.:
SELECT *
WHERE {
?a :p1 ?b .
BIND(:john AS ?a)}
is not a valid query, whereas:
SELECT *
WHERE {
BIND(:john AS ?a)
?a :p1 ?b .
}
is entirely valid. The same applies to variables used inside of the BIND, which must be declared before the BIND appears.
See here for more.
To go back to your problem, your BIND is using the ?minDate variable before it has been bound, which is why it fails to produce a value for ?diffDate.
This query should do the trick:
SELECT ?p ?minDate ?date2 ?diffDate
{
?p a abc:P;
abc:hasAnotherDate ?date2.
{
SELECT ?p (MIN(?date1) as ?minDate)
WHERE
{
?p a abc:P;
abc:hasDate ?date1.
} group by ?p
}
BIND((YEAR(?minDate) - YEAR(?date2)) AS ?diffDate) #Put the BIND after all the variables it uses are bound.
}
Alternatively, you could evaluate the difference in the SELECT, like so:
SELECT ?p ?minDate ?date2 (YEAR(?minDate) - YEAR(?date2) AS ?diffDate)
{
?p a abc:P;
abc:hasAnotherDate ?date2.
{
SELECT ?p (MIN(?date1) as ?minDate)
WHERE
{
?p a abc:P;
abc:hasDate ?date1.
} group by ?p
}
}

Related

ُEXISTS and join with distinct values in two SPARQL queries return different results while they should do the same thing?

When running the following two queries on DBpedia the result is different.
First query gives 68 while the second gives 42. The only difference is the line
filter(exists {[] <http://dbpedia.org/ontology/nationality> ?o.})
replaced by join to ensure that the object of dbpo:country is in dbpo:nationality
{select distinct ?o { [] <http://dbpedia.org/ontology/nationality> ?o.}}
First Query:
select count(*){
{select distinct ?s ?o
{ ?o1 <http://dbpedia.org/ontology/successor> ?s .
?o1 <http://dbpedia.org/ontology/governor> ?o2 .
?o2 <http://dbpedia.org/ontology/country> ?o
filter(exists {[] <http://dbpedia.org/ontology/nationality> ?o.})
filter(exists {?s <http://dbpedia.org/ontology/nationality> []})
}}.
}
Second Query:
select count(*){
{select distinct ?s ?o
{ ?o1 <http://dbpedia.org/ontology/successor> ?s .
?o1 <http://dbpedia.org/ontology/governor> ?o2 .
?o2 <http://dbpedia.org/ontology/country> ?o
{select distinct ?o { [] <http://dbpedia.org/ontology/nationality> ?o.}}
filter(exists {?s <http://dbpedia.org/ontology/nationality> []})
}}.
}
The result of the first query seems to be the correct one.
You've got a DISTINCT in the subquery within the second full query, which is causing some results not to be carried through to the final result set.
Note the result of this query, which drops that keyword from the subquery, matches your first, i.e., 68 --
select count(*)
{ { select distinct ?s ?o
{ ?o1 <http://dbpedia.org/ontology/successor> ?s .
?o1 <http://dbpedia.org/ontology/governor> ?o2 .
?o2 <http://dbpedia.org/ontology/country> ?o
{ select ?o { [] <http://dbpedia.org/ontology/nationality> ?o. } }
filter ( exists { ?s <http://dbpedia.org/ontology/nationality> [] } )
} } }
I can't spare the time to investigate which result rows from the first and third queries are not found in the second, but I imagine that if you dig further into the descriptions of all these ?s and ?o, you will be able to find the answer.
A key hint — SPARQL queries are evaluated from inside-out (also described as from bottom-up, but this is confusing because it's not the literal bottom, but the lowest sub-query). That means that select ?o { [] <http://dbpedia.org/ontology/nationality> ?o. } (or select distinct ?o { [] <http://dbpedia.org/ontology/nationality> ?o. }) is evaluated before the rest of the query -- while the filter clauses are evaluated after the main select.

Sparql CONSTRUCT with DISTINCT

PREFIX content: <http://example.com/content#>
construct { ?s content:field ?o}
WHERE { ?s content:field ?o }
90% of all the ?o I get here are the same URI <http://example.com/name>.
I'm trying to find a way to filter out all quads that have the same value for ?o, so in the end I get a list of quads which are unique by its ?o
I tried DISTINCT ?o CONSTRUCT{...} but from what I saw you cant use DISTINCT on a CONSTRUCT.
How would you filter the returned list of quads
I'm trying to find a way to filter out all quads that have the same
value for ?o, so in the end I get a list of quads which are unique by
its ?o
if it does not matter which exact value is bound to ?s, then a sub-select with a group by ?o is the way to go. Use (SAMPLE(?s) as ?subj) e.g. something like:
`
PREFIX content: <http://example.com/content#>
construct { ?s content:field ?o}
WHERE {
{ select ?o (SAMPLE(?subj) as ?s)
{ ?subj content:field ?o }
group by ?o
}
}
`

Number of triples of specific group instances?

I have found another problem in SPARQLing dbpedia. I am trying to get number of triples for specific group of class instances.
Number of triples of class Politician:
SELECT * WHERE {?s ?p ?o FILTER (?s = dbo:Politician OR ?o = dbo:Politician)}
But what about summary number of all triples for a specific group of politicians? For example number of triples of german politician. How is possible to get?
Thank you for your help!
revised answer
This will get the count of entities who are described as being Politicians from Germany —
SELECT COUNT(*)
{ ?s a dbo:Politician .
?s dbo:nationality dbr:Germany .
}
— and this will get the count of all records where those entities who are described as being Politicians from Germany appear as Subject —
SELECT COUNT(*)
{ ?s a dbo:Politician .
?s dbo:nationality dbr:Germany .
?s ?p ?o .
}
It is possible that you're looking for a bit more info, to include all records where the entities who are described as being Politicians from Germany appears as either Subject or Object (not just as Subject) —
SELECT COUNT(*)
{ { ?s a dbo:Politician .
?s dbo:nationality dbr:Germany .
?s ?p ?o .
}
UNION
{ ?o a dbo:Politician .
?o dbo:nationality dbr:Germany .
?s ?p ?o .
}
}
original answer
I think you are currently aiming for this, which counts all triples with dbo:Politician as either Subject or Object (which is currently 41105, without timeout), but note that this query doesn't count "entities which are politicians" which is (I think) what you're really after!
SELECT ( COUNT ( * ) AS ?NumberOfTriples )
WHERE
{ { dbo:Politician ?p ?o }
UNION
{ ?s ?p dbo:Politician }
}
If you want to count the number of "entities which are politicians" (i.e., rdf:type dbo:Politician) (currently 41078), you need a different query, like this --
SELECT ( COUNT ( DISTINCT ?s ) AS ?NumberOfPoliticians )
WHERE
{ ?s rdf:type dbo:Politician }
This should be clarified by a look at the { dbo:Politician ?p ?o } triples --
SELECT *
WHERE
{ dbo:Politician ?p ?o }

Is it possible to Filter Graphs in a way that they at most contain requested Data?

Let me start with an example query to explain my problem:
SELECT ?g ?s ?p ?o WHERE
{
{GRAPH ?g
{ ?s ?p ?o.
OPTIONAL{ ?s
ab:temperature ?temperature.}
FILTER (?temperature = 20)
FILTER NOT EXISTS {?s ab:person ?person}
}
}
}
This query gives me all graphs (in this case representing context data) that have a temperature of 20 but don't have a person associated. My problem is I want to query the graphs for certain optional properties but they shouldn't have any other properties. At the time of the query I only know the OPTIONAL part but I don't know which additional property might be there. Is there an easy way to do this with SPARQL or is that something that would be easier to check after I received the graph and converted it to an object which I can handle with my programm?
If i understand your question correctly, you are searching for graphs that only have that subjects with some properties but not others. In that case i'd run something like this:
SELECT ?g ?s ?p ?o WHERE {
GRAPH ?g {
?s ?p ?o.
FILTER NOT EXISTS {
?s ?bad [] .
FILTER (?bad NOT IN ( ab:temperature, ... ) )
}
}
}

How to bind a variable to a queried item in SPARQL

In this simple sparql query I get a list of subjects whose object is 42
SELECT ?v WHERE { ?v ?p 42 }
If I add ?p as a variable
SELECT ?v ?p WHERE { ?v ?p 42 }
I will get two entities per row, the subject and the predicate.
What if I wanted three entities, so including the 42? Something like:
SELECT ?v ?p ?m WHERE { ?v ?p (42 as m) }
Another variant is to use BIND, e.g.:
SELECT ?v ?p ?m
WHERE {
BIND(42 AS ?m)
?v ?p ?m
}
The BIND statement simply adds a binding for ?m, which can then be selected for the result set.
In SPARQL 1.1, you can use VALUES for this. You would write
SELECT ?v ?p ?m WHERE {
values ?m { 42 }
?v ?p ?m
}
Standard SPARQL 1.0 does not really allow that. There may be some implementation-specific extensions for doing it, though.
As a workaround, if the data contains a triple with 42 as an object literal, you can do it e.g. like this:
SELECT ?v ?p ?m { ?v ?p 42, ?m FILTER(?m=42)}
which is equivalent with
SELECT ?v ?p ?m WHERE { ?v ?p 42 . ?v ?p ?m FILTER(?m=42)}
as you can write graph patterns sharing the same subject and predicate with the comma object list notation, and the WHERE keyword is optional.
For efficiency, you want to use basic graph patterns to reduce the working triple to a smaller set and only then apply FILTER expressions to further prune the results.
You can accomplish in two ways using BINDINGS keyword as well as FILTER
Using BINDINGS
SELECT ?v ?p ?m
WHERE { ?v ?p ?m}
BINDINGS ?m {(42)}
Using FILTER
SELECT ?v ?p ?m
WHERE {
?v ?p ?m
FILTER (?m = 42)
}
select ?v ?p ?m where { ?v ?p ?m . FILTER( ?m = 42 ) }
I know this is round-about, but I believe this is doable with a subquery.
This is a useful pattern to help you work on the query in the narrow, before you let it loose on your entire dataset:
SELECT ?v ?p ?m WHERE {
{ SELECT 42 as ?m WHERE { } }
?v ?p ?m .
}