Specification of how MIN/MAX aggregators deal with unbound optional values - sparql

The aggregators min and max skip unbound values, at least in Virtuoso and Stardog. Could somebody point me to where this is defined in the SPARQL 1.1 specification?
For example given:
insert data {
<http://s1> <http://p1> <http://o1> .
<http://o1> <http://p2> <http://o2> .
<http://x> <http://p1> <http://y> .
}
the query:
select (min(?o2) as ?min) {
?s <http://p1> ?o1 .
optional { ?o1 <http://p2> ?o2 }
}
returns <http://o2>, ignoring the unbound value for ?o2 for ?s = <http://x>.

Related

SPARQL returning empty result when passing MIN(?date1) from subquery into outer query, with BIND((YEAR(?minDate) - YEAR(?date2)) AS ?diffDate)

<This question is now resolved, see comment by Valerio Cocchi>
I am trying to pass a variable from a subquery, that takes the minimum date of a set of dates ?date1 belonging to ?p and passes this to the outer query, which then takes another date ?date2 belonging to ?p (there can be at most 1 ?date2 for every ?p) and subtracts ?minDate from ?date2 to get an integer value for the number of years between. I am getting a blank value for this, i.e. ?diffDate returns no value.
I am using Fuseki version 4.3.2. Here is an example of the query:
SELECT ?p ?minDate ?date2 ?diffDate
{
?p a abc:P;
abc:hasAnotherDate ?date2.
BIND((YEAR(?minDate) - YEAR(?date2)) AS ?diffDate)
{
SELECT ?p (MIN(?date1) as ?minDate)
WHERE
{
?p a abc:P;
abc:hasDate ?date1.
} group by ?p
}
}
and an example of the kind of result I am getting:
|-?p----|-----------------?minDate-------------|-----------------?date2------------- |?diffDate|
|<123>|20012-11-22T00:00:00"^^xsd:dateTime|2008-08-18T00:00:00"^^xsd:dateTime| |
I would expect that ?diffDate would give me an integer value. Am I missing something fundamental about how subqueries work in SPARQL?
It seems you have encountered quite an obscure part of the SPARQL spec, namely how BIND works.
Normally SPARQL is evaluated without regard for the position of atoms, i.e.
SELECT *
WHERE {
?a :p1 ?b .
?b :p2 ?c .}
is the same query as:
SELECT *
WHERE {
?b :p2 ?c .
?a :p1 ?b .}
However, BIND is position dependent, so e.g.:
SELECT *
WHERE {
?a :p1 ?b .
BIND(:john AS ?a)}
is not a valid query, whereas:
SELECT *
WHERE {
BIND(:john AS ?a)
?a :p1 ?b .
}
is entirely valid. The same applies to variables used inside of the BIND, which must be declared before the BIND appears.
See here for more.
To go back to your problem, your BIND is using the ?minDate variable before it has been bound, which is why it fails to produce a value for ?diffDate.
This query should do the trick:
SELECT ?p ?minDate ?date2 ?diffDate
{
?p a abc:P;
abc:hasAnotherDate ?date2.
{
SELECT ?p (MIN(?date1) as ?minDate)
WHERE
{
?p a abc:P;
abc:hasDate ?date1.
} group by ?p
}
BIND((YEAR(?minDate) - YEAR(?date2)) AS ?diffDate) #Put the BIND after all the variables it uses are bound.
}
Alternatively, you could evaluate the difference in the SELECT, like so:
SELECT ?p ?minDate ?date2 (YEAR(?minDate) - YEAR(?date2) AS ?diffDate)
{
?p a abc:P;
abc:hasAnotherDate ?date2.
{
SELECT ?p (MIN(?date1) as ?minDate)
WHERE
{
?p a abc:P;
abc:hasDate ?date1.
} group by ?p
}
}

Is there a good example of how to use SPARQL to replace a substring with another substring across a collection of triples?

I want to edit a set of URIs replacing a substring "iso-693" with "iso-639" using a SPARQL query. I am using REPLACE but it doesn't seem to do anything.
I have a large SKOS taxonomy with URIs that have an incorrect string. They should have this string: "iso-639" but I made a mistake when creating it and put "iso-693". I'd like to correct it. I used the SPARQL query shown below, which when run returns a message "update successful", but none of the triples data actually changes. Where am I going wrong?
INSERT
{
?s ?p ?o2
}
WHERE
{
?s ?p ?o .
FILTER (regex(str(?s), "iso-693") || regex(str(?o), "iso-693"))
BIND(REPLACE(?o, "iso-693", "iso-639", "i") AS ?o2) .
}
I expected all of the occurrences of the substring to change to the desired value, but nothing seems to change at all despite the success message.
You are missing the bit that removes the old value (INSERT just adds a new triples). To replace a triple, you should DELETE the old triple at the same time as you are INSERTing the new one, like this:
DELETE
{
?s ?p ?o
}
INSERT
{
?s ?p ?o2
}
WHERE
{
?s ?p ?o .
FILTER (regex(str(?s), "iso-693") || regex(str(?o), "iso-693"))
BIND(REPLACE(?o, "iso-693", "iso-639", "i") AS ?o2) .
}
If you are targeting URIs then you need to construct new IRIs with the required substitution and use these in the INSERT part of the update along with the original values fro ?s and ?o for DELETE part. The REPLACE will produce Literals which is not correct fro subjects.
Suggest using something along following lines:
DELETE {
?s ?p ?o
}
INSERT {
?newS ?p ?newO
} WHERE {
?s ?p ?o .
bind("iso-693" as ?match) .
bind("iso-639" as ?replacement) .
bind (regex(str(?s), ?match) as ?subjMatch) .
bind (regex(str(?o), ?match) as ?objMatch) .
filter (?subjMatch || ?objMatch)
bind (if(?subjMatch, IRI(replace(str(?s), ?match, ?replacement)), ?s) as ?newS)
bind (if(?objMatch, IRI(replace(str(?o), ?match, ?replacement)), ?o) as ?newO)
}

Return values under same column in SPARQL query

Given three possible objects for triples, foaf:name, foaf:givenName, and foaf:familyName, where statements either have foaf:name or foaf:givenName + foaf:familyName, e.g.:
<uri1> <foaf:name> "Lolly Loozles" .
<uri2> <foaf:givenName> "Stotly" .
<uri2> <foaf:familyName> "Styles" .
wondering how to write a SPARQL query to return a new variable like pretty_name that is either the value of foaf:name or a concatenation of the values from foaf:givenName and foaf:familyName.
Resulting in something like:
?o | ?pretty_name
----------------------
<uri1> | Lolly Loozles
<uri2> | Stotly Styles
This is what I have so far, but unsure how to proceed:
PREFIX : <https://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
# select two variables, not ideal...
SELECT ?foaf_fullName ?pretty_name
WHERE {
# Find all triples
?s ?p ?o .
# Binds
OPTIONAL { ?s foaf:name ?foaf_fullName }
OPTIONAL { ?s foaf:givenName ?givenName }
OPTIONAL { ?s foaf:familyName ?familyName }
# Filter where predicate is part of list
FILTER (?p IN (foaf:name, foaf:givenName, foaf:familyName ) )
# Binds
BIND( CONCAT(?givenName, ' ', ?familyName) AS ?pretty_name ) .
}
I had imagined, and tried, adding another BIND to add to ?pretty_name, but the SPARQL engine wouldn't have it:
BIND( ?foaf_fullName AS ?pretty_name ) .
I also had luck writing a CONSTRUCT statement to get the values I'm looking for, but don't have the ability to write back to this triplestore (for a number of reasons):
CONSTRUCT {
?s :hasPrettyName ?foaf_fullName .
?s :hasPrettyName ?pretty_name .
}
I had thought that CONSTRUCT could accompany SELECT, but must have been mistaken?
Any insight or suggestions would much appreciated.
Using #StanislavKralin comment/suggestion to use COALESCE without IF clauses works great:
PREFIX : <https://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
# select two variables, not ideal...
SELECT ?foaf_fullName ?pretty_name
WHERE {
# Find all triples
?s ?p ?o .
# Binds
OPTIONAL { ?s foaf:name ?foaf_fullName }
OPTIONAL { ?s foaf:givenName ?givenName }
OPTIONAL { ?s foaf:familyName ?familyName }
# Filter where predicate is part of list
FILTER (?p IN (foaf:name, foaf:givenName, foaf:familyName ) )
# Binds
BIND( COALESCE(?foaf_fullName, CONCAT(?givenName, ' ', ?familyName)) AS ?pretty_name )
}

SPARQL multi graph request and sort (Virtuoso 7)

Is it possible to easily make a CONSTRUCT request where I would be able to check data in different graphs AND sort them by "graph preference"?
Let's say I sell products. For each product, I may have different suppliers, so that my setup would look like this:
<http://data.experiment.com/product/1> <http://purl.org/goodrelations/v1#hasCurrencyValue> "10" <http://data.experiment.com/graph/supplier/1> .
<http://data.experiment.com/product/1> <http://purl.org/goodrelations/v1#hasCurrencyValue> "8" <http://data.experiment.com/graph/supplier/2> .
<http://data.experiment.com/product/2> <http://purl.org/goodrelations/v1#hasCurrencyValue> "5" <http://data.experiment.com/graph/supplier/2> .
For each product specification, I want it from <http://data.experiment.com/graph/supplier/1>, then from <http://data.experiment.com/graph/supplier/2> if not found in <http://data.experiment.com/graph/supplier/1>.
This is what I've come up to:
CONSTRUCT
{
<http://data.experiment.com/product/1> ?p ?o .
}
WHERE
{
GRAPH <http://data.experiment.com/graph/supplier/1>
{
OPTIONAL
{
<http://data.experiment.com/product/1> ?p1 ?o1 .
}
}
GRAPH <http://data.experiment.com/graph/supplier/2>
{
OPTIONAL
{
<http://data.experiment.com/product/1> ?p2 ?o2 .
}
}
BIND (IF (BOUND(?p1), ?p1, IF (BOUND(?p2), ?p2, UNDEF)) AS ?p)
BIND (IF (BOUND(?o1), ?o1, IF (BOUND(?o2), ?o2, UNDEF)) AS ?o)
}
It does work pretty nice if I know what I'm looking for. Now if I consider:
CONSTRUCT
{
<http://data.experiment.com/product/1> ?p ?o . ?o ?cp ?co
}
WHERE
{
GRAPH <http://data.experiment.com/graph/supplier/1>
{
OPTIONAL
{
<http://data.experiment.com/product/1> ?p1 ?o1 .
OPTIONAL { ?o1 ?cp1 ?co1 . }
}
}
GRAPH <http://data.experiment.com/graph/supplier/2>
{
OPTIONAL
{
<http://data.experiment.com/product/1> ?p2 ?o2 .
OPTIONAL { ?o2 ?cp2 ?co2 . }
}
}
BIND (IF (BOUND(?p1), ?p1,IF (BOUND(?p2), ?p2, UNDEF)) AS ?p)
BIND (IF (BOUND(?o1), ?o1,IF (BOUND(?o2), ?o2, UNDEF)) AS ?o)
BIND (IF (BOUND(?cp1), ?cp1,IF (BOUND(?cp2), ?cp2, UNDEF)) AS ?cp)
BIND (IF (BOUND(?co1), ?co1,IF (BOUND(?co2), ?co2, UNDEF)) AS ?co)
}
Sometimes it doesn't work because I explicitly BIND ?o, and ?o may not be an Object —
Virtuoso RDF01 Error Bad variable value in CONSTRUCT: "1532610063"
(tag 189 box flags 0) is not a valid subject, only object of a triple
can be a literal
I don't seem to find anyone trying to sort data by "graphs" and I'm struggling trying to find an "easy" way to do it.
I've tried with SELECT and FROM NAMED, but you've still to manually select data from the graph you want.
If anyone can help, it is more than welcome.
Thank you.
Update from my previous post. Each suppliers for a given Book are stored in a "default" graph.
# Named graph : http://data.books.com/default
#prefix book: <http://data.books.com/resource/Book/>
#prefix ns: <http://data.books.com/ns#>
book:8780953608758 ns:hasSupplier <http://data.books.com/supplier/Alpha> .
book:8780953608758 ns:hasSupplier <http://data.books.com/supplier/Beta> .
# Named graph : http://data.books.com/supplier/Alpha
#prefix book: <http://data.books.com/resource/Book/>
#prefix price: <http://data.books.com/resource/Price/>
#prefix gr: <http://purl.org/goodrelations/v1#>
#prefix dc: <http://purl.org/dc/terms/>
book:8780953608758 gr:hasPriceSpecification price:8780953608758_FR_EUR .
price:8780953608758_FR_EUR gr:hasCurrencyValue "10" .
book:8780953608758 dc:available 1447632000 .
# Named graph : http://data.books.com/supplier/Beta
#prefix book: <http://data.books.com/resource/Book/>
#prefix price: <http://data.books.com/resource/Price/>
#prefix gr: <http://purl.org/goodrelations/v1#>
#prefix dc: <http://purl.org/dc/terms/>
book:8780953608758 gr:hasPriceSpecification price:8780953608758_FR_USD .
price:8780953608758_FR_USD gr:hasCurrencyValue "8" .
book:8780953608758 dc:available 1547632000 .
The first subquery in the query bellow use the graph http://data.books.com/default to find and sort all our suppliers graphs for the book 8780953608758. It then matches another pattern against that graph.
PREFIX book: <http://data.bookeen.com/resource/Book/>
CONSTRUCT
{
book:8780953608758 ?p ?o . ?o ?cp ?co .
}
WHERE
{
{
SELECT ?supplier
FROM <http://data.books.com/default>
WHERE
{
VALUES (?supplier ?priority)
{
(<http://data.books.com/supplier/Beta> 1)
(<http://data.books.com/supplier/Alpha> 2)
}
book:8780953608758 <http://data.books.com/ns/hasSupplier> ?supplier.
}
ORDER BY ?priority
LIMIT 1
}
GRAPH ?supplier
{
book:8780953608758 ?p ?o .
OPTIONAL { ?o ?cp ?co . }
}
}

Is there a way to use a variable without returning it with SPARQL "select *"?

Is there some way to use a kind of placeholder variable with SPARQL without returning it when using SELECT * ?
For example:
SELECT * WHERE {
?s dcterms:title ?title;
foaf:person ?name.
?s2 :inProject ?s.
}
Where I would not want to return the ?s variable, just the ?title, ?name, and ?s2 variables while leaving the SELECT *.
I understand I can limit the select results by using SELECT ?title ?name ..., but I'm just curious if there is some kind of notation for placeholder variables or some way to manage this.
EDIT:
I understand you can use blank nodes to accomplish this in some cases, for example:
SELECT * WHERE {
_:s dcterms:title ?title;
foaf:person ?name.
?s2 :inProject _:s.
}
However, doesn't this cause problems since blank nodes can't be used across basic graph patterns? For example, this breaks in the case:
SELECT * WHERE {
_:s dcterms:title ?title;
foaf:person ?name.
OPTIONAL { ?s2 :inProject _:s. }
}
Thanks!
For variables scoped to a single pattern yes, use the blank node variable syntax as you demonstrated in your question
In the general case no, if you use a variable then SELECT * will return it.
One possible workaround is to use sub-queries where you do list variables and leave out the variables you don't want e.g.
SELECT * WHERE
{
{
SELECT ?title ?name ?s2 WHERE
{
?s dcterms:title ?title;
foaf:person ?name.
OPTIONAL{ ?s2 :inProject ?s. }
}
}
}
But then I assume this is exactly what you are trying to avoid since you end up listing variables anyway.