Update or create numeric counters in sparql (UPSERT) - sparql

I would like to store a large number of integers in an rdf database (Blazegraph). Those values need to be updated (incremented) with new data, or if missing, created. What's the best way to do that in sparql 1.1? If I was wrtiting it in SQL (MySQL/MariaDB), it would be this, assuming the database is empty at first, and the table's unique key is set to "s,p" (subject and predicate):
-- Inserting or adding A=10, B=1
INSERT INTO tbl (s,p,o) VALUES ('A','cnt',10), ('B','cnt',1)
ON DUPLICATE KEY UPDATE o=o+VALUES(o);
Resulting RDF data:
:A :cnt 10 .
:B :cnt 1 .
Next run:
-- Inserting or adding A=3, C=5
INSERT INTO tbl (s,p,o) VALUES ('A','cnt',3), ('C','cnt',5)
ON DUPLICATE KEY UPDATE o=o+VALUES(o);
Resulting RDF data:
:A :cnt 13 .
:B :cnt 1 .
:C :cnt 5 .
So the question is - how would one construct a SPARQL query to do bulk UPSERT based on existing data and increments, and make it efficient.

PREFIX : <http://example.org/>
DELETE { ?letter :cnt ?outdated }
INSERT { ?letter :cnt ?updated }
WHERE {
VALUES (?letter ?increment) { (:A 10) (:B 1) }
OPTIONAL { ?letter :cnt ?outdated }
BIND ((IF(BOUND(?outdated), ?outdated + ?increment, ?increment)) AS ?updated)
}

Related

SPARQL same predicates with different values

I am using SPARQL to query my data from the triplestore.
In the triplestore I have a form containing a table, the table has different table entries. Each entry has a number of predicates. I want to use the predicates index and cost. My query needs to catch the first 2 rows, indices 0 and 1 and save the cost under a specific variable for each.
So I have tried the following (prefix ext is ommited) :
SELECT DISTINCT ?entry ?priceOne ?priceTwo
WHERE {
?form ext:table ?table.
?table ext:entry ?entry.
?entry ext:index 0;
ext:cost ?priceOne.
?entry ext:index 1;
ext:cost ?priceTwo.
}
This however does not show any values, if I remove the second part (with index 1) than I do get ?priceOne. How can I get both values?
Your current query finds each ?entry that has both ext:index values, 0 and 1. You could avoid this by using something like ?entry0 and ?entry1, essentially duplicating your triple patterns.
But typically, you would match alternatives with UNION:
SELECT DISTINCT ?entry ?priceOne ?priceTwo
WHERE {
# a shorter way to specify this, if you don’t need the ?table variable
?form ext:table/ext:entry ?entry .
{
?entry ext:index 0 ;
ext:cost ?priceOne .
}
UNION
{
?entry ext:index 1 ;
ext:cost ?priceTwo .
}
}

transform multiple columns to records in Bigquery

I'm trying to convert a flat table into a nested table in Bigquery.
If I want to take a row, and transform some of the columns into 2 fields:
key.name
key.value
for example, If I'm taking this table:
and I want to convert it to the following structure:
You can define this as an array. I would suggest putting this into a string struct so you have only one array:
select unique_key, cast_number, date,
[struct('block' as key, block as value),
struct('iucr' as key, iucr as value),
struct('primary_type' as key, primary_type as value),
. . .
] as key_values
But for what you specifically ask for:
select unique_key, cast_number, date,
['block', 'iucr', 'primary_type', . . . ] as keys,
[block, iucr, primary_type, . . . ] as values
Note that these assume that the values are all strings. You may have to convert some values if they are not.

Aggregate functions in Sparql query with empty records

I've been trying to run a sparql query against https://landregistry.data.gov.uk/app/qonsole# to yield some sold properties result.
The query is the following:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ukhpi: <http://landregistry.data.gov.uk/def/ukhpi/>
SELECT sum(?ukhpi_salesVolume)
WHERE
{ { SELECT ?ukhpi_refMonth ?item
WHERE
{ ?item ukhpi:refRegion <http://landregistry.data.gov.uk/id/region/haringey> ;
ukhpi:refMonth ?ukhpi_refMonth
FILTER ( ?ukhpi_refMonth >= "2019-03"^^xsd:gYearMonth )
FILTER ( ?ukhpi_refMonth < "2020-03"^^xsd:gYearMonth )
}
}
OPTIONAL
{ ?item ukhpi:salesVolume ?ukhpi_salesVolume }
}
The problem is, the result from this is empty. However, if i run the same query without the SUM on the 4th line, i can see there are 11 integer records.
My thoughts are that there is a 12th, empty record which causes all the issues in the SUM operation, but sparql is not my storngest side so i'm not sure how to filter this (and remove any empty records) if that's really the problem.
I've also noticed that most of the aggregate functions do not work as well(min, max, avg). *Count does, and returns 11
I actually solved this myself, all that was needed was a coalesce which apparently existed in sparql too.
So:
SELECT sum(COALESCE(?ukhpi_salesVolume, 0))
instead of just
SELECT sum(?ukhpi_salesVolume)

How to get UUID() from INSERT Sparql Request?

I would like to know if there is a possibility to retrieve UUID() urn: when using INSERT Statement in SPARQL query ?
My problem is simple but I don't know how to solve it using SPARQL :
I would like to store a lot of timestamp values. Same timestamp can appear multiple times, so I guess I can use UUID() to generate random URI.
I need urn: from UUID() function to relate my new triples.
I'm right ?
Or UUID() is not the solution ?
Thanks for your help.
EDIT :
Ok, so I have to say I would like to retrieve data in my python code.
I am using SPARQLWrapper to run my requests.
If I create one INSERT request like that :
INSERT {
?uuid1 rdf:type capt:ECHANTILLON .
?uuid1 capt:Nom ?uuid1 .
?uuid1 capt:Description '2019-08-07 16:07:34.027636' .
?uuid1 capt:Valeur '1565182189' .
} WHERE {
FILTER NOT EXISTS { ?uuid1 rdf:type capt:ECHANTILLON} .
BIND (UUID() AS ?uuid1) .
};
Then an other INSERT request using ?uuid1 from the first :
INSERT {
?uuid2 rdf:type capt:VALEUR .
?uuid2 capt:Nom ?uuid2 .
?uuid2 capt:Description '2019-08-07 16:07:34.027659' .
?uuid2 capt:Valeur '27.0' .
**?uuid1 capt:A_Pour_Valeur ?uuid2 .** <===
} WHERE {
FILTER NOT EXISTS { ?uuid2 rdf:type capt:VALEUR} .
BIND (UUID() AS ?uuid2) .
};
What I want is :
uuid1 = endpoint.setQuery(request_1)
request_2.addSubject(uuid1)
uuid2 = endpoint.setQuery(request_2)
Something like that.
How can I retrieve ?uuid1 from the first request if INSERT does not return this value ? I would like to make two requests if possible.
Have I to regroup two requests in one request, or have I to run a SELECT ?uuid1 request before running second ?
You can't get the value of the UUID directly from the SPARQL update - if you want to retrieve it via SPARQL somehow, you'll have to do a query after you've inserted it - or, of course, you could adapt your second SPARQL update to do the selection for you by querying for the 'correct' UUID in its WHERE clause.
However, in this case, that looks difficult to do, and I think the easiest solution for you is that you don't create the UUID in the SPARQL operation itself, but instead create it in code and then use that in your query string, e.g. by adding a VALUES clause, something like this:
import uuid
uuid1 = uuid.uuid1()
update = """INSERT { .... }
WHERE { VALUES ?uuid1 { <urn:uuid:""" + uuid1 + """> }
FILTER NOT EXISTS .... (etc) }"""

SPARQL for unique set of values from all columns and rows

I have a query that returns several columns, i.e.:
SELECT ?a ?b ?c
WHERE { ... }
Every column variable is an IRI. Obviously this returns a unique row for every combination of column values (note that values may not be unique to a column):
<urn:id:x:1> <urn:id:a:2> <urn:id:j:3>
<urn:id:x:1> <urn:id:a:2> <urn:id:j:4>
<urn:id:x:1> <urn:id:j:4> <urn:id:k:5>
<urn:id:y:2> <urn:id:j:4> <urn:id:k:6>
...
However, all I need are the unique IRIs spanning all rows and columns. i.e.:
<urn:id:x:1>
<urn:id:a:2>
<urn:id:j:3>
<urn:id:j:4>
<urn:id:k:5>
<urn:id:y:2>
<urn:id:k:6>
...
Is it possible to achieve this using SPARQL, or do I need to post-process the results to merge and de-deduplicate the values? Order is unimportant.
SELECT DISTINCT ?d {
...
VALUES ?i { 1 2 3 }
BIND (if(?i=1, ?a, if(?i=2, ?b, ?c)) AS ?d)
}
What does this do?
The VALUES clause creates three copies of each solution and numbers them with a variable ?i
The BIND clause creates a new variable ?d whose value is ?a, ?b or ?c, depending on whether ?i is 1, 2 or 3 in the given solution
The SELECT DISTINCT ?d returns only ?d and removes duplicates