Increment counter or insert row in one statement, in SQLite - sql

In SQLite, given this database schema
CREATE TABLE observations (
src TEXT,
dest TEXT,
verb TEXT,
occurrences INTEGER
);
CREATE UNIQUE INDEX observations_index
ON observations (src, dest, verb);
whenever a new observation tuple (:src, :dest, :verb) comes in, I want to either increment the "occurrences" column for the existing row for that tuple, or add a new row with occurrences=1 if there isn't already one. In concrete pseudocode:
if (SELECT COUNT(*) FROM observations
WHERE src == :src AND dest == :dest AND verb == :verb) == 1:
UPDATE observations SET occurrences = occurrences + 1
WHERE src == :src AND dest == :dest AND verb == :verb
else:
INSERT INTO observations VALUES (:src, :dest, :verb, 1)
I'm wondering if it's possible to do this entire operation in one SQLite statement. That would simplify the application logic (which is required to be fully asynchronous wrt database operations) and also avoid a double index lookup with exactly the same key. INSERT OR REPLACE doesn't appear to be what I want, and alas there is no UPDATE OR INSERT.

I got this answer from Igor Tandetnik on sqlite-users:
INSERT OR REPLACE INTO observations
VALUES (:src, :dest, :verb,
COALESCE(
(SELECT occurrences FROM observations
WHERE src=:src AND dest=:dest AND verb=:verb),
0) + 1);
It's slightly but consistently faster than dan04's approach.

Don't know of a way to do it in one statement, but you could try
BEGIN;
INSERT OR IGNORE INTO observations VALUES (:src, :dest, :verb, 0);
UPDATE observeraions SET occurrences = occurrences + 1 WHERE
src = :src AND dest = :dest AND verb = :verb;
COMMIT;

Related

Postgres: on conflict, summing two vectrors(arrays)

I'm trying to handle an array of counters column in Postgres
for example, let's say I have this table
name
counters
Joe
[1,3,1,0]
and now I'm adding 2 values ("Ben", [1,3,1,0]) and ("Joe",[2,0,2,1])
I expect the query to sum between the 2 counters vectors on conflict ([1,3,1,0] + [2,0,2,1] = [3,3,3,1])
the expected result:
name
counters
Joe
[3,3,3,1]
Ben
[1,3,1,0]
I tried this query
insert into test (name, counters)
values ("Joe",[2,0,2,1])
on conflict (name)
do update set
counters = array_agg(unnest(test.counters) + unnest([2,0,2,1]))
but it didn't seem to work, what am I missing?
There are two problems with the expression:
array_agg(unnest(test.counters) + unnest([2,0,2,1]))
there is no + operator for arrays,
you cannot use set-valued expressions as an argument in an aggregate function.
You need to unnest both arrays in a single unnest() call placed in the from clause:
insert into test (name, counters)
values ('Joe', array[2,0,2,1])
on conflict (name) do
update set
counters = (
select array_agg(e1 + e2)
from unnest(test.counters, excluded.counters) as u(e1, e2)
)
Also pay attention to the correct data syntax in values and the use of a special record excluded (find the relevant information in the documentation.)
Test it in db<>fiddle.
Based on your reply to my comments that it will always be four elements in the array and the update is being done by a program of some type, I would suggest something like this:
insert into test (name, counters)
values (:NAME, :COUNTERS)
on conflict (name) do
update set
counters[1] = counters[1] + :COUNTERS[1],
counters[2] = counters[2] + :COUNTERS[2],
counters[3] = counters[3] + :COUNTERS[3],
counters[4] = counters[4] + :COUNTERS[4]

How to write Azure storage table queries for non-existent columns

We have a storage table where we want to add a new integer column (It is in fact an enum of 3 values converted to int). We want a row to be required when:
It is an older row and the column does not exist
It is a new row and the column exists and does not match a particular value
When I just use a not equal operator on the column the old rows do not get returned. How can this be handled?
Update
Assuming a comparison always returns false for the non-existent column I tried somethinglike below (the value of the property will be always > 0 when it exists), which does not work either:
If the (Prop GreaterThanOrEqual -1) condition returns false I assume the value is null.
If not then, the actual comparison happens.
string propNullCondition = TableQuery.GenerateFilterConditionForInt(
"Prop",
QueryComparisons.GreaterThanOrEqual,
-1);
propNullCondition = $"{TableOperators.Not}({propNullCondition})";
string propNotEqualValueCondition = TableQuery.CombineFilters(
propNullCondition,
TableOperators.Or,
TableQuery.GenerateFilterConditionForInt(
"Prop",
QueryComparisons.NotEqual,
XXXX));
Note: The table rows written so far do not have "Prop" and only new rows will have this column. And expectation is the query should return all old rows and the new ones only when Prop != XXXX.
It seems that your code is correct, maybe there is a minor error there. You can follow my code below, which works fine as per my test:
Note: in the filter, the column name is case-sensitive.
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("test1");
string propNullCondition = TableQuery.GenerateFilterConditionForInt(
"prop1", //note the column name shoud be case-sensitive here.
QueryComparisons.GreaterThanOrEqual,
-1);
propNullCondition = $"{TableOperators.Not}({propNullCondition})";
TableQuery<DynamicTableEntity> propNotEqualValueCondition = new TableQuery<DynamicTableEntity>()
.Where(
TableQuery.CombineFilters(
propNullCondition,
TableOperators.Or,
TableQuery.GenerateFilterConditionForInt(
"prop1",//note the column name shoud be case-sensitive here.
QueryComparisons.NotEqual,
2)));
var query = table.ExecuteQuery(propNotEqualValueCondition);
foreach (var q in query)
{
Console.WriteLine(q.PartitionKey);
}
The test result:
Here is my table in azure:

Can Slick's insertOrUpdate modify a subset of columns in the event the record already exists?

In my use case I have a createdDate field that I would like to preserve in the event that the record already exists.
case class Record(id:Long, value:String, createdDate:DateTime, updateDate:DateTime)
Is it possible to use a TableQuery.insertOrUpdate(record) such that only parts of the record are updated in the event the record already exists?
In my case I'd want only the value and updateDate fields to change. Using plain SQL in a stored procedure I'd do something like:
merge Record r
using (
select #id,
#value
) as source (
id,
value
)
on r.id = source.id
when matched then
update set value = source.value, updateDate = getDate()
when not matched then
insert (id, value, createdDate, updatedDate) values
(id, value, getDate(), getDate()
Can Slick's insertOrUpdate modify a subset of columns?
No, I don't believe this is possible with the insertOrUpdate function. This has been requested as a feature but it is not currently implemented.
How can we work around this?
Since the update function does support updating a specific list of columns, we can write our own upsert logic instead of using the insertOrUpdate function. It might work like this:
def insertOrUpdate(record: Record): Future[Int] = {
val insertOrUpdateAction = for {
recordOpt <- records.filter(_.id === record.id).result.headOption
updateAction = recordOpt.map(_ => updateRecord(record))
action <- updateAction.getOrElse(insertRecord(record))
} yield action
connection.run(insertOrUpdateAction)
}
private def updateRecord(record: Record) = {
val query = for {
r <- records.filter(_.id === record.id)
} yield (r.value, r.updatedDate) // list of columns which can be updated
query.update(record.value, record.updatedDate)
}
private def insertRecord(record: Record) = records += record

sqlite3 UPDATE generating nulls

I'm trying to transition from MySQL to SQLIte3 and running into an update problem. I'm using SQLite 3.6.20 on redhat.
My first line of code behaves normally
update atv_covar set noncomp= 2;
All values for noncomp (in the rightmost column) are appropriately set to 2.
select * from atv_covar;
A5202|S182|2
A5202|S183|2
A5202|S184|2
It is the second line of code that gives me problems:
update atv_covar
set noncomp= (select 1 from f4003 where
atv_covar.study = f4003.study and
atv_covar.rpid = f4003.rpid and
(rsoffrx="81" or rsoffrx="77"));
It runs without generating errors and appropriately sets atv_covar.noncomp to 1 where it matches the SELECT statement. The problem is that it changes atv_covar.noncomp for the non-matching rows to null, where I want it to keep them as 2.
select * from atv_covar;
A5202|S182|
A5202|S183|1
A5202|S184|
Any help would be welcome.
#Dan, the problem with your query is not specific to SQLite; you are updating all rows of atv_covar, but not all of them have correspondence in f4003, so these default to NULL. You should filter the update or provide a default value.
The following statement sets 1 only to the rows that macth the filtering condition:
UPDATE atv_covar
SET noncomp = 1
WHERE EXISTS (
SELECT 'x'
FROM f4003
WHERE atv_covar.study = f4003.study
AND atv_covar.rpid = f4003.rpid
AND (rsoffrx="81" or rsoffrx="77")
);
The following statement sets 1 or 2 for all rows of noncomp, depending on the filtering match (use this instead of two updates):
UPDATE atv_covar
SET noncomp = COALESCE((
SELECT 1
FROM f4003
WHERE atv_covar.study = f4003.study
AND atv_covar.rpid = f4003.rpid
AND (rsoffrx="81" or rsoffrx="77")
), 2);

Creating a new table from grouped substring of existing table

I am having some trouble creating some SQL (for SQL server 2008).
I have a table of tasks that are priority ordered, comma delimited tasks:
Id = 1, LongTaskName = "a,b,c"
Id = 2, LongTaskName = "a,c"
Id = 3, LongTaskName = "b,c"
Id = 4, LongTaskName = "a"
etc...
I am trying to build a new table that groups them by the first task, along with the id:
GroupName: "a", TaskId: 1
GroupName: "a", TaskId: 2
GroupName: "a", TaskId: 4
GroupName: "b", TaskId: 3
Here is the naive, slow, linq code:
foreach(var t in Tasks)
{
var gt = new GroupedTasks();
gt.TaskId = t.Id;
var firstWord = t.LongTaskName.Split(',');
if(firstWord.Count() > 0)
{
gt.GroupName = firstWord.First();
}
else
{
gt.GroupName = t.LongTaskName;
}
GroupedTasks.InsertOnSubmit(gt);
}
I wrote a sql function to do the string split:
create function fn_Split(
#String nvarchar (4000),
#Delimiter nvarchar (10)
)
returns nvarchar(4000)
begin
declare #FirstComma int
set #FirstComma = charindex(#Delimiter,#String)
if(#FirstComma = 0)
return #String
return substring(#String, 0, #FirstComma)
end
go
However, I am getting stuck on the real sql to do the work.
I can get the group by alone:
SELECT dbo.fn_Split(LongTaskName, ',')
FROM [dbo].[Tasks]
GROUP BY dbo.fn_Split(LongTaskName, ',')
And I know I need to head down something like this:
DECLARE #RowSet TABLE (GroupName nvarchar(1024), Id nvarchar(5))
insert into #RowSet
select ???
FROM [dbo].Tasks as T
INNER JOIN
(
SELECT dbo.fn_Split(LongTaskName, ',')
FROM [dbo].[Tasks]
GROUP BY dbo.fn_Split(LongTaskName, ',')
) G
ON T.??? = G.???
ORDER BY ???
INSERT INTO dbo.GroupedTasks(GroupName, Id)
select * from #RowSet
But I am not quite groking how to reference the grouped relationships and am confused about having to call split multiple times.
Any thoughts?
If you only care about the first item in the list, there's no need really for a function. I would recommend this way. You also don't need the #RowSet table variable for any temporary holding.
INSERT dbo.GroupedTasks(GroupName, Id)
SELECT
LEFT(LongTaskName, COALESCE(NULLIF(CHARINDEX(',', LongTaskName)-1, -1), 1024)),
Id
FROM dbo.Tasks;
It is even easier if the tasks are 1-character long, you can use LEFT(LongTaskName, 1) instead of the ugly SUBSTRING/CHARINDEX mess. But I'm guessing your task names are not one character long (if this is the case, you should include some data that varies a bit so that others don't make assumptions about length).
Now, keep in mind that you'll have to do something like this to keep dbo.GroupedTasks up to date every time a dbo.Tasks row is inserted, updated or deleted. How are you going to keep these two tables in sync?
More to the point, you should consider storing the top priority task separately in the first place, either by using a computed column or separating it out before the insert. Munging data together is something that you do with hash tables and arrays in application code, but it rarely has any positive attributes inside a database. You almost always spend more time and effort extracting the data apart than you ever saved by keeping it together in the first place. This will negate the need for a second table at all.
Select Id, Split( ',', LongTaskName ) as GroupName into TasksWithGroupInfo
Does this answer your question?