Prevent consecutive duplicate values without a trigger - sql

Within a group, I'd like to prevent INSERTs of consecutive duplicate values, where "consecutive" is defined by a simple ORDER BY clause.
Imagine a set of experiments which is regularly sampling values from a sensor. We only want to insert a value if it is new for that experiment.
Note that older values are allowed to be duplicates. So this is allowed:
id experiment value
1 A 10
2 A 20
3 A 10
but this is not:
id experiment value
1 A 10
2 A 10
I know how to find the previous value per experiment:
SELECT
*,
lag(sample_value) OVER experiment_and_id
FROM new_samples
WINDOW experiment_and_id AS (
PARTITION BY experiment
ORDER BY id
);
From the docs I know that CHECK constraints are not allowed to use other rows in their checking:
PostgreSQL does not support CHECK constraints that reference table data other than the new or updated row being checked. While a CHECK constraint that violates this rule may appear to work in simple tests, it cannot guarantee that the database will not reach a state in which the constraint condition is false (due to subsequent changes of the other row(s) involved). This would cause a database dump and reload to fail. The reload could fail even when the complete database state is consistent with the constraint, due to rows not being loaded in an order that will satisfy the constraint. If possible, use UNIQUE, EXCLUDE, or FOREIGN KEY constraints to express cross-row and cross-table restrictions.
If what you desire is a one-time check against other rows at row insertion, rather than a continuously-maintained consistency guarantee, a custom trigger can be used to implement that. (This approach avoids the dump/reload problem because pg_dump does not reinstall triggers until after reloading data, so that the check will not be enforced during a dump/reload.)
The EXCLUDE constraint looks promising, but is primarily for cases where the test is not equality. And I'm not sure if I can include window functions in there.
So I'm left with a custom trigger but this seems like a bit of a hack for what seems like a fairly common use case.
Can anyone improve on using a trigger?
Ideally, I'd like to be able to just say:
INSERT ....
ON CONFLICT DO NOTHING
and have Postgres deal with the rest!
Minimum working example
BEGIN;
CREATE TABLE new_samples (
id INT GENERATED ALWAYS AS IDENTITY,
experiment VARCHAR,
sample_value INT
);
INSERT INTO new_samples(experiment, sample_value)
VALUES
('A', 1),
-- This is fine because they are for different groups
('B', 1),
-- This is fine because the value has changed
('A', 2),
-- This is fine because it's different to the previous value in
-- experiment A.
('A', 1),
-- Two is not allowed here because it's the same as the value
-- before it, within this experiment.
('A', 1);
SELECT
*,
lag(sample_value) OVER experiment_and_id
FROM new_samples
WINDOW experiment_and_id AS (
PARTITION BY experiment
ORDER BY id
);
ROLLBACK;

If the samples will not change, then the restriction cited in the docs will not be relevant to your use case.
You can create a function to accomplish this:
create or replace function check_new_sample(_experiment text, _sample_value int)
returns boolean as
$$
select _sample_value != first_value(sample_value)
over (partition by experiment
order by id desc)
from new_samples
where experiment = _experiment;
$$ language sql;
alter table new_samples add constraint new_samples_ck_repeat
check (check_new_sample(experiment, sample_value));
Example inserts:
insert into new_samples (experiment, sample_value) values ('A', 1);
INSERT 0 1
insert into new_samples (experiment, sample_value) values ('B', 1);
INSERT 0 1
insert into new_samples (experiment, sample_value) values ('A', 2);
INSERT 0 1
insert into new_samples (experiment, sample_value) values ('A', 1);
INSERT 0 1
insert into new_samples (experiment, sample_value) values ('A', 1);
ERROR: new row for relation "new_samples" violates check constraint "new_samples_ck_repeat"
DETAIL: Failing row contains (5, A, 1).

Related

How to do a Cross column unique constraint in SQL (Oracle)

How to have a Unique Constraint in Oracle-DB with two columns so that a duplicate must not occur in one or the other.
Assume this table
|id | A | B |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 3 | 4 |
I that a new row is not allowed to have in column "A" a value that duplicate a value from column "A" or "B".
In the example above: I am allowed to add 5 to column "A" but not 1, 2, 3, or 4.
My idea was to do something like:
CREATE UNIQUE INDEX crossTest ON test (
SELECT t.A AS x FROM test t
UNION ALL
SELECT t.B AS x FROM test t
)
but it does not work because Oracle does not accept this syntax.
The two classic approaches:
have two unique constraints CREATE UNIQUE INDEX uidxA ON test A and CREATE UNIQUE INDEX uidxB ON test B does not work because then I could add 2 and 4 to column "A"
have a unique constraint of two columns CREATE UNIQUE INDEX uidxB ON test (A, B) because this check only existing pairs.
(Bonus question: it should be allowed that "A" and "B" of the same row can be equals)
SQL scripts for the example
CREATE TABLE test (id NUMBER (10) NOT NULL, a VARCHAR2(12), b VARCHAR2(12));
INSERT INTO test (id,a,b) VALUES(1, '1', '2');
INSERT INTO test (id,a,b) VALUES(2, '3', '4');
INSERT INTO test (id,a,b) VALUES(3, '4', 'x'); -> should fail
INSERT INTO test (id,a,b) VALUES(3, '5', 'x'); -> should work
#Tejash's answer gave me an idea to avoid locking or serialization. You can create an auxiliary table duet_index to produce the extended data set with all rows. Then a simple trigger will do the trick, including your bonus question.
For example:
create table duet_index (
n number,
constraint unique uq1 (n)
);
And then the trigger:
create or replace trigger test_trg
before insert on test
for each row
begin
insert into duet_index (n) values (:new.a);
if (:new.a <> :new.b) then
insert into duet_index (n) values (:new.b);
end if;
end;
Please consider I'm not proficient at writing Oracle triggers. The syntax can be wrong, but the idea should fly.
I've been working with Oracle for decades now and I don't recall having such a requirement. It makes me nervous about your data model.
What you want to do cannot be done with a single index. Trigger-based approaches are going to have trouble working correctly in all multi-user cases. A materialized-view approach seems promising.
My suggestion is to create a materialized view that refreshes on commit and that contains a concatenation (UNION ALL) of the column A and column B values.
Here is what I mean (see comments in code for more details):
create table test1 ( id number not null primary key, a number, b number );
insert into test1 values ( 1, 1, 2);
insert into test1 values ( 2, 3, 4);
commit;
-- Create a snapshot to allow us to create a REFRESH FAST ON COMMIT snapshot...
create snapshot log on test1 with primary key, rowid;
-- And create that snapshot... this will be updated when any changes to TEST1 are committed
create materialized view test1_concat
refresh fast on commit
as
select t1.rowid row_id, 1 as marker, t1.a concatenation from test1 t1
union all
select t2.rowid row_id, 2 as marker, t2.b concatenation from test1 t2
-- this next bit allows a = b in single rows (i.e., bonus question)
where t2.a != t2.b;
-- Now, enforce the constraint on our snapshot to prevent cross-column duplicates
create unique index test1_concat_u1 on test1_concat ( concatenation );
-- Test #1 -- column a may equal column b without error (bonus!)
insert into test1 values ( 3, 5, 5);
commit;
-- Test #2 uniqueness enforced
insert into test1 values ( 4, 6, 1);
-- (no error at this point)
commit;
> ORA-12008: error in materialized view refresh path ORA-00001: unique
> constraint (APPS.TEST1_CONCAT_U1) violated
Drawbacks
There is a scalability issue here. Oracle will synchronize on the commit. Every working solution to your problem will have this drawback, I believe
You do not get an error until the transaction tries to commit, at which point it is impossible to correct and recover the transaction. I believe you cannot solve this drawback in any solution without making drawback #1 much worse (i.e., without much more extensive and longer-lasting locks on your table).
I suggest fixing our data model, so the values are in rows rather than columns:
CREATE TABLE test (
id NUMBER (10) NOT NULL,
type varchar2(1) check (type in ('A', 'B'),
value varchar2(12),
unique (value),
unique (id, type)
);
The unique constraint is then easy.
Not possible using INDEX or CONSTRAINT. You need a trigger, something like this:
CREATE OR REPLACE TRIGGER TEST_TRG
BEFORE INSERT ON TEST
FOR EACH ROW
DECLARE
CNT NUMBER := 0;
BEGIN
SELECT COUNT(1) INTO CNT from TEST
WHERE A = :NEW.A OR B = :NEW.A OR A = :NEW.B OR B = :NEW.B;
IF CNT > 0 THEN
raise_application_error(-20111,'This value is not allowed');
END IF;
END;

INSERT OR REPLACE multiple rows, but there is no unique or primary keys

Hi I'm running into the following problem on SQlite3
I have a simple table
CREATE TABLE TestTable (id INT, cnt INT);
There are some rows already in the table.
I have some data I want to be inserted into the table: {(id0, cnt0), (id1, cnt1)...}
I want to insert data into the table, on id conflict, update TestTable.cnt = TestTable.cnt + value.cnt
(values.cnt is cnt0, cnt1 ... basically my data to be inserted)
*** But the problem is, there is no primary or unique constraint on id, and I am not allowed to change it!
What I currently have :
In my program I loop through all the values
UPDATE TestTABLE SET count = count + value.cnt WHERE id = value.id;
if (sqlite3_changes() == 0)
INSERT INTO MyTable (id, cnt) values (value.id, value.cnt);
But the problem is, with a very large dataset, doing 2 queries for each data entry takes too long. I'm trying to bundle multiple entries together into one call.
Please let me know if you have questions about my description, thank you for helping!
If you are able to create temporary tables, then do the following. Although I don't show it here, I suggest wrapping all this in a transaction. This technique will likely increase efficiency even if you are also able to add a temporary unique index. (In that case you could use an UPSERT with source data in the temporary table.)
CREATE TEMP TABLE data(id INT, cnt INT);
Now insert the new data into the temporary table, whether by using the host-language data libraries or crafting an insert statement similar to
INSERT INTO data (id, cnt)
VALUES (1, 100),
(2, 200),
(5, 400),
(7, 500);
Now update all existing rows using the single UPDATE statement. SQLite does not have a convenient syntax for joining tables and/or providing a source query for an UPDATE statement. However, one can use nested statement to provide similar convenience:
UPDATE TestTable AS tt
SET cnt = cnt + ifnull((SELECT cnt FROM data WHERE data.id == tt.id), 0)
WHERE tt.id IN (SELECT id FROM data);
Note that the two nested queries are independent of each other. In fact, one could eliminate the WHERE clause altogether and get the same results for this simple case. The WHERE clause is simply to make it more efficient, only attempting to update matching id's. The other subquery in the SET clause also specifies a match on id, but alone it would still allow updates of rows that don't have a match, defaulting to a null value and being converted to 0 (by isnull() function) for a no-op. By the way, without the isnull() function, the sum would result in null and would overwrite non-null values.
Finally, insert only rows with non-existing id values:
INSERT INTO TestTable (id, cnt)
SELECT data.id, data.cnt
FROM data LEFT JOIN TestTable
ON data.id == TestTable.id
WHERE TestTable.id IS NULL;

Return rows from INSERT with ON CONFLICT without needing to update

I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return.
For example my table might be:
CREATE TABLE names(
id SERIAL PRIMARY KEY,
name TEXT,
CONSTRAINT names_name_key UNIQUE (name)
);
And it contains:
id | name
1 | bob
2 | alice
Then I'd like to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT DO NOTHING RETURNING id;
Or perhaps:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT (name) DO NOTHING RETURNING id
and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;
which seems kind of cumbersome. I guess my questions are:
What is the reasoning for not allowing the (my) desired behaviour?
Is there a more elegant way to do this?
It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
This way you do not actually write a new row version without need.
I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.
However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.
Proper solution for single-row UPSERT:
Is SELECT or INSERT in a function prone to race conditions?
General solutions for bulk UPSERT:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Without concurrent write load
If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

Create Unique Index One/Only/Single NULL

It seems that in SQL Server, unique indexes treat NULLs as 'just another value' instead of like in the rest of SQL where comparisons against NULL return NULL.
Say you've got a table (t) with a unique index on a nullable column K:
K V
0 32
1 12
3 45
All good.
But it will also allow
K V
0 32
1 12
3 45
NULL 89 <-- Baaad
And vice versa, it will also allow the following:
K V
NULL 89
0 32 <-- not good
I can see this is could be a potential disaster as I'm using NULLs key values to represent values where no further break down is possible - having a total and a breakdown leads to double counting or inconsistency.
I can find seemingly thousands of questions where where people want to do the opposite (allow multiple NULLs), but none that want to treat NULLs as NULLs.
How can I get SQL Server to treat NULLs as NULLs (and only allow one NULL or any number of unique values in a column) in a unique index?
If Andomar's interpretation of what you want is correct, it may be doable if you have a table that already contains all possible K values:
create table dbo.T (
K int null,
V int not null,
)
go
create table dbo.PossibleKs (
K int not null
)
insert into dbo.PossibleKs (K) values (0),(1),(2)
go
create view dbo.TV
with schemabinding
as
select pk.K
from
dbo.T t
inner join
dbo.PossibleKs pk
on
t.K = pk.K or
t.K is null
GO
create unique clustered index IX_TV on dbo.TV (K)
And your test cases:
insert into dbo.T(K,V) values
(0, 32),
(1, 12),
(3, 45)
go
insert into dbo.T(K,V) values
(NULL,89)
--Msg 2601, Level 14, State 1, Line 1
--Cannot insert duplicate key row in object 'dbo.TV' with unique index 'IX_TV'. The duplicate key value is (0).
--The statement has been terminated.
go
delete from dbo.T
go
insert into dbo.T(K,V) values
(NULL,89)
go
insert into dbo.T(K,V) values
(0, 32)
--Msg 2601, Level 14, State 1, Line 1
--Cannot insert duplicate key row in object 'dbo.TV' with unique index 'IX_TV'. The duplicate key value is (0).
--The statement has been terminated.
So you want either one null or any amount of unique numbers. I don't think that can reliably be enforced using constraints.
You could possibly use a trigger. The trigger will have to answer questions like: are you updating a row to null? Is there already a row that is null? Are you updating a row that already was null? That trigger will be complex and hard to maintain.
You could manipulate the table using stored procedures. The stored procedures could do the update/insert/delete operations in a transaction. Before committing, they could check if the table consists of one null or any number of other values. You could reasonably maintain that.
At the end of the day, your design imposes unusual constraints that are hard to implement. Perhaps you could revisit the design.

MERGE INTO table containing AUTO_INCREMENT columns

I've declared the following table for use by audit triggers:
CREATE TABLE audit_transaction_ids (id IDENTITY PRIMARY KEY, uuid VARCHAR UNIQUE NOT NULL, `time` TIMESTAMP NOT NULL);
The trigger will get invoked multiple times in the same transaction.
The first time the trigger is invoked, I want it to insert a new
row with the current TRANSACTION_ID() and time.
The subsequent times the trigger is invoked, I want it to return
the existing "id" (I invoke Statement.getGeneratedKeys() to that end)
without altering "uuid" or "time".
The current schema seems to have two problems.
When I invoke MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES(TRANSACTION_ID(), NOW()) I get: org.h2.jdbc.JdbcSQLException: Column "ID" contains null values; SQL
statement: MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES
(TRANSACTION_ID(), NOW()) [90081-155]
I suspect that invoking MERGE on an existing row will alter "time".
How do I fix both these problems?
MERGE is analogous to java.util.Map.put(key, value): it will insert the row if it doesn't exist, and update the row if it does. That being said, you can still merge into a table containing AUTO_INCREMENT columns so long as you use another column as the key.
Given customer[id identity, email varchar(30), count int] you could merge into customer(id, email, count) key(email) values((select max(id) from customer c2 where c2.email='test#acme.com'), 'test#acme.com', 10). Meaning, re-use the id if a record exists, use null otherwise.
See also https://stackoverflow.com/a/18819879/14731 for a portable way to insert-or-update depending on whether a row already exists.
1. MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES(TRANSACTION_ID(), NOW())
If you just want to insert a new row, use:
INSERT INTO audit_transaction_ids (uuid, time) VALUES(TRANSACTION_ID(), NOW())
MERGE without setting the value for the column ID doesn't make sense if ID is used as the key, because that way it could never (even in theory) update an existing rows. What you could do is using another key column (in the case above there is no column that could be used). See the documentation for MERGE for details.
2. Invoking MERGE on an existing row will alter "time"
I'm not sure if you talk about the fact that the value of the column 'time' is altered. This is the expected behavior if you use MERGE ... VALUES(.., NOW()), because the MERGE statement is supposed to update that column.
Or maybe you mean that older versions of H2 returned different values within the same transaction (unlike most other databases, which return the same value within the same transaction). This is true, however with H2 version 1.3.155 (2011-05-27) and later, this incompatibility is fixed. See also the change log: "CURRENT_TIMESTAMP() and so on now return the same value within a transaction." It looks like this is not the problem in your case, because you do seem to use version 1.3.155 (the error message [90081-155] includes the build / version number).
Short Answer:
MERGE INTO AUDIT_TRANSACTION_IDS (uuid, time) KEY (uuid, time)
VALUES (TRANSACTION_ID(), NOW());
little performance tip: make sure uuid is indexed
Long Answer:
MERGE is basically an UPDATE which INSERTs when no record found to be updated.
Wikipedia gives a more concise, standardized syntax of
MERGE but you have to supply your own update and insert.
(Whether this will be supported in H2 or not is not mine to answer)
So how do you update a record using MERGE in H2? You define a key to be looked up for, if it is found you update the row (with column names you supply, and you can define DEFAULT here, to reset your columns to its defaults), otherwise you insert the row.
Now what is Null? Null means unknown, not found, undefined, anything which is not what you're looking for.
That is why Null works as key to be looked up for. Because it means the record is not found.
MERGE INTO table1 (id, col1, col2)
KEY(id) VALUES (Null, 1, 2)
Null has a value. it IS a value.
Now let's see your SQL.
MERGE INTO table1 (id, col1, col2)
KEY(id) VALUES (DEFAULT, 1, 2)
What is that implying? To me, it says
I have this [DEFAULT, 1, 2], find me a DEFAULT in column id,
then update col1 to 1, col2 to 2, if found.
otherwise, insert default to id, 1 to col1, 2 to col2.
See what I emphasized there? What does that even mean? What is DEFAULT? How do you compare DEFAULT to id?
DEFAULT is just a keyword.
You can do stuff like,
MERGE INTO table1 (id, col1,
timeStampCol) KEY(id) VALUES (Null, 1,
DEFAULT)
but don't put DEFAULT in the key column.