Can you prevent an insert in an after trigger in pgsql? - sql

In postgres, I have an after-insert trigger on a partitionned table that checks if the record is unique or not.
Currently, if the record is not unique, the trigger raises an exception.
This is not what I want because I just want to ignore this case and act as if the insertion actually happened.
Is it possible to configure the trigger to silently fail ?
The surrounding transaction should not fail, but the row should not be inserted.

I have an after-insert trigger on a partitionned table that checks if the record is unique or not.
You could do this as part of your insert statement, using the on conflict do nothing clause. This is simpler and more efficient than using a trigger.
For this to work, you need a unique constraint on the column (or the tuple of columns) whose uniqueness you want to enforce. Assuming that this is column id, you would do:
insert into mytable(id, ...) -- enumerate the target columns here
values(id, ...) -- enumerate the values to insert here
on conflict(id) do nothing
Note that conflict action do nothing does not actually requires specifiying a conflict target, so strictly speaking you could just write this as on conflict do nothing instead. I find that it is always a good idea to specify the conflict target, so the scope is better defined.
If for some reason, you cannot have a unique index on the target column, then another option is to use the insert ... select syntax along with a not exists condition:
insert into mytable(id, ...)
select id, ...
from (values(id, ...)) t(id, ...)
where not exists (select 1 from mytable t1 where t1.id = t.id)

Related

Is there an elegant way to deal with inserting rows with duplicate identifiers?

I'm trying to insert some rows into my table that have the same unique identifier, but all the other fields are different (the rows represent points on a map, and they just happen to have the same name). The final result I'd like to end up with is to somehow modify the offending rows to have unique identifiers (adding on some incrementing number to the identifier, like "name0", "name1", "name2", etc.) during the insertion command.
I'm aware of Postgres's recent addition of "ON CONFLICT" support, but it's not quite what I'm looking for.
According to the Postgres 9.6 Documentation:
The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error. For each individual row proposed for insertion, either the insertion proceeds, or ... the alternative conflict_action is taken. ...ON CONFLICT DO UPDATE updates the existing row that conflicts with the row proposed for insertion as its alternative action.
What I would like to do is 1) either modify the offending row or the insertion itself and 2) proceed with the insertion (instead of replacing it with an update, like the ON CONFLICT feature does). Is there an elegant way of accomplishing this? Or am I going to need to write something more complex?
You can do this:
create table my_table
(
name text primary key,
some_column varchar
);
create sequence my_table_seq;
The sequence is used to assign a unique suffix to the new row's PK column.
The "insert on conflict insert modified" behaviour can be done like this:
with data (name, some_column) as (
values ('foo', 'bar')
), inserted as (
insert into my_table
select *
from data
on conflict (name) do nothing
returning *
)
insert into my_table (name, some_column)
select concat(name, '_', nextval('my_table_seq')), some_column
from data
where not exists (select 1 from inserted);
The first time you insert a value into the PK column, the insert (in the CTE "inserted") just proceeds. The final insert won't insert anything because the where not exists () prevents that as the inserted returned one row.
The second time you run this, the first insert won't insert anything, and thus the second (final) insert will.
There is one drawback though: if something was inserted by the "inserted" CTE, the the overall statement will report "0 rows affected" because the final insert is the one "driving" this information.

Return rows from INSERT with ON CONFLICT without needing to update

I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return.
For example my table might be:
CREATE TABLE names(
id SERIAL PRIMARY KEY,
name TEXT,
CONSTRAINT names_name_key UNIQUE (name)
);
And it contains:
id | name
1 | bob
2 | alice
Then I'd like to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT DO NOTHING RETURNING id;
Or perhaps:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT (name) DO NOTHING RETURNING id
and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;
which seems kind of cumbersome. I guess my questions are:
What is the reasoning for not allowing the (my) desired behaviour?
Is there a more elegant way to do this?
It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
This way you do not actually write a new row version without need.
I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.
However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.
Proper solution for single-row UPSERT:
Is SELECT or INSERT in a function prone to race conditions?
General solutions for bulk UPSERT:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Without concurrent write load
If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

What happens with duplicates when inserting multiple rows?

I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts:
INSERT INTO table (col1,col2) VALUES ('v1','v2'),('v3','v4') ... etc
I was wondering what would happen if it hits a duplicate key for the insert. Will it stop the entire query and throw an exception? Or will it merely ignore the insert of that specific row and move on?
The INSERT will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY, UNIQUE, CHECK or EXCLUDE constraint) - which you did not mention in your question. But that's what you are probably worried about.
Assuming a UNIQUE or PK constraint on (col1,col2), you are dealing with a textbook UPSERT situation. Many related questions and answers to find here.
Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.
Without concurrent writes
I.e.: No other transactions will try to write to the same table at the same time.
Exclude rows that are already in the table with WHERE NOT EXISTS ... or any other applicable technique:
Select rows which are not present in other table
And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join WHERE NOT EXISTS ...
One technique to deal with both at once would be EXCEPT:
INSERT INTO tbl (col1, col2)
VALUES
(text 'v1', text 'v2') -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4') -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;
EXCEPT without the key word ALL folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use EXCEPT ALL (or one of the other techniques). See:
Using EXCEPT clause in PostgreSQL
Generally, if the target table is big, WHERE NOT EXISTS in combination with DISTINCT on the source will probably be faster:
INSERT INTO tbl (col1, col2)
SELECT *
FROM (
SELECT DISTINCT *
FROM (
VALUES
(text 'v1', text'v2')
, ('v3', 'v4')
, ('v3', 'v4') -- dupes in source
) t(c1, c2)
) t
WHERE NOT EXISTS (
SELECT FROM tbl
WHERE col1 = t.c1 AND col2 = t.c2
);
If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.
Related:
Select rows which are not present in other table
With concurrent writes
Use the Postgres UPSERT implementation INSERT ... ON CONFLICT ... in Postgres 9.5 or later:
INSERT INTO tbl (col1,col2)
SELECT DISTINCT * -- still can't insert the same row more than once
FROM (
VALUES
(text 'v1', text 'v2')
, ('v3','v4')
, ('v3','v4') -- you still need to fold dupes in source!
) t(c1, c2)
ON CONFLICT DO NOTHING; -- ignores rows with *any* conflict!
Further reading:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How do I insert a row which contains a foreign key?
Documentation:
The manual
The commit page
The Postgres Wiki page
Craig's reference answer for UPSERT problems:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Will it stop the entire query and throw an exception? Yes.
To avoid that, you can look on the following SO question here, which describes how to avoid Postgres from throwing an error for multiple inserts when some of the inserted keys already exist on the DB.
You should basically do this:
INSERT INTO DBtable
(id, field1)
SELECT 1, 'value'
WHERE
NOT EXISTS (
SELECT id FROM DBtable WHERE id = 1
);

MERGE INTO table containing AUTO_INCREMENT columns

I've declared the following table for use by audit triggers:
CREATE TABLE audit_transaction_ids (id IDENTITY PRIMARY KEY, uuid VARCHAR UNIQUE NOT NULL, `time` TIMESTAMP NOT NULL);
The trigger will get invoked multiple times in the same transaction.
The first time the trigger is invoked, I want it to insert a new
row with the current TRANSACTION_ID() and time.
The subsequent times the trigger is invoked, I want it to return
the existing "id" (I invoke Statement.getGeneratedKeys() to that end)
without altering "uuid" or "time".
The current schema seems to have two problems.
When I invoke MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES(TRANSACTION_ID(), NOW()) I get: org.h2.jdbc.JdbcSQLException: Column "ID" contains null values; SQL
statement: MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES
(TRANSACTION_ID(), NOW()) [90081-155]
I suspect that invoking MERGE on an existing row will alter "time".
How do I fix both these problems?
MERGE is analogous to java.util.Map.put(key, value): it will insert the row if it doesn't exist, and update the row if it does. That being said, you can still merge into a table containing AUTO_INCREMENT columns so long as you use another column as the key.
Given customer[id identity, email varchar(30), count int] you could merge into customer(id, email, count) key(email) values((select max(id) from customer c2 where c2.email='test#acme.com'), 'test#acme.com', 10). Meaning, re-use the id if a record exists, use null otherwise.
See also https://stackoverflow.com/a/18819879/14731 for a portable way to insert-or-update depending on whether a row already exists.
1. MERGE INTO audit_transaction_ids (uuid, time) KEY(id) VALUES(TRANSACTION_ID(), NOW())
If you just want to insert a new row, use:
INSERT INTO audit_transaction_ids (uuid, time) VALUES(TRANSACTION_ID(), NOW())
MERGE without setting the value for the column ID doesn't make sense if ID is used as the key, because that way it could never (even in theory) update an existing rows. What you could do is using another key column (in the case above there is no column that could be used). See the documentation for MERGE for details.
2. Invoking MERGE on an existing row will alter "time"
I'm not sure if you talk about the fact that the value of the column 'time' is altered. This is the expected behavior if you use MERGE ... VALUES(.., NOW()), because the MERGE statement is supposed to update that column.
Or maybe you mean that older versions of H2 returned different values within the same transaction (unlike most other databases, which return the same value within the same transaction). This is true, however with H2 version 1.3.155 (2011-05-27) and later, this incompatibility is fixed. See also the change log: "CURRENT_TIMESTAMP() and so on now return the same value within a transaction." It looks like this is not the problem in your case, because you do seem to use version 1.3.155 (the error message [90081-155] includes the build / version number).
Short Answer:
MERGE INTO AUDIT_TRANSACTION_IDS (uuid, time) KEY (uuid, time)
VALUES (TRANSACTION_ID(), NOW());
little performance tip: make sure uuid is indexed
Long Answer:
MERGE is basically an UPDATE which INSERTs when no record found to be updated.
Wikipedia gives a more concise, standardized syntax of
MERGE but you have to supply your own update and insert.
(Whether this will be supported in H2 or not is not mine to answer)
So how do you update a record using MERGE in H2? You define a key to be looked up for, if it is found you update the row (with column names you supply, and you can define DEFAULT here, to reset your columns to its defaults), otherwise you insert the row.
Now what is Null? Null means unknown, not found, undefined, anything which is not what you're looking for.
That is why Null works as key to be looked up for. Because it means the record is not found.
MERGE INTO table1 (id, col1, col2)
KEY(id) VALUES (Null, 1, 2)
Null has a value. it IS a value.
Now let's see your SQL.
MERGE INTO table1 (id, col1, col2)
KEY(id) VALUES (DEFAULT, 1, 2)
What is that implying? To me, it says
I have this [DEFAULT, 1, 2], find me a DEFAULT in column id,
then update col1 to 1, col2 to 2, if found.
otherwise, insert default to id, 1 to col1, 2 to col2.
See what I emphasized there? What does that even mean? What is DEFAULT? How do you compare DEFAULT to id?
DEFAULT is just a keyword.
You can do stuff like,
MERGE INTO table1 (id, col1,
timeStampCol) KEY(id) VALUES (Null, 1,
DEFAULT)
but don't put DEFAULT in the key column.

Delete and Insert or Select and Update

We have a status table. When the status changes we currently delete the old record and insert a new.
We are wondering if it would be faster to do a select to check if it exists followed by an insert or update.
Although similar to the following question, it is not the same, since we are changing individual records and the other question was doing a total table refresh.
DELETE, INSERT vs UPDATE || INSERT
Since you're talking SQL Server 2008, have you considered MERGE? It's a single statement that allows you to do an update or insert:
create table T1 (
ID int not null,
Val1 varchar(10) not null
)
go
insert into T1 (ID,Val1)
select 1,'abc'
go
merge into T1
using (select 1 as ID,'def' as Val1) upd on T1.ID = upd.ID --<-- These identify the row you want to update/insert and the new value you want to set. They could be #parameters
when matched then update set Val1 = upd.Val1
when not matched then insert (ID,Val1) values (upd.ID,upd.Val1);
What about INSERT ... ON DUPLICATE KEY? First doing a select to check if a record exists and checking in your program the result of that creates a race condition. That might not be important in your case if there is only a single instance of the program however.
INSERT INTO users (username, email) VALUES ('Jo', 'jo#email.com')
ON DUPLICATE KEY UPDATE email = 'jo#email.com'
You can use ##ROWCOUNT and perform UPDATE. If it was 0 rows affected - then perform INSERT after, nothing otherwise.
Your suggestion would mean always two instructions for each status change. The usual way is to do an UPDATE and then check if the operation changed any rows (Most databases have a variable like ROWCOUNT which should be greater than 0 if something changed). If it didn't, do an INSERT.
Search for UPSERT for find patterns for your specific DBMS
Personally, I think the UPDATE method is the best. Instead of doing a SELECT first to check if a record already exists, you can first attempt an UPDATE but if no rows are affected (using ##ROWCOUNT) you can do an INSERT.
The reason for this is that sooner or later you might want to track status changes, and the best way to do this would be to keep an audit trail of all changes using a trigger on the status table.