PL/PgSQL: Catching unique violation exceptions and inserting from 'record' variables - sql

I've got a nightly archive job that sweeps existing rows from a records table into records_archive:
INSERT INTO records_archive
SELECT * FROM records WHERE id > 555 AND id <= 556;
For reasons that are outside the scope of this question, there is a column -- master_guid -- in both records and records_archive that has a UNIQUE index, and I can't change that. Rebuilding the index into a non-unique one is off the table for reasons out of my control. So, in principle, every master_guid value is supposed to be unique. There are some bad client implementations out there that sometimes fail to generate unique enough GUIDs, though, which results in a collision when we attempt to insert a record into records_archive that with a master_guid value that already exists in records_archive.
I can't fix the clients, so I need to work around it. The way to do that is to catch the unique_violation exception, modify the GUID (adding some random characters to it), and attempt to re-insert.
I can't just wrap the above-mentioned INSERT query in a stored procedure and catch the unique_violation exception, because the whole query is one transaction. I need row-level access. So, I wrote a stored procedure to iterate over each row and catch a unique_violation exception for that row individually:
CREATE OR REPLACE FUNCTION archive_records(start_id bigint,
end_id bigint)
RETURNS void
AS $$
DECLARE
_r record;
BEGIN
FOR _r IN
SELECT * FROM records WHERE id > start_id AND id <= end_id
LOOP
BEGIN
INSERT INTO records_archive VALUES (_r.*);
EXCEPTION WHEN unique_violation THEN
-- Manipulate the _r.master_guid value, add some random
-- numbers to it or whatever, and attempt reinsertion.
END;
END LOOP;
END;
$$ LANGUAGE 'plpgsql';
The problem is that records (and records_archive) are insanely wide tables. I don't want to explicitly enumerate every single column to be copied from _r to records_archive, not only because I'm lazy, but because this stored procedure would become a dependency in any future column changes in those tables.
The problem I've got is that this doesn't work, syntactically or conceptually:
INSERT INTO records_archive VALUES (_r.*);
Neither do any of these:
INSERT INTO records_archive _r;
INSERT INTO records_archive _r.*;
Is there a way to pull this off? If not, is there a better way to accomplish what I'm trying to accomplish?

I would create a BEFORE INSERT Trigger on records_archive. Do a select on the records_archive for the NEW.master_guid, and if it exists, manipulate it to add your random numbers.
You'd probably want a loop around the check to ensure the modified master_guid still didn't exist before went ahead with the insertion.

Related

How to successfully reference another table before insert with a trigger

I'm trying to create a trigger to validate if a new entry in the table registraties (registrations) contains a valid MNR (employee number) but I'm getting stuck on the part where I'm referencing the table medewerkers (employees).
Could someone help me out?
CREATE OR REPLACE TRIGGER t_MNRcontrole
BEFORE INSERT OR UPDATE
ON registraties
DECLARE
MNR_medewerkers number (SELECT MNR FROM MEDEWERKERS);
FOR EACH ROW
BEGIN
IF :new.MNR <> MNR_medewerkers
THEN raise_application_error(-20111, 'Medewerker niet herkend!');
END IF;
END;
Error message received is
ORA-24344: success with compilation error
The PL/SQL assignment operator is :=, or select x into y from z to populate from a SQL query.
FOR EACH ROW is part of the trigger spec, not the PL/SQL code.
If :new.mnr is not present in the parent table, you will get a no_data_found exception, not a mismatched variable.
It's good practice for error messages to include details of what failed.
In programming, we use indentation to indicate code structure.
A fixed version would be something like:
create or replace trigger trg_mnrcontrole
before insert or update on registraties
for each row
declare
mnr_medewerkers medewerkers.mnr%type;
begin
select mw.mnr into mnr_medewerkers
from medewerkers mw
where mw.mnr = :new.mnr;
exception
when no_data_found then
raise_application_error(-20111, 'Medewerker '||:new.mnr||' niet herkend!');
end;
However, we can implement this kind of check better using a foreign key constraint, for example:
alter table registraties add constraint registraties_mw_fk
foreign key (mnr) references medewerkers.mnr;
MNR_medewerkers number (SELECT MNR FROM MEDEWERKERS);
will always fail because its not a NUMBER, unless your table happens to only have one single entry and even then I am not sure PLSQL will allow it to pass.
The more standard case for this would be to first declare the number, then in the codeblock you do a SELECT INTO along with a WHERE clause where you make sure to only pick one specific row from the table. Then you can compare that number with the new one.
If however you are not trying to compare to one specific row, but are instead checking if the entry exists in that table.
BEGIN
SELECT 1
INTO m_variable
FROM table
WHERE MNR = :new.MNR;
EXCEPTION
WHEN TOO_MANY_ROWS THEN
m_variable = 1;
WHEN OTHERS THEN
m_variable = 0;
END;
Declare the m_variable beforehand, and then check if its 0 then report the error.
The too_many_rows is in case there is more than one row in the table with this MNR, and the OTHERS is there for the NO_DATA_FOUND, but I use OTHERS to handle everything else that could happen but probably wont.
Btw this is a code block to be included within the main code block, so between your BEGIN and IF, then just change the IF to check if the variable is 0.

Ensure a serial in PostgreSQL [duplicate]

I'm moving from MySql to Postgres, and I noticed that when you delete rows from MySql, the unique ids for those rows are re-used when you make new ones. With Postgres, if you create rows, and delete them, the unique ids are not used again.
Is there a reason for this behaviour in Postgres? Can I make it act more like MySql in this case?
Sequences have gaps to permit concurrent inserts. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.
PostgreSQL SEQUENCEs are used to allocate IDs. These only ever increase, and they're exempt from the usual transaction rollback rules to permit multiple transactions to grab new IDs at the same time. This means that if a transaction rolls back, those IDs are "thrown away"; there's no list of "free" IDs kept, just the current ID counter. Sequences are also usually incremented if the database shuts down uncleanly.
Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. All you can do is say "equal" or "not equal". Anything else is unsafe. You shouldn't care about gaps.
If you need a gapless sequence that re-uses deleted IDs, you can have one, you just have to give up a huge amount of performance for it - in particular, you cannot have any concurrency on INSERTs at all, because you have to scan the table for the lowest free ID, locking the table for write so no other transaction can claim the same ID. Try searching for "postgresql gapless sequence".
The simplest approach is to use a counter table and a function that gets the next ID. Here's a generalized version that uses a counter table to generate consecutive gapless IDs; it doesn't re-use IDs, though.
CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);
CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;
COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';
Usage:
INSERT INTO dummy(id, blah)
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );
Note that when one open transaction has obtained an ID, all other transactions that try to call get_next_id will block until the 1st transaction commits or rolls back. This is unavoidable and for gapless IDs and is by design.
If you want to store multiple counters for different purposes in a table, just add a parameter to the above function, add a column to the counter table, and add a WHERE clause to the UPDATE that matches the parameter to the added column. That way you can have multiple independently-locked counter rows. Do not just add extra columns for new counters.
This function does not re-use deleted IDs, it just avoids introducing gaps.
To re-use IDs I advise ... not re-using IDs.
If you really must, you can do so by adding an ON INSERT OR UPDATE OR DELETE trigger on the table of interest that adds deleted IDs to a free-list side table, and removes them from the free-list table when they're INSERTed. Treat an UPDATE as a DELETE followed by an INSERT. Now modify the ID generation function above so that it does a SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 and if found, DELETEs that row. IF NOT FOUND gets a new ID from the generator table as normal. Here's an untested extension of the prior function to support re-use:
CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
IF next_value IS NOT NULL THEN
EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
ELSE
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
END IF;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;

Compound trigger: collecting mutating rows into nested table

I have two tables in my project: accounts and transactions (one-to-many relationship). In every transaction I store the balance of the associated account (after the transaction is executed). Additionally in every transaction I store a value of the transaction.
So I needed a trigger fired when someone adds new transaction. It should check whether new account balance will be correct (old account balance + transaction value = new account balance stored in transaction).
So I was suggested, I should use a compound trigger which would:
in before each row section: save a row's PK (made of two columns) somewhere,
in after statement section: check whether all inserted transactions where correct.
Now I can't find anywhere how could I implement the first point.
What I already have:
CREATE OR REPLACE TRIGGER check_account_balance_is_valid
FOR INSERT
ON Transactions
COMPOUND TRIGGER
TYPE Modified_transactions_T IS TABLE OF Transactions%ROWTYPE;
Modified_transactions Modified_transactions_T;
BEFORE STATEMENT IS BEGIN
Modified_transactions := Modified_transactions_T();
END BEFORE STATEMENT;
BEFORE EACH ROW IS BEGIN
Modified_transactions.extend;
Modified_transactions(Modified_transactions.last) := :NEW;
END BEFORE EACH ROW;
AFTER STATEMENT IS BEGIN
NULL; -- I will write something here later
END AFTER STATEMENT;
END check_account_balance_is_valid;
/
However, I got that:
Warning: execution completed with warning
11/58 PLS-00049: bad bind variable 'NEW'
Could someone tell me, how to fix it? Or maybe my whole "compound trigger" idea is wrong and you have better suggestions.
Update 1
Here is my ddl script: http://pastebin.com/MW0Eqf9J
Maybe try this one:
TYPE Modified_transactions_T IS TABLE OF ROWID;
Modified_transactions Modified_transactions_T;
BEFORE STATEMENT IS BEGIN
Modified_transactions := Modified_transactions_T();
END BEFORE STATEMENT;
BEFORE EACH ROW IS BEGIN
Modified_transactions.extend;
Modified_transactions(Modified_transactions.last) := :NEW.ROWID;
END BEFORE EACH ROW;
or this
TYPE PrimaryKeyRecType IS RECORD (
Col1 Transactions.PK_COL_1%TYPE, Col2 Transactions.PK_COL_2%TYPE);
TYPE Modified_transactions_T IS TABLE OF PrimaryKeyRecType;
...
Modified_transactions(Modified_transactions.last) := PrimaryKeyRecType(:NEW.PK_COL_1, :NEW.PK_COL_2);
Your immediate problem is that :new is not a real record so it is not of type Transactions%ROWTYPE. If you're really going to go down this path, you would generally want to declare a collection of the primary key of the table
TYPE Modified_transactions_T IS TABLE OF Transactions.Primary_Key%TYPE;
and then put just the primary key in the collection
BEFORE EACH ROW IS BEGIN
Modified_transactions.extend;
Modified_transactions(Modified_transactions.last) := :NEW.Primary_Key;
END BEFORE EACH ROW;
The fact that you are trying to work around a mutating table exception in the first place, however, almost always indicates that you have an underlying data modeling problem that you should really be solving. If you need to query other rows in the table in order to figure out what you want to do with the new rows, that's a pretty good indication that you have improperly normalized your data model and that one row has some dependency on another row in the same table rather than being an autonomous fact. Fixing the data model is almost always preferrable to working around the mutating table exception.

Writing an SQL trigger to find if number appears in column more than X times?

I want to write a Postgres SQL trigger that will basically find if a number appears in a column 5 or more times. If it appears a 5th time, I want to throw an exception. Here is how the table looks:
create table tab(
first integer not null constraint pk_part_id primary key,
second integer constraint fk_super_part_id references bom,
price integer);
insert into tab values(1,NULL,100), (2,1,50), (3,1,30), (4,2,20), (5,2,10), (6,3,20);
Above are the original inserts into the table. My trigger will occur upon inserting more values into the table.
Basically if a number appears in the 'second' column more than 4 times after inserting into the table, I want to raise an exception. Here is my attempt at writing the trigger:
create function check() return trigger as '
begin
if(select first, second, price
from tab
where second in (
select second from tab
group by second
having count(second) > 4)
) then
raise exception ''Error, there are more than 5 parts.'';
end if;
return null;
end
'language plpgsql;
create trigger check
after insert or update on tab
for each row execute procedure check();
Could anyone help me out? If so that would be great! Thanks!
CREATE FUNCTION trg_upbef()
RETURN trigger as
$func$
BEGIN
IF (SELECT count(*)
FROM tab
WHERE second = NEW.second ) > 3 THEN
RAISE EXCEPTION 'Error: there are more than 5 parts.';
END IF;
RETURN NEW; -- must be NEW for BEFORE trigger
END
$func$ LANGUAGE plpgsql;
CREATE TRIGGER upbef
BEFORE INSERT OR UPDATE ON tab
FOR EACH ROW EXECUTE procedure trg_upbef();
Major points
Keyword is RETURNS, not RETURN.
Use the special variable NEW to refer to the newly inserted / updated row.
Use a BEFORE trigger. Better skip early in case of an exception.
Don't count everything for your test, just what you need. Much faster.
Use dollar-quoting. Makes your live easier.
Concurrency:
If you want to be absolutely sure, you'll have to take an exclusive lock on the table before counting. Else, concurrent inserts / updates might outfox each other under heavy concurrent load. While this is rather unlikely, it's possible.

Is SELECT or INSERT in a function prone to race conditions?

I wrote a function to create posts for a simple blogging engine:
CREATE FUNCTION CreatePost(VARCHAR, TEXT, VARCHAR[])
RETURNS INTEGER AS $$
DECLARE
InsertedPostId INTEGER;
TagName VARCHAR;
BEGIN
INSERT INTO Posts (Title, Body)
VALUES ($1, $2)
RETURNING Id INTO InsertedPostId;
FOREACH TagName IN ARRAY $3 LOOP
DECLARE
InsertedTagId INTEGER;
BEGIN
-- I am concerned about this part.
BEGIN
INSERT INTO Tags (Name)
VALUES (TagName)
RETURNING Id INTO InsertedTagId;
EXCEPTION WHEN UNIQUE_VIOLATION THEN
SELECT INTO InsertedTagId Id
FROM Tags
WHERE Name = TagName
FETCH FIRST ROW ONLY;
END;
INSERT INTO Taggings (PostId, TagId)
VALUES (InsertedPostId, InsertedTagId);
END;
END LOOP;
RETURN InsertedPostId;
END;
$$ LANGUAGE 'plpgsql';
Is this prone to race conditions when multiple users delete tags and create posts at the same time?
Specifically, do transactions (and thus functions) prevent such race conditions from happening?
I'm using PostgreSQL 9.2.3.
It's the recurring problem of SELECT or INSERT under possible concurrent write load, related to (but different from) UPSERT (which is INSERT or UPDATE).
This PL/pgSQL function uses UPSERT (INSERT ... ON CONFLICT ..) to INSERT or SELECT a single row:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT tag_id -- only if row existed before
FROM tag
WHERE tag = _tag
INTO _tag_id;
IF NOT FOUND THEN
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
END IF;
END
$func$;
There is still a tiny window for a race condition. To make absolutely sure we get an ID:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id
FROM tag
WHERE tag = _tag
INTO _tag_id;
EXIT WHEN FOUND;
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$;
db<>fiddle here
This keeps looping until either INSERT or SELECT succeeds.
Call:
SELECT f_tag_id('possibly_new_tag');
If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT statement with FOR SHARE.
If the row gets inserted instead, it is locked (or not visible for other transactions) until the end of the transaction anyway.
Start with the common case (INSERT vs SELECT) to make it faster.
Related:
Get Id from a conditional INSERT
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
Related (pure SQL) solution to INSERT or SELECT multiple rows (a set) at once:
How to use RETURNING with ON CONFLICT in PostgreSQL?
What's wrong with this pure SQL solution?
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE sql AS
$func$
WITH ins AS (
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
)
SELECT tag_id FROM ins
UNION ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT 1;
$func$;
Not entirely wrong, but it fails to seal a loophole, like #FunctorSalad worked out. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. The manual:
All the statements are executed with the same snapshot
If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:
The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)
The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.
We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.
So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads anyway.
Postgres 9.4 or older
Given this (slightly simplified) table:
CREATE table tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE
);
An almost 100% secure function to insert new tag / select existing one, could look like this.
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
BEGIN
WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
, ins AS (INSERT INTO tag(tag)
SELECT _tag
WHERE NOT EXISTS (SELECT 1 FROM sel) -- only if not found
RETURNING tag.tag_id) -- qualified so no conflict with param
SELECT sel.tag_id FROM sel
UNION ALL
SELECT ins.tag_id FROM ins
INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- insert in concurrent session?
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
EXIT WHEN tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
db<>fiddle here
Old sqlfiddle
Why not 100%? Consider the notes in the manual for the related UPSERT example:
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
Explanation
Try the SELECT first. This way you avoid the considerably more expensive exception handling 99.99% of the time.
Use a CTE to minimize the (already tiny) time slot for the race condition.
The time window between the SELECT and the INSERT within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.
No need for FETCH FIRST ROW ONLY (= LIMIT 1). The tag name is obviously UNIQUE.
Remove FOR SHARE in my example if you don't usually have concurrent DELETE or UPDATE on the table tag. Costs a tiny bit of performance.
Never quote the language name: 'plpgsql'. plpgsql is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.
Don't use non-descriptive column names like id or name. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.
Built into your function
Using this function you could largely simplify your FOREACH LOOP to:
...
FOREACH TagName IN ARRAY $3
LOOP
INSERT INTO taggings (PostId, TagId)
VALUES (InsertedPostId, f_tag_id(TagName));
END LOOP;
...
Faster, though, as a single SQL statement with unnest():
INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM unnest($3) tag;
Replaces the whole loop.
Alternative solution
This variant builds on the behavior of UNION ALL with a LIMIT clause: as soon as enough rows are found, the rest is never executed:
Way to try multiple SELECTs till a result is available?
Building on this, we can outsource the INSERT into a separate function. Only there we need exception handling. Just as safe as the first solution.
CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
RETURNS int
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, NULL is returned
END
$func$;
Which is used in the main function:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id FROM tag WHERE tag = _tag
UNION ALL
SELECT f_insert_tag(_tag) -- only executed if tag not found
LIMIT 1 -- not strictly necessary, just to be clear
INTO _tag_id;
EXIT WHEN _tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
This is a bit cheaper if most of the calls only need SELECT, because the more expensive block with INSERT containing the EXCEPTION clause is rarely entered. The query is also simpler.
FOR SHARE is not possible here (not allowed in UNION query).
LIMIT 1 would not be necessary (tested in pg 9.4). Postgres derives LIMIT 1 from INTO _tag_id and only executes until the first row is found.
There's still something to watch out for even when using the ON CONFLICT clause introduced in Postgres 9.5. Using the same function and example table as in #Erwin Brandstetter's answer, if we do:
Session 1: begin;
Session 2: begin;
Session 1: select f_tag_id('a');
f_tag_id
----------
11
(1 row)
Session 2: select f_tag_id('a');
[Session 2 blocks]
Session 1: commit;
[Session 2 returns:]
f_tag_id
----------
NULL
(1 row)
So f_tag_id returned NULL in session 2, which would be impossible in a single-threaded world!
If we raise the transaction isolation level to repeatable read (or the stronger serializable), session 2 throws ERROR: could not serialize access due to concurrent update instead. So no "impossible" results at least, but unfortunately we now need to be prepared to retry the transaction.
Edit: With repeatable read or serializable, if session 1 inserts tag a, then session 2 inserts b, then session 1 tries to insert b and session 2 tries to insert a, one session detects a deadlock:
ERROR: deadlock detected
DETAIL: Process 14377 waits for ShareLock on transaction 1795501; blocked by process 14363.
Process 14363 waits for ShareLock on transaction 1795503; blocked by process 14377.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (0,3) in relation "tag"
SQL function "f_tag_id" statement 1
After the session that received the deadlock error rolls back, the other session continues. So I guess we should treat deadlock just like serialization_failure and retry, in a situation like this?
Alternatively, insert the tags in a consistent order, but this is not easy if they don't all get added in one place.
I think there is a slight chance that when the tag already existed it might be deleted by another transaction after your transaction has found it. Using a SELECT FOR UPDATE should solve that.