Postgres unique index says duplicate exists on freshly deleted row - sql

I have a postgres database in which I'm refreshing data periodically. Most of the time it works, but sometimes I have issues with a unique index.
Minimal example
create table test_table (
id int
);
create unique index test_table_unique on test_table(id);
(I know, in this case it should be a primary key, but for the sake of example, please bear with me.)
Now, every hour, I do something like this:
begin;
delete from test_table;
insert into test_table (id) values (1), (2), (3)...
commit;
As I said, most of the time it will just work fine. However, sometimes postgres complains about a duplicate entry in the unique index.
error: duplicate key value violates unique constraint test_table_unique
detail: "Key (id)=(2) already exists."
My real database
In my actual table, I'm using JSON payloads, and the unique index is made on fields of that json payload. In particular, the error details is as follows:
create table if not exists source (
id serial primary key,
payload jsonb not null
);
create unique index if not exists source_index_and_id on source ((payload->>'_index'), (payload->>'_id'));
error details: "Key ((payload ->> '_index'::text), (payload ->> '_id'::text))=(companies, AC9860) already exists."
I'm confident there is no actual duplicate data. I'm deleting everything for a particular ->>_index, and the ->>_id is unique in my source data.
My understanding is that if I delete rows from a table, the indices will be updated before the next statements are executed. But it doesn't seem to be the case. I've found that it helps (not sure if it actually solves the issue) to commit the changes after the delete, and before the inserts.
begin;
delete...
commit;
begin;
insert...
commit;
What's happening here?

The only options how this could happen are
the deleting transaction rolled back
concurrent transactions inserted new rows after you deteted the original ones
the inserting transaction inserts the same key twice
the inserting transaction is accidentally run before the deleting one

PostGreSQL is not a real relational DBMS and does not match rule 7 of Codd's Rule about functional set operations.
Contrary to other RDBMS PostGreSQL delete rows one by one and this lack of functionality conduct to have sometime fantom key violation.
In my paper that compare PostGreSQL to MS SQL Server I made a test that show this evidence (§ 7 – The hard way to udpates unique values)

Related

Postgres unique constraint performance, insert + fail on duplicate or check?

What is bigger performance hit on a postgres database when table has unique constraint:
Trying to insert and let it throw unique violation constraint error
Check if entry exist and not do insert if it does
I'm importing some data, and ORM is connecting some entries via many to many through connection table. It is not checking if connection exist, it just runs the query and fails with unique constraint when it exist.
Is it better to leave it like that, or to introduce a step where I would check if the entry exist and then do the insert if it doesn't?
I would assume that your check from 2. would be an extra statement, so it is probably more expensive. I cannot say for sure, since you were rather vague in your question.
Besides the second approach is suffering from a race condition: you can never guarantee that no conflicting row gets inserted by a concurrent session after you checked.
If you want to avoid the error, the best approach would be
INSERT INTO ... VALUES (...) ON CONFLICT DO NOTHING;
performance hit:
As unique constraint creates index on the specified column it will affect the rate of insertion and updation.And most abruptly, In batch operations where numbers of inserts and updates are very large.

Why doesn't Kudu fail when inserting duplicate primary key?

From Impala documentation:
In most relational databases, if you try to insert a row that has already been inserted, the insertion will fail because the primary key would be duplicated.
Impala, however, will not fail the query. Instead, it will generate a warning, but continue to execute the remainder of the insert statement.
Why does Impala/Kudu act like that?
Please note that the insert won't update the value (there is an upsert command for that), it will just fail silently.
Is there a way to be aware that I'm inserting a duplicate primary key?
This is because kudu itself will not throw any exception (only raise warning) and hence impala will (rightly) assume the task succeeded.
As to why Kudu chose to do it this way we can only speculate.
This is just my opinion. Kudu (and Impala) is designed for analytical work-load instead of transactional work-load. Which usually involves batch processing of large amounts of data. It would be undesirable to for the application to fail because of small number of records with duplicate keys.
Thus default behaviour inserts all records with non-duplicate keys and skip all the duplicate keys. This can be changed by using upsert which replaces replaces duplicates.
According to Imapala documentation
If an INSERT statement attempts to insert a row with the same values for the primary key columns as an existing row, that row is discarded and the insert operation continues. When rows are discarded due to duplicate primary keys, the statement finishes with a warning, not an error. (This is a change from early releases of Kudu where the default was to return in error in such cases, and the syntax INSERT IGNORE was required to make the statement succeed. The IGNORE clause is no longer part of the INSERT syntax.)
For situations where you prefer to replace rows with duplicate primary key values, rather than discarding the new data, you can use the UPSERT statement instead of INSERT. UPSERT inserts rows that are entirely new, and for rows that match an existing primary key in the table, the non-primary-key columns are updated to reflect the values in the "upserted" data.
If you really want to store new rows, not replace existing ones, but cannot do so because of the primary key uniqueness constraint, consider recreating the table with additional columns included in the primary key.

SQL command works in interactive mode but no in bash script in Oracle Database [duplicate]

Imagine I have this simple table:
Table Name: Table1
Columns: Col1 NUMBER (Primary Key)
Col2 NUMBER
If I insert a record into Table1 with no commit...
INSERT INTO Table1 (Col1, Col2) Values (100, 1234);
How does Oracle know that this next INSERT statement violates the PK constraint, since nothing has yet been committed to the database yet.
INSERT INTO Table1 (Col1, Col2) Values (100, 5678);
Where/how does Oracle manage the transactions so that it knows I'm violating the constraint when I haven't even committed the transaction yet.
Oracle creates an index to enforce the primary key constraint (a unique index by default). When Session A inserts the first row, the index structure is updated but the change is not committed. When Session B tries to insert the second row, the index maintenance operation notes that there is already a pending entry in the index with that particular key. Session B cannot acquire the latch that protects the shared index structure so it will block until Session A's transaction completes. At that point, Session B will either be able to acquire the latch and make its own modification to the index (because A rolled back) or it will note that the other entry has been committed and will throw a unique constraint violation (because A committed).
It's because of the unique index that enforces the primary key constraint. Even though the insert into the data block is not yet committed, the attempt to add the duplicate entry into the index cannot succeed, even if it's done in another session.
Just because you haven't done a commit yet does not mean the first record hasn't been sent to the server. Oracle already knows about you intentions to insert the first record. When you insert the second record Oracle knows for sure there is no way this will ever succeed without a constraint violation so it refuses.
If another user were to insert the second record, Oracle will accept it if the first record has not been committed yet. If the second user commits before you do, your commit will fail.
Unless a particular constraint is "deferred", it will be checked at the point of the statement execution. If it is deferred, it will be checked at the end of the transaction. I'm assuming you did not defer your PRIMARY KEY and that's why you get a violation even before you commit.
How this is really done is an implementation detail and may vary between different database systems and even versions of the same system. The application developer should probably not make too many assumptions about it. In Oracle's case, PRIMARY KEY uses the underlying index for performance reasons, while there are systems out there that do not even require an index (if you can live with the corresponding performance hit).
BTW, a deferrable Oracle PRIMARY KEY constraint relies on a non-unique index (vs non-deferrable PRIMARY KEY that uses a unique index).
--- EDIT ---
I just realized you didn't even commit the first INSERT. I think Justin's answer explains nicely how what is essentially a lock contention causes one of the transactions to stall.

Postgresql turns update to insert and creates duplicate record

I'm not quite sure how to ask this question.
The table stores its main data in a JSONB column. The other columns are an integer primary key, a unique text secondary key, an application generated integer transaction id, and the type of operation last performed (insert, update, delete).
There are 5 triggers.
On before insert and update, set the new.operation column to TG_OP (more on this later)
On before insert, generate a unique 6 digit alphameric code for use in URLs
On before insert, generate a unique, random 6 digit numeric code avoiding the German Tank Problem.
On before insert, add the numeric and alphameric codes to the JSONB object.
On after update and delete, insert the old record appended with the new tranid and operation column to an unindexed archive table.
All of the triggers seem to work and the records get created with the new ids and the ids in the JSONB column.
However, on an update the new operation gets set to update from TG_OP variable, but the record gets inserted into the table creating duplicate keys. Subsequent ops on that record fail because of the duplicate records.
I've stepped through it in the pgAdmin debugger. It seems to go through each trigger correctly. It completes with a record from the insert (e.g. tranid=254, operation=insert) and another from the update (e.g. tranid=256, operation=update). The archive table has one record added which shows the original info was 254/insert and it was replaced by 256/update.
But there are two records in the main table!!!
This is a violation of two uniqueness constraints which should have caused it to fail:
CONSTRAINT npprimarykey_id PRIMARY KEY (id),
CONSTRAINT npid_txt_unique UNIQUE (id_txt)
Beyond that, the command being executed was an UPDATE.
I'd not clear where to look or on what forum to ask the question. Which is the forum the firms building Postgresql frequent?
Thanks,
David

Oracle - How does Oracle manage transaction specific DML statements

Imagine I have this simple table:
Table Name: Table1
Columns: Col1 NUMBER (Primary Key)
Col2 NUMBER
If I insert a record into Table1 with no commit...
INSERT INTO Table1 (Col1, Col2) Values (100, 1234);
How does Oracle know that this next INSERT statement violates the PK constraint, since nothing has yet been committed to the database yet.
INSERT INTO Table1 (Col1, Col2) Values (100, 5678);
Where/how does Oracle manage the transactions so that it knows I'm violating the constraint when I haven't even committed the transaction yet.
Oracle creates an index to enforce the primary key constraint (a unique index by default). When Session A inserts the first row, the index structure is updated but the change is not committed. When Session B tries to insert the second row, the index maintenance operation notes that there is already a pending entry in the index with that particular key. Session B cannot acquire the latch that protects the shared index structure so it will block until Session A's transaction completes. At that point, Session B will either be able to acquire the latch and make its own modification to the index (because A rolled back) or it will note that the other entry has been committed and will throw a unique constraint violation (because A committed).
It's because of the unique index that enforces the primary key constraint. Even though the insert into the data block is not yet committed, the attempt to add the duplicate entry into the index cannot succeed, even if it's done in another session.
Just because you haven't done a commit yet does not mean the first record hasn't been sent to the server. Oracle already knows about you intentions to insert the first record. When you insert the second record Oracle knows for sure there is no way this will ever succeed without a constraint violation so it refuses.
If another user were to insert the second record, Oracle will accept it if the first record has not been committed yet. If the second user commits before you do, your commit will fail.
Unless a particular constraint is "deferred", it will be checked at the point of the statement execution. If it is deferred, it will be checked at the end of the transaction. I'm assuming you did not defer your PRIMARY KEY and that's why you get a violation even before you commit.
How this is really done is an implementation detail and may vary between different database systems and even versions of the same system. The application developer should probably not make too many assumptions about it. In Oracle's case, PRIMARY KEY uses the underlying index for performance reasons, while there are systems out there that do not even require an index (if you can live with the corresponding performance hit).
BTW, a deferrable Oracle PRIMARY KEY constraint relies on a non-unique index (vs non-deferrable PRIMARY KEY that uses a unique index).
--- EDIT ---
I just realized you didn't even commit the first INSERT. I think Justin's answer explains nicely how what is essentially a lock contention causes one of the transactions to stall.