Alternative to Postgres SERIAL field to solve incrementing values when ON CONFLICT causes update - sql

I've recently been caught out by being unaware of the issue where SERIAL fields increment whether data is inserted or not.
Most of the answers I've read on this matter discuss preventing holes from appearing in the column, which I'm fairly certain in most cases isn't what the question posed is concerned with, and it certainly wasn't in my case.
My situation was that a specific user of my software was using a feature in a way that caused millions of upserts to be performed on a single record. That record was used as status information, and in my naivety I was blissfully unaware of the impending failure when the INTEGER id fields nextval() reached its limit, that being the following error:
ERROR: integer out of range
SQL state: 22003
So my question is and was, how can I prevent id fields from incrementing the next sequence value in the case of a conflict rollback.
I look foward to others adding their knowledge to my solution.

My immediate solution to this issue which alleviated the out of range situation was to alter the column to BIGINT, as follows:
ALTER TABLE MyTable ALTER COLUMN idMyTable TYPE BIGINT;
The number of records in my case was extremely small (<1000) so this was a trivial alteration to perform.
Once that was out of the way, it was time to look for a solution to the underlying issue. My solution is unlikely to be as performant as using a SERIAL field, so keep that in mind based on your use case if you are going to implement something similar to what I have done - there's always a trade off somewhere.
Consider the following table and resulting data insert/query:
CREATE TABLE TestTable ( id SERIAL PRIMARY KEY NOT NULL, Key TEXT UNIQUE NOT NULL, Val TEXT );
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'banana') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'apple') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'peach') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Animal', 'horse') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
SELECT * FROM TestTable;
id Key Val
1 Fruit peach
4 Animal horse
In this case, each conflict during the Fruit update has bumped the SERIAL value, even though it wasn't creating any new record in TestTable.
Now this is the workaround I'm currently working with. If anyone knows how to concatenate the table name onto 'NEW.id', I'd love to hear that, as I like to name my id columns idTablename for consistency.
CREATE OR REPLACE FUNCTION IncrementSerial()
RETURNS trigger AS $fn$
BEGIN
EXECUTE format('SELECT COALESCE( MAX( id ), 0 ) + 1 FROM %I.%I;',TG_TABLE_SCHEMA,TG_TABLE_NAME) INTO NEW.id;
RETURN NEW;
END
$fn$ LANGUAGE 'plpgsql'
CREATE TABLE TestTable ( id INTEGER PRIMARY KEY NOT NULL, Key TEXT UNIQUE NOT NULL, Val TEXT );
CREATE TRIGGER trgIncrementSerial
BEFORE INSERT ON TestTable
FOR EACH ROW
EXECUTE PROCEDURE IncrementSerial()
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'banana') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'apple') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'peach') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Animal', 'horse') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
SELECT * FROM TestTable;
id Key Val
1 Fruit peach
2 Animal horse
As you can see, the SERIAL id is now just using the next highest id number, which is ideal for most of my use cases.
Obviously this is going to be a problem if the id key must always be unique, as removal of the last record will free up that id. If that's not a problem (ie. where the id is just used for references and cascades) then this might be a good solution for you.

Related

manually updating primary key

We are dealing with legacy code that doesn't auto-increment the primary key (see serial) so I have to manually do it. What is the correct way to manually update the primary key field on insert. I am getting an error when I do the below
Table:
CREATE TABLE pizza (
id bigint not null,
price int
)
Insert statement:
INSERT INTO pizza
(id, price)
VALUES
(
(SELECT max(id) from pizza)+1,
1.75
)
Don't use max()+1 to generate a primary key. It's not safe for concurrent inserts and it doesn't really scale well.
Just create a sequence and use that:
create sequence pizza_id_seq;
Then synchronize it with the current values in the table:
select setval('pizza_id_seq', coalesce(max(id),1))
from pizza;
Then, instead of changing your INSERT statements to use the dreaded max() + 1, just use the sequence:
INSERT INTO pizza
(id, price)
VALUES
(nextval('pizza_id_seq'), 1.75)

Controlling a table's primary key value when running On Conflict command

Got a table that I need to populate data while getting rid of duplicates. Am using ON CONFLICT ... DO NOTHING. Issue is, when the table has an auto_increment primary_key field -lets call it id-, it seems that that field continues to increase even when duplicates are not inserted resorting to the id field value being way higher than the number of records that have successfully been inserted.
Unfortunately SQL Fiddle does not currently support postgresql 9.5 so I'll copy paste the code below.
CREATE TABLE table_one
(
id serial primary key,
col_foo VARCHAR(40) not null unique,
col_bar VARCHAR(20)
);
INSERT into table_one (col_foo, col_bar)
VALUES ('1a', '1b'), ('2a', '2b'), ('1a', '2b'),('1a', Null), ('3a', '1b'), ('4a', '2b'), ('1a', '2b'),('1a', Null)
ON CONFLICT (col_foo) DO NOTHING;
If you run that on postgresql 9.5, you'll find that the final primary key is 6 while there are only 4 records. Is it possible to ensure that if 4 records out of 6 are successfully inserted then the max/last id field should have a value of 4?
In my current case, I was dealing with a large data set which had 1.2 million records inserted but the very last record had an id value of 62 million. That's what i'm trying to avoid if possible.
You could of course use a temp table to catch&suppress the duplicates:
CREATE TABLE table_one
(
id serial primary key,
col_foo VARCHAR(40) not null unique,
col_bar VARCHAR(20)
);
CREATE TEMP TABLE temp_one
(
id serial primary key, -- dont actually need this
col_foo VARCHAR(40) not null unique,
col_bar VARCHAR(20)
);
INSERT into temp_one (col_foo, col_bar)
VALUES ('1a', '1b'), ('2a', '2b'), ('1a', '2b'),('1a', Null), ('3a', '1b'), ('4a', '2b'), ('1a', '2b'),('1a', Null)
ON CONFLICT (col_foo) DO NOTHING
;
INSERT into table_one (col_foo, col_bar)
SELECT col_foo, col_bar FROM temp_one
ON CONFLICT (col_foo) DO NOTHING -- wont need this
-- (except for suppressing already-existing duplicates)
;
SELECT * FROM temp_one;
SELECT * FROM table_one;
You can not really change the behavior of ON CONFLICT. All it allows is updating the conflicting row instead of creating a new one.
You can reset the sequence and reassign the IDs afterwards, though:
SELECT setval('table_one_id_seq', 1);
UPDATE table_one SET id = nextval('table_one_id_seq');
And, of course, you should never rely on last ID to get the row count. And if you are worried about running out of IDs, use bigserial instead of serial.

How not to insert specific value into database

I have MS SQL Server database and insert some values to one of the tables.
Let's say that the table contains columns ID, int Status and text Text.
If possible, I would like to create a trigger which prevents from writing specific incorrect status (say 1) to the table. Instead, 0 should be written. However, other columns should be preserved when inserting new values:
If the new row written is (1, 4, "some text"), it is written as is.
If the new row written is (1, 1, "another text"), it is written as (1, 0, "another text")
Is it possible to create such trigger? How?
EDIT: I need to allow writing such record even if status column is invalid, so foreign keys will not work for me.
I think you would need a foreign key to ensure data integrity even if you choose to use a trigger (though I would myself prefer a 'helper' stored proc -- triggers can cause debugging hell) e.g.
CREATE TABLE MyStuff
(
ID INTEGER NOT NULL UNIQUE,
Status INTEGER NOT NULL
CHECK (Status IN (0, 1)),
UNIQUE (Status, ID)
);
CREATE TABLE MyZeroStuff
(
ID INTEGER NOT NULL,
Status INTEGER NOT NULL
CHECK (Status = 0),
FOREIGN KEY (Status, ID)
REFERENCES MyStuff (Status, ID),
my_text VARCHAR(20) NOT NULL
);
CREATE TRIGGER tr__MyZeroStuff
ON MyZeroStuff
INSTEAD OF INSERT, UPDATE
AS
BEGIN;
INSERT INTO MyZeroStuff (ID, Status, my_text)
SELECT i.ID, 0, i.my_text
FROM inserted AS i;
END;
An insert trigger has been mentioned, but another way to achieve this is to have a foriegn key on your Status column which points back to a Status table - this will not allow the write and change the value, instead it will simply disallow the write if the foriegn key is not valid.
Check out referential integrity for more info on this option.

SQL can I have a "conditionally unique" constraint on a table?

I've had this come up a couple times in my career, and none of my local peers seems to be able to answer it. Say I have a table that has a "Description" field which is a candidate key, except that sometimes a user will stop halfway through the process. So for maybe 25% of the records this value is null, but for all that are not NULL, it must be unique.
Another example might be a table which must maintain multiple "versions" of a record, and a bit value indicates which one is the "active" one. So the "candidate key" is always populated, but there may be three versions that are identical (with 0 in the active bit) and only one that is active (1 in the active bit).
I have alternate methods to solve these problems (in the first case, enforce the rule code, either in the stored procedure or business layer, and in the second, populate an archive table with a trigger and UNION the tables when I need a history). I don't want alternatives (unless there are demonstrably better solutions), I'm just wondering if any flavor of SQL can express "conditional uniqueness" in this way. I'm using MS SQL, so if there's a way to do it in that, great. I'm mostly just academically interested in the problem.
If you are using SQL Server 2008 a Index filter would maybe your solution:
http://msdn.microsoft.com/en-us/library/ms188783.aspx
This is how I enforce a Unique Index with multiple NULL values
CREATE UNIQUE INDEX [IDX_Blah] ON [tblBlah] ([MyCol]) WHERE [MyCol] IS NOT NULL
In the case of descriptions which are not yet completed, I wouldn't have those in the same table as the finalized descriptions. The final table would then have a unique index or primary key on the description.
In the case of the active/inactive, again I might have separate tables as you did with an "archive" or "history" table, but another possible way to do it in MS SQL Server at least is through the use of an indexed view:
CREATE TABLE Test_Conditionally_Unique
(
my_id INT NOT NULL,
active BIT NOT NULL DEFAULT 0
)
GO
CREATE VIEW dbo.Test_Conditionally_Unique_View
WITH SCHEMABINDING
AS
SELECT
my_id
FROM
dbo.Test_Conditionally_Unique
WHERE
active = 1
GO
CREATE UNIQUE CLUSTERED INDEX IDX1 ON Test_Conditionally_Unique_View (my_id)
GO
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 1)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 1)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 1) -- This insert will fail
You could use this same method for the NULL/Valued descriptions as well.
Thanks for the comments, the initial version of this answer was wrong.
Here's a trick using a computed column that effectively allows a nullable unique constraint in SQL Server:
create table NullAndUnique
(
id int identity,
name varchar(50),
uniqueName as case
when name is null then cast(id as varchar(51))
else name + '_' end,
unique(uniqueName)
)
insert into NullAndUnique default values
insert into NullAndUnique default values -- Works
insert into NullAndUnique default values -- not accidentally :)
insert into NullAndUnique (name) values ('Joel')
insert into NullAndUnique (name) values ('Joel') -- Boom!
It basically uses the id when the name is null. The + '_' is to avoid cases where name might be numeric, like 1, which could collide with the id.
I'm not entirely aware of your intended use or your tables, but you could try using a one to one relationship. Split out this "sometimes" unique column into a new table, create the UNIQUE index on that column in the new table and FK back to the original table using the original tables PK. Only have a row in this new table when the "unique" data is supposed to exist.
OLD tables:
TableA
ID pk
Col1 sometimes unique
Col...
NEW tables:
TableA
ID
Col...
TableB
ID PK, FK to TableA.ID
Col1 unique index
Oracle does. A fully null key is not indexed by a Btree in index in Oracle, and Oracle uses Btree indexes to enforce unique constraints.
Assuming one wished to version ID_COLUMN based on the ACTIVE_FLAG being set to 1:
CREATE UNIQUE INDEX idx_versioning_id ON mytable
(CASE active_flag WHEN 0 THEN NULL ELSE active_flag END,
CASE active_flag WHEN 0 THEN NULL ELSE id_column END);

conditional unique constraint

I have a situation where i need to enforce a unique constraint on a set of columns, but only for one value of a column.
So for example I have a table like Table(ID, Name, RecordStatus).
RecordStatus can only have a value 1 or 2 (active or deleted), and I want to create a unique constraint on (ID, RecordStatus) only when RecordStatus = 1, since I don't care if there are multiple deleted records with the same ID.
Apart from writing triggers, can I do that?
I am using SQL Server 2005.
Behold, the filtered index. From the documentation (emphasis mine):
A filtered index is an optimized nonclustered index especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance as well as reduce index maintenance and storage costs compared with full-table indexes.
And here's an example combining a unique index with a filter predicate:
create unique index MyIndex
on MyTable(ID)
where RecordStatus = 1;
This essentially enforces uniqueness of ID when RecordStatus is 1.
Following the creation of that index, a uniqueness violation will raise an arror:
Msg 2601, Level 14, State 1, Line 13
Cannot insert duplicate key row in object 'dbo.MyTable' with unique index 'MyIndex'. The duplicate key value is (9999).
Note: the filtered index was introduced in SQL Server 2008. For earlier versions of SQL Server, please see this answer.
Add a check constraint like this. The difference is, you'll return false if Status = 1 and Count > 0.
http://msdn.microsoft.com/en-us/library/ms188258.aspx
CREATE TABLE CheckConstraint
(
Id TINYINT,
Name VARCHAR(50),
RecordStatus TINYINT
)
GO
CREATE FUNCTION CheckActiveCount(
#Id INT
) RETURNS INT AS BEGIN
DECLARE #ret INT;
SELECT #ret = COUNT(*) FROM CheckConstraint WHERE Id = #Id AND RecordStatus = 1;
RETURN #ret;
END;
GO
ALTER TABLE CheckConstraint
ADD CONSTRAINT CheckActiveCountConstraint CHECK (NOT (dbo.CheckActiveCount(Id) > 1 AND RecordStatus = 1));
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 2);
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 2);
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 2);
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 1);
INSERT INTO CheckConstraint VALUES (2, 'Oh no!', 1);
INSERT INTO CheckConstraint VALUES (2, 'Oh no!', 2);
-- Msg 547, Level 16, State 0, Line 14
-- The INSERT statement conflicted with the CHECK constraint "CheckActiveCountConstraint". The conflict occurred in database "TestSchema", table "dbo.CheckConstraint".
INSERT INTO CheckConstraint VALUES (2, 'Oh no!', 1);
SELECT * FROM CheckConstraint;
-- Id Name RecordStatus
-- ---- ------------ ------------
-- 1 No Problems 2
-- 1 No Problems 2
-- 1 No Problems 2
-- 1 No Problems 1
-- 2 Oh no! 1
-- 2 Oh no! 2
ALTER TABLE CheckConstraint
DROP CONSTRAINT CheckActiveCountConstraint;
DROP FUNCTION CheckActiveCount;
DROP TABLE CheckConstraint;
You could move the deleted records to a table that lacks the constraint, and perhaps use a view with UNION of the two tables to preserve the appearance of a single table.
You can do this in a really hacky way...
Create an schemabound view on your table.
CREATE VIEW Whatever
SELECT * FROM Table
WHERE RecordStatus = 1
Now create a unique constraint on the view with the fields you want.
One note about schemabound views though, if you change the underlying tables you will have to recreate the view. Plenty of gotchas because of that.
For those still searching for a solution, I came accross a nice answer, to a similar question and I think this can be still useful for many. While moving deleted records to another table may be a better solution, for those who don't want to move the record can use the idea in the linked answer which is as follows.
Set deleted=0 when the record is available/active.
Set deleted=<row_id or some other unique value> when marking the row
as deleted.
If you can't use NULL as a RecordStatus as Bill's suggested, you could combine his idea with a function-based index. Create a function that returns NULL if the RecordStatus is not one of the values you want to consider in your constraint (and the RecordStatus otherwise) and create an index over that.
That'll have the advantage that you don't have to explicitly examine other rows in the table in your constraint, which could cause you performance issues.
I should say I don't know SQL server at all, but I have successfully used this approach in Oracle.
Because, you are going to allow duplicates, a unique constraint will not work. You can create a check constraint for RecordStatus column and a stored procedure for INSERT that checks the existing active records before inserting duplicate IDs.