Controlling a table's primary key value when running On Conflict command - sql

Got a table that I need to populate data while getting rid of duplicates. Am using ON CONFLICT ... DO NOTHING. Issue is, when the table has an auto_increment primary_key field -lets call it id-, it seems that that field continues to increase even when duplicates are not inserted resorting to the id field value being way higher than the number of records that have successfully been inserted.
Unfortunately SQL Fiddle does not currently support postgresql 9.5 so I'll copy paste the code below.
CREATE TABLE table_one
(
id serial primary key,
col_foo VARCHAR(40) not null unique,
col_bar VARCHAR(20)
);
INSERT into table_one (col_foo, col_bar)
VALUES ('1a', '1b'), ('2a', '2b'), ('1a', '2b'),('1a', Null), ('3a', '1b'), ('4a', '2b'), ('1a', '2b'),('1a', Null)
ON CONFLICT (col_foo) DO NOTHING;
If you run that on postgresql 9.5, you'll find that the final primary key is 6 while there are only 4 records. Is it possible to ensure that if 4 records out of 6 are successfully inserted then the max/last id field should have a value of 4?
In my current case, I was dealing with a large data set which had 1.2 million records inserted but the very last record had an id value of 62 million. That's what i'm trying to avoid if possible.

You could of course use a temp table to catch&suppress the duplicates:
CREATE TABLE table_one
(
id serial primary key,
col_foo VARCHAR(40) not null unique,
col_bar VARCHAR(20)
);
CREATE TEMP TABLE temp_one
(
id serial primary key, -- dont actually need this
col_foo VARCHAR(40) not null unique,
col_bar VARCHAR(20)
);
INSERT into temp_one (col_foo, col_bar)
VALUES ('1a', '1b'), ('2a', '2b'), ('1a', '2b'),('1a', Null), ('3a', '1b'), ('4a', '2b'), ('1a', '2b'),('1a', Null)
ON CONFLICT (col_foo) DO NOTHING
;
INSERT into table_one (col_foo, col_bar)
SELECT col_foo, col_bar FROM temp_one
ON CONFLICT (col_foo) DO NOTHING -- wont need this
-- (except for suppressing already-existing duplicates)
;
SELECT * FROM temp_one;
SELECT * FROM table_one;

You can not really change the behavior of ON CONFLICT. All it allows is updating the conflicting row instead of creating a new one.
You can reset the sequence and reassign the IDs afterwards, though:
SELECT setval('table_one_id_seq', 1);
UPDATE table_one SET id = nextval('table_one_id_seq');
And, of course, you should never rely on last ID to get the row count. And if you are worried about running out of IDs, use bigserial instead of serial.

Related

Update Postgres SQL table with SERIAL from previous insert [duplicate]

This question already has answers here:
Insert a value from a table in another table as foreign key
(3 answers)
Closed 4 months ago.
Very new to SQL in general, working on creating 2 Tables, 1 for example representing appliances with a primary key, second representing a microwave for example with its FK referencing the primary tables PK.
I'm using SERIAL as the id for the primary table, but don't know how to update or insert into the second table using that specific generated value from the first.
I've created my tables using PSQL (Postgres15) like so:
CREATE TABLE Appliances (
id SERIAL NOT NULL,
field1 integer NOT NULL DEFAULT (0),
--
PRIMARY KEY (id),
UNIQUE(id)
);
CREATE TABLE Microwaves (
id integer NOT NULL,
field1 integer,
--
PRIMARY KEY (id),
FOREIGN KEY (id) REFERENCES Appliances(id)
);
Inserting my first row into the Appliance table:
INSERT INTO Appliances(field1) VALUES(1);
SELECT * FROM Appliances;
Yields:
And a query I found somewhere pulls the current increment of the SERIAL:
SELECT currval(pg_get_serial_sequence('Appliances', 'id'));
Yields:
I'm struggling to determine how to format the INSERT statement, have tried several variations around the below input:
INSERT INTO Microwaves VALUES(SELECT currval(pg_get_serial_sequence('Appliances', 'id'), 1));
Yields:
Appreciate feedback on solving the problem as represented, or a better way to tackle this in general.
Okay looks like I stumbled on at least one solution that works in my case as taken from https://stackoverflow.com/a/50004699/3564760
DO $$
DECLARE appliance_id integer;
BEGIN
INSERT INTO Appliances(field1) VALUES('appliance2') RETURNING id INTO appliance_id;
INSERT INTO Microwaves(id, field2) VALUES(appliance_id, 100);
END $$;
Still open to other answers if this isn't ideal.

How auto increment id and insert 2 rows

I have two table with one to one relation and I want to insert two rows to the tables with the same auto increment id. Is it possible?
create table first
(
id bigint primary key,
value varchar(100) not null
);
create table second
(
id bigint references first (id),
sign boolean
);
insert into first(id, value)
values (-- autoincremented, 'some_value');
insert into second(id, sign)
values (-- the same autoincremented, true);
Your id column must be defined as an "auto increment" one before you can use that:
create table first
(
id bigint generated always as identity primary key,
value varchar(100) not null
);
Then you can use lastval() to get the last generated id:
insert into first(id, value)
values (default, 'some_value');
insert into second(id, sign)
values (lastval(), true);
Or if you want to be explicit:
insert into first(id, value)
values (default, 'some_value');
insert into second(id, sign)
values (currval(pg_get_serial_sequence('first','id')), true);
One option uses a cte with the returning clause:
with i as (
insert into first(value) values('some_value')
returning id
)
insert into second(id, sign)
select i.id, true from i
This performs the two inserts at once; the id of the first insert is auto-generated, and then used in the second insert.
For this to work, you need the id of the first table to be defined as serial.

Alternative to Postgres SERIAL field to solve incrementing values when ON CONFLICT causes update

I've recently been caught out by being unaware of the issue where SERIAL fields increment whether data is inserted or not.
Most of the answers I've read on this matter discuss preventing holes from appearing in the column, which I'm fairly certain in most cases isn't what the question posed is concerned with, and it certainly wasn't in my case.
My situation was that a specific user of my software was using a feature in a way that caused millions of upserts to be performed on a single record. That record was used as status information, and in my naivety I was blissfully unaware of the impending failure when the INTEGER id fields nextval() reached its limit, that being the following error:
ERROR: integer out of range
SQL state: 22003
So my question is and was, how can I prevent id fields from incrementing the next sequence value in the case of a conflict rollback.
I look foward to others adding their knowledge to my solution.
My immediate solution to this issue which alleviated the out of range situation was to alter the column to BIGINT, as follows:
ALTER TABLE MyTable ALTER COLUMN idMyTable TYPE BIGINT;
The number of records in my case was extremely small (<1000) so this was a trivial alteration to perform.
Once that was out of the way, it was time to look for a solution to the underlying issue. My solution is unlikely to be as performant as using a SERIAL field, so keep that in mind based on your use case if you are going to implement something similar to what I have done - there's always a trade off somewhere.
Consider the following table and resulting data insert/query:
CREATE TABLE TestTable ( id SERIAL PRIMARY KEY NOT NULL, Key TEXT UNIQUE NOT NULL, Val TEXT );
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'banana') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'apple') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'peach') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Animal', 'horse') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
SELECT * FROM TestTable;
id Key Val
1 Fruit peach
4 Animal horse
In this case, each conflict during the Fruit update has bumped the SERIAL value, even though it wasn't creating any new record in TestTable.
Now this is the workaround I'm currently working with. If anyone knows how to concatenate the table name onto 'NEW.id', I'd love to hear that, as I like to name my id columns idTablename for consistency.
CREATE OR REPLACE FUNCTION IncrementSerial()
RETURNS trigger AS $fn$
BEGIN
EXECUTE format('SELECT COALESCE( MAX( id ), 0 ) + 1 FROM %I.%I;',TG_TABLE_SCHEMA,TG_TABLE_NAME) INTO NEW.id;
RETURN NEW;
END
$fn$ LANGUAGE 'plpgsql'
CREATE TABLE TestTable ( id INTEGER PRIMARY KEY NOT NULL, Key TEXT UNIQUE NOT NULL, Val TEXT );
CREATE TRIGGER trgIncrementSerial
BEFORE INSERT ON TestTable
FOR EACH ROW
EXECUTE PROCEDURE IncrementSerial()
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'banana') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'apple') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Fruit', 'peach') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
INSERT INTO TestTable (Key,Val) VALUES ('Animal', 'horse') ON CONFLICT( Key ) DO UPDATE SET Val=EXCLUDED.Val;
SELECT * FROM TestTable;
id Key Val
1 Fruit peach
2 Animal horse
As you can see, the SERIAL id is now just using the next highest id number, which is ideal for most of my use cases.
Obviously this is going to be a problem if the id key must always be unique, as removal of the last record will free up that id. If that's not a problem (ie. where the id is just used for references and cascades) then this might be a good solution for you.

SQL Merge output both matched and unmatched results

I have two keys for data, pk is supposed to be generated from database when a row is inserted, while fk is a complex key which is given by another system. I would like to produce a pk key for each fk key.
CREATE TABLE test_target (
[pk] [INT] IDENTITY(1,1),
[fk] [varchar](20) NOT NULL)
And I can use merge to ensure that a new pk is produced whenever there is no corresponding fk exists in the table and I know I can output the newly created ids.
CREATE TABLE test_source (
[fk] [varchar](20) NOT NULL)
INSERT INTO test_source VALUES('abc123'),('def456'),('ghi789')
MERGE test_target WITH (SERIALIZABLE) AS T
USING test_source AS U
ON U.fk = T.fk
WHEN NOT MATCHED THEN
INSERT (fk) VALUES(U.fk)
OUTPUT inserted.pk, inserted.fk;
However, what I really want is all the pk associated with the fk in the test_source table. So I can get all by joining two tables.
SELECT test_target.* FROM test_target
INNER JOIN test_source ON test_target.fk = test_source.fk
But I feel like the associate pk is already found when in the case of MATCHED in the merge statement, so it is duplicated effort to do another search on the target table. My question is that is there a way to output the MATCHED pk in the same merge statement?
Yes there is - at first I thought I had to touch the row and update it in some form but I've realized we can just trick it. The output clause will output any row the statement touches, not just the rows you did not match on, so you can include a when matched clause - the problem is to make it a null op.
create table foo
(
id int
,bar varchar(30)
)
insert into foo (id, bar) values (1,'test1');
insert into foo (id, bar) values (2,'test2');
insert into foo (id, bar) values (3,'test3');
declare #temp int;
merge foo as dest
using
(
values (2, 'an updated value')
, (4, 'a new value')
) as src(id, bar) on (dest.id = src.id)
when matched then
update set #temp=1
when not matched then
insert (id,bar)
values (src.id, src.bar)
output $action, src.id;
You can see in the when matched clause, I set a declared variable to 1. This is oddly enough to be considered for the output clause to pick it up for output. You can distinguish which operation (insert vs update) has occurred if you need with the $action in the output.
This gives the following results:
$action id
UPDATE 2
INSERT 4
Performance wise I'd want to test how to operated at scale, or whether the variable assignment would cause a throttling effect

SQL can I have a "conditionally unique" constraint on a table?

I've had this come up a couple times in my career, and none of my local peers seems to be able to answer it. Say I have a table that has a "Description" field which is a candidate key, except that sometimes a user will stop halfway through the process. So for maybe 25% of the records this value is null, but for all that are not NULL, it must be unique.
Another example might be a table which must maintain multiple "versions" of a record, and a bit value indicates which one is the "active" one. So the "candidate key" is always populated, but there may be three versions that are identical (with 0 in the active bit) and only one that is active (1 in the active bit).
I have alternate methods to solve these problems (in the first case, enforce the rule code, either in the stored procedure or business layer, and in the second, populate an archive table with a trigger and UNION the tables when I need a history). I don't want alternatives (unless there are demonstrably better solutions), I'm just wondering if any flavor of SQL can express "conditional uniqueness" in this way. I'm using MS SQL, so if there's a way to do it in that, great. I'm mostly just academically interested in the problem.
If you are using SQL Server 2008 a Index filter would maybe your solution:
http://msdn.microsoft.com/en-us/library/ms188783.aspx
This is how I enforce a Unique Index with multiple NULL values
CREATE UNIQUE INDEX [IDX_Blah] ON [tblBlah] ([MyCol]) WHERE [MyCol] IS NOT NULL
In the case of descriptions which are not yet completed, I wouldn't have those in the same table as the finalized descriptions. The final table would then have a unique index or primary key on the description.
In the case of the active/inactive, again I might have separate tables as you did with an "archive" or "history" table, but another possible way to do it in MS SQL Server at least is through the use of an indexed view:
CREATE TABLE Test_Conditionally_Unique
(
my_id INT NOT NULL,
active BIT NOT NULL DEFAULT 0
)
GO
CREATE VIEW dbo.Test_Conditionally_Unique_View
WITH SCHEMABINDING
AS
SELECT
my_id
FROM
dbo.Test_Conditionally_Unique
WHERE
active = 1
GO
CREATE UNIQUE CLUSTERED INDEX IDX1 ON Test_Conditionally_Unique_View (my_id)
GO
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 1)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 1)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 1) -- This insert will fail
You could use this same method for the NULL/Valued descriptions as well.
Thanks for the comments, the initial version of this answer was wrong.
Here's a trick using a computed column that effectively allows a nullable unique constraint in SQL Server:
create table NullAndUnique
(
id int identity,
name varchar(50),
uniqueName as case
when name is null then cast(id as varchar(51))
else name + '_' end,
unique(uniqueName)
)
insert into NullAndUnique default values
insert into NullAndUnique default values -- Works
insert into NullAndUnique default values -- not accidentally :)
insert into NullAndUnique (name) values ('Joel')
insert into NullAndUnique (name) values ('Joel') -- Boom!
It basically uses the id when the name is null. The + '_' is to avoid cases where name might be numeric, like 1, which could collide with the id.
I'm not entirely aware of your intended use or your tables, but you could try using a one to one relationship. Split out this "sometimes" unique column into a new table, create the UNIQUE index on that column in the new table and FK back to the original table using the original tables PK. Only have a row in this new table when the "unique" data is supposed to exist.
OLD tables:
TableA
ID pk
Col1 sometimes unique
Col...
NEW tables:
TableA
ID
Col...
TableB
ID PK, FK to TableA.ID
Col1 unique index
Oracle does. A fully null key is not indexed by a Btree in index in Oracle, and Oracle uses Btree indexes to enforce unique constraints.
Assuming one wished to version ID_COLUMN based on the ACTIVE_FLAG being set to 1:
CREATE UNIQUE INDEX idx_versioning_id ON mytable
(CASE active_flag WHEN 0 THEN NULL ELSE active_flag END,
CASE active_flag WHEN 0 THEN NULL ELSE id_column END);