Unique column value combination with NULL values

Unique column value combination with NULL values - sql

I'm trying to conditionally insert some data into a table, where each combination of column values may only appear at most once in the table. The schema looks something like this:
CREATE TABLE foobar (
id SERIAL PRIMARY KEY,
a_id INTEGER,
b_id INTEGER,
c_id INTEGER,
ident VARCHAR(32),
date_a timestamp,
date_b timestamp,
FOREIGN KEY a_id REFERENCES a (id) ON DELETE CASCADE,
FOREIGN KEY b_id REFERENCES b (id) ON DELETE CASCADE,
FOREIGN KEY c_id REFERENCES c (id) ON DELETE CASCADE));
The combination of (a_id, b_id, c_id, ident) is unique, but only for rows where date_a AND date_b are both NULL.
I want to be able to insert a new row only if the a_id, b_id, c_id, task combination is not already in the db. if it is it doesn't have to do anything.
At first I tried to create a unique constraint on these columns, but the problem is that a_id, b_id and c_id are allowed to be NULL, as long as least one of them is not null. This ruins the unique constraint. Because the a, b and c_id fields are foreign keys, I can't set them to some other stub value (like -1).
I tried playing with locking (against my better judgement) which resulted in a deadlock within a couple of minutes of testing.
Is there any standard solution to this problem?

Rather than using a unique constraint, you should probably be using a unique index on the exact conditions you want to check for. You could then coalesce the null values into a dummy value, such as -1. Something like:
CREATE UNIQUE INDEX
foobar_test ON foobar
(COALESCE(a_id, -1), COALESCE(b_id, -1), COALESCE(c_id, -1), ident) -- Nulls become -1
WHERE date_a is null and date_b is null; -- Only check when date_a and date_b is null
This would make sure a_id, b_id, c_id, and ident were unique (including their combination of null values) for all rows where both date_a and date_b were null.

In a similar situations in multiple productive databases of mine I just add a "default row" to referenced tables. I use id = 0 for these and start real surrogate keys at 10 or 100. Or you could use -1 (or whatever suits you) if 0 is reserved.
This way I can set all fk columns to NOT NULL DEFAULT 0 and (partial) UNIQUE constraints work out of the box:
CREATE UNIQUE INDEX tbl_uni_idx ON tbl (a_id, b_id, c_id, ident)
WHERE date_a IS NULL AND date_b IS NULL;

Related

How to insert multiple rows into table B, and update table A's null foreign keys with the new IDs?

I've found a million things sounding kind of similar on StackOverflow, but not my case exactly. I'll simplify as much as possible:
I have two tables as follows:
CREATE TABLE B (id uuid PRIMARY KEY);
CREATE TABLE A (id uuid PRIMARY KEY, b_id uuid REFERENCES b);
There are some NULL values in A.b_id. I am trying to create a migration that does the following:
For every row in A with no b_id, create a new row in B, and assign its id to A.b_id.
How can I accomplish this in one query?

Assuming you want a distinct entry in b for every row with a missing UUID in a:
WITH upd AS (
UPDATE a
SET b_id = gen_random_uuid()
WHERE b_id IS NULL
RETURNING b_id
)
INSERT INTO b (id)
SELECT b_id FROM upd;
db<>fiddle here
This works because it's a single command, and the FK reference is only enforced at the end of the command.
See:
SET CONSTRAINTS ALL DEFERRED not working as expected
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?

Unique constraints on multiple columns that cannot both be null

I wanted to ask if there's a better way of modeling the following behavior in Postgres 10:
CREATE TABLE test.my_table
(
id UUID PRIMARY KEY,
id_a UUID,
id_b UUID,
some_shared_data JSONB,
UNIQUE (id_a, id_b)
);
CREATE UNIQUE INDEX IF NOT EXISTS b_null_constraint ON test.my_table (id_a) WHERE id_b IS NULL;
CREATE UNIQUE INDEX IF NOT EXISTS a_null_constraint ON test.my_table (id_b) WHERE id_a IS NULL;
ALTER TABLE test.my_table
ADD CONSTRAINT both_null_constraint CHECK (
(id_b IS NOT NULL) OR (id_a IS NOT NULL));
I.e. the constraints are:
Both id_a and id_b cannot be null
The combination of id_a and id_b must be unique (including cases where one of them is null)
It feels to me the code above to set this up is not very expressive. Would people do this in another/more normalized way? I tried splitting this up in separate tables but then constraint (1.) is hard to satisfy.

It is possible to do this with just two unique constraints. The second one is:
CREATE UNIQUE INDEX IF NOT EXISTS ab_null_constraint ON my_table ( coalesce(id_a, id_b), (id_a is null) WHERE id_a IS NULL or id_b is null;
Here is a db<>fiddle.
Actually, you can combine all this into one unique index:
CREATE UNIQUE INDEX IF NOT EXISTS ab_null_constraint ON
my_table ( coalesce(id_a, id_b),
coalesce(id_b, id_a),
(id_a is null),
(id_b is null)
);
Here is a db<>fiddle for this.
You might find your original formulation more maintainable.

How to create SQL constraint on primary key to make sure it could only be referenced once?

How do I add constraint to guard that a primary key could only be referenced once?(It could be referenced in two tables)
Each reference should have a unique value out of the primary key.
Table A
----------------------
id
1
2
3
4
Table B
----------------------
id a_id (foreign key to table A.id)
1 2
2 3
Table C
----------------------
id a_id (foreign key to table A.id)
1 1
I want something to happen to give error when try to insert a_id = 2 into table C as its used in table B already.

You can use an INSERT, UPDATE trigger on each of the child tables to ensure that the PK of the parent table that is about to be inserted or updated does not already exist in the other child table.

What you are trying to do requires another table D, that will help unify the references to A.
Table D will contain its own primary key ( Id ), a reference to table A with a UNIQUE constraint on it (call it AId ), and a third column (called "RowType") to indicate to which of the child tables (B or C) the row corresponds. You can make this column to be of type int, and assign value "0" for B and "1" for C, for example.
Then in table B you add a foreign key to D.Id, AND another column "BRowType" as foreign key to D.RowType; then you define a constraint on this column, so it can only have the value '0' ( or whatever value you have decided to correspond to this table).
For table C your constraint will limit the values to '1'.
Or course, in order to insert a record into B or C you first need to create a record in D. But once you have a record in B that references a record in D, which in turn links to a record in A, you will no longer be able to create a record in C for the same line in A - because of the UNIQUE constraint on D.AId AND the constraint on C.BRowType.

If I understand the question correctly, it sounds like you need to add a unique constraint on the column of each table that references your primary key.
For example:
Table A
----------------------
id (primary key)
1
2
3
Table B
----------------------
id a_id (foreign key to table A.id)
1 2
2 3
Set the a_id column to be UNIQUE and that way you can ensure that the primary key from Table A is not used twice. You would do that in each table which references A.id

If you want to avoid using triggers, you could create a table X with id and a unique constraint on it.
In each transaction in which you insert a record into B or C you have to insert into X as well. Both insertions will only be possible if not yet in the other table.

Creating a Foreign Key based on an index name rather than table(columns)

I have a table with a timestamp field onto which I've created a composite index.
CREATE INDEX "IDX_NAME_A" ON TABLE "A" (a_id, extract(year FROM created_at))
I have another table which stores a year and a_id which I'd like to have a foreign key relation.
I can't seem to find the syntax to do what I want.
ALTER TABLE "B"
ADD FOREIGN KEY(a_id, a_year)
REFERENCES A(a_id, extract(YEAR FROM created_at));
produces:
ERROR: syntax error at or near "("
I've also tried ...
ALTER TABLE "B"
ADD FOREIGN KEY(a_id, a_year)
USING INDEX "IDX_NAME_A";
Any Ideas?
Table A
--------
a_id serial,
created_at timestamp default now()
Table B
-------
b_id serial
a_id integer not null,
a_year date_part('year')

A foreign key constraint cannot reference an index. It has to be a table.
A foreign key constraint cannot reference an expression. It has to point to column name(s) of the referenced table.
And there has to exist a unique index (primary key qualifies, too, implicitly) on the set of referenced columns.
Start by reading the manual about foreign keys here.
The superior design would be to just drop the column b.a_year. It is 100% redundant and can be derived from a.created_at any time.
If you positively need the column (for instance to enforce one row per year for certain criteria in table b), you can achieve your goal like this:
CREATE TABLE a (
a_id serial
,created_at timestamp NOT NULL DEFAULT now()
,a_year NOT NULL DEFAULT extract(year FROM now())::int -- redundant, for fk
,CHECK (a_year = extract(year FROM created_at)::int)
);
CREATE UNIQUE INDEX a_id_a_year_idx ON TABLE a (a_id, a_year); -- needed for fk
CREATE TABLE b (
b_id serial
,a_id integer NOT NULL
,a_year int -- doesn't have to be NOT NULL, but might
,CONSTRAINT id_year FOREIGN KEY (a_id, a_year) REFERENCES a(a_id, a_year)
);
Updated after #Catcall's comment:
The CHECK constraint in combination with the column DEFAULT and NOT NULL clauses enforces your regime.
Alternatively (less simple, but allowing for NULL values) you could maintain the values in a.a_year with a trigger:
CREATE OR REPLACE FUNCTION trg_a_insupbef()
RETURNS trigger AS
$BODY$
BEGIN
NEW.a_year := extract(year FROM NEW.created_at)::int;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER insupbef
BEFORE INSERT OR UPDATE ON a
FOR EACH ROW EXECUTE PROCEDURE trg_a_insupbef();

How to constraint one column with values from a column from another table?

This isn't a big deal, but my OCD is acting up with the following problem in the database I'm creating. I'm not used to working with databases, but the data has to be stored somewhere...
Problem
I have two tables A and B.
One of the datafields is common to both tables - segments. There's a finite number of segments, and I want to write queries that connect values from A to B through their segment values, very much asif the following table structure was used:
However, as you can see the table Segments is empty. There's nothing more I want to put into that table, rather than the ID to give other table as foreign keys. I want my tables to be as simple as possible, and therefore adding another one just seems wrong.
Note also that one of these tables (A, say) is actually master, in the sense that you should be able to put any value for segment into A, but B one should first check with A before inserting.
EDIT
I tried one of the answers below:
create table A(
id int primary key identity,
segment int not null
)
create table B(
id integer primary key identity,
segment int not null
)
--Andomar's suggestion
alter table B add constraint FK_B_SegmentID
foreign key (segment) references A(segment)
This produced the following error.
Maybe I was somehow unclear that segments is not-unique in A or B and can appear many times in both tables.
Msg 1776, Level 16, State 0, Line 11 There are no primary or candidate
keys in the referenced table 'A' that match the referencing column
list in the foreign key 'FK_B_SegmentID'. Msg 1750, Level 16, State 0,
Line 11 Could not create constraint. See previous errors.

You can create a foreign key relationship directly from B.SegmentID to A.SegmentID. There's no need for the extra table.
Update: If the SegmentIDs aren't unique in TableA, then you do need the extra table to store the segment IDs, and create foreign key relationships from both tables to this table. This however is not enough to enforce that all segment IDs in TableB also occur in TableA. You could instead use triggers.

You can ensure the segment exists in A with a foreign key:
alter table B add constraint FK_B_SegmentID
foreign key (SegmentID) references A(SegmentID)
To avoid rows in B without a segment at all, make B.SegmentID not nullable:
alter table B alter column SegmentID int not null
There is no need to create a Segments table unless you want to associate extra data with a SegmentID.

As Andomar and Mark Byers wrote, you don't have to create an extra table.
You can also CASCADE UPDATEs or DELETEs on the master. Be very carefull with ON DELETE CASCADE though!
For queries use a JOIN:
SELECT *
FROM A
JOIN B ON a.SegmentID = b.SegmentID
Edit:
You have to add a UNIQUE constraint on segment_id in the "master" table to avoid duplicates there, or else the foreign key is not possible. Like this:
ALTER TABLE A ADD CONSTRAINT UNQ_A_SegmentID UNIQUE (SegmentID);

If I've understood correctly, a given segment cannot be inserted into table B unless it has also been inserted into table A. In which case, table A should reference table Segments and table B should reference table A; it would be implicit that table B ultimately references table Segments (indirectly via table A) so an explicit reference is not required. This could be done using foreign keys (e.g. no triggers required).
Because table A has its own key I assume a given segment_ID can appear in table A more than once, therefore for B to be able to reference the segment_ID value in A then a superkey would need to be defined on the compound of A_ID and segment_ID. Here's a quick sketch:
CREATE TABLE Segments
(
segment_ID INTEGER NOT NULL UNIQUE
);
CREATE TABLE A
(
A_ID INTEGER NOT NULL UNIQUE,
segment_ID INTEGER NOT NULL
REFERENCES Segments (segment_ID),
A_data INTEGER NOT NULL,
UNIQUE (segment_ID, A_ID) -- superkey
);
CREATE TABLE B
(
B_ID INTEGER NOT NULL UNIQUE,
A_ID INTEGER NOT NULL,
segment_ID INTEGER NOT NULL,
FOREIGN KEY (segment_ID, A_ID)
REFERENCES A (segment_ID, A_ID),
B_data INTEGER NOT NULL
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas