Performance of ON DELETE CASCADE in PostgresSQL - sql

I have an issue related to performance of ON DELETE CASCADE. I'm trying to understand why it takes so long. For this topic purposes I simplified real case to schema presented below:
CREATE TABLE IF NOT EXISTS public.items
(
id uuid NOT NULL,
name text COLLATE pg_catalog."default",
CONSTRAINT items_pk PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS public.links
(
parent uuid,
child uuid,
CONSTRAINT links_parent_fk FOREIGN KEY (parent)
REFERENCES public.items (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE,
CONSTRAINT links_child_fk FOREIGN KEY (child)
REFERENCES public.items (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS parent_idx
ON public.links USING btree
(parent ASC NULLS LAST);
CREATE INDEX IF NOT EXISTS child_idx
ON public.links USING btree
(child ASC NULLS LAST);
CREATE EXTENSION "uuid-ossp";
and data can be generated with:
INSERT INTO public.items
SELECT uuid_generate_v4 (), 'item_' || i
FROM generate_series(1, 134001) AS i;
INSERT INTO links
SELECT (SELECT id FROM public.items WHERE name='item_1'), id FROM public.items;
Briefly, data base contains two tables. Table items contains a list of items (identifier and name column) and table links which defines relations between items (parent <-> child). In presented case all items (children) belongs to item named 'item_1' (parent).
I call a query in order to delete all children assigned to parent:
BEGIN;
EXPLAIN ANALYZE DELETE FROM public.items where id in (SELECT child FROM public.links WHERE parent = (SELECT id FROM public.items WHERE name='item_1'));
ROLLBACK;
From execution plan we can read among others:
"Trigger for constraint links_parent_fk: time=10451.471 calls=134001"
"Trigger for constraint links_child_fk: time=2962.035 calls=134001"
The question is why trigger for constraint links_parent_fk consumes a lot time?
I performed some attempts with exchanging data between columns in links table. After that trigger for links_child_fk consumed ~10 s and trigger for links_parent_fk took ~3 s. I'm curious why there is such difference between execution of this delete cascades?
PostgreSQL version: 12.4 and 13.9.

Related

Cascade Delete Children not working as expected

I have two tables one of which is for the polymorphic relationship of different corporations and I've added foreign key references to ids to ensure that if I delete a parent all children will be deleted. With this table setup below if I delete a parent corporation the child corporation persists which is not what I expected. If I delete a corporation_relationship via the parent_id the parent and its children cascade delete and if I a delete the relationship via the child_id the parent and siblings are unaffected. My questions are what am I doing wrong and how can I ensure that by deleting a parent the children are also deleted without adding any new columns?
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TYPE "corporation_relationship_type" AS ENUM (
'campus',
'network'
);
CREATE TABLE "corporations" (
"id" uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
"name" varchar(255) NOT NULL
);
CREATE TABLE "corporation_relationships" (
"parent_id" uuid NOT NULL,
"child_id" uuid NOT NULL,
"type" corporation_relationship_type NOT NULL,
PRIMARY KEY ("parent_id", "child_id")
);
ALTER TABLE "corporation_relationships" ADD FOREIGN KEY ("parent_id") REFERENCES "corporations" ("id") ON DELETE CASCADE;
ALTER TABLE "corporation_relationships" ADD FOREIGN KEY ("child_id") REFERENCES "corporations" ("id") ON DELETE CASCADE;
Example queries:
If I add 2 corporations and then add a relationship to the two like so:
insert into corporations (id, name) values ('f9f8f7f6-f5f4f3f2-f1f0f0f0-f0f0f0f0', 'Father');
insert into corporations (id, name) values ('f9f8f7f6-f5f4f3f2-f1f0f0f0-f0f0f0f1', 'Son');
insert into corporation_relationships (parent_id, child_id) values ('f9f8f7f6-f5f4f3f2-f1f0f0f0-f0f0f0f0', 'f9f8f7f6-f5f4f3f2-f1f0f0f0-f0f0f0f1');
My output for select * from corporations; will be:
id | name
--------------------------------------+--------------------
f9f8f7f6-f5f4-f3f2-f1f0-f0f0f0f0f0f0 | Father
f9f8f7f6-f5f4-f3f2-f1f0-f0f0f0f0f0f1 | Son
(2 rows)
My output for select * from corporation_relationships; is:
parent_id | child_id | type
--------------------------------------+--------------------------------------+--------
f9f8f7f6-f5f4-f3f2-f1f0-f0f0f0f0f0f0 | f9f8f7f6-f5f4-f3f2-f1f0-f0f0f0f0f0f1 | campus
Now if I delete the 'father' by executing delete FROM corporations WHERE id = 'f9f8f7f6-f5f4-f3f2-f1f0-f0f0f0f0f0f0'; I would expect my output of select * from corporations; to be nothing but instead it is the following:
id | name
--------------------------------------+--------------------
f9f8f7f6-f5f4-f3f2-f1f0-f0f0f0f0f0f1 | Son
(1 row)
Also, it is noteworthy that the corporation_relationships table is empty after this delete as well but I would want the cascade to keep going past that table and delete the child entity as well.
Your second foreign key constraint in the corporation_relationships table, that references to the corporations table has nothing with with your expectations of cascade deletions of children rows in corporations. To clearify, this foreign key do cascade deletions when you delete a referenced row in the corporations table. But you need the opposite.
To make it work as you expect in your design, you should have a column in corporations that references a primary key in corporation_relationships.
So you need to
create a primary key column, e.g. id, in corporation_relationships (not those you already have, it's not a pk, it's a unique constraint).
create a column in corporations and add a foreign key constraint on it that references a created corporation_relationships pk.
Remove a child_id column from corporation_relationships, it's incorrect and useless at this point.
When you create a relation you should set it's id to the fk column of corresponding child row in corporations.
Now, if you delete a parent corporation, it would delete all relationships, those will delete corresponding children of corporation and so on recursively.
Meanwhile, in my opinion, your design is not correct.
To define a tree-like relations you do not need the transit table, i.e
corporation_relationships. You can define it in a single corporations table. For that you need just a one column parent_id, those would be a foreign key with cascade delete rule, that references a pk in this table. Top-parent corporations would have a null in parent_id, all children - parent's id value.
Also, type column in corporation_relationships is not an attribute of relation itself, it's an attribute of child.
Postgres doesn't mantain referential integrity with optional polymorphic relationships so I created a trigger to do this for me:
CREATE FUNCTION cascade_delete_children() RETURNS trigger AS $$
BEGIN
-- Check if the corporation is a parent
IF OLD.id IN (SELECT parent_id FROM corporation_relationships) THEN
-- Delete all of the corporation's children
DELETE FROM corporations WHERE id IN (SELECT child_id FROM corporation_relationships WHERE parent_id = OLD.id);
END IF;
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
CREATE trigger cascade_delete_children BEFORE DELETE ON corporations
FOR EACH ROW EXECUTE PROCEDURE cascade_delete_children();

Very slow SQL DELETE query on table with foreign key constraint

I have got some trouble with a SQL DELETE query.
I work on a database (postgres 9.3) with 2 tables (Parent and Child).
The child has a relation to the parent with a foreign key.
Parent Table
CREATE TABLE parent
(
id bigint NOT NULL,
...
CONSTRAINT parent_pkey PRIMARY KEY (id)
)
Child Table
CREATE TABLE child
(
id bigint NOT NULL,
parent_id bigint,
...
CONSTRAINT child_pkey PRIMARY KEY (id),
CONSTRAINT fk_adc9xan172ilseglcmi1hi0co FOREIGN KEY (parent_id)
REFERENCES parent (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I inserted in both tables 200'000 entries without any relation ( Child.parent_id = NULL).
But a DELETE query like below has a duration of more than 20 minutes.
And that even without a WHERE conditions.
DELETE FROM Parent;
If I don't add the relation constraints the execution time will be done in 400 ms.
What did I miss?
A workable solution is the example below. But I don't know if this is a good idea. Maybe anyone could tell me a better way to do that.
BEGIN WORK;
ALTER TABLE Parent DISABLE TRIGGER ALL;
DELETE FROM Parent;
ALTER TABLE Parent ENABLE TRIGGER ALL;
COMMIT WORK;
When you delete from Parent, the Child table needs to be queried by parent_id to ensure that no child row refers to the parent row you are about to delete.
To ensure that the child lookup runs quickly, you need to have an index on your parent_id column in the Child table.

Updating primary keys in POSTGRESQL

I have a database from previous project that I want to use in another project, from security reasons I need to update the IDs of one of the table. Problem is that the table is heavily referenced by foreign keys from other tables:
CREATE TABLE "table_table" (
"id" serial NOT NULL PRIMARY KEY,
"created" timestamp with time zone NOT NULL,
);
CREATE TABLE "table_photo" (
"id" serial NOT NULL PRIMARY KEY,
"table_id" integer NOT NULL REFERENCES "table_table" ("id") DEFERRABLE INITIALLY DEFERRED,
);
Now if I change the id on table_table the reference from table_photo won't work.
I will probably use something like this to change the IDs:
UPDATE table_table SET id = id + 15613;
I have read somewhere that I could use ON UPDATE CASCADE constraints to do this but I am not very sure how to use it.
btw: I am using Django ORM.
Get the constraint name with \d "table_photo", which shows:
Foreign-key constraints:
"table_photo_table_id_fkey" FOREIGN KEY (table_id) REFERENCES table_table(id) DEFERRABLE INITIALLY DEFERRED
Then replace it with a constraint that has on update cascade:
ALTER TABLE "table_photo"
DROP CONSTRAINT "table_photo_table_id_fkey",
ADD CONSTRAINT "table_photo_table_id_fkey"
FOREIGN KEY ("table_id")
REFERENCES "table_table"
ON UPDATE CASCADE
DEFERRABLE INITIALLY DEFERRED;
Now when you do your UPDATE, referenced row IDs are automatically updated. Adding an index on "table_photo"."table_id" will help a lot.
This can be slow for big tables though. An alternative if you have large tables is to do it in a couple of stages. For table A with field id that's referenced by table B's field A_id:
Add a new column, new_id, to A, with a UNIQUE constraint. Leave it nullable.
Add a new column, A_new_id to table B, giving it a foreign key constraint to A(new_id).
Populate A.new_id with the new values
Do an
UPDATE B
SET A_new_id = A.new_id
FROM A
WHERE B.A_id = A.id;
to do a joined update, setting the new ID values in B.A_new_id to match.
Drop the column B.A_id and rename B.A_new_id to B.A_id.
Drop the column A.id and rename A.new_id to A.id
Create a PRIMARY KEY constraint on the renamed A.id, USING the index created automatically before.
It's a lot more complicated, especially since for big tables you usually want to do each of these steps in batches.
If this seems too complicated, just do it with a cascading foreign key constraint like above.

PostgreSQL delete fails with ON DELETE rule on inherited table

In my PostgreSQL 9.1 database I've defined RULEs that delete rows from child tables whenever a parent table row is deleted. This all worked OK, until I introduced inheritance. If the parent (referencing) table INHERITS from another table and I delete from the base table then the DELETE succeeds, but the RULE doesn't appear to fire at all - the referenced row is not deleted. If I try to delete from the derived table I get an error:
update or delete on table "referenced" violates foreign key constraint "fk_derived_referenced" on table "derived"
There is no other row in the parent table that would violate the foreign key: it's being referenced by the row that's being deleted! How do I fix this?
The following script reproduces the problem:
-- Schema
CREATE TABLE base
(
id serial NOT NULL,
name character varying(100),
CONSTRAINT pk_base PRIMARY KEY (id)
);
CREATE TABLE referenced
(
id serial NOT NULL,
value character varying(100),
CONSTRAINT pk_referenced PRIMARY KEY (id)
);
CREATE TABLE derived
(
referenced_id integer,
CONSTRAINT pk_derived PRIMARY KEY (id),
CONSTRAINT fk_derived_referenced FOREIGN KEY (referenced_id) REFERENCES referenced (id)
)
INHERITS (base);
-- The rule
CREATE OR REPLACE RULE rl_derived_delete_referenced
AS ON DELETE TO derived DO ALSO
DELETE FROM referenced r WHERE r.id = old.referenced_id;
-- Some test data
INSERT INTO referenced (id, value)
VALUES (1, 'referenced 1');
INSERT INTO derived (id, name, referenced_id)
VALUES (2, 'derived 2', 1);
-- Delete from base - deletes the "base" and "derived" rows, but not "referenced"
--DELETE FROM base
--WHERE id = 2;
-- Delete from derived - fails with:
-- update or delete on table "referenced" violates foreign key constraint "fk_derived_referenced" on table "derived"
DELETE FROM derived
WHERE id = 2
As I said in my comment, this seems an unusual way to do things. But you can make it work with a deferred constraint.
CREATE TABLE derived
(
referenced_id integer,
CONSTRAINT pk_derived PRIMARY KEY (id),
CONSTRAINT fk_derived_referenced FOREIGN KEY (referenced_id)
REFERENCES referenced (id) DEFERRABLE INITIALLY DEFERRED
)
INHERITS (base);
The PostgreSQL docs, Rules vs. Triggers, say
Many things that can be done using triggers can also be implemented
using the PostgreSQL rule system. One of the things that cannot be
implemented by rules are some kinds of constraints, especially foreign
keys.
But it's not clear to me that this specific limitation is what you're running into.
Also, you need to check if other records are still referencing the to-be-deleted rows. I added a test derived record#3, which points to the same #1 reference record.
-- The rule
CREATE OR REPLACE RULE rl_derived_delete_referenced
AS ON DELETE TO tmp.derived DO ALSO (
DELETE FROM tmp.referenced re_del
WHERE re_del.id = OLD.referenced_id
AND NOT EXISTS ( SELECT * FROM tmp.derived other
WHERE other.referenced_id = re_del.id
AND other.id <> OLD.id )
;
);
-- Some test data
INSERT INTO tmp.referenced (id, value)
VALUES (1, 'referenced 1');
-- EXPLAIN ANALYZE
INSERT INTO tmp.derived (id, name, referenced_id)
VALUES (2, 'derived 2', 1);
INSERT INTO tmp.derived (id, name, referenced_id)
VALUES (3, 'derived 3', 1);
-- Delete from base - deletes the "base" and "derived" rows, but not "referenced"
--DELETE FROM base
--WHERE id = 2;
-- Delete from derived - fails with:
-- update or delete on table "referenced" violates foreign key constraint "fk_derived_referenced" on table "derived"
EXPLAIN ANALYZE
DELETE FROM tmp.derived
WHERE id = 2
;
SELECT * FROM tmp.base;
SELECT * FROM tmp.derived;
SELECT * FROM tmp.referenced;

Stop invalid data in a attribute with foreign key constraint using triggers?

How to specify a trigger which checks if the data inserted into a tables foreign key attribute, actually exists in the references table. If it exist no action should be performed , else the trigger should delete the inserted tuple.
Eg: Consider have 2 tables
R(A int Primary Key) and
S(B int Primary Key , A int Foreign Key References R(A) ) .
I have written a trigger like this :
Create Trigger DelS
BEFORE INSERT ON S
FOR EACH ROW
BEGIN
Delete FROM S where New.A <> ( Select * from R;) );
End;
I am sure I am making a mistake while specifying the inner sub query within the Begin and end Blocks of the trigger. My question is how do I make such a trigger ?
Wouldn't a foreign key constraint better achieve what you want?
ALTER TABLE [dbo].[TABLE2] WITH CHECK
ADD CONSTRAINT [FK_TABLE2_TABLE1] FOREIGN KEY([FK_COLUMN])
REFERENCES [dbo].[TABLE1] ([PK_COLUMN])
GO
This is what foreign key constraints are meant to do - specifically, not allow a record to be inserted that violate the foreign key relationship.
Note that to make this example more readable, I used different column and table names - S, A, R and B looked like a mess.