Ordered DELETE of records in self-referencing table

Ordered DELETE of records in self-referencing table - sql

I need to delete a subset of records from a self referencing table. The subset will always be self contained (that is, records will only have references to other records in the subset being deleted, not to any records that will still exist when the statement is complete).
My understanding is that this might cause an error if one of the records is deleted before the record referencing it is deleted.
First question: does postgres do this operation one-record-at-a-time, or as a whole transaction? Maybe I don't have to worry about this problem?
Second question: is the order of deletion of records consistent or predictable?
I am obviously able to write specific SQL to delete these records without any errors, but my ultimate goal is to write a regression test to show the next person after me why I wrote it that way. I want to set up the test data in such a way that a simplistic delete statement will consistently fail because of the records referencing the same table. That way if someone else messes with the SQL later, they'll get notified by the test suite that I wrote it that way for a reason.
Anyone have any insight?
EDIT: just to clarify, I'm not trying to work out how to delete the records safely (that's simple enough). I'm trying to figure out what set of circumstances will cause such a DELETE statement to consistently fail.
EDIT 2: Abbreviated answer for future readers: this is not a problem. By default, postgres checks the constraints at the end of each statement (not per-record, not per-transaction). Confirmed in the docs here: http://www.postgresql.org/docs/current/static/sql-set-constraints.html And by the SQLFiddle here: http://sqlfiddle.com/#!15/11b8d/1

In standard SQL, and I believe PostgreSQL follows this, each statement should be processed "as if" all changes occur at the same time, in parallel.
So the following code works:
CREATE TABLE T (ID1 int not null primary key,ID2 int not null references T(ID1));
INSERT INTO T(ID1,ID2) VALUES (1,2),(2,1),(3,3);
DELETE FROM T WHERE ID2 in (1,2);
Where we've got circular references involved in both the INSERT and the DELETE, and yet it works just fine.
fiddle

A single DELETE with a WHERE clause matching a set of records will delete those records in an implementation-defined order. This order may change based on query planner decisions, statistics, etc. No ordering guarantees are made. Just like SELECT without ORDER BY. The DELETE executes in its own transaction if not wrapped in an explicit transaction, so it'll succeed or fail as a unit.
To force order of deletion in PostgreSQL you must do one DELETE per record. You can wrap them in an explicit transaction to reduce the overhead of doing this and to make sure they all happen or none happen.
PostgreSQL can check foreign keys at three different points:
The default, NOT DEFERRABLE: checks for each row as the row is inserted/updated/deleted
DEFERRABLE INITIALLY IMMEDIATE: Same, but affected by SET CONSTRAINTS DEFERRED to instead check at end of transaction / SET CONSTRAINTS IMMEDIATE
DEFERRABLE INITIALLY DEFERRED: checks all rows at the end of the transaction
In your case, I'd define your FOREIGN KEY constraint as DEFERRABLE INITIALLY IMMEDIATE, and do a SET CONSTRAINTS DEFERRED before deleting.
(Actually if I vaguely recall correctly, despite the name IMMEDIATE, DEFERRABLE INITIALLY IMMEDIATE actually runs the check at the end of the statement instead of the default of after each row change. So if you delete the whole set in a single DELETE the checks will then succeed. I'll need to double check).
(The mildly insane meaning of DEFERRABLE is IIRC defined by the SQL standard, along with gems like a TIMESTAMP WITH TIME ZONE that doesn't have a time zone).

If you issue a single DELETE that affects multiple records (like delete from x where id>100), that will be handled as a single transaction and either all will succeed or fail. If multiple DELETEs, you have to put them in a transaction yourself.
There will be problems. If you have a constraint with DELETE CASCADE, you might delete more than you want with a single DELETE. If you don't, the integrity check might stop you from deleting. Constraints other than NO ACTION are not deferrable, so you'd have to disable the constraint before delete and enable it afterwards (basically drop/create, which might be slow).
If you have multiple DELETEs, then the order is as the DELETE statements are sent. If a single DELETE, the database will delete in the order it happens to find them (index, oids, something else...).
So I would also suggest thinking about the logic and maybe handling the deletes differently. Can you elaborate more on the actual logic? A tree in database?

1) It will do as transaction if enclosed within "BEGIN/COMMIT". Otherwise in general no.
For more see http://www.postgresql.org/docs/current/static/tutorial-transactions.html
The answer in general to your question depends on how is self-referencing implemented.
If it is within application logic, it is solely your responsibility to check the things yourself.
Otherwise, it is in general possible to restrict or cascade deletes for rows with foreign keys and DELETE CASCADE . However, as far as PG docs go, I understand we are talking about referencing columns in other tables, not sure if same-table foreign keys are supported:
http://www.postgresql.org/docs/current/static/ddl-constraints.html#DDL-CONSTRAINTS-FK
2) In general, the order of deletion will be the order in which you issue delete statements. If you want them all to be "uninterruptible" with no other statements modifying table in between, you enclose them in a transaction.
As a warning, I may be wrong, but what you seem to be trying to do, must not be done. You should not have to rely on some esoteric "order of deletion" or some other undocumented and/or implicit features of database. The underlying logic does not seem sound, there should be another way.

Related

MSSQL Multiple FKs in table: cannot have multiple cascade/set nulls?

I have a fairly simple design, as follows:
What I want to achieve in my grouping_individual_history is marked in red:
when a session is deleted, I want to cascade delete the grouping_history....
when a grouping is deleted, I just want the child field to be nullified
It seems that MSSQL will not allow me to have more than one FK that does something else than no action ... It'll complain with:
Introducing FOREIGN KEY constraint 'FK_grouping_individual_history_grouping' on table 'grouping_individual_history' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
I've already read this post (https://www.mssqltips.com/sqlservertip/2733/solving-the-sql-server-multiple-cascade-path-issue-with-a-trigger/), although it's not quite the same scenario it seems to me.
I've tried doing a INSTEAD OF DELETE trigger on my grouping table, but it wont accept it, because in turn, my grouping table has another FK (fkSessionID) that does a cascade delete... So, the fix would be to change it all, in all affected tables with FKs. The chain is long though, and we cannot consider it.
For one thing, can someone explain to me why SQL Server is giving me the issue for this very simple scenario in the first place? I just don't understand it.
Is there another workaround I could use (besides just removing the foreign key link from my grouping_individual_history table)?

Adding a constraint or otherwise preventing the removal of a record in a SQL DB table

I have a table in a SQL Server database that contains a row I never want deleted. This is a row required by the application to work as designed.
Is there a way to add a constraint to that row that prevents it from being deleted? Or another way to handle this scenario?

Here is an example of using a FOR DELETE trigger to prevent the deletion of a row when a certain condition is satisfied:
CREATE TRIGGER KeepImportantRow ON MyTable
FOR DELETE
AS BEGIN
-- This next line assumes that your important table has a
-- column called id, and your important row has an id of 0.
-- Adjust accordingly for your situation.
IF DELETED.id = 0 BEGIN
RAISERROR('Cannot delete important row!', 16, 1)
ROLLBACK TRAN
END
END

If you want to prevent accidental deletes then you could have a dummy table that declares a foreign key into your table with ON DELETE NO ACTION, and add one row in it with the foreign key matching your 'precious' row primary key. This way if the 'parent' row is deleted, the engine will refuse and raise an error.
If you want to prevent intentional deletes then you should rely on security (deny DELETE permission on the table). Of course, privileged users that have the required permission can delete the row, there is no way to prevent that, nor should you try. Since SQL Server does not support row level security, if you need to deny only certain rows then you have to go back to the drawing broad and change your table layout so that all rows that have to be denied are stored in one table, and rows that are allowed to be delete are stored in a different table.
Other solutions (like triggers) will ultimately be a variation on these themes, what you really must solve is the question whether you want to prevent accidental deletes (solvable) or intentional deletes (unsolvable, is their database, not yours).

You could do it in a number of ways, although it depends on the situation.
If the table only contains that row, do not grant deletion / truncate privledges.
If the table contains other rows as well you could use a before deletion trigger.
One issue you will have is that someone with DBA / SA access to the database, can get around anything you put in, if they desire, so what are you trying to protect against, casual user, or anyone.

what is the difference between triggers, assertions and checks (in database)

Can anybody explain (or suggest a site or paper) the exact difference between triggers, assertions and checks, also describe where I should use them?
EDIT: I mean in database, not in any other systems or programing languages.

Triggers - a trigger is a piece of SQL to execute either before or after an update, insert, or delete in a database. An example of a trigger in plain English might be something like: before updating a customer record, save a copy of the current record. Which would look something like:
CREATE TRIGGER triggerName
AFTER UPDATE
INSERT INTO CustomerLog (blah, blah, blah)
SELECT blah, blah, blah FROM deleted
The difference between assertions and checks is a little more murky, many databases don't even support assertions.
Check Constraint - A check is a piece of SQL which makes sure a condition is satisfied before action can be taken on a record. In plain English this would be something like: All customers must have an account balance of at least $100 in their account. Which would look something like:
ALTER TABLE accounts
ADD CONSTRAINT CK_minimumBalance
CHECK (balance >= 100)
Any attempt to insert a value in the balance column of less than 100 would throw an error.
Assertions - An assertion is a piece of SQL which makes sure a condition is satisfied or it stops action being taken on a database object. It could mean locking out the whole table or even the whole database.
To make matters more confusing - a trigger could be used to enforce a check constraint and in some DBs can take the place of an assertion (by allowing you to run code un-related to the table being modified). A common mistake for beginners is to use a check constraint when a trigger is required or a trigger when a check constraint is required.
An example: All new customers opening an account must have a balance of $100; however, once the account is opened their balance can fall below that amount. In this case you have to use a trigger because you only want the condition evaluated when a new record is inserted.

In the SQL standard, both ASSERTIONS and CHECK CONSTRAINTS are what relational theory calls "constraints" : rules that the data actually contained in the database must comply with.
The difference between the two is that CHECK CONSTRAINTS are, in a sense, much "simpler" : they are rules that relate to one single row only, while ASSERTIONs can involve any number of other tables, or any number of other rows in the same table. That obviously makes it (much !) more complex for the DBMS builders to support it, and that is, in turn, the reason why they don't : they just don't know how to do it.
TRIGGERs are pieces of executable code of which it can be declared to the DBMS that those should be executed every time a certain kind of update operation (insert/delete/update) gets done on a certain table. Because triggers can raise exceptions, they are a MEANS for implementing the same thing as an ASSERTION. However, with triggers, it's still the programmer who has to do all the coding, and not make any mistake.
EDIT
Onedaywhen's comments re. ASSERTION/CHECK cnstr. are correct. The difference is way more subtle (AND confusing). The standard indeed allows subqueries in CHECK constraints. (Most products don't support it though, so my "relate to a single row" is true for most SQL products, but not for the standard.) So is there still a difference ? Yes there still is. More than one even.
First case : TABLE MEN (ID:INTEGER) and TABLE WOMEN(ID:INTEGER). Now imagine a rule to the effect that "no ID value can appear both in the MEN and in the WOMEN table". That's a single rule. The intent of ASSERTION is precisely that the database designer would state this single rule [and be done with it], and the DBMS would know how to deal with this [efficiently, of course] and how to enforce this rule, no matter what particular update gets done to the database. In the example, the DBMS would know that it has to do a check for this rule upon INSERT INTO MEN, and upon INSERT INTO WOMEN, but not upon DELETE FROM MEN/WOMEN, or INSERT INTO <anyothertable>.
But DBMS's aren't smart enough for doing all that. So what needs to be done ? The database designer must add TWO CHECK constraints to his database, one to the MEN table (checking newly inserted MEN ID's against the WOMEN table) and one to the WOMAN table (checking the other way round). There's your first difference : one rule, one ASSERTION, TWO CHECK constraints. CHECK constraints are a lower level of abstraction than ASSERTIONs, because they require the designer to do more thinking himself about (a) all the kinds of update that could potentially cause his ASSERTION to be violated, and (b) what particular check should be carried out for any of the specific "kinds of update" he found in (a). (Although I don't really like making "absolute" statements on what is still "WHAT" and what is "HOW", I'd summarize that CHECK constraints require more "HOW" thinking (procedural) by the database designer, whereas ASSERTIONs allow the database designer to focus exclusively on the "WHAT" (declarative).)
Second case (though I'm not entirely sure of this - so take with a grain of salt) : just your average RI rule. Of course you are used to specify this using some REFERENCES clause. But imagine that a REFERENCES clause was not available. A rule like "Every ORDER must be placed by a known CUSTOMER" is really just that, a rule, thus : a single ASSERTION. However, we all know that such a rule can always be violated in two ways : inserting an ORDER (in this example), and deleting a CUSTOMER. Now, in line with the foregoing MAN/WOMEN example, if we wanted to implement this single rule/ASSERTION using CHECK constraints, then we'd have to write a CHECK constraint that checks CUSTOMER existence upon insertions into ORDER, but what CHECK constraint could we write that does whatever is needed upon deletion from CUSTOMER ? They simply aren't designed for that purpose, as far as I can tell. There's your second difference : CHECK constraints are tied to INSERTs exclusively, ASSERTIONS can define rules that will also be checked upon DELETEs.
Third case : Imagine a table COMPOS (componentID:... percentage:INTEGER), and a rule to the effect that "the sum of all percentages must at all times be equal to 100". That's a single rule, and an ASSERTION is capable of specifying that. But try and imagine how you would go about enforcing such a rule with CHECK constraints ... If you have a valid table with, say, three nonzero rows adding up to a hundred, how would you apply any change to this table that could survive your CHECK constraint ? You can't delete or update(decrease) any row without having to add other replacing rows, or update the remaining rows, that sum up to the same percentage. Likewise for insert or update (increase). You'd need deferred constraint checking at the very least, and then what are you going to CHECK ? There's your third difference : CHECK constraints are targeted to individual rows, while ASSERTIONs can also define/express rules that "span" several rows (i.e. rules about aggregations of rows).

Assertions do not modify the data, they only check certain conditions
Triggers are more powerful because the can check conditions and also modify the data
Assertions are not linked to specific tables in the database and not linked to specific events
Triggers are linked to specific tables and specific events

A database constraint involves a condition that must be satisfied when the database is updated. In SQL, if a constraint condition evaluates to false then the update fails, the data remains unchanged and the DBMS generates an error.
Both CHECK and ASSERTION are database constraints defined by the SQL standards. An important distinction is that a CHECK is applied to a specific base table, whereas an ASSERTION is applied to the whole database. Consider a constraint that limits the combined rows in tables T1 and T2 to a total of 10 rows e.g.
CHECK (10 >= (
SELECT COUNT(*)
FROM (
SELECT *
FROM T1
UNION
SELECT *
FROM T2
) AS Tn
))
Assume the tables are empty. If this was applied as an ASSERTION only and a user tried to insert 11 rows into T1 then then the update would fail. The same would apply if the constraint was applied as a CHECK constraint to T1 only. However, if the constraint was applied as a CHECK constraint to T2 only the constraint would succeed because a statement targeting T1 does not cause the constraints applied to T1 to be tested.
Both an ASSERTION and a CHECK may be deferred (if declared as DEFERRABLE), allowing for data to temporarily violate the constraint condition, but only within a transaction.
ASSERTION and CHECK constraints involving subqueries are features outside of core Standard SQL and none of the major SQL products support these features. MS Access (not exactly an industrial-strength product) supports CHECK constraints involving subqueries but not deferrable constraints plus constraint testing is always performed on a row-by-row basis, the practical consequences being that the functionality is very limited.
In common with CHECK constraints, a trigger is applied to a specific table. Therefore, a trigger can be used to implement the same logic as a CHECK constraint but not an ASSERTION. A trigger is procedural code and, unlike constraints, the user must take far more responsibility for concerns such as performance and error handling. Most commercial SQL products support triggers (the aforementioned MS Access does not).

The expression should be true for trigger to fire, but check will be evaluated wherever the expression is false.

How to prevent deletion of the first row in table (PostgreSQL)?

Is it possible to prevent deletion of the first row in table on PostgreSQL side?
I have a category table and I want to prevent deletion of default category as it could break the application. Of course I could easily do it in application code, but it would be a lot better to do it in database.
I think it has something to do with rules on delete statement, but I couldn't find anything remotely close to my problem in documentation.

You were right about thinking of the rules system. Here is a link to an example matching your problem. It's even simpler than the triggers:
create rule protect_first_entry_update as
on update to your_table
where old.id = your_id
do instead nothing;
create rule protect_first_entry_delete as
on delete to your_table
where old.id = your_id
do instead nothing;
Some answers miss one point: also the updating of the protected row has to be restricted. Otherwise one can first update the protected row such that it no longer fulfills the forbidden delete criterion, and then one can delete the updated row as it is no longer protected.

You want to define a BEFORE DELETE trigger on the table. When you attempt to delete the row (either match by PK or have a separate "protect" boolean column), RAISE an exception.
I'm not familiar with PostgreSQL syntax, but it looks like this is how you'd do it:
CREATE FUNCTION check_del_cat() RETURNS trigger AS $check_del_cat$
BEGIN
IF OLD.ID = 1 /*substitute primary key value for your row*/ THEN
RAISE EXCEPTION 'cannot delete default category';
END IF;
END;
$check_del_cat$ LANGUAGE plpgsql;
CREATE TRIGGER check_del_cat BEFORE DELETE ON categories /*table name*/
FOR EACH ROW EXECUTE PROCEDURE check_del_cat();

The best way I see to accomplish this is by creating a delete trigger on this table. Basically, you'll have to write a stored procedure to make sure that this 'default' category will always exist, and then enforce it using a trigger ON DELETE event on this table. A good way to do this is create a per-row trigger that will guarantee that on DELETE events the 'default' category row will never be deleted.
Please check out PostgreSQL's documentation about triggers and stored procedures:
http://www.postgresql.org/docs/8.3/interactive/trigger-definition.html
http://www.postgresql.org/docs/8.3/interactive/plpgsql.html
There's also valuable examples in this wiki:
http://wiki.postgresql.org/wiki/A_Brief_Real-world_Trigger_Example

You could have a row in another table (called defaults) referencing the default category. The FK constraint would not let the deletion of the default category happen.

Keep in mind how triggers work. They will fire off for every row your delete statement will delete. This doesn't mean you shouldn't use triggers just keep this in mind and most importantly test your usage scenarios and make sure performance meets the requirements.
Should I use a rule or a trigger?
From the official docs:
"For the things that can be implemented by both, which is best depends on the usage of the database. A trigger is fired for any affected row once. A rule manipulates the query or generates an additional query. So if many rows are affected in one statement, a rule issuing one extra command is likely to be faster than a trigger that is called for every single row and must execute its operations many times. However, the trigger approach is conceptually far simpler than the rule approach, and is easier for novices to get right."
See the docs for details.
http://www.postgresql.org/docs/8.3/interactive/rules-triggers.html

How do the Postgres foreign key 'on update' and 'on delete' options work?

Can anyone provide a clear explanation / example of what these functions do, and when it's appropriate to use them?

Straight from the manual...
We know that the foreign keys disallow creation of orders that do not relate to any products. But what if a product is removed after an order is created that references it? SQL allows you to handle that as well. Intuitively, we have a few options:
Disallow deleting a referenced product
Delete the orders as well
Something else?
CREATE TABLE order_items (
product_no integer REFERENCES products ON DELETE RESTRICT,
order_id integer REFERENCES orders ON DELETE CASCADE,
quantity integer,
PRIMARY KEY (product_no, order_id)
);
Restricting and cascading deletes are the two most common options. RESTRICT prevents deletion of a referenced row. NO ACTION means that if any referencing rows still exist when the constraint is checked, an error is raised; this is the default behavior if you do not specify anything. (The essential difference between these two choices is that NO ACTION allows the check to be deferred until later in the transaction, whereas RESTRICT does not.) CASCADE specifies that when a referenced row is deleted, row(s) referencing it should be automatically deleted as well. There are two other options: SET NULL and SET DEFAULT. These cause the referencing columns to be set to nulls or default values, respectively, when the referenced row is deleted. Note that these do not excuse you from observing any constraints. For example, if an action specifies SET DEFAULT but the default value would not satisfy the foreign key, the operation will fail.
Analogous to ON DELETE there is also ON UPDATE which is invoked when a referenced column is changed (updated). The possible actions are the same.
edit: You might want to take a look at this related question: When/Why to use Cascading in SQL Server?. The concepts behind the question/answers are the same.

I have a PostGreSQL database and I use On Delete when I have a user that I delete from the database and I need to delete it's information from other table. This ways I need to do only 1 delete and FK that has ON delete will delete information from other table.
You can do the same with ON Update. If you update the table and the field have a FK with On Update, if a change is made on the FK you will be noticed on the FK table.

What Daok says is true... it can be rather convenient. On the other hand, having things happen automagically in the database can be a real problem, especially when it comes to eliminating data. It's possible that in the future someone will count on the fact that FKs usually prevent deletion of parents when there are children and not realize that your use of On Delete Cascade not only doesn't prevent deletion, it makes huge amounts of data in dozens of other tables go away thanks to a waterfall of cascading deletes.
#Arthur's comment.
The more frequently "hidden" things happen in the database the less likely it becomes that anyone will ever have a good handle on what is going on. Triggers (and this is essentially a trigger) can cause my simple action of deleting a row, to have wide ranging consequences throughout my database. I issue a Delete statement and 17 tables are affected with cascades of triggers and constraints and none of this is immediately apparent to the issuer of the command. OTOH, If I place the deletion of the parent and all its children in a procedure then it is very easy and clear for anyone to see EXACTLY what is going to happen when I issue the command.
It has absolutely nothing to do with how well I design a database. It has everything to do with the operational issues introduced by triggers.

Instead of writing the method to do all the work, of the cascade delete or cascade update, you could simply write a warning message instead. A lot easier than reinventing the wheel, and it makes it clear to the client (and new developers picking up the code)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas