what is the difference between triggers, assertions and checks (in database) - sql

Can anybody explain (or suggest a site or paper) the exact difference between triggers, assertions and checks, also describe where I should use them?
EDIT: I mean in database, not in any other systems or programing languages.

Triggers - a trigger is a piece of SQL to execute either before or after an update, insert, or delete in a database. An example of a trigger in plain English might be something like: before updating a customer record, save a copy of the current record. Which would look something like:
CREATE TRIGGER triggerName
AFTER UPDATE
INSERT INTO CustomerLog (blah, blah, blah)
SELECT blah, blah, blah FROM deleted
The difference between assertions and checks is a little more murky, many databases don't even support assertions.
Check Constraint - A check is a piece of SQL which makes sure a condition is satisfied before action can be taken on a record. In plain English this would be something like: All customers must have an account balance of at least $100 in their account. Which would look something like:
ALTER TABLE accounts
ADD CONSTRAINT CK_minimumBalance
CHECK (balance >= 100)
Any attempt to insert a value in the balance column of less than 100 would throw an error.
Assertions - An assertion is a piece of SQL which makes sure a condition is satisfied or it stops action being taken on a database object. It could mean locking out the whole table or even the whole database.
To make matters more confusing - a trigger could be used to enforce a check constraint and in some DBs can take the place of an assertion (by allowing you to run code un-related to the table being modified). A common mistake for beginners is to use a check constraint when a trigger is required or a trigger when a check constraint is required.
An example: All new customers opening an account must have a balance of $100; however, once the account is opened their balance can fall below that amount. In this case you have to use a trigger because you only want the condition evaluated when a new record is inserted.

In the SQL standard, both ASSERTIONS and CHECK CONSTRAINTS are what relational theory calls "constraints" : rules that the data actually contained in the database must comply with.
The difference between the two is that CHECK CONSTRAINTS are, in a sense, much "simpler" : they are rules that relate to one single row only, while ASSERTIONs can involve any number of other tables, or any number of other rows in the same table. That obviously makes it (much !) more complex for the DBMS builders to support it, and that is, in turn, the reason why they don't : they just don't know how to do it.
TRIGGERs are pieces of executable code of which it can be declared to the DBMS that those should be executed every time a certain kind of update operation (insert/delete/update) gets done on a certain table. Because triggers can raise exceptions, they are a MEANS for implementing the same thing as an ASSERTION. However, with triggers, it's still the programmer who has to do all the coding, and not make any mistake.
EDIT
Onedaywhen's comments re. ASSERTION/CHECK cnstr. are correct. The difference is way more subtle (AND confusing). The standard indeed allows subqueries in CHECK constraints. (Most products don't support it though, so my "relate to a single row" is true for most SQL products, but not for the standard.) So is there still a difference ? Yes there still is. More than one even.
First case : TABLE MEN (ID:INTEGER) and TABLE WOMEN(ID:INTEGER). Now imagine a rule to the effect that "no ID value can appear both in the MEN and in the WOMEN table". That's a single rule. The intent of ASSERTION is precisely that the database designer would state this single rule [and be done with it], and the DBMS would know how to deal with this [efficiently, of course] and how to enforce this rule, no matter what particular update gets done to the database. In the example, the DBMS would know that it has to do a check for this rule upon INSERT INTO MEN, and upon INSERT INTO WOMEN, but not upon DELETE FROM MEN/WOMEN, or INSERT INTO <anyothertable>.
But DBMS's aren't smart enough for doing all that. So what needs to be done ? The database designer must add TWO CHECK constraints to his database, one to the MEN table (checking newly inserted MEN ID's against the WOMEN table) and one to the WOMAN table (checking the other way round). There's your first difference : one rule, one ASSERTION, TWO CHECK constraints. CHECK constraints are a lower level of abstraction than ASSERTIONs, because they require the designer to do more thinking himself about (a) all the kinds of update that could potentially cause his ASSERTION to be violated, and (b) what particular check should be carried out for any of the specific "kinds of update" he found in (a). (Although I don't really like making "absolute" statements on what is still "WHAT" and what is "HOW", I'd summarize that CHECK constraints require more "HOW" thinking (procedural) by the database designer, whereas ASSERTIONs allow the database designer to focus exclusively on the "WHAT" (declarative).)
Second case (though I'm not entirely sure of this - so take with a grain of salt) : just your average RI rule. Of course you are used to specify this using some REFERENCES clause. But imagine that a REFERENCES clause was not available. A rule like "Every ORDER must be placed by a known CUSTOMER" is really just that, a rule, thus : a single ASSERTION. However, we all know that such a rule can always be violated in two ways : inserting an ORDER (in this example), and deleting a CUSTOMER. Now, in line with the foregoing MAN/WOMEN example, if we wanted to implement this single rule/ASSERTION using CHECK constraints, then we'd have to write a CHECK constraint that checks CUSTOMER existence upon insertions into ORDER, but what CHECK constraint could we write that does whatever is needed upon deletion from CUSTOMER ? They simply aren't designed for that purpose, as far as I can tell. There's your second difference : CHECK constraints are tied to INSERTs exclusively, ASSERTIONS can define rules that will also be checked upon DELETEs.
Third case : Imagine a table COMPOS (componentID:... percentage:INTEGER), and a rule to the effect that "the sum of all percentages must at all times be equal to 100". That's a single rule, and an ASSERTION is capable of specifying that. But try and imagine how you would go about enforcing such a rule with CHECK constraints ... If you have a valid table with, say, three nonzero rows adding up to a hundred, how would you apply any change to this table that could survive your CHECK constraint ? You can't delete or update(decrease) any row without having to add other replacing rows, or update the remaining rows, that sum up to the same percentage. Likewise for insert or update (increase). You'd need deferred constraint checking at the very least, and then what are you going to CHECK ? There's your third difference : CHECK constraints are targeted to individual rows, while ASSERTIONs can also define/express rules that "span" several rows (i.e. rules about aggregations of rows).

Assertions do not modify the data, they only check certain conditions
Triggers are more powerful because the can check conditions and also modify the data
Assertions are not linked to specific tables in the database and not linked to specific events
Triggers are linked to specific tables and specific events

A database constraint involves a condition that must be satisfied when the database is updated. In SQL, if a constraint condition evaluates to false then the update fails, the data remains unchanged and the DBMS generates an error.
Both CHECK and ASSERTION are database constraints defined by the SQL standards. An important distinction is that a CHECK is applied to a specific base table, whereas an ASSERTION is applied to the whole database. Consider a constraint that limits the combined rows in tables T1 and T2 to a total of 10 rows e.g.
CHECK (10 >= (
SELECT COUNT(*)
FROM (
SELECT *
FROM T1
UNION
SELECT *
FROM T2
) AS Tn
))
Assume the tables are empty. If this was applied as an ASSERTION only and a user tried to insert 11 rows into T1 then then the update would fail. The same would apply if the constraint was applied as a CHECK constraint to T1 only. However, if the constraint was applied as a CHECK constraint to T2 only the constraint would succeed because a statement targeting T1 does not cause the constraints applied to T1 to be tested.
Both an ASSERTION and a CHECK may be deferred (if declared as DEFERRABLE), allowing for data to temporarily violate the constraint condition, but only within a transaction.
ASSERTION and CHECK constraints involving subqueries are features outside of core Standard SQL and none of the major SQL products support these features. MS Access (not exactly an industrial-strength product) supports CHECK constraints involving subqueries but not deferrable constraints plus constraint testing is always performed on a row-by-row basis, the practical consequences being that the functionality is very limited.
In common with CHECK constraints, a trigger is applied to a specific table. Therefore, a trigger can be used to implement the same logic as a CHECK constraint but not an ASSERTION. A trigger is procedural code and, unlike constraints, the user must take far more responsibility for concerns such as performance and error handling. Most commercial SQL products support triggers (the aforementioned MS Access does not).

The expression should be true for trigger to fire, but check will be evaluated wherever the expression is false.

Related

Adding a row to Table A if it has a required foreign key to Table B which has a required foreign key to Table A

This might sound complicated, so I'll give an example.
Say, I have two tables Instructor and Class.
Instructor has a required field called PreferredClassID which has a foreign key against Class.
Class has a required field called CurrentInstructorID which is a foreign key against Instructor
Is it possible to insert a row to either of these tables?
Cause if I insert a row to Instructor, I won't be able to as I'll need to supply a PreferredClassID, but I can't create a Class row either because it needs a CurrentInstructorID.
If I can't do this, how would I solve this problem? Would I just need to make one of those fields non-required (even if business requirements specifies it really should be required?)
If you find yourself here, reevaluate your data relation model.
In this case, you could simply have a lookup table called PreferredCourse with courseId and instructorId.
This will enforce that both the course and instructor exist before adding the row to the PreferredCourse lookup. Maintaining business model requirements without bending the rules of database model requirements.
While it may seem excessive to have another table, it will prevent a whole lot of maintenance overhead in both your database procedures and jobs, and your application code. Circular references create nothing but headaches and are easily solved with small lookup tables and JOINs.
The Impaler gave an example of how to accomplish this with your current data structure. Please note, that you have to 1: make a key nullable in at least one of the tables, and then 2: Perform INSERTs in a specified order. Or, 3: disable the constraints, 4: perform INSERTS, 5: reenable constraints, 6: roll back transaction if constraints are now broken.
There is a whole lot that can go wrong, simply fix the relation model now before things get out of hand.
As long as one of those foreign keys allows a null value, you're good. So you:
Insert the row that accepts the null value first (say Instructor), with a null value on the FK. Get the ID of the inserted row.
Insert in the other table (say Class). In the FK you use the ID you got from step #1. Once inserted, you get the ID of this new row.
Update the FK on the first row (Instructor) with the ID you got from step #2.
Commit.
Alternatively, if both FKs are NOT NULL then you have a bit of a problem. The options I see for this last case are:
Use deferrable FK integrity check. Some databases do allow you to insert without checking integrity until the COMMIT happens. This is really tricky, and enabling this is looking for trouble.
Disable the FK for a short period of time. Some databases allow you to enable/disable constraints. You are not deleting them, just temporarily disabling them. If you do this, don't forget to enable them back.
Drop the constraint temporarily, while you do the insert, and the add it again. This is really a work around of last resort. Adding/Dropping constraint are DML statements and usually cannot participate in a transaction. Do this at your own peril.
Something to consider (as per user7396598's answer) is looking at how normal forms apply to your data as it fits within your relational model.
In this case, it might be worth looking at the following:
With your Instructor table, is the PreferredClassID a necessary component? Does an instructor -need- to have a preferred class, or is it okay to say "Hey, I'm creating an entry for a new instructor, I don't know their preferred class."
(if they're new, they might not have a preferred class that your school offers)
This is a case where you definitely want to have a foreign key, but it should be okay to say 'I don't necessarily know the value I want to put there.'
In a similar vein, does a Class need to have an instructor when it's created? Is it possible to create a Class that an instructor has not been assigned to yet?
Again, both of these points are really a case of 'I don't know what I want to put here, but when I do, it should be a specific instance that exists in another table.'

constraints different moments

I have a table of schedules
So my question is this : How can I make a constraint to forbid a values to be scheduled no more than once a day.
Thanks ahead.
Simply add a unique constraint/index on the vessel and date:
create unique index unq_tourschedule_vesselid_tourdate on tourschedule(vesselid, tourdate);
(A unique constraint is implemented using a unique index.)
You should do this in the database, so even manual changes to the data enforce this constraint.
It depends on what level you need to "prevent" the scheduling. Do you want to prevent it from the UI, the middle-tier, or at the database level?
UI - Do an AJAX check against DB or middle-tier check and prevent insertion of the record there (not a secure solution, but worth mentioning because it informs your users of an existing record).
Middle Tier - best place. Query your DB to see if a record exists with that given vesselID and TourDate. If any records are returned, do not allow insertion. You could then redirect to the page with a helpful message to the user. Business logic goes here typically, and it is best to decouple your business logic from your database.
Database level - most robust, but least maintainable and bad practice for business logic visibility. Many options, all of them cumbersome:
Stored procedure - upon insert, check the records, same procedure as middle tier, but you have to funnel your "error" message up through all the tiers.
Compound key using vesselID and TourDate ensures automatically that only unique entries can be inserted.
Constraint on the table data upon insertion - not just an index, which is for searching optimization, but an actual constraint. This constraint may be added to an existing table or be part of the table creation statement itself.
Yes I have created a unique Index and everything worked out all right thank you for helping me out.

Ordered DELETE of records in self-referencing table

I need to delete a subset of records from a self referencing table. The subset will always be self contained (that is, records will only have references to other records in the subset being deleted, not to any records that will still exist when the statement is complete).
My understanding is that this might cause an error if one of the records is deleted before the record referencing it is deleted.
First question: does postgres do this operation one-record-at-a-time, or as a whole transaction? Maybe I don't have to worry about this problem?
Second question: is the order of deletion of records consistent or predictable?
I am obviously able to write specific SQL to delete these records without any errors, but my ultimate goal is to write a regression test to show the next person after me why I wrote it that way. I want to set up the test data in such a way that a simplistic delete statement will consistently fail because of the records referencing the same table. That way if someone else messes with the SQL later, they'll get notified by the test suite that I wrote it that way for a reason.
Anyone have any insight?
EDIT: just to clarify, I'm not trying to work out how to delete the records safely (that's simple enough). I'm trying to figure out what set of circumstances will cause such a DELETE statement to consistently fail.
EDIT 2: Abbreviated answer for future readers: this is not a problem. By default, postgres checks the constraints at the end of each statement (not per-record, not per-transaction). Confirmed in the docs here: http://www.postgresql.org/docs/current/static/sql-set-constraints.html And by the SQLFiddle here: http://sqlfiddle.com/#!15/11b8d/1
In standard SQL, and I believe PostgreSQL follows this, each statement should be processed "as if" all changes occur at the same time, in parallel.
So the following code works:
CREATE TABLE T (ID1 int not null primary key,ID2 int not null references T(ID1));
INSERT INTO T(ID1,ID2) VALUES (1,2),(2,1),(3,3);
DELETE FROM T WHERE ID2 in (1,2);
Where we've got circular references involved in both the INSERT and the DELETE, and yet it works just fine.
fiddle
A single DELETE with a WHERE clause matching a set of records will delete those records in an implementation-defined order. This order may change based on query planner decisions, statistics, etc. No ordering guarantees are made. Just like SELECT without ORDER BY. The DELETE executes in its own transaction if not wrapped in an explicit transaction, so it'll succeed or fail as a unit.
To force order of deletion in PostgreSQL you must do one DELETE per record. You can wrap them in an explicit transaction to reduce the overhead of doing this and to make sure they all happen or none happen.
PostgreSQL can check foreign keys at three different points:
The default, NOT DEFERRABLE: checks for each row as the row is inserted/updated/deleted
DEFERRABLE INITIALLY IMMEDIATE: Same, but affected by SET CONSTRAINTS DEFERRED to instead check at end of transaction / SET CONSTRAINTS IMMEDIATE
DEFERRABLE INITIALLY DEFERRED: checks all rows at the end of the transaction
In your case, I'd define your FOREIGN KEY constraint as DEFERRABLE INITIALLY IMMEDIATE, and do a SET CONSTRAINTS DEFERRED before deleting.
(Actually if I vaguely recall correctly, despite the name IMMEDIATE, DEFERRABLE INITIALLY IMMEDIATE actually runs the check at the end of the statement instead of the default of after each row change. So if you delete the whole set in a single DELETE the checks will then succeed. I'll need to double check).
(The mildly insane meaning of DEFERRABLE is IIRC defined by the SQL standard, along with gems like a TIMESTAMP WITH TIME ZONE that doesn't have a time zone).
If you issue a single DELETE that affects multiple records (like delete from x where id>100), that will be handled as a single transaction and either all will succeed or fail. If multiple DELETEs, you have to put them in a transaction yourself.
There will be problems. If you have a constraint with DELETE CASCADE, you might delete more than you want with a single DELETE. If you don't, the integrity check might stop you from deleting. Constraints other than NO ACTION are not deferrable, so you'd have to disable the constraint before delete and enable it afterwards (basically drop/create, which might be slow).
If you have multiple DELETEs, then the order is as the DELETE statements are sent. If a single DELETE, the database will delete in the order it happens to find them (index, oids, something else...).
So I would also suggest thinking about the logic and maybe handling the deletes differently. Can you elaborate more on the actual logic? A tree in database?
1) It will do as transaction if enclosed within "BEGIN/COMMIT". Otherwise in general no.
For more see http://www.postgresql.org/docs/current/static/tutorial-transactions.html
The answer in general to your question depends on how is self-referencing implemented.
If it is within application logic, it is solely your responsibility to check the things yourself.
Otherwise, it is in general possible to restrict or cascade deletes for rows with foreign keys and DELETE CASCADE . However, as far as PG docs go, I understand we are talking about referencing columns in other tables, not sure if same-table foreign keys are supported:
http://www.postgresql.org/docs/current/static/ddl-constraints.html#DDL-CONSTRAINTS-FK
2) In general, the order of deletion will be the order in which you issue delete statements. If you want them all to be "uninterruptible" with no other statements modifying table in between, you enclose them in a transaction.
As a warning, I may be wrong, but what you seem to be trying to do, must not be done. You should not have to rely on some esoteric "order of deletion" or some other undocumented and/or implicit features of database. The underlying logic does not seem sound, there should be another way.

Best practices for referencing natural and/or surrogate key values in code

I'm modifying some stored procedures that manage status changes when records are updated.
For example, if I have these two tables
Request(RequestID, StatusID)
Status(StatusID, StatusName)
I'm trying to determine the best to handle calling out the statuses in code.
Do I use StatusID or StatusName?
It's not guaranteed that StatusID will match between environments (DEV, PRE, PROD, etc).
Also, StatusName could be changed and I wouldn't want to have to alter code because I needed to change a StatusName.
I could create a 2nd unique column, which would sort of closely resemble StatusID.
I'd make sure this column was matched between regions, but that doesn't seem that clean either and sort of repetitive.
Can anyone suggest a cleaner, simpler way?
The difficulty of matching code to data can only partially be handled with a second column. When someone adds an item, what does this mean? If they re-use a known constant, what does it mean if you don't require this column to be unique?
Often times we will have user modifiable lookup tables, but they will have to be associated with a number of other flags indicating how to interpret the status - "IsTreatedAsExpired", "IsTreatedAsActive" or perhaps other tables which hold the statuses which are treated as certain things.
I think you really need to figure out the scope of what you want to allow with this table first. Because if you have a LOT of code references, you would be better off using a natural key which is in sync with your code on all installations. A possibility to handle this is to use negative numbers for unmovable codes (identity insert to add new unmovable codes) and then have your sequence only add positive ones. But again, this doesn't address the semantics of how your program would handle or use the user-entered extensions.
Again, it's hard to say without getting the full scope sorted out here.
From the information you've given, StatusID may have different values in different databases, presumably because your keys are generated automatically and are not specified by you. If so then obviously it's impossible to use StatusID consistently in your code anyway (without standardizing the values). Therefore the question becomes "is it acceptable/practical/desirable to hard-code StatusName values in my code?"
The obvious answer is yes, what's the alternative? If you have a certain status that represents 'ready' and you want to reference that in code then you must put something in your code that identifies the status unambiguously.
If you add a second key of some kind (as Carlos suggested) you still have the same basic problem that changing a natural key value is changing the identity of the status and therefore changes the meaning of your code. If you change the 'real' natural key (READY) without changing the second key (RDY) then your code will become more confusing and difficult to maintain.
If you do something more complex like extracting 'constants' or 'configuration parameters' into a configuration file or table or even writing a custom preprocessor to insert key values into your scripts at deployment time, you add lots of complexity for very little gain (unless you have other good reasons for doing it). I've seen this approach used, and it was a huge, unmaintainable mess.
In practice, StatusName is most likely to change because a) someone thinks another name would be 'more accurate' or 'look better', or b) you discover that it doesn't correctly represent your requirements. If you're forced to spend time on a) then just change the display name in your front end or reports and leave the database and code alone. If b) comes up then by definition your current data model and code are inaccurate and must be revised and possibly modified anyway. And when b) does happen, it often results in adding a new code, not changing the existing one (e.g. because someone defined a new process step that there is no existing code for).
And if you are open to changing your development and deployment practices there are other ways to look at this issue too, as others have suggested. Can you make your StatusID values the same everywhere? Technically it's possible, so what are the organizational reasons not to? Can you reduce the probability and impact of StatusName changes through change management and code reviews? Can you improve your requirements process to capture certain information more effectively?
Write a user defined function that accepts status name and gives out the status if wherever you are referring the status id
select * from resources where statusid = dbo.getStatusId("COMPLETED");
This would make sure that resolving the status id always happens within the function that you have defined
As a rule of thumb when you have id,value tables (Status, Result, Area, etc..) I usually add a third field that its the record's mnemonic value and always use that, neither the name or the id.
Now the mnemonic value is like a business key (well, it is a business key) in the sense that its a business value and does not depend on the database (for the id) or the way it displayed (the description) so for example for your status table you may have
StatusID,StatusName,StatusMnemo
1 ,COMPLETED ,COM
2 ,REJETED ,REJ
and so forth.
And in your queries you always join by statusId but you add a clause to join against the status table by StatusMnemo. This is a value that's independent across environments and remains constant.
Also in inserts, you always use statusid.
If you have statusID values that need special treatment then they should be the same across environments.
Why would you introduce a statusID that needs special treatment in Prod that has not gone thru Pre and Dev?
What I often do is start iden at 100 and use that for generic status that don't need special treatment.
Then DEV owns the space under 100 for special treatment using IDENTITY INSERT ON.
If deploy from DEV to PRE insert any records under 100.

What is the best way to handle this constraint in SQL Server 2005?

I have SMS based survey application which takes in a survey domain, and a answer.
I've gotten requests for detailed DDL, so.... The database looks like this
SurveyAnswer.Answer must be unique within all active Surveys for that SurveyDomain. In SQL terms, this should always return 0..1 rows:
select * from survey s, surveyanswer sa
where s.surveyid = sa.surveyid and
s.active = 1 and
s.surveydomainid = #surveydomainid
sa.answer = #answer
I plan on handling this constraint at the application level, but would also like some database integrity to be enforced. What is the best way to do this? Trigger? Possible in a constraint?
As you are covering 2 tables there is AFAIK only 2 ways to enforce this.
Trigger as you suggested.
Indexed view with unique constraint accross the 3 columns.
As far as reliability is concerned I would go for the Indexed view but the only downside is that it will be difficult to understand by third parties.
It is possible to add a constraint that is implemented in a UDF like this:
alter table MyTable add constraint complexConstraint
check (dbo.complexConstraintFct()=0)
Where complexConstraintFct would be a function containing a query on other tables. However this approach has some issues as check constraints were designed to be evaluated on a single row at a time but updates can affect more that one row at a time.
So, the bottom line is: stick with triggers.
Assuming you are using stored procedures to perform DML operations, you could add a guard clause to the SP that adds answers to surveys to check for the existence of an equivalent answer. You could then either throw an exception or return a status code to indicate that the answer could not be added.
You can't do it at the row level (eg CHECK constraint) so you have to have something that can view all rows
A trigger can send "nice" messages, but they run after the DML statement. You have fine control over processing.
An indexed view prevents the DML statement, but it gives a technical error message. It's an extra object and indexes to maintain.
I think what you're saying is that for any active question, the tuple (surveyDomain, surveyQuestion, surveyAnswer) must be unique?
Or in other words, survey:surveyanswer is 1:1 if the survey is active, even though survey:surveyanswer is set up to be 1:many.
If so, the answer is to change your table structure. Adding a nullable activeAnswerId column to survey will effectively make the relation 1:1; your existing constraint unique SurveyId (or unique SurveyId, SurvetDomainId) will suffice to enforce uniqueness.
Indeed, unless I'm misunderstanding, I'm surprised that Survey has a Question column; I'd expect Survey:Question to be 1:many (a survey has many questions) or even many:many, if a question can show up on more than one survey.
More generally, I suspect the reason that figuring out how to enforce the constraint is difficult and requires "heroics" like triggers or user defined functions, is a symptom of a schema that doesn't accurately model your problem domain.
OP comments:
no, you're missing it. Survey:Answer is 1:n. "Question" is the survey question – Tuple would be (SurveyDomain.SurveyDomainId, Survey.Answer)
You mean that for every domain, there's at most one answer? Again, looking at your schema, it's misleading at best. A SurveyDomain has many Surveys (each of which has a Question column) and a Survey has many Answers? (Schema)
But if the Survey's active bit is set, there should be only one Answer?
Is Survey a misnomer for Question?
It's really not clear what you're trying to model.
Again, if it's hard to add a constraint, that suggests that your model doesn't work.