When creating/inserting a foreign key relationship in Postgres what steps does the backend perform that ensure the referential integrity of my tables?
How does Postgres know where the relevant foreign keys are in the many tables?
All of my searches just say how to implement and examples but not the nuts and bolts of the backend. If I wrote my own checks when inserting data would that be the same thing and as efficient as Postgres?
Postgres documentation is unsatisfying in its description:
"In simplistic database systems this would be implemented (if at all)
by first looking at the cities table to check if a matching record
exists, and then inserting or rejecting the new weather records. This
approach has a number of problems and is very inconvenient, so
PostgreSQL can do this for you."
Edit: Nice book link I'm guessing will answer my question The Internals of PostgreSQL
Clarification: I am not intending to write my own checks or triggers, I understand they will not be as good. The question is to glean details and a better understanding of optimizations.
It is of course recommended to use foreign keys in the DB rather than writing any code in any trigger or backend. Let me explain the reasons:
You don't write any additional codes on backend or on triggers.
If the business logic ever changes, you won't have to make many changes to the code.
If you have to manually write foreign keys yourself, maybe you will have any bugs or you will forget to write checking foreign keys, but foreign keys on DB reliably always provides this check.
During migration, maybe someone will run a bulk insert on the DB from the outside your backend, or if someone (or DB admin) will be deleted data mistily, foreign keys on DB will strictly not allow this.
Related
Should I build the relationships with the database schema or deal with them programmatically?
For Example When I build the database in MSSQL I can to not build the relationships and deal with the relationships programmatically like checking if a key is exist as a primary key in another table and determine upon that to insert the new row to the table or not.
could anyone advise me if it is a good practice or not.
DO make relationships between tables explicit by declaring foreign key constraints.
I do not see any good reason for not doing this. Why are foreign key constraints a good idea?
Foreign key constraints are a simple way to help safeguard data integrity/consistency.
Constraints (not just foreign key ones) can also be seen as a form of "living documentation" (making things explicit and therefore discoverable, without having to guess).
You might still want to validate inserts in code; in that case you can look at foreign key constraints as a "safety net", in case your code fails.
(Regarding the second bullet point above: I have to work with one legacy database which is lacking some foreign key constraints that should by all means have been declared. This means that every time I have to make a change to it, I might inadvertently break an application that makes certain assumptions about the schema that aren't obvious by looking at the schema. Working with this database is very painful and error-prone. If I could change one thing about this database, it would be to add all missing constraints.)
it depends upon your need.
If you are designing OLTP applciation then builing relationship is good but if you designing datawarehouse DWH or datamart then it is advisable not to establish relationship in schema and handle it in code.
As a programmer, adding a reference to an object is pretty safe but adding a foreign key relationship (I think) is pretty dangerous. By adding a FK relationship, ALL the queries that delete a row from this foreign table has to be updated to properly delete the foreign key that's tied to that row before actually deleting the row. How do you search for all the queries that delete a row from this foreign table? These queries can lie buried in code and in stored procedures. Is this a real life example of a maintenance nightmare? Is there a solution to this problem?
You should never design a relational database without foreign keys from the very beginning. That is a guarantee of poor data integrity over time.
You can add the code and use cascade delete as others have suggested, but that too is often the wrong answer. There are times when you genuinely want the delete stopped because you have child records. For instance, suppose you have customers and orders. If you delete a customer who has an order, then you lose the financial record of the order which is a disaster. Instead you would want the application to get an error saying an order exists for this customer. Further cascade delete could suddenly get you into deleting millions of child records thus locking up your datbase while a huge transaction happens. It is a dangerous practice that should rarely, if ever, be used in a production database.
Add the FK (if you have the relationships, it is needed) and then search for the code that deletes from that table and adjust it appropriately. Consider if a soft delete isn't a better option. This is where you mark a record as deleted or inactive, so it no longer shows up as a data entry option, but you can still see the existing records. Again you may need to adjust your database code fairly severly to implement this correctly. There is no easy fix for having a database that was badly designed from the start.
The soft delete is also a good choice if you think you will have many child records and actually do want to delete them. This way you can mark the records so they no longer show in the application and use a job that runs during non-peak hours to batch delete records.
If you are adding a new table and adding an FK, it is certainly easier to deal with becasue you would create the table before writing any code against it.
Your statement is simply not true. When establishing a foreign key relationship, you can set the cascading property to cascade delete. Once that's done, the child records will be deleted when the parent is deleted, ensuring that no records are orphaned.
If you use a proper ORM solution, configure FK's and PK's correctly, and enable cascading deletes, you shouldn't have any problems.
I wouldn't say so (to confirm what others mentioned) - that is usually taken care of with cascading deletes. Providing you want it that way - or with careful procedures that clean things behind.
The bigger system is you get to see more of the 'procedures' and less of the 'automation' (i.e. cascade deletes). For larger setups - DBA-s usually prefer to deal with that during database maintenance phase. Quite often, records are not allowed to be deleted, through middle-ware application code - but are simply marked as 'deleted' or inactive - and dealt with later on according to database routines and procedures in place in the organization (archived etc.).
And unless you have a very large code base, that's not a huge issue. Also, usually, most of the Db code goes through some DAL layer which can be easily traversed. Or you can also query system tables for all the relationships and 'dependencies' and many routines were written for such a code maintenance (on both sides of the 'fence'). It's not that it's not an 'issue', just nothing much different than normal Db work - and there're worse things than that.
So, I wouldn't lose my sleep over that. There are other issues with around using 'too much' of the referential integrity constraints (performance, maintenance) - but that is often a very controversial issue among DBA-s (and Db professionals in general), so I won't get into that:)
After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?
I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.
There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.
Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.
As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.
In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.
Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.
I have an issue I am working with an existing SQL Server 2008 database: I need to occasionally change the primary key value for some existing records in a table. Unfortunately, there are about 30 other tables with foreign key references to this table.
What is the most elegant way to change a primary key and related foreign keys?
I am not in a situation where I can change the existing key structure, so this is not an option. Additionally, as the system is expanded, more tables will be related to this table, so maintainability is very important. I am looking for the most elegant and maintainable solution, and any help is greatly appreciated. I so far have thought about using Stored Procedures or Triggers, but I wanted some advice before heading in the wrong direction.
Thanks!
When you say "I am not in a situation where I can change the existing key structure" are you able to add the ON UPDATE CASCADE option to the foreign keys? That is the easiest way to handle this situation — no programming required.
As Larry said, On Update Cascade will work, however, it can cause major problems in a production database and most dbas are not too thrilled with letting you use it. For instance, suppose you have a customer who changes his company name (and that is the PK) and there are two million related records in various tables. On UPDATE Cascade will do all the updates in one transaction which could lock up your major tables for several hours. This is one reason why it is a very bad idea to have a PK that will need to be changed. A trigger would be just as bad and if incorrectly written, it could be much worse.
If you do the changes in a stored proc you can put each part in a separate transaction, so at least you aren't locking everything up. You can also update records in batches so that if you have a million records to update in a table, you can do them in smaller batches which will will run faster and have fewer locks. The best way to do this is to create a new record in the primary table with the new PK and then move the old records to the new one in batches and then delete the old record once all related records are moved. If you do this sort of thing, it is best to have audit tables so you can easily revert the data if there is a problem since you will want to do this in multiple transactions to avoid locking the whole database. Now this is harder to maintain, you have to remember to add to the proc when you add an FK (but you would have to remember to do on UPDATE CASCADE as well). On the other hand if it breaks due to a problem with a new FK, it is an easy fix, you know right what the problems is and can easily put a change to prod relatively quickly.
There are no easy solutions to this problem because the basic problem is poor design. You'll have to look over the pros and cons of all solutions (I would throw out the trigger idea as Cascade Update will perform better and be less subject to bugs) and decide what works best in your case. Remember data integrity and performance are critical to enterprise databases and may be more important than maintainability (heresy, I know).
If you have to update your primary key regularly then something is wrong there. :)
I think the simplest way to do it is add another column and make it the primary key. This would allow you to change the values easily and also related the foreign keys. Besides, I do not understand why you cannot change the existing key structure.
But, as you pointed in the question (and Larry Lustig commented) you cannot change the existing structure. But, I am afraid if it is a column which requires frequent updates then use of triggers could affect the performance adversely. And, you also say that as the system expands, more tables will be related to this table so maintainability is very important. But, a quick fix now will only worsen the problem.
Recently I've asked a question about the best way to go to design a DB schema to support multiple types of users and interactions between them, one of the answers suggested that I use one table for each user type and Distributed Keys.
The thing is the only databases I actively work with are MySQL and SQLite and I've always done this kinda of work of maintaining the integrity of the DB on the programming side and never directly in the database, can someone point me to a detailed yet easy to understand guide on foreign keys, references and related subjects?
Thanks in advance!
EDIT: I'm interested specifically in MySQL usage examples and documentation, I've already searched in the MySQL manual but nothing useful comes up.
This isn't MySQL-specific, but there is some good stuff in here
http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/
I don't agree with him about the use of natural keys versus surrogate keys. I have found surrogate keys in general work better for primary keys, but if you have a natural key you should put a unique index on it to prevent duplication of data. Pay particular attention to the sections on:
- Not using SQL facilities to protect data integrity
- Trying to code generic T-SQL objects
- One table to hold all domain values
Another good starting place is:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx [dead link Feb 17, 2015]
Try these:
http://en.wikipedia.org/wiki/Relational_database (has links to articles on Constraints, Foreign keys, Stored procedures, Indices, etc.)
http://en.wikipedia.org/wiki/Database
Try this one: Relational Database Design Basics or the Wiki. Give this a read too.
Specifically related to MySQL:
Referential Integrity in MySQL
Foreign Keys and Referential Integrity
Also this stackoverflow question: MYSQL and RDBMS
If you like to read books, try Beginning Database Design: From Novice to Professional by Clare Churcher. You can take a look it at google books.
Hugh Darwen has made his course on Relational Algebra/Database Technology publicly and freely available. Search for "An Introduction to Relational Database Theory" on http://www.thethirdmanifesto.com
It's introductory, so nothing "advanced", but at least you won't be told anything that is an outright violation of the theory.