As a programmer, adding a reference to an object is pretty safe but adding a foreign key relationship (I think) is pretty dangerous. By adding a FK relationship, ALL the queries that delete a row from this foreign table has to be updated to properly delete the foreign key that's tied to that row before actually deleting the row. How do you search for all the queries that delete a row from this foreign table? These queries can lie buried in code and in stored procedures. Is this a real life example of a maintenance nightmare? Is there a solution to this problem?
You should never design a relational database without foreign keys from the very beginning. That is a guarantee of poor data integrity over time.
You can add the code and use cascade delete as others have suggested, but that too is often the wrong answer. There are times when you genuinely want the delete stopped because you have child records. For instance, suppose you have customers and orders. If you delete a customer who has an order, then you lose the financial record of the order which is a disaster. Instead you would want the application to get an error saying an order exists for this customer. Further cascade delete could suddenly get you into deleting millions of child records thus locking up your datbase while a huge transaction happens. It is a dangerous practice that should rarely, if ever, be used in a production database.
Add the FK (if you have the relationships, it is needed) and then search for the code that deletes from that table and adjust it appropriately. Consider if a soft delete isn't a better option. This is where you mark a record as deleted or inactive, so it no longer shows up as a data entry option, but you can still see the existing records. Again you may need to adjust your database code fairly severly to implement this correctly. There is no easy fix for having a database that was badly designed from the start.
The soft delete is also a good choice if you think you will have many child records and actually do want to delete them. This way you can mark the records so they no longer show in the application and use a job that runs during non-peak hours to batch delete records.
If you are adding a new table and adding an FK, it is certainly easier to deal with becasue you would create the table before writing any code against it.
Your statement is simply not true. When establishing a foreign key relationship, you can set the cascading property to cascade delete. Once that's done, the child records will be deleted when the parent is deleted, ensuring that no records are orphaned.
If you use a proper ORM solution, configure FK's and PK's correctly, and enable cascading deletes, you shouldn't have any problems.
I wouldn't say so (to confirm what others mentioned) - that is usually taken care of with cascading deletes. Providing you want it that way - or with careful procedures that clean things behind.
The bigger system is you get to see more of the 'procedures' and less of the 'automation' (i.e. cascade deletes). For larger setups - DBA-s usually prefer to deal with that during database maintenance phase. Quite often, records are not allowed to be deleted, through middle-ware application code - but are simply marked as 'deleted' or inactive - and dealt with later on according to database routines and procedures in place in the organization (archived etc.).
And unless you have a very large code base, that's not a huge issue. Also, usually, most of the Db code goes through some DAL layer which can be easily traversed. Or you can also query system tables for all the relationships and 'dependencies' and many routines were written for such a code maintenance (on both sides of the 'fence'). It's not that it's not an 'issue', just nothing much different than normal Db work - and there're worse things than that.
So, I wouldn't lose my sleep over that. There are other issues with around using 'too much' of the referential integrity constraints (performance, maintenance) - but that is often a very controversial issue among DBA-s (and Db professionals in general), so I won't get into that:)
Related
I need to use one Access(2007)database on 2 offline locations and then get all the data back in one database. Some advised me to use SharePoint, but after some trial and frustration I wonder if it's really the best way.
Is it possible to manage this in an automated way, with update queries or so?
I have 26 tables, but only 14 need to be updated frequently. I use autonumber to create the parentkey and use cascade updating for the linked tables.
If your data can handle it, it's probably better to use a more natural key for the tables that require frequent updating. I.e. ideally you can uniquely identify a record my some combination of the columns in that record. Autonumbers in two databases can, and very likely will, step on each other, then when you do merge any records based on an old auto number need to be mapped properly. That can be done but is kind of a pain. It'd be nicer to avoid it all from the start.
As for using Sharepoint (I assume the suggestion is to replace your tables with lists, not to just put your accdb on SP) it has a lot of limitations in terms of the kinds of indices that can be created and relationships you can establish. Maybe your data are simple enough to live with this. I'm yet to be able to justify the move.
ultimate the answer to your question is YES it is possible to manage the synchonization with insert/update queries and very likely some VBA (possibly lots depending on how complicated your table hierarchy is). You'll need to be vigilant about two people updating a single record. You'll need to come up with some means to resolve the conflict.
After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?
I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.
There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.
Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.
As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.
In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.
Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.
I have a web appliaction with several entities (tables).Each one has his CRUD pages.
I'd like to add for some the, the ability to add comments and attach files.
I was thinking of two scenarios.
One table for all comments/files - table would have some id for the entity and the particular record.
For each entity a separate comments/files table.
The files would be stored on the disk in a directory.In the table would be the name of the file and some additional info.
In term of application Design having one unique table for all coments seems to make sense. In term of application code that mean the same SQL will be reused for all entities. It's the 'classical way' used by most applications, extending on having the same acitive records and controllers used to handle comments and attachments for all objects.
In term of SQL thesecond solution could be usefull in some databases like MySQL to get more Memory Cache benefit. Every comment/attachmlent added in the 1st solution would drop from the memory cache all requests impacting the comment table. With individual tables a comment on one entity would not invalidate queries on other entities. But you would alos require more file descriptors and a bigger table cache.... so to choose this solution you would need a decision based on real-life, precise, case, where you would be able to compare the benefits in database access speed. And when you will add new entities you'll certainly find your each-entity-have-a-comment-table solution boring, things could have been automated by using 1st solution.
It's a tradeoff. With a single comments table, you get a simple, DRY (don't repeat yourself) schema, but you don't get foreign key constraints and thus no cascade deletion. Thus, if you delete an entity with comments, you must also remember to delete the comments!
If you go with multiple comment tables, you get FK constraints and cascade deletion, but you have a "wet" schema (you are repeating yourself). For example, each comment table might have a commentbody column. If you change that column definition, you have to change it in every comment table!
One interesting solution for a DRY-er schema could involve table inheritance (see http://www.postgresql.org/docs/9.0/interactive/ddl-inherit.html) but please read section 5.8.1. Caveats, as there are some "gotchas" regarding indexing, at least in postgres.
Either way, kudos to you for thinking carefully about your database design!
I have an issue I am working with an existing SQL Server 2008 database: I need to occasionally change the primary key value for some existing records in a table. Unfortunately, there are about 30 other tables with foreign key references to this table.
What is the most elegant way to change a primary key and related foreign keys?
I am not in a situation where I can change the existing key structure, so this is not an option. Additionally, as the system is expanded, more tables will be related to this table, so maintainability is very important. I am looking for the most elegant and maintainable solution, and any help is greatly appreciated. I so far have thought about using Stored Procedures or Triggers, but I wanted some advice before heading in the wrong direction.
Thanks!
When you say "I am not in a situation where I can change the existing key structure" are you able to add the ON UPDATE CASCADE option to the foreign keys? That is the easiest way to handle this situation — no programming required.
As Larry said, On Update Cascade will work, however, it can cause major problems in a production database and most dbas are not too thrilled with letting you use it. For instance, suppose you have a customer who changes his company name (and that is the PK) and there are two million related records in various tables. On UPDATE Cascade will do all the updates in one transaction which could lock up your major tables for several hours. This is one reason why it is a very bad idea to have a PK that will need to be changed. A trigger would be just as bad and if incorrectly written, it could be much worse.
If you do the changes in a stored proc you can put each part in a separate transaction, so at least you aren't locking everything up. You can also update records in batches so that if you have a million records to update in a table, you can do them in smaller batches which will will run faster and have fewer locks. The best way to do this is to create a new record in the primary table with the new PK and then move the old records to the new one in batches and then delete the old record once all related records are moved. If you do this sort of thing, it is best to have audit tables so you can easily revert the data if there is a problem since you will want to do this in multiple transactions to avoid locking the whole database. Now this is harder to maintain, you have to remember to add to the proc when you add an FK (but you would have to remember to do on UPDATE CASCADE as well). On the other hand if it breaks due to a problem with a new FK, it is an easy fix, you know right what the problems is and can easily put a change to prod relatively quickly.
There are no easy solutions to this problem because the basic problem is poor design. You'll have to look over the pros and cons of all solutions (I would throw out the trigger idea as Cascade Update will perform better and be less subject to bugs) and decide what works best in your case. Remember data integrity and performance are critical to enterprise databases and may be more important than maintainability (heresy, I know).
If you have to update your primary key regularly then something is wrong there. :)
I think the simplest way to do it is add another column and make it the primary key. This would allow you to change the values easily and also related the foreign keys. Besides, I do not understand why you cannot change the existing key structure.
But, as you pointed in the question (and Larry Lustig commented) you cannot change the existing structure. But, I am afraid if it is a column which requires frequent updates then use of triggers could affect the performance adversely. And, you also say that as the system expands, more tables will be related to this table so maintainability is very important. But, a quick fix now will only worsen the problem.
The lead developer on a project I'm involved in says it's bad practice to rely on cascades to delete related rows.
I don't see how this is bad, but I would like to know your thoughts on if/why it is.
I'll preface this by saying that I rarely delete rows period. Generally most data you want to keep. You simply mark it as deleted so it won't be shown to users (ie to them it appears deleted). Of course it depends on the data and for some things (eg shopping cart contents) actually deleting the records when the user empties his or her cart is fine.
I can only assume that the issue here is you may unintentionally delete records you don't actually want to delete. Referential integrity should prevent this however. So I can't really see a reason against this other than the case for being explicit.
I would say that you follow the principle of least surprise.
Cascading deletes should not cause unexpected loss of data. If a delete requires related records to be deleted, and the user needs to know that those records are going to go away, then cascading deletes should not be used. Instead, the user should be required to explicitly delete the related records, or be provided a notification.
On the other hand, if the table relates to another table that is temporary in nature, or that contains records that will never be needed once the parent entity is gone, then cascading deletes may be OK.
That said, I prefer to state my intentions explicitly by deleting the related records in code, rather than relying on cascading deletes. In fact, I've never actually used a cascading delete to implicitly delete related records. Also, I use soft deletion a lot, as described by cletus.
I never use cascading deletes. Why? Because it is too easy to make a mistake. Much safer to require client applications to explicitly delete (and meet the conditions for deletion, such as deleting FK referred records.)
In fact, deletions per se can be avoided by marking records as deleted or moving into archival/history tables.
In the case of marking records as deleted, it depends on the relative proportion of marked as deleted data, since SELECTs will have to filter on 'isDeleted = false' an index will only be used if less than 10% (approximately, depending on the RDBMS) of records are marked as deleted.
Which of these 2 scenarios would you prefer:
Developer comes to you, says "Hey, this delete won't work". You both look into it and find that he was accidently trying to delete entire table contents. You both have a laugh, and go back to what you were doing.
Developer comes to you, and sheepishly asks "Do we have backups?"
There's another great reason to not use cascading UPDATES or DELETES: they hold a serializable lock. Holding a serializable lock can kill performance.
Another huge reason to avoid cascading deletes is performance. They seem like a good idea until you need to delete 10,000 records from the main table which in turn have millions of records in child tables. Given the size of this delete, it is likely to completely lock down all of the table for hours maybe even days. Why would you ever risk this? For the convenience of spending ten minutes less time writing the extra delete statements for one record deletes?
Further, the error you get when you try to delete a record that has a child record is often a good thing. It tells you that you don't want to delete this record becasue there is data that you need that you would lose if you did so. Cascade delete would just go ahead and delete the child records resulting in loss of information about orders for instance if you deleted a customer who had orders in the past. This sort of thing can thoroughly mess up your financial records.
I was likewise told that cascading deletes were bad practice... and thus never used them until I came across a client who used them. I really didn't know why I was not supposed to use them but thought they were very convenient in not having to code out deleting all the FK records as well.
Thus I decided to research why they were so "bad" and from what I've found so far their doesn't to appear to be anything problematic about them. In fact the only good argument I've seen so far is what HLGLEM stated above about performance. But as I am usually not deleting this number of records I think in most cases using them should be fine. I would like to hear of any other arguments others may have against using them to make sure I've considered all options.
I'd add that ON DELETE CASCADE makes it difficult to maintain a copy of the data in a data warehouse using binlog replication which is how most commercial ETL tools work. Explicit deletion from each table maintains a full log record and is much easier on the data team :)
I actually agree with most of the answers here, YET not all scenarios are the same, and it depends on the situation at hand and what would be the entropy of that decision, for example:
If you have a deletion command for an entity that has multiple many/belong relationships with a large number of entities, each time you would call that deletion process you would also need to remember to delete all the corresponding FKs from each relational pivot that A has corrosponding relationships with.
Whereas via a cascade on delete, you write that once as part of your schema and it will ONLY delete those corresponding FKs and cleanup the pivots from relations that are no longer necessary, imagine 24 relations for an entity + other entities that would also have large number of relations on top of that, again, it really depends on your setup and what YOU feel comfortable with. In anycase just for FYIs, in an Illuminate migration schema file, you would write it as such:
$table->dropForeign(['permission_id']);
$table->foreign('permission_id')
->references('id')
->on('permission')
->onDelete('cascade');