Are foreign key constraints needed? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In a perfect world, are foreign key constraints ever really needed?

Foreign keys enforce consistency in an RDBMS. That is, no child row can ever reference a non-existent parent.
There's a school of thought that consistency rules should be enforced by application code, but this is both inefficient and error-prone. Even if your code is perfect and bug-free and never introduces a broken reference, how can you be certain that everyone else's code that accesses the same database is also perfect?
When constraints are enforced within the RDBMS, you can rely on consistency. In other words, the database never allows a change to be committed that breaks references.
When constraints are enforced by application code, you can never be quite sure that no errors have been introduced in the database. You find yourself running frequent SQL scripts to catch broken references and correct them. The extra code you have to write to do this far exceeds any performance cost of the RDBMS managing consistency.

In addition to protecting the integrity of your data, FK constraints also help document the relationships between your tables within the database itself.

The world is not perfect that's why they are needed.

A world cannot be perfect without foreign keys.

Yes, if you want to ensure referential integrity.

In addition to consistency enforcement and documentation, they can actually speed up queries. The query optimizer can see a foreign constraint, understand its effect, and make a plan optimization that would be impossible w/o the constraint in place. See Foreign Key Constraints (Without NOCHECK) Boost Performance and Data Integrity. (SQL Server specific)

Additionally to the documentation effect Dave mentioned, FK constraints can help you to have write lesser code and automate some bits.
If you for example delete a customer record, all his invoices and invoice lines are also deleted automatically if you have "ON DELETE CASCADE" on their FK constrainst.

Related

Should I have parent entities for tables that share many attributes? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to design the schema for a live-action game application that supports multiple games. I have several tables that share multiple attributes such as: Assassins_Participants, Zombies_Participants, Traitors_Participants and Group_Messages, User_Messages, Game_Messages.
Should I use some sort of inheritance (i.e. create Participants and Messages tables) or should I leave it as is? If I should create parent tables, how should I go about it?
Also, any other critiques on my schema are welcome! I want to catch mistakes while I am early in the process. The link below is the current schema for my database.
Previous Design
Updated Design
Got a bit long for comments, so here's an 'answer'. Composite keys aren't a bad thing. (But I don't use them.) The benefit of a unique synthetic key (identity column or UUID) is that it's stable. You add it once, leave it alone, and never have to update it. Like the old saying goes "smart numbers aren't." But one problem with synthetic keys is that they can obscure problems with the "real" key on the data. Say that you need uniqueness on three fields, more of more of which might change. Okay, that's a good place for a unique, synthetic key as long as you still enforce the uniqueness on the three fields. Postgres is great at this.
A synthetic PK is an implementation convenience, it's less important than your real-world rule. If that wasn't clear, the point is that if, say, three fields must be unique, that needs to be checked. The uniqueness here is based on the real world, as you've modeled it. Put another way, you can bolt a synthetic number/UUID onto the row, and voila! It's unique! But not in a useful way. So, use the synthetic PK, but add a unique index on the composites. This way, if any of the combined values change and violate your uniqueness rule, the engine blocks the insert/update. But you don't have to get into the messy business of reworking a PK which may be used elsewhere as a FK. For some docs, see:
https://www.postgresql.org/docs/current/index-unique-checks.html
https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-UNIQUE-CONSTRAINTS.
On the question “should I have several *_participant tables or not?”:
The big advantage of having a single table is that you can have foreign key relationships between participants and other entities.
If most of the attributes are the same, use a single table with a type column that has all possible attributes and CHECK constraints to make sure the right ones are set.
If there are many attributes and big differences between the attributes of certain types of participants, you can put these extra attributes into type specific tables that have a foreign key relationship with the common participants table that holds the common attributes.
That latter technique can also be useful if you need foreign key relationships that involve only certain types of participants.

Lookup tables implementation - one table or separate tables [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am going to implement several lookup tables in a system. In general all lookup tables have the same structure like
id, name, value, rank, active
We are using AngularJS as front-end and Web API/Entity framework as backend in this project
There are some options on my mind
Option 1 - Create a set of lookup tables with the same structure
e.g. LKRegion, LKStatus, LKPeriod, LKState, LKDepartment, etc.
This option is a traditional design. The data schema is structural and easy to understand. It is easy to implement/enforce foreign key integrity. But you have to create separated web methods to handle CRUD actions. You have to repeat the same thing if you have another lookup table to add in the future.
Option 2 - Create a big lookup table by adding an extra column called LookupType to identify the lookup group
This option reduces the number of tables. Make the lookup table easy to maintain and retrieve (e.g. One schema, one web method can handle all general lookup CRUD actions). But the foreign key integrity is a little bit loose due to the LookupType.
Please share your preference and the tell me why. I would like to get the best practise on this implementation. Thank you!
I'll defend Option 2, although in general, you want Option 1. As others have mentioned, Option 1 is the simpler method and easily allows foreign key relationships.
There are some circumstances where having a single reference table is handy. For instance, if you are writing a system that will support multiple human languages, then having a single reference table with the names of things is much simpler than a zillion reference tables spread throughout the database. Or, I suppose, you could have very arcane security requirements that require complex encryption algorithms -- and dealing with a single table is easier.
Nevertheless, referential integrity on reference tables is important. Some databases have non-trigger-based mechanisms that will support referential integrity for one table (foreign keys and computed columns in Oracle and SQL Server). These mechanisms are a bit cumbersome but they do allow different foreign key references to a single table. And, you can always enforce referential integrity using triggers, although I don't recommend that approach.
As with most things that databases do, there isn't a right answer. There is a generally accepted answer that works/is correct in most cases (Option 1). The second option would only be desirable under limited circumstances depending on system requirements.
I suggest that :
A. Follow the organization standard if this is an enterprise system (some may laugh loud on this, I know). If such a thing exists, it would certainly promote individual tables.
B. Use Enums or 1 aggregated lookup table for programming level lookups only (such as error messages, etc,) if you must only. Any lookup data for business related data should be (in my opinion) be in a separate table for the following reasons at least:
When you have separate tables, you need to use the correct table name when you join and not use a code column of the reference table. This makes writing queries less error prone. Writing "Select ... Where (TableID=12 and State="NY") AND (TableId=133 and Country="USA")"...style of coding is quite error prone during development. This is the major issue for me from coding perspective.
RI errors on inserts and updates may be ambiguous when there is more 1 than reference to the lookup in the row being inserted or updated.
In some cases, the a lookup table may have self references (relationships). For example, a Geographical location can be described as a hierarchy which would add more confusion to the model.
The relationships (references) could loose meaning in your database. You will find that almost every table in your system is linked to this one table. It some how will not make sense.
If you ever decided to allow the user to perform ad-hoc reporting, it would be difficult for them to use codes for lookup tables instead of names.
I feel that the 1 table approach breaks Normalization concepts - Can't prove it now though.
An disadvantage, is that you may need to build an indexes on PKs and FKs for some (or all) of the separate tables. However, in the world of powerful database available today, this may not be a big deal.
There are plenty of discussion in the net that I should have read before answering your question, however I present some of the links if you care to take a look at some yourself:
Link 1, Link 2, Link 3, Link 4, Link 5...
Avoid option 2 at all costs, go with option 1 without even thinking about it.(*)
Referential integrity is far too important to compromise in favour of virtually any other concern.
If there you go, only pain will you find.
If you want to reduce duplication, implement a list of services in your web-api implementation language (java?) and parametrize each service with the name of the lookup table to work with.
Edit
(*) It was wrong on my behalf to say "without even thinking about it". Of course, think about it. If need be, go ahead and even post a question on stackoverflow about it. Thinking is good, and Gordon Linoff's answer above demonstrates this nicely.

PostgreSQL: How safe is it to rely on default constraint names? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
PostgreSQL provides the ability to magically generate constraint names in statements like CREATE TABLE, ALTER TABLE if none are provided explicitly. The naming convention is well known and I personally like it very much. But how stable and official is it? Is it something which one can rely on for different major releases or even the next 50 years?
I always had the impression that this is an implementation detail and while a lot of people rely on it, one shouldn't and always use explicit names to properly document things instead. I think I've read something like that in the official documentation in the past, but couldn't find it anymore...
So is there a definitive, official statement how reliable this naming scheme is or if users should always try to provide explicit names?
Strictly, if it's not in the documentation, you should not rely on it.
The docs only say:
If you don't specify a constraint name in this way, the system chooses a name for you.
so strictly I should recommend not baking the constraint names into the application unless you specify them explicitly in the SQL. This will also make the connection more apparent when reading the SQL - you bothered to specify constraint names for a reason.
That said, constraint name generation has not AFAIK changed, at least since I started using Pg around 7.4. So while it's not part of the official documented API, it's probably also not especially bad to rely on it. Also, constraint names are always going to be preserved by pg_dump and pg_upgrade, so it likely doesn't matter much unless you are doing a clean reload into a new version that has changed default constraint name generation.
TL;DR: It doesn't look like they're officially defined and documented, but they're unlikely to change, and if they do the impact is minimal. So relying on them is probably OK. Just document that in the app.

why there is databases with no RELATIONSHIP between their tables? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I didn't ever read about ignoring RELATIONSHIPS in databases with tables that have logically relate to each other.
My question is, Not defining RELATIONSHIPS in a DB is a particular way of getting something? IMHO because of some problems like cascade update deletes or other constraints RELATIONSHIPS makes for developers.
Historically, one reason for not defining relationships was improving performance. Checking referential integrity takes time; some systems try to save on it, by claiming that their code has enough checks that the additional verifications inside the RDBMS itself would be redundant.
This rationale is rarely good these days. The only situation when I think it may be applicable is when the entire schema is managed by a framework-type product, with 100% generated table structures, 100% generated queries, and zero need for manual tweaking. In situations like that all you need is tables and indexes. Of course the product that manages such database as its private "storage back end" needs to be extremely reliable to avoid creating orphaned rows, dangling row references, and other unpleasant things that flourish in the absence of referential integrity checks.
When I worked on a product like that in late nineties, we never generated any referential integrity constraints. However, my experience in tracking down problems with the product has been that a significant portion of issues that we've seen in the field could have been detected early with help of referential integrity constraints. That is why I think that the "check redundancy" rationale is flawed, and should not be considered "best practice".
Is there any reason/best practice for not implementing relationships between tables?
Primary and Foreign key constraints haven't always existed. (Citation needed) Sometimes in the early days, they were maintained in code only. Or relationships may have been implemented as unique indexes on the tables rather than PK/FK relationships.
The rational at the time was that when moving data around, key constraints became cumbersome to manage, there's an overhead associated to them, and people can do stupid things with them at times like cascade update when they shouldn't be cascaded updated because the new developer doesn't understand the whole system.
There is an overhead to primary keys they usually represent some arbitrary system assigned value that has no meaning other than to the system. Because of the early costs of storage, databases would be designed using combined keys with information that was required to save space. Yes, it was that important to save space. Was it the right thing to do in terms of current database design and modeling, no. But at the time, given the limits of systems, it was the most economical.
Now if this database was created in the past 15-20 years... Some of those reasons go away. If it's beyond 20 years old. I could see why it might not have the constraints.

Primary Key Change Force Foreign Key Changes

I have an issue I am working with an existing SQL Server 2008 database: I need to occasionally change the primary key value for some existing records in a table. Unfortunately, there are about 30 other tables with foreign key references to this table.
What is the most elegant way to change a primary key and related foreign keys?
I am not in a situation where I can change the existing key structure, so this is not an option. Additionally, as the system is expanded, more tables will be related to this table, so maintainability is very important. I am looking for the most elegant and maintainable solution, and any help is greatly appreciated. I so far have thought about using Stored Procedures or Triggers, but I wanted some advice before heading in the wrong direction.
Thanks!
When you say "I am not in a situation where I can change the existing key structure" are you able to add the ON UPDATE CASCADE option to the foreign keys? That is the easiest way to handle this situation — no programming required.
As Larry said, On Update Cascade will work, however, it can cause major problems in a production database and most dbas are not too thrilled with letting you use it. For instance, suppose you have a customer who changes his company name (and that is the PK) and there are two million related records in various tables. On UPDATE Cascade will do all the updates in one transaction which could lock up your major tables for several hours. This is one reason why it is a very bad idea to have a PK that will need to be changed. A trigger would be just as bad and if incorrectly written, it could be much worse.
If you do the changes in a stored proc you can put each part in a separate transaction, so at least you aren't locking everything up. You can also update records in batches so that if you have a million records to update in a table, you can do them in smaller batches which will will run faster and have fewer locks. The best way to do this is to create a new record in the primary table with the new PK and then move the old records to the new one in batches and then delete the old record once all related records are moved. If you do this sort of thing, it is best to have audit tables so you can easily revert the data if there is a problem since you will want to do this in multiple transactions to avoid locking the whole database. Now this is harder to maintain, you have to remember to add to the proc when you add an FK (but you would have to remember to do on UPDATE CASCADE as well). On the other hand if it breaks due to a problem with a new FK, it is an easy fix, you know right what the problems is and can easily put a change to prod relatively quickly.
There are no easy solutions to this problem because the basic problem is poor design. You'll have to look over the pros and cons of all solutions (I would throw out the trigger idea as Cascade Update will perform better and be less subject to bugs) and decide what works best in your case. Remember data integrity and performance are critical to enterprise databases and may be more important than maintainability (heresy, I know).
If you have to update your primary key regularly then something is wrong there. :)
I think the simplest way to do it is add another column and make it the primary key. This would allow you to change the values easily and also related the foreign keys. Besides, I do not understand why you cannot change the existing key structure.
But, as you pointed in the question (and Larry Lustig commented) you cannot change the existing structure. But, I am afraid if it is a column which requires frequent updates then use of triggers could affect the performance adversely. And, you also say that as the system expands, more tables will be related to this table so maintainability is very important. But, a quick fix now will only worsen the problem.