Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I didn't ever read about ignoring RELATIONSHIPS in databases with tables that have logically relate to each other.
My question is, Not defining RELATIONSHIPS in a DB is a particular way of getting something? IMHO because of some problems like cascade update deletes or other constraints RELATIONSHIPS makes for developers.
Historically, one reason for not defining relationships was improving performance. Checking referential integrity takes time; some systems try to save on it, by claiming that their code has enough checks that the additional verifications inside the RDBMS itself would be redundant.
This rationale is rarely good these days. The only situation when I think it may be applicable is when the entire schema is managed by a framework-type product, with 100% generated table structures, 100% generated queries, and zero need for manual tweaking. In situations like that all you need is tables and indexes. Of course the product that manages such database as its private "storage back end" needs to be extremely reliable to avoid creating orphaned rows, dangling row references, and other unpleasant things that flourish in the absence of referential integrity checks.
When I worked on a product like that in late nineties, we never generated any referential integrity constraints. However, my experience in tracking down problems with the product has been that a significant portion of issues that we've seen in the field could have been detected early with help of referential integrity constraints. That is why I think that the "check redundancy" rationale is flawed, and should not be considered "best practice".
Is there any reason/best practice for not implementing relationships between tables?
Primary and Foreign key constraints haven't always existed. (Citation needed) Sometimes in the early days, they were maintained in code only. Or relationships may have been implemented as unique indexes on the tables rather than PK/FK relationships.
The rational at the time was that when moving data around, key constraints became cumbersome to manage, there's an overhead associated to them, and people can do stupid things with them at times like cascade update when they shouldn't be cascaded updated because the new developer doesn't understand the whole system.
There is an overhead to primary keys they usually represent some arbitrary system assigned value that has no meaning other than to the system. Because of the early costs of storage, databases would be designed using combined keys with information that was required to save space. Yes, it was that important to save space. Was it the right thing to do in terms of current database design and modeling, no. But at the time, given the limits of systems, it was the most economical.
Now if this database was created in the past 15-20 years... Some of those reasons go away. If it's beyond 20 years old. I could see why it might not have the constraints.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to design the schema for a live-action game application that supports multiple games. I have several tables that share multiple attributes such as: Assassins_Participants, Zombies_Participants, Traitors_Participants and Group_Messages, User_Messages, Game_Messages.
Should I use some sort of inheritance (i.e. create Participants and Messages tables) or should I leave it as is? If I should create parent tables, how should I go about it?
Also, any other critiques on my schema are welcome! I want to catch mistakes while I am early in the process. The link below is the current schema for my database.
Previous Design
Updated Design
Got a bit long for comments, so here's an 'answer'. Composite keys aren't a bad thing. (But I don't use them.) The benefit of a unique synthetic key (identity column or UUID) is that it's stable. You add it once, leave it alone, and never have to update it. Like the old saying goes "smart numbers aren't." But one problem with synthetic keys is that they can obscure problems with the "real" key on the data. Say that you need uniqueness on three fields, more of more of which might change. Okay, that's a good place for a unique, synthetic key as long as you still enforce the uniqueness on the three fields. Postgres is great at this.
A synthetic PK is an implementation convenience, it's less important than your real-world rule. If that wasn't clear, the point is that if, say, three fields must be unique, that needs to be checked. The uniqueness here is based on the real world, as you've modeled it. Put another way, you can bolt a synthetic number/UUID onto the row, and voila! It's unique! But not in a useful way. So, use the synthetic PK, but add a unique index on the composites. This way, if any of the combined values change and violate your uniqueness rule, the engine blocks the insert/update. But you don't have to get into the messy business of reworking a PK which may be used elsewhere as a FK. For some docs, see:
https://www.postgresql.org/docs/current/index-unique-checks.html
https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-UNIQUE-CONSTRAINTS.
On the question “should I have several *_participant tables or not?”:
The big advantage of having a single table is that you can have foreign key relationships between participants and other entities.
If most of the attributes are the same, use a single table with a type column that has all possible attributes and CHECK constraints to make sure the right ones are set.
If there are many attributes and big differences between the attributes of certain types of participants, you can put these extra attributes into type specific tables that have a foreign key relationship with the common participants table that holds the common attributes.
That latter technique can also be useful if you need foreign key relationships that involve only certain types of participants.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am going to implement several lookup tables in a system. In general all lookup tables have the same structure like
id, name, value, rank, active
We are using AngularJS as front-end and Web API/Entity framework as backend in this project
There are some options on my mind
Option 1 - Create a set of lookup tables with the same structure
e.g. LKRegion, LKStatus, LKPeriod, LKState, LKDepartment, etc.
This option is a traditional design. The data schema is structural and easy to understand. It is easy to implement/enforce foreign key integrity. But you have to create separated web methods to handle CRUD actions. You have to repeat the same thing if you have another lookup table to add in the future.
Option 2 - Create a big lookup table by adding an extra column called LookupType to identify the lookup group
This option reduces the number of tables. Make the lookup table easy to maintain and retrieve (e.g. One schema, one web method can handle all general lookup CRUD actions). But the foreign key integrity is a little bit loose due to the LookupType.
Please share your preference and the tell me why. I would like to get the best practise on this implementation. Thank you!
I'll defend Option 2, although in general, you want Option 1. As others have mentioned, Option 1 is the simpler method and easily allows foreign key relationships.
There are some circumstances where having a single reference table is handy. For instance, if you are writing a system that will support multiple human languages, then having a single reference table with the names of things is much simpler than a zillion reference tables spread throughout the database. Or, I suppose, you could have very arcane security requirements that require complex encryption algorithms -- and dealing with a single table is easier.
Nevertheless, referential integrity on reference tables is important. Some databases have non-trigger-based mechanisms that will support referential integrity for one table (foreign keys and computed columns in Oracle and SQL Server). These mechanisms are a bit cumbersome but they do allow different foreign key references to a single table. And, you can always enforce referential integrity using triggers, although I don't recommend that approach.
As with most things that databases do, there isn't a right answer. There is a generally accepted answer that works/is correct in most cases (Option 1). The second option would only be desirable under limited circumstances depending on system requirements.
I suggest that :
A. Follow the organization standard if this is an enterprise system (some may laugh loud on this, I know). If such a thing exists, it would certainly promote individual tables.
B. Use Enums or 1 aggregated lookup table for programming level lookups only (such as error messages, etc,) if you must only. Any lookup data for business related data should be (in my opinion) be in a separate table for the following reasons at least:
When you have separate tables, you need to use the correct table name when you join and not use a code column of the reference table. This makes writing queries less error prone. Writing "Select ... Where (TableID=12 and State="NY") AND (TableId=133 and Country="USA")"...style of coding is quite error prone during development. This is the major issue for me from coding perspective.
RI errors on inserts and updates may be ambiguous when there is more 1 than reference to the lookup in the row being inserted or updated.
In some cases, the a lookup table may have self references (relationships). For example, a Geographical location can be described as a hierarchy which would add more confusion to the model.
The relationships (references) could loose meaning in your database. You will find that almost every table in your system is linked to this one table. It some how will not make sense.
If you ever decided to allow the user to perform ad-hoc reporting, it would be difficult for them to use codes for lookup tables instead of names.
I feel that the 1 table approach breaks Normalization concepts - Can't prove it now though.
An disadvantage, is that you may need to build an indexes on PKs and FKs for some (or all) of the separate tables. However, in the world of powerful database available today, this may not be a big deal.
There are plenty of discussion in the net that I should have read before answering your question, however I present some of the links if you care to take a look at some yourself:
Link 1, Link 2, Link 3, Link 4, Link 5...
Avoid option 2 at all costs, go with option 1 without even thinking about it.(*)
Referential integrity is far too important to compromise in favour of virtually any other concern.
If there you go, only pain will you find.
If you want to reduce duplication, implement a list of services in your web-api implementation language (java?) and parametrize each service with the name of the lookup table to work with.
Edit
(*) It was wrong on my behalf to say "without even thinking about it". Of course, think about it. If need be, go ahead and even post a question on stackoverflow about it. Thinking is good, and Gordon Linoff's answer above demonstrates this nicely.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Ok ok I know you probably all going to kill me for asking this, however I got into an friendly programmer argument with a co-worker about one of our database tables and he asked a question which I know the answer to but I couldn't explain it is the better way.
I will simplify the situation for the simplicity of the question, We have a fairly large table of people / users. Now amongst other data being stored the data in question is as follows: we have a simNumber, cellNumber and the ipAddress of that sim.
Now I am saying that we should make a table lets call it SimTable and put those 3 entries in the sim table, and then put a FK in the UsersTable linking the two. Why? Because that's what I have always been taught NORMALISE your tables!!! Ok so all is good in that regard.
But now my friend says to me yes, but now when you want to query a users phone number, SQL now has to go and:
search for the user
search for the sim fk
search for the correct sim row in the sim database
get the phone number
Now when I go and request 10000 users phone numbers, the number of operations done seriously grows in size.
Vs the other approach
search for the user
find the phone number
Now the argument is purely performance based. As much as I understand why we do normalize the data (to remove redundant data, maintainability, make changes to data in one table which propagate up etc.. ) It does appear to me that the approach with the data in one table will be faster or will at least less tasks/ operations to give me the data I want?
So what is the case in this situation? I do hope that I have not asked anything insanely silly , it is early in the morning so do forgive me if im not thinking clearly
The technology involved in MS SQL server 2012
[EDIT]
This article below also touches on some pf the concepts I have mentioned above
http://databases.about.com/od/specificproducts/a/Should-I-Normalize-My-Database.htm
The goal of normalization is not performance. The goal is to model your data correctly with minimum redundancy so you avoid data anomalies.
Say for example two users share the same phone. If you store the phones in the user table, you'd have sim number, IP address, and cell number stored one each user's row.
Then you change the IP address on one row but not the other. How can one sim number have two IP addresses? Is that even valid? Which one is correct? How would you fix such discrepancies? How would you even detect them?
There are times when denormalization is worthwhile, if you really need to optimize data access for one query that you run very frequently. But denormalization comes at a cost, so be prepared to commit yourself to a lot more manual work to take responsibility for data integrity. More code, more testing, more cleanup tasks. Do those count when considering "performance" of the project overall?
Re comments:
I agree with #JoelBrown, as soon as you implement your first case of denormalization, you compromise on data integrity.
I'll expand on what Joel mentions as "well-considered." Denormalization benefits specific queries. So you need to know which queries you have in your app, and which ones you need to optimize for. Do this conservatively, because while denormalization can help a specific query, it harms performance for all other uses of the same data. So you need to know whether you need to query the data in different ways.
Example: suppose you are designing a database for StackOverflow, and you want to support tags for questions. Each question can have a number of tags, and each tag can apply to many questions. The normalized way to design this is to create a third table, pairing questions with tags. That's the physical data model for a many-to-many relationship:
Questions ----<- QuestionsTagged ->---- Tags
But you figure you don't want to do the join to get tags for a given question, so you put tags into a comma-separated string in the questions table. This makes it quicker to query a given question and its associated tags.
But what if you also want to query for one specific tag and find its related questions? If you use the normalized design, it's simply a query against the many-to-many table, but on the tag column.
But if you denormalize by storing tags as a comma-separated list in the Questions table, you'd have to search for tags as substrings within that comma-separated list. Searching for substrings can't be indexed with a standard B-tree style index, and therefore searching for related questions becomes a costly table-scan. It's also more complex and inefficient to insert and delete a tag, or to apply constraints like uniqueness or foreign keys.
That's what I mean by denormalization making an improvement for one type of query at the expense of other uses of the data. That's why it's a good idea to start out with everything in normal form, and then refactor to denormalized designs later on a case by case basis as your bottlenecks reveal themselves.
This goes back to old wisdom:
"Premature optimization is the root of all evil" -- Donald Knuth
In other words, don't denormalize until you can demonstrate during load testing that (a) it makes a real improvement to performance that justifies the loss of data integrity, and (b) it does not degrade performance of other cases unacceptably.
It sounds like you already understand the benefits of normalisation, so I won't cover these.
There are a couple of considerations here:
1. Does a user always have one and only phone number?
If so, then it is still normalised to add these to the user table. However, if the user can have either no phone number or multiple phone numbers, then the phone details should be held in a seperate table.
Assuming you have these in seperate tables, but after conducting performance tests you found that joining on these 2 tables was having a significant effect on performance, then you may choose to deliberately denormalise the tables for performance gains.
Others have already provided some good points and you may also want to take a look at this.
I'd just like to mention one more aspect that is often overlooked: I/O tends to be the greatest component of the cost of most queries, and denormalization generally increases the storage size of data, therefore making the DBMS cache "smaller".
If your normalized database fits into cache and denormalized doesn't, you may actually observe a performance decrease for the latter.
And you won't be able to spot that in development, unless you actually have the amount of data that is similar to production. This is one of many reasons why you should never, ever denormalize without solid measurements (on representative amounts of data) to justify it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In a perfect world, are foreign key constraints ever really needed?
Foreign keys enforce consistency in an RDBMS. That is, no child row can ever reference a non-existent parent.
There's a school of thought that consistency rules should be enforced by application code, but this is both inefficient and error-prone. Even if your code is perfect and bug-free and never introduces a broken reference, how can you be certain that everyone else's code that accesses the same database is also perfect?
When constraints are enforced within the RDBMS, you can rely on consistency. In other words, the database never allows a change to be committed that breaks references.
When constraints are enforced by application code, you can never be quite sure that no errors have been introduced in the database. You find yourself running frequent SQL scripts to catch broken references and correct them. The extra code you have to write to do this far exceeds any performance cost of the RDBMS managing consistency.
In addition to protecting the integrity of your data, FK constraints also help document the relationships between your tables within the database itself.
The world is not perfect that's why they are needed.
A world cannot be perfect without foreign keys.
Yes, if you want to ensure referential integrity.
In addition to consistency enforcement and documentation, they can actually speed up queries. The query optimizer can see a foreign constraint, understand its effect, and make a plan optimization that would be impossible w/o the constraint in place. See Foreign Key Constraints (Without NOCHECK) Boost Performance and Data Integrity. (SQL Server specific)
Additionally to the documentation effect Dave mentioned, FK constraints can help you to have write lesser code and automate some bits.
If you for example delete a customer record, all his invoices and invoice lines are also deleted automatically if you have "ON DELETE CASCADE" on their FK constrainst.
does setting up proper relationships in a database help with anything else other than data integrity?
do they improve or hinder performance?
As long as you have the obvious indexes in place corresponding to the foreign keys, there should be no perceptible negative effect on performance. It's one of the more foolproof database features you have to work with.
I'd have to say that proper relationships will help people to understand the data (or the intention of the data) better than if omitting them, especially as the overall cost is quite low in maintaining them.
Their presence doesn't hinder performance except in terms of architecture (as others have pointed out, data integrity will occasionally cause foreign key violations which may have some effect) but IMHO is outweighed by the many benefits (if used correctly).
I know you weren't asking whether to use FKs or not, but I thought I'd just add a couple of viewpoints about why to use them (and have to deal with the consequences):
There are other considerations too, such as if you ever plan to use an ORM (perhaps later on) you'll require foreign keys. They can also be very helpful for ETL/Data Import and Export and later for reporting and data warehousing.
It's also helpful if other applications will make use of the schema - since Foreign Keys implement a basic business logic. So your application (and any others) only need to be aware of the relationships (and honour them). It'll keep the data consistent and most likely reduce the number of data errors in any consuming applications.
Lastly, it gives you a pretty decent hint as to where to put indexes - since it's likely you'll lookup table data by an FK value.
It neither helps nor hurts performance in any significant way. The only hindrance is the check for integrity when inserting/updating/deleting.
Foreign keys are an important part of database design because they ensure consistency. You should use them because it offers the lowest level of protection against data screw ups that can wreck your applications. Another benefit is that database tools (visualization/analysis/code generation) use foreign keys to relate data.
Do relationships in databases improve or hinder performance?
Like any tool in your toolbox, the results you'll get depend on how you use it. Properly specified relationships and a well-designed logical database can be an enormous boon to performance -- consider the difference between searching through normalized and denormalized data, for example.
Depending on your database engine, relationships defined through foreign key constraints can benefit performance. The constraint allows the engine to make certain assumptions about the existence of data in tables on the parent side of the key.
A brief explanation for MS SQL Server can be found at http://www.microsoft.com/technet/abouttn/flash/tips/tips_122104.mspx. I don't know about other engines, but the concept would make sense in other platforms.
Relationships in the data exist whether you declare them or not. Declaring and enforcing the relationships via FK constraints will prevent certain kinds of errors in the data, at a small cost of checking data when inserts/updates/deletes occur.
Declaring cascading deletes via relationships helps prevent certain kinds of errors when deleting data.
Knowing the relationships helps to make flexible and correct use of the data when forming queries.
Designing the tables well can make the relationships more obvious and more useful. Using relationships in the data is the primary power behind using relational databases in the first place.
About impact on performance: In my experience with MS Access 2003, if you have a multi-user application and use Relationships to enforce a lot of referential integrity, you can take a big hit in terms of response time for the end-user.
There are different ways to take care of enforcing referential integrity. I decided to take out some rules in Relationships, build more enforcement into the front-end and live with some loss of RI. Of course in the multi-user environment, you want to be very careful with that bit of liberty.
In my experience building performance-sensitive databases, Foreign Keys hurt performance pretty significantly, since they have to be checked every time the referring record is inserted/updated or master record is deleted. If you need a proof, just look at the execution plan.
I still keep them for documentation and for tools to use but I usually disable them, especially in high-performance systems where access to DB is only through the application layer.