Why to have a join table for 1:m relation in SQL - sql

What is the benefit of having junction tables between the first 1:m and the second 1:m relations in the following database?
alt text http://dl.getdropbox.com/u/175564/db/db-simple.png
The book Joe Celko's trees and hierarchies in SQL for Smarties says that the reason is to have unique relations in 1:m's. For instance, the following tables resrict users to ask the exactly same question twice and to give exactly the same answer twice, respectively.
The first 1:m relation
users-questions
===============
user_id REFERENCES users( user_id )
question_id REFERENCES questions ( question_id )
PK( user_id, question_id) // User is not allowed to ask same question twice
The second 1:m relation
questions-answers
=================
question_id REFERENCES questions( question_id)
answer_id REFERENCES answers( aswer_id )
PK( question_id, answer_id ) // Question is not allowed to have to same answers
This benefit about uniqueness does not convince me to make my code more challenging.
I cannot understand why I should restrict the possibility of having questions or answers with the same ID in the db, since I can perhaps use PHP to forbid that.

Well, the unique relations thing seems nonsensical to me, probably because I'm used to DBMSes where you can define unique keys other than the primary key. In my world, mapping tables like those are how you implement a many-to-many relationship, and using them for a one-to-many relationship is madness — I mean, if you do that, maybe you intend for the relationship to be used as one-to-many, but what you've actually implemented is many-to-many support.
I don't agree with what you're saying about there being no utility to unique compound keys in the persistence layer because you can enforce that in the application layer, though. Persistence-layer uniqueness constraints have a lot of difficult-to-replicate benefits, such as, in MySQL, the ability to take advantage of INSERT ... ON DUPLICATE KEY UPDATE.

Its usually due to duplication of data.
As for your reasoning, yes you can enforce this in the business layer, but if you make a mistake, it could break a significant amount of code. The issue you have is your data model may have only a few tables. Lucky you. When your data model grows, if you can't make sense of the structure and you have to put all the logic to maintain denormalised tables in your GUI layer you could very easily run into problems. Note that it is hard to make things threadsafe on a GUI for your SQL Database without using locking which will destroy your performance.
DBMS are very very good at dealing with these problems. You can keep your data model clean and use indexing to provide you with the speed you need. Your goal should be to get it right first, and only denormalise your tables when you can see a clear need to do so (for performance etc.)
Believe it or not, there are many situations where having normalised data makes your life easier, not harder when it comes to your application. For instance, if you have one big table with questions and answers, you have to write code to check if it is unique. If you have a table with a primary key, you simply write
insert into table (col1, col2) values (#id, #value) --NOTE: You would probably
--make the id column an autonumber so you dont have to worry about this
The database will prevent you from inserting if you have a non unique value there OR if you are placing in an answer with no question. All you need to do is check whether the insertion worked, nothing more. Which one do you think is less code?

I agree that the join table for a one-to-many in this situation doesn't seem to add much benefit, and as #chaos says, you actually end up implementing many-to-many support. But Joe Celko is a smart guy - is this really the exact answer he gives?
One other possible reason for implementing a join table on a one-to-many is that it completely separates questions/answers from a dependence on users.
For example, say you added a Dogs tables and an Deities table. We all know that dogs can't register as users because they don't have email addresses, and gods don't register as users because, well, it's beneath them. Maybe dogs and gods still ask questions though, but to do that you might want to implement a dogs-questions table and a deities-questions table. In theory this is still many-to-many, but in practice you do it so that you can have multiple one-to-manys.

Related

SQL relationship table

I have theoretical question about SQL design. When I have 10s of tables, between which I need relations m:n, is better approach to do relation table for each required pair or is it possible (or from performance view better) to have one relation table with columns (id,table1,row1,table2,row2), with integers in it?
I am an Informatics student and based on what I have learned in class it is always better to create a table in between the two tables in order for it to hold the primary keys of each of those two tables. This table will hold the relations like the following:
student: student_id, first_name, last_name
classes: class_id, name, teacher_id
student_classes: class_id, student_id # the relations table
Of course it would be better to have a more detailed example, at least for a smaller scope, which are tables, why do you need that relation etc.
But generally, in my opinion, database design should follow logic of your data, so relations should be where they are needed.
Other thing, if you will have only one table for all relations, it will be kind of bottle neck for the whole application. So then, how critical this situation is, also depends, on how are you going to use it in application (mostly reads, or lots of writes/updates/deletes, how many rows.. etc).
Also this consideration also could be somewhat even dependent on what RDBMS you're using.
Depends on the case, sometimes it's better to follow normal forms for designing (https://www.sqlshack.com/what-is-database-normalization-in-sql-server/), but after years working in software... from experience that is lost with time, the original design gets messy unless there is a team for that.

What are the consequences of violating a one-to-many relationship & when to use a many-to-many?

Considering the amount of order records are many magnitudes larger then tags:
I’ve read a SO question’s comment where inserting order# 68 as pictured wouldn’t cause any trouble but if I wanted to query orders by certain tags, a many-to-many is more appropriate/convenient/efficient since otherwise in a one-to-many every single order will have to be checked to know how its tags, is this true?
a. I also recall having read that many-to-many relationships cost the most in resources and don’t perform well yet also read claims that the performance losses are negligible and not worth the risk and overhead of handling orphan records of many-to-many relationships, any comments?
b. Based on other readings I was convinced to model a many-to-many via making 2 one-to-many tables with a joint table of FKs, if there is no longer a many-to-many but 2 one-to-many tables are the aforementioned cons of many-to-many avoided?
In a similar question I’ve posted in another forum I was told that I couldn’t even insert order# 68 and doing so would cause referential integrity issues, which nearly doesn’t make sense, is this also true?
I want to rid myself of the conflicting and auto contradictory posts I’ve read.
The world is a complex place and your questions assume there is one answer to all questions. In fact for the the questions you ask the answer is one to many.
The insert you describe could cause a problem, or it could not cause a problem. This depends on if you added rules to your database to enforce referential integrity. Some people like to do such things -- I don't. I have a data model that says that a relationship is 1 to many and I assume people will follow it because it will break if they don't. Making the database enforce this rule just slows things down. In my opinion it is better to have everything break once a year, track down the problem, flog the new programmer with a wet noodle and move on.
As for the rest of 1. this just depends on what makes sense for the data you are working with -- Who can say in the general case.
See 1. Yes it is possible to have the database stop you from the insert, but I don't advise setting up your databases in this way. UNLESS Order# is an index that is defined as unique -- in that case it would give an error when the duplicate insert was attempted.
Based on your image I would say that you have a one to many relationship between tblTAG and tblOrder.
I dont see a many to many relationship in this image and adding the records will not produce any problems at all.
tblTAG.TagID 1 can exists as many times in tblORDER as you want without problems.
Also a query like
select * from tblORDER where TagID = 1
will perform very fast because the foreign key in tblORDER will have created an index.
So why exactly do need a many to many relationship here ?

Database design for many-to-many relations with restrictions

I have one database with users and one with questions. What I want is to ensure that every user can answer every question only once.
I thought of a database that has all the question id's as columns and all the user id's as records, but this gets very big (and slow I guess) when the questions and the user count grow.
Is there another way to do this with better performance?
You probably want a setup like this.
Questions table (QuestionID Primary Key, QuestionText)
Users table (UserID Primary Key, Username)
Answers table (QuestionID, UserID, Date) -- plus AnswerText/Score/Etc as needed.
In the Answers table the two first columns together form a compound primary key (QuestionID, UserID) and both are foreign keys to Question(QuestionID) and Users(UserID) respectively.
The compound primary key ensures that each combination of QuestionID/UserID is only allowed once. If you want to allow users to answer the same question multiple times you could extend the ¨compound primary key to include the date (it would then be a composite key).
This is a normalized design and should be efficient enough. It's common to use a surrogate primary key (like AnswerID) instead of the compound key and use a unique constraint instead to ensure uniqueness - the use of a surrogate key is often motivated by ease of use, but it's by no means necessary.
Diagram
Below is a diagram of my own table design, quite similar to the correct Answer by jpw. I made up a few column names to give more of a flavor of the nature of the table. I used Postgres data types.
As the last paragraph of that Answer discusses, I would go with a simple single primary key on the response_ ("Answers") table rather than a compound primary key combining fkey_user_ & fkey_question_.
Unrealistic
This diagram fits the problem description in the Question. However this design is not practicable. This scenario is for a single set of questions to be put to the user, only a single survey or quiz ever. In real life in a situation like a school, opinion survey, or focus group, I expect we would put more than one questionnaire to a user. But I will ignore that to directly address the Question as worded.
Also in some scenarios we might have versions of a question, as it is tweaked and revised over time when given on successive quizzes/questionnaires.
Performance
Your Question correctly identifies this problem as a Many-To-Many relationship between a user and a question, where each user can answer many questions and each question may be answered by many users. In relational database design there is only one proper way to represent a many-to-many. That way is to add a third child table, sometimes called a "bridge table", with a foreign key linking to each of the two parent tables.
In a diagram where you draw parent tables vertically higher up the page than child tables, I personally see such a many-to-many diagram as a butterfly or bird pattern where the child bridge table is the body/thorax and the two parents are wings.
Performance is irrelevant in a sense, as this is the only correct design. Fortunately, modern relational databases are optimized for such situations. You should see good performance for many millions of records. Especially if you a sequential number as your primary key values. I tend to use UUID data type instead; their arbitrary bit values may have less efficient index performance when table size reaches the millions (but I don't know the details.

Designing Tables Sql Server

Good Morning,
in the design of a database, I have a table (TabA's call it) that could have relationships with four other tables. In the sense that this table can be connected both with the first of four, and with the second, and the third to the fourth, but could not have links with them; or it could have one (with any of the tables), or two links (always with two of any of them), and so on.
The table TabA I added four fields that refer to the four tables which could be "null" when they do not have any connection.
Wondering is this the kind of optimal design (say the four fields in the TabA) or you can make a better design for this type of situation?
Many thanks for your reply.
dave
In answer to the question and clarification in your comment, the answer is that your design can't be improved in terms of the number of foreign key columns. Having a specific foreign key column for every potential foreign key relationship is a best practice design.
However, the schema design itself seems questionable. I don't have enough information to tell whether the "Distributori_[N]_Livello" tables are a truly hierarchical structure or not. If it is, it is often possible to use a self-referential table for hierarchical structures rather than a set of N tables, as the diagram you linked seems to use. If you are able to refactor your design in such a way, it might be possible to reduce the number of foreign key columns required.
Whether this is possible or not is not for me to say given the data provided.

Why is a primary-foreign key relation required when we can join without it?

If we can get data from two tables without having primary and foreign key relation, then why we need this rule? Can you please explain me clearly, with suitable example?
It's a test database, don't mind the bad structure.
Tables' structure:
**
table - 'test1'
columns - id,lname,fname,dob
no primary and foreign key and also not unique(without any constraints)
**
**table - 'test2'
columns- id,native_city
again, no relations and no constraints**
I can still join these tables with same columns 'id',
so if there's no primary-foreign key, then what is the use of that?
The main reason for primary and foreign keys is to enforce data consistency.
A primary key enforces the consistency of uniqueness of values over one or more columns. If an ID column has a primary key then it is impossible to have two rows with the same ID value. Without that primary key, many rows could have the same ID value and you wouldn't be able to distinguish between them based on the ID value alone.
A foreign key enforces the consistency of data that points elsewhere. It ensures that the data which is pointed to actually exists. In a typical parent-child relationship, a foreign key ensures that every child always points at a parent and that the parent actually exists. Without the foreign key you could have "orphaned" children that point at a parent that doesn't exist.
You need two columns of the same type, one on each table, to JOIN on. Whether they're primary and foreign keys or not doesn't matter.
You don't need a FK, you can join arbitrary columns.
But having a foreign key ensures that the join will actually succeed in finding something.
Foreign key give you certain guarantees that would be extremely difficult and error prone to implement otherwise.
For example, if you don't have a foreign key, you might insert a detail record in the system and just after you checked that the matching master record is present somebody else deletes it. So in order to prevent this you need to lock the master table, when ever you modify the detail table (and vice versa). If you don't need/want that guarantee, screw the FKs.
Depending on your RDBMS a foreign key also might improve performance of select (but also degrades performance of updates, inserts and deletes)
I know its late to post, but I use the site for my own reference and so I wanted to put an answer here for myself to reference in the future too. I hope you (and others) find it helpful.
Lets pretend a bunch of super Einstein experts designed our database. Our super perfect database has 3 tables, and the following relationships defined between them:
TblA 1:M TblB
TblB 1:M TblC
Notice there is no relationship between TblA and TblC
In most scenarios such a simple database is easy to navigate but in commercial databases it is usually impossible to be able to tell at the design stage all the possible uses and combination of uses for data, tables, and even whole databases, especially as systems get built upon and other systems get integrated or switched around or out. This simple fact has spawned a whole industry built on top of databases called Business Intelligence. But I digress...
In the above case, the structure is so simple to understand that its easy to see you can join from TblA, through to B, and through to C and vice versa to get at what you need. It also very vaguely highlights some of the problems with doing it. Now expand this simple chain to 10 or 20 or 50 relationships long. Now all of a sudden you start to envision a need for exactly your scenario. In simple terms, a join from A to C or vice versa or A to F or B to Z or whatever as our system grows.
There are many ways this can indeed be done. The one mentioned above being the most popular, that is driving through all the links. The major problem is that its very slow. And gets progressively slower the more tables you add to the chain, the more those tables grow, and the further you want to go through it.
Solution 1: Look for a common link. It must be there if you taught of a reason to join A to C. If it is not obvious, create a relationship and then join on it. i.e. To join A through B through C there must be some commonality or your join would either produce zero results or a massive number or results (Cartesian product). If you know this commonality, simply add the needed columns to A and C and link them directly.
The rule for relationships is that they simply must have a reason to exist. Nothing more. If you can find a good reason to link from A to C then do it. But you must ensure your reason is not redundant (i.e. its already handled in some other way).
Now a word of warning. There are some pitfalls. But I don't do a good job of explaining them so I will refer you to my source instead of talking about it here. But remember, this is getting into some heavy stuff, so this video about fan and chasm traps is really only a starting point. You can join without relationships. But I advise watching this video first as this goes beyond what most people learn in college and well into the territory of the BI and SAP guys. These guys, while they can program, their day job is to specialise in exactly this kind of thing. How to get massive amounts of data to talk to each other and make sense.
This video is one of the better videos I have come across on the subject. And it's worth looking over some of his other videos. I learned a lot from him.
A primary key is not required. A foreign key is not required either. You can construct a query joining two tables on any column you wish as long as the datatypes either match or are converted to match. No relationship needs to explicitly exist.
To do this you use an outer join:
select tablea.code, tablea.name, tableb.location from tablea left outer join
tableb on tablea.code = tableb.code
join with out relation
SQL join