Many-to-Many-to-Many Relation in SQL? - sql

I am quite new to SQL and I am trying to make a few relations and come across the following.
I have 3 tables
- Persons
- PhoneNumbers
- PhoneNumberCategories
Persons and PhoneNumbers is self explanatory I think and PhoneNumberCategories I use to indicate whether a phonenumber is for the office, home, mobile, fax etc.
Now I made a 4th table to make this many-to-many-to-many relation. This table has 4 columns:
ID
PersonID
PhoneNumberID
PhoneNumberCategoryID
And I have set-up 3 relations here. One for all columns except the ID Column. It seems to work, but it just gives me a strange feeling. Can someone tell me if I am on the wright track or if I am going all wrong about this.
The reason I want this is as follows. Of course I need to link a person to a phone number. However, I may have a phone number that is the general office (This would be the row 'OfficeGeneral') number, and therefore linked to me because I work there. I also have a direct office (And this would be the row 'OfficeDirect') number. This is of course also linked to me. The general number however, is linked to all people in the office as 'OfficeGeneral'. Except for the receptionist, here it would be linked as 'OfficeDirect'. And this is the reason I came up with this many-to-many-to-many relation. I can not find much about it on the web. And that is reason enough to doubt if this is the correct way to go about it. Anyway, this is just an example. I would like to make sure that I am flexible and can catch as many exceptions as possible. I am sure once the database is in use that people will come with situations which I have not anticipated. People are good at that I have learned over the years.
Clarification in Response to Comment Below:
Person can have more than 1 PhoneNumber.
PhoneNumber can have more than 1 Person.
Person with PhoneNumber can have more than 1 PhoneNumberCategory (i.e. Private Phone used for both Phone and Fax.)
Several people can be linked to the same PhoneNumber with different PhoneNumberCategories. (i.e. OfficeMain for me is OfficDirect for receptionist.)
Looking forward to hear from you all.

Your model looks fine. Your description, perhaps, is not.
Person and PhoneNumber have a many to many relationship. This relationship, in turn, is 1-many with PhoneNumberCategory -- unless a given phone number could be in more than one category.
I would expect that Person/PhoneNumber would be declared unique in this table, so a person could only use a particular phone number once. However, that may not be how you are viewing the structure.

Related

A different (less redundant) approach to Normalization

Let's say we are to normalize a database into 3rd normal form using the requirement:
I need a movie ticket registry program that can remember customers and
the tickets that they've purchased.
We might end up with a database like this:
ticket
id
movie_name
price
customer
id
first_name
However, when I look at this, for some reason it looks redundant. What if I were to break it up into even smaller pieces, like this:
name
id
name
customer
id
fk_name_id
ticket
id
fk_name_id
price
Would this be a good approach? Is there a name for this approach?
As Jordan says, the point of breaking data out into a separate table is to avoid redundant data.
As you apparently realize, we do NOT want to lay out our tables like this:
WRONG!!!
ticket
customer_name
movie_name
That would mean that the customer_name is repeated for every movie he watches, and the movie name is repeated for every person who watches that movie. Lots and lots of redundant names. If the user has to type them in every time, it's likely that sometimes he mis-spells a name or uses a variation on a name, like we find our table includes "Star Wars", "Star Wars IV", "Star Wars Episode IV", and "Stra Wars", all for the same movie. All sorts of problems.
By breaking the customer and the movie out into separate tables, we eliminate all the redundancy. Great. Celebrate.
But if we take your suggestion of making a "name" table that holds both customer names and movie names, did we eliminate any redundancy?
If a customer has the same name as a movie -- if we happen to have a customer named "Anna Karenina" or "John Carter" or whatever (or maybe someone named their kid "Batman Returns" for that matter) -- are you going to use the same record to store both? If no, then you have not saved any redundancy. You have just forced us to do an extra join every time we read the tables.
If you do use the same record, it's even worse. What if you create a record for customer "Anna Karenina" and you share the id/name record with the movie. Then Anna gets married and now her name is "Anna Smith". If you update the name record, you have not only changed the name of the customer, but also the title of the movie! This would be a very bad thing.
You could, of course, say that if you change the name, that instead of updating in place you create a new record for the new name. But then that defeats half the purpose of breaking the names out to a separate table. Suppose when we originally created the movie record we mistyped the name as "Anna Karina". Now someone points out our mistake and we fix it. But with the "make a new record every time there's a change" logic, we'd have to fix each ticket sale one by one.
I guess you could ask the user if this is a change for just the movie title, just the customer name, or both. But now we've added another level of complexity. And for what? Our program is more complex, our queries are more complex, and our user interface is more complex. In exchange, we get a tiny gain in saving disk space for the rare case where a customer coincidentally has the same name as a movie title.
Not worth it.
Your first approach is not correct. If you think about the problem, there are three entities:
Movie
Customer
Ticket
The connection between Movie and Customer is really the Ticket table, so this is an example of an association or junction table that has additional information.
I wouldn't think of the problem as "there is an entity 'name' and customers and movies both have names". The name is an attribute of other entities, it is not its own entity (at least in this case).
Jay's answer is excellent, and should be chosen as the correct one IMHO.
However I wanted to add: normalization does not mean "storing like data in a separate structure". That is absolutely not the intent of normalization, and this is a mistake made by a lot of inexperienced database modelers, especially when they have a programming (OOP) background.

One column with multiple foreign keys situation

Here's the deal : this is my first ever database project and I am afraid my solution to this problem isn't quite the best. The database keeps track of different "types" of cooperators. Those types are companies, organisations, employed workers, other "persons" ...
All of those have totally different sets of information, but they all have one in common - contact information. I decided to let user enter what kind of contact information he wants to add to any of the cooperator, whether its e-mail, phone, URL, Fax and so on ...
So I created "Contacts" table in which all the contact data for all the cooperators will be put, regardless of what type the cooperator is it.
The TablesList is the table that contains the list of cooperator types (Companies, Organisations, Workers)
Each row in the Contacts table must contain "TableID" number which identifies what type of the cooperator is it (Company/Worker/Organisation...), and must contain "RowID" which identifies what exact company/worker/organisation the contact is about.
The problem that exists it that Contacts table contains foreign keys from 3 other tables in 1 column, which cannot be good. I could remove the relationships and just fill the column with thos ID's without the DBMS knowing about the constrains, but that just doesnt look like a good solution to me, so now I doubt this idea is any good.
What do you suggest ?
Keep in mind that in future there may be some more types of cooperators added if needed (like temp/contract workers, agencies) and Contacts table should be designed to support them too
Thanks in advance !
Btw im using SQL CE and C#
here is the sketch of whats going on :
EDIT
Although it doesnt feel right, I just removed the relationships and it works just fine with my application so far
I suspect your design is over-normalized. You can simplify it by consolidating data into three tables: Companies, Workers and Organisations. It's not going to be normalized but it is much simpler to work with.
Check out http://www.codinghorror.com/blog/2008/07/maybe-normalizing-isnt-normal.html

Designing a contacts database in SQL Server 2005

I think this is a common thing to do... you have a database server, and you want to store customer contact information in it.
You need the person's name, their address, phone, etc.
What are best practices for storing addresses and phones? Assuming OLTP...
Multiple people may have the same phone number (such as wife and husband, or mother and daughter).
Multiple people share a household.
I read this: http://sqlcat.com/sqlcat/b/whitepapers/archive/2008/09/03/best-practices-for-semantic-data-modeling-for-performance-and-scalability.aspx
And that will work fine for the specific model mentioned, but I don't see how this model can be optimized short of denormalizing.
Ex:
Person table = person id, first name, last name, etc...
Address table = address id, address line 1, etc..
Phone table = phone id, phone number, etc...
So if I designed it like that whitepaper suggests, I'd have a personid in my address table and in my phone table. However, since multiple people may share the same address, that isn't feasible. One person may have multiple addresses or even no addresses. So it seems I'll need a person -> address mapping table as well as a mapping table for the phones, otherwise I'll denormalize both of those tables and let there be some duplicates in the unusual case of two people who share the same phone / address.
Anyway, my point in asking this question is because it seems difficult to find a 'best practices' for this type of thing, yet it seems like the type of thing which would come up in just about any type of application or database.
Normalizing addresses and phone numbers in a one-to-many relationship where a Contact may have many related Phone or Address entities makes perfect sense.
However, there is no need to normalize addresses and phone numbers in a many-to-many relationship in a contacts database, because those are not entities you have any interest in working with by themselves, on their own merits as unique entities. In fact, I would say that in your situation, normalizing them to that level is not a good design.
If you were modeling a business in real estate, rentals, or phone service, where you cared about properties and phone numbers even when no person was associated with them, then it could make sense to model them to this level. It is more work for someone to avoid duplicate addresses and phone numbers in the many-to-many design than it is for them to just enter the address again, and there is no real benefit to avoiding these duplicates. Plus, you'll end up with duplicates anyway (at least for addresses, unless you scrub them all real-time using post office routines), so who is going to go through and match up '123 Ascot Wy #5' to '123 Ascot Way Apt 5'? What value is there in that?
The usual reason for normalizing this deep doesn't apply. Let's say that you do create a PhoneNumber table and the PersonPhoneNumber table needed for the many-to-many relationship. You have three people using the same phone number and they are all properly linked to it. Now, one of them calls you up and tells you that he is changing his phone number. Are you sure you want to change the actual PhoneNumber record and update the numbers of the other two folks at the same time? What if they aren't moving with him? Soon you will find that your data is screwed up. You may as well normalize first names to the FirstName table and last names to the LastName table! Then when "Joey" grows up and changes his name to "Joe", all the other Joeys will get an automatic upgrade. But whoops... "Joe" already exists, as does the phone number that you are changing one of the three people above to... what an awful mess.
For another thing, will you use PhoneID as a surrogate key for the phone number? But phone numbers are one of the few things that actually are good as natural keys, they almost even demand being used as natural keys. Then your Phone table becomes meaningless because it doesn't encode any additional information about that phone number. It would just be a list of phone numbers, which are already present in the referencing table. Don't use a Phone table like that. If you want to find out whether two people share the same phone number, you just join on or group by the column! In my mind it approaches silliness to have a layer of abstraction where a phone number is linked to a monotonically-increasing PhoneID.
If you read A Universal Person and Organization Model you will see the perspective that phone numbers and addresses in fact aren't entities that need modeling to the level of a many-to-many relationship--they are more like "intelligent locators" that route messages to recipients. Why on earth would you force three different people's locator (a.k.a. phone number) to be identical? The locator helps to locate the person, not the physical phone that rings. You couldn't care less about the phone or who else might answer--you only care about the fact that once answered, the person of interest could possibly be reached.
Normalize.
Normalize until it hurts.
Normalize again until it is excruciating.
Then tune your queries; design your indices; and measure your performance; if at that point you have no other options, denormalize the bare minimum to meet performance options.
Remember that every denormalization that speeds performance on one query, by its nature degrades performance on (almost) every operation on tat table set. Only keep the denormalizaiton if measurement actually shows a noticeable performance improvement.
Remember that the more you normalize the smaller your indices are; the more index rows sit in cache, and the faster your database performs. Yes, a lot of very small tables get created - they are permanently in cache and thus almost free to access.

Setup Many-to-Many tables that share a common type

I'm preparing a legacy Microsoft SQL Server database so that I can interface with in through an ORM such as Entity Framework, and my question revolves around handling the setup of some of my many-to-many associations that share a common type. Specifically, should a common type be shared among master types or should each master type have its own linked table?
For example, here is a simple example I concocted that shows how the tables of interest are currently setup:
Notice that of there are two types, Teachers and Students, and both can contain zero, one, or many PhoneNumbers. The two tables, Teachers and Students, actually share an association table (PeoplePhoneNumbers). The field FKID is either a TeacherId or a StudentId.
The way I think it ought to be setup is like this:
This way, both the Teachers table and the Students table get its own PhoneNumbers table.
My gut tells me the second way is the proper way. Is this true? What about even if the PhoneNumbers tables contains several fields? My object oriented programmer brain is telling me that it would be wrong to have several identical tables, each containing a dozen or so fields if the only difference between these tables is which master table they are linked to? For example:
Here we have two tables that contain the same information, yet the only difference is that one table is addresses for Teachers and the other is for Students. These feels redundant to me and that they should really be one table -- but then I lose the ability for the database to constrain them (right?) and also make it messier for myself when I try to apply an ORM to this.
Should this type of common type be merged or should it stay separated for each master type?
Update
The answers below have directed me to the following solution, which is based on subclassing tables in the database. One of my original problems was that I had a common table shared among multiple other tables because that entity type was common to both the other tables. The proper way to handle that is to subclass the shared tables and essentially descend them from a common parent AND link the common data type to this new parent. Here's an example (keep in mind my actual database has nothing to do with Teachers and Students, so this example is highly manufactured but the concepts are valid):
Since Teachers and Students both required PhoneNumbers, the solution is to create a superclass, Party, and FK PhoneNumbers to the Party table. Also note that you can still FK tables that only have to do with Teachers or only have to do with Students. In this example I also subclassed Students and PartTimeStudents one more level down and descended them from Learners.
Where this solution is very satisfactory is when I implement it in an ORM, such as Entity Framework.
The queries are easy. I can query all Teachers AND Students with a particular phone number:
var partiesWithPhoneNumber = from p in dbContext.Parties
where p.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select p;
And it's just as easy to do a similar query but only for PhoneNumbers belonging to only Teachers:
var teachersWithPhoneNumber = from t in dbContext.Teachers
where t.Party.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select t;
Teacher and Student are both subclasses of a more general concept (a Person). If you create a Person table that contains the general data that is shared for all people in your database and then create Student and Teacher tables that link to Person and contain any additional details you will find that you have an appropriate point to link in any other tables.
If there is data that is common for all people (such as zero to many phone numbers) then you can link to the Person table. When you have data that is only appropriate for a Student you link it to the Student ID. You gain the additional advantage that Student Instructors are simply a Person with both a Student and Teacher record.
Some ORMs support the concept of subclass tables directly. LLBLGen does so in the way I describe so you can make your data access code work with higher level concepts (Teacher and Student) and the Person table will be managed on your behalf in the low level data access code.
Edit
Some commentary on the current diagram (which may not be relevant in the source domain this was translated from, so a pinch of salt is advised).
Party, Teachers and Learners looks good. Salaries looks good if you add start and end dates for the rate so you can track salary history. Also keep in mind it may make sense to use PartyID (instead of TeacherID) if you end up with multiple entites that have a Salary.
PartyPhoneNumbers looks like you might be able to hang the phone number off of that directly. This would depend on if you expect to change the phone number for multiple people (n:m) at once or if a phone number is owned by each Party independently. (I would expect the latter because you might have a student who is a (real world) child of a teacher and thus they share a phone number. I wouldn't want an update to the student's phone number to impact the teacher, so the join table seems odd here.)
Learners to PaymentHistories seems right, but the Students vs PartTimeStudents difference seems artificial. (It seems like PartTimeStudents is more AttendenceDays which in turn would be a result of a LearnerClasses join).
I think you should look into the supertype/subtype pattern. Add a Party or Person table that has one row for every teacher or student. Then, use the PartyID in the Teacher and Student tables as both the PK and FK back to Party (but name them TeacherID and StudentID). This establishes a "one-to-zero-or-one" relationship between the supertype table and each of the subtype tables.
Note that if you have identity columns in the subtype tables they will need to be removed. When creating those entities going forward you will first have to insert to the supertype and then use that row's ID in either subtype.
To maintain consistency you will also have to renumber one of your subtype tables so its IDs do not conflict with the other's. You can use SET IDENTITY_INSERT ON to create the missing supertype rows after that.
The beauty of all this is that when you have a table that must allow only one type such as Student you can FK to that table, but when you need an FK that can be either--as with your Address table--you FK to the Party table instead.
A final point is to move all the common columns into the supertype table and put only columns in the subtypes that must be different between them.
Your single Phone table now is easily linked to PartyID as well.
For a much more detailed explanation, please see this answer to a similar question.
The problem that you have is an example of a "one-of" relationship. A person is a teacher or a student (or possibly both).
I think the existing structure captures this information best.
The person has a phone number. Then, some people are teachers and some are students. The additional information about each entity is stored in either the teacher or student table. Common information, such as name, is in the phone table.
Splitting the phone numbers into two separate tables is rather confusing. After all, a phone number does not know whether it is for a student or a teacher. In addition, you don't have space for other phone numbers, such as for administrative staff. You also have a challenge for students who may sometimes teach or help teach a class.
Reading your question, it looks like you are asking for a common database schema to your situation. I've seen several in the past, some easier to work with than others.
One option is having a Student_Address table and a Teacher_Address table that both use the same Address table. This way if you have entity specific fields to store, you have that capability. But this can be slightly (although not significantly) harder to query against.
Another option is how you suggested above -- I would probably just add a primary key on the table. However you'd want to add a PersonTypeId field to that table (PersonTypeId which links to a PersonType table). This way you'd know which entity was with each record.
I would not suggest having two PhoneNumber tables. I think you'll find it much easier to maintain with all in the same table. I prefer keeping same entities together, meaning Students are a single entity, Teachers are a single entity, and PhoneNumbers are the same thing.
Good luck.

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ