Relational Database Design (MySQL) - sql

I am starting a new Project for a website based on "Talents" - for example:
Models
Actors
Singers
Dancers
Musicians
The way I propose to do this is that each of these talents will have its own table and include a user_id field to map the record to a specific user.
Any user who signs up on the website can create a profile for one or more of these talents. A talent can have sub-talents, for example an actor can be a tv actor or a theatre actor or a voiceover actor.
So for example I have User A - he is a Model (Catwalk Model) and an Actor (TV actor, Theatre actor, Voiceover actor).
My questions are:
Do I need to create separate tables to store sub-talents of this user?
How should I perform the lookups of the top-level talents for this user? I.e. in the user table should there be fields for the ID of each talent? Or should I perform a lookup in each top-level talent table to see if that user_id exists in there?
Anything else I should be aware of?

before answering your questions... i think that user_id should not be in the Talents table... the main idea here is that "for 1 talent you have many users, and for one user you have multiple talent".. so the relation should be NxN, you'll need an intermediary table
see: many to many
now
Do I need to create seperate tables to store sub-talents of this
user?
if you want to do something dynamic (add or remove subtalents) you can use a recursive relationship. That is a table that is related to itself
TABLE TALENT
-------------
id PK
label
parent_id PK FK (a foreign key to table Talent)
see : recursive associations
How should I perform the lookups of the top-level talents for this
user? I.e. in the user table should
there be fields for the ID of each
talent? Or should I perform a lookup
in each top-level talent table to see
if that user_id exists in there?
if you're using the model before, it could be a nightmare to make queries, because your table Talents is now a TREE that can contain multiple levels.. you might want to restrict yourself to a certain number of levels that you want in your Talent's table i guess two is enough.. that way your queries will be easier
Anything else I should be aware of?
when using recursive relations... the foreign key should alow nulls because the top levels talents wont have a parent_id...
Good luck! :)
EDIT: ok.. i've created the model.. to explain it better
Edit Second model (in the shape of a Christmas tree =D ) Note that the relation between Model & Talent and Actor & Talent is a 1x1 relation, there are different ways to do that (the same link on the comments)
to find if user has talents.. join the three tables on the query =)
hope this helps

You should have one table that has everything about the user (name, dob, any other information about the user). You should have one table that has everything about talents (id, talentName, TopLevelTalentID (to store the "sub" talents put a reference to the "Parent" talent)). You should have a third table for the many to many relationship between users and talents: UserTalents which stores the UserID and the TalentID.
Here's an article that explains how to get to 3rd NF:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx

This is a good question to show some of the differences and similarities between object oriented thinking and relational modelling.
First of all there are no strict rules regarding creating the tables, it depends on the problem space you are trying to model (however, having a field for each of the tables is not necessary at all and constitutes a design fault - mainly because it is inflexible and hard to query).
For example perfectly acceptable design in this case is to have tables
Names (Name, Email, Bio)
Talents (TalentType references TalentTypes, Email references Names)
TalentTypes (TalentType, Description, Parent references TalentTypes)
The above design would allow you to have hierarchical TalentTypes and also to keep track which names have which talents, you would have a single table from which you could get all names (to avoid registering duplicates), you have a single table from which you could get a list of talents and you can add new talent types and/or subtypes easily.
If you really need to store some special fileds on each of the talent types you can still add these as tables that reference general talents table.
As an illustration
Models (Email references Talents, ModelingSalary) -- with a check constraint that talents contain a record with modelling talent type
Do notice that this is only an illustration, it might be sensible to have Salary in the Talents table and not to have tables for specific talents.
If you do end up with tables for specific talents in a sense you can look at Talents table as sort of a class from which a particular talent or sub-talent inherits properties.

ok sorry for the incorrect answer.. this is a different approach.
The way i see it, a user can have multiple occupations (Actor, Model, Musician, etc.) Usually what i do is think in objects first then translate it into tables. In P.O.O. you'd have a class User and subclasses Actor, Model, etc. each one of them could also have subclasses like TvActor, VoiceOverActor... in a DB you'd have a table for each talent and subtalent, all of them share the same primary key (the id of the user) so if the user 4 is and Actor and a Model, you would have one registry on the Actor's Table and another on the Model Table, both with id=4
As you can see, storing is easy.. the complicated part is to retrieve the info. That's because databases dont have the notion of inheritance (i think mysql has but i haven't tried it).. so if you want to now the subclases of the user 4, i see three options:
multiple SELECTs for each talent and subtalent table that you have, asking if their id is 4.
SELECT * FROM Actor WHERE id=4;SELECT * FROM TvActor WHERE id=4;
Make a big query joining all talent and subtalent table on a left join
SELECT * from User LEFT JOIN Actor ON User.id=Actor.id LEFT JOIN TvActor ON User.id=TvActor.id LEFT JOIN... WHERE User.id=4;
create a Talents table in a NxN relation with User to store a reference of each talent and subtalents that the User has, so you wont have to ask all of the tables. You'd have to make a query on the Talents table to find out what tables you'll need to ask on a second query.
Each one of these three options have their pros and cons.. maybe there's another one =)
Good Luck
PS: ahh i found another option here or maybe it's just the second option improved

Related

Get all Many:Many relationships from reference/join-table

I am having difficulty querying for possible relationships in a Many:Many scenario.
I present my schema:
What I do know how to query with this schema is:
All Bands that a given User belongs to.
All Users that belong to a given Band.
What I am trying to do is:
Get all Band Members across all Bands that a given User belongs to.
ie, say I am in 5 bands, I want to know who all of my bandmates are.
My first questions are:
Is there a name for this type of query? Where I am more interested in the joined relationships than what I am joined to (just saying that made me want to put this whole system into a Graph DB :/ )? I'd like to learn proper terminology to help me google for problems down the road.
Is this a terrible idea in RDBMS land in general? I feel like this should be a common use case but I want to know if I'm totally approaching this wrong.
To recap:
I am looking to query the above schema with the expected output being one row per User as Band Members that a given User shares a Band with.
Terminology
Your terminology seems to be correct - "many to many", often written as "many:many" with a colon. Sometimes the middle table (band_members) is called the "bridge table".
You can probably drop band_members.id, since the two foreign keys also make up a composite primary key (and the primary key can actually be defined that way, since normally a User cannot be a member of the same Band twice. The only exception to that is if a User could have more than one role in the same Band).
Solution
On the surface of it, this sounds easy - we can see the relationships of the tables, and one would normally just use an INNER JOIN between them. There are three tables, so that would be two joins.
However, we have to conceptualise the problem correctly first. The problem we have is that the join between Users and Band Members (user ID) is actually to be used for two things:
which User is in what Band
filtering by User
So to do this we need to introduce one table with multiple purposes:
SELECT
Users.first_name, Users.last_name
FROM Users
INNER JOIN Band_Members Band_Members1 ON (Band_Members1.user_id = Users.Id)
INNER JOIN Band_Members Band_Members2 ON (Band_Members1.band_id = Band_Members2.band_id)
WHERE
Band_Members2.user_id = 1
You can see here that I have joined Band_Members twice, and when one does that, one has to alias them differently, so they can be separately referenced. The first instance does the obvious join between the Users table and the bridge table, and the second one does a link between "Users who are in Bands" and "Bands that I am in".
Of course, this solution requires that you know your User ID. If you had wanted to do a similar query but filter based on your name, then you would have to join to another (re-aliased) copy of the User table, so that you can differentiate between the two different purposes: "Users who are in bands" and "your User".

Risks with Foreign Key that references the primary key of the same table

I've inherited a project that has a table called Users that stores a unique ID for each user that has permissions to use an application. That Table has the User_ID. This table also has a column called Mgr_User_ID which tracks who the individual reports to. I'd like to make the User_ID a foreign key of Column Mgr_user_ID so that I can access the managers information easily in the MVC project I'm working on that uses this table.
My question is basically, is this considered poor practice and are there any risks associated with doing something like this?
Self-referencing tables are a pretty common practice. It's how you model any hierarchical relationship.
The biggest "risk", which isn't really a risk, is that if you're trying to walk the entire hierarchy (e.g. you want to create an organizational chart) you will need to write a recursive method (which can sometimes be a brain-twister if you're not used to it).
Also, decide in advance how you're going to identify the top of the hierarchy (i.e. the manager who doesn't report to anyone). In some systems the Mgr_User_ID is set to null and in others they decide to make it the same as the User_ID. I've worked with both approaches and there isn't really an advantage of one over the other, just make sure everyone working in the data understands the rule, and if you have any other relationships like this use the same approach for all of them.
This is commonly called "self-join" of tables. It is really common for an employee table or an organization table. Whenever there is a hierarchy within in the rows of a table(worker -> manager, field office -> head quarter), self join can be used.
This is not a bad practice. In fact, breaking an employee table to a worker table and a manager table would be worse practice. As that would complicate things by managing essentially same objects in two tables.
However, querying a table using self join can be unintuitive and makes for a lengthy code. You can make things easier by using views, though. If you have an employee table that has Mgr_User_ID. Maybe you can write a query like this and make it a view. If you just want worker's name and manager's name in a single table:
CREATE VIEW dbo.DIMUser
AS
SELECT Users.UserId
, Users.UserName
, MUsers.UserId AS ManagerId
, MUsers.UserName AS ManagerName
FROM dbo.Users AS Users
LEFT JOIN dbo.Users AS MUsers
ON Users.Mgr_User_ID = MUsers.Id

Setup Many-to-Many tables that share a common type

I'm preparing a legacy Microsoft SQL Server database so that I can interface with in through an ORM such as Entity Framework, and my question revolves around handling the setup of some of my many-to-many associations that share a common type. Specifically, should a common type be shared among master types or should each master type have its own linked table?
For example, here is a simple example I concocted that shows how the tables of interest are currently setup:
Notice that of there are two types, Teachers and Students, and both can contain zero, one, or many PhoneNumbers. The two tables, Teachers and Students, actually share an association table (PeoplePhoneNumbers). The field FKID is either a TeacherId or a StudentId.
The way I think it ought to be setup is like this:
This way, both the Teachers table and the Students table get its own PhoneNumbers table.
My gut tells me the second way is the proper way. Is this true? What about even if the PhoneNumbers tables contains several fields? My object oriented programmer brain is telling me that it would be wrong to have several identical tables, each containing a dozen or so fields if the only difference between these tables is which master table they are linked to? For example:
Here we have two tables that contain the same information, yet the only difference is that one table is addresses for Teachers and the other is for Students. These feels redundant to me and that they should really be one table -- but then I lose the ability for the database to constrain them (right?) and also make it messier for myself when I try to apply an ORM to this.
Should this type of common type be merged or should it stay separated for each master type?
Update
The answers below have directed me to the following solution, which is based on subclassing tables in the database. One of my original problems was that I had a common table shared among multiple other tables because that entity type was common to both the other tables. The proper way to handle that is to subclass the shared tables and essentially descend them from a common parent AND link the common data type to this new parent. Here's an example (keep in mind my actual database has nothing to do with Teachers and Students, so this example is highly manufactured but the concepts are valid):
Since Teachers and Students both required PhoneNumbers, the solution is to create a superclass, Party, and FK PhoneNumbers to the Party table. Also note that you can still FK tables that only have to do with Teachers or only have to do with Students. In this example I also subclassed Students and PartTimeStudents one more level down and descended them from Learners.
Where this solution is very satisfactory is when I implement it in an ORM, such as Entity Framework.
The queries are easy. I can query all Teachers AND Students with a particular phone number:
var partiesWithPhoneNumber = from p in dbContext.Parties
where p.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select p;
And it's just as easy to do a similar query but only for PhoneNumbers belonging to only Teachers:
var teachersWithPhoneNumber = from t in dbContext.Teachers
where t.Party.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select t;
Teacher and Student are both subclasses of a more general concept (a Person). If you create a Person table that contains the general data that is shared for all people in your database and then create Student and Teacher tables that link to Person and contain any additional details you will find that you have an appropriate point to link in any other tables.
If there is data that is common for all people (such as zero to many phone numbers) then you can link to the Person table. When you have data that is only appropriate for a Student you link it to the Student ID. You gain the additional advantage that Student Instructors are simply a Person with both a Student and Teacher record.
Some ORMs support the concept of subclass tables directly. LLBLGen does so in the way I describe so you can make your data access code work with higher level concepts (Teacher and Student) and the Person table will be managed on your behalf in the low level data access code.
Edit
Some commentary on the current diagram (which may not be relevant in the source domain this was translated from, so a pinch of salt is advised).
Party, Teachers and Learners looks good. Salaries looks good if you add start and end dates for the rate so you can track salary history. Also keep in mind it may make sense to use PartyID (instead of TeacherID) if you end up with multiple entites that have a Salary.
PartyPhoneNumbers looks like you might be able to hang the phone number off of that directly. This would depend on if you expect to change the phone number for multiple people (n:m) at once or if a phone number is owned by each Party independently. (I would expect the latter because you might have a student who is a (real world) child of a teacher and thus they share a phone number. I wouldn't want an update to the student's phone number to impact the teacher, so the join table seems odd here.)
Learners to PaymentHistories seems right, but the Students vs PartTimeStudents difference seems artificial. (It seems like PartTimeStudents is more AttendenceDays which in turn would be a result of a LearnerClasses join).
I think you should look into the supertype/subtype pattern. Add a Party or Person table that has one row for every teacher or student. Then, use the PartyID in the Teacher and Student tables as both the PK and FK back to Party (but name them TeacherID and StudentID). This establishes a "one-to-zero-or-one" relationship between the supertype table and each of the subtype tables.
Note that if you have identity columns in the subtype tables they will need to be removed. When creating those entities going forward you will first have to insert to the supertype and then use that row's ID in either subtype.
To maintain consistency you will also have to renumber one of your subtype tables so its IDs do not conflict with the other's. You can use SET IDENTITY_INSERT ON to create the missing supertype rows after that.
The beauty of all this is that when you have a table that must allow only one type such as Student you can FK to that table, but when you need an FK that can be either--as with your Address table--you FK to the Party table instead.
A final point is to move all the common columns into the supertype table and put only columns in the subtypes that must be different between them.
Your single Phone table now is easily linked to PartyID as well.
For a much more detailed explanation, please see this answer to a similar question.
The problem that you have is an example of a "one-of" relationship. A person is a teacher or a student (or possibly both).
I think the existing structure captures this information best.
The person has a phone number. Then, some people are teachers and some are students. The additional information about each entity is stored in either the teacher or student table. Common information, such as name, is in the phone table.
Splitting the phone numbers into two separate tables is rather confusing. After all, a phone number does not know whether it is for a student or a teacher. In addition, you don't have space for other phone numbers, such as for administrative staff. You also have a challenge for students who may sometimes teach or help teach a class.
Reading your question, it looks like you are asking for a common database schema to your situation. I've seen several in the past, some easier to work with than others.
One option is having a Student_Address table and a Teacher_Address table that both use the same Address table. This way if you have entity specific fields to store, you have that capability. But this can be slightly (although not significantly) harder to query against.
Another option is how you suggested above -- I would probably just add a primary key on the table. However you'd want to add a PersonTypeId field to that table (PersonTypeId which links to a PersonType table). This way you'd know which entity was with each record.
I would not suggest having two PhoneNumber tables. I think you'll find it much easier to maintain with all in the same table. I prefer keeping same entities together, meaning Students are a single entity, Teachers are a single entity, and PhoneNumbers are the same thing.
Good luck.

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ

Two tables or one table?

a quick question in regards to table design..
Let's say I am designing a loan application database.
As it is right now, I will have 2 tables..
Applicant (ApplicantID, FirstName , LastName, SSN, Email... )
and
Co-Applicant(CoApplicantID, FirstName, LastName , SSN, Email.., ApplicantID)
Should I consider having just one table because all the fields are the same.. ??
Person( PersonID, FirstName, LastName , SSN, Email...,ParentID (This determines if it is a co-applicant))
What are the pros and cons of these two approaches ?
I suggest the following data model:
PERSON table
PERSON_ID, pk
LOAN_APPLICATIONS table
APPLICATION_ID, pk
APPLICANT_TYPE_CODE table
APPLICANT_TYPE_CODE, pk
APPLICANT_TYPE_CODE_DESCRIPTION
LOAN_APPLICANTS table
APPLICATION_ID, pk, fk
PERSON_ID, pk, fk
APPLICANT_TYPE_CODE, fk
Person( PersonID, FirstName, LastName , SSN, Email...,ParentID (This determines if it is a co-applicant))
That works if a person will only ever exist in your system as either an applicant or a co-applicant. A person could be a co-applicant to numerous loans and/or an applicant themselves - you don't want to be re-entering their details every time.
This is the benefit of how & why things are normalized. Based on the business rules & inherent reality of usage, the tables are setup so stop redundant data being stored. This is for the following reasons:
Redundant data is a waste of space & resources to support & maintain
The action of duplicating the data means it can also be different in subtle ways - capitalizations, spaces, etc that can all lead to complications to isolate real data
Data incorrectly stored due to oversight when creating the data model
Foresight & Flexibility. Currently there isn't any option other than applicant or co-applicant for an APPLICANT_TYPE_CODE value - it could be a stored without using another table & foreign key. But this setup allows support to add different applicant codes in the future, as needed - without any harm to the data model.
There's no performance benefit when you risk bad data. What you would make, will be eaten by the hacks you have to perform to get things to work.
If the Domain Model determines that both people are applicants and that are related, then they belong in the same table with a self-referential foriegn key.
You may want to read up on database normalization.
I think you should have two tables, but not those two. Have a table "loans" which has foreign keys to an applicants table, or just have records in applicants reference the same table.
The advantages:
- Makes searching easier: If you only have a phone number or a name, you can still search, in a single table and find the corresponding person regardless of he/she being a co-applicant or a main-applicant. Otherwise you'd need to use a UNION construct. (Yet, when you know that you search for a particular type of applicant, you add a filter on the type and you only get such applicants.
- Generally easier to maintain. Say tomorrow you need to add the tweeter id of the applicant ;-), only one place to change.
- Also allows inputing persons with say an "open/undefined" type, and assign then as applicant or otherwise, at a later date.
- Allows to introduce new types of applicants (say a co-latteral warrantor... whatever)...
The disadvantages:
with really huge (multi-million person records), there could be a slight performance gain with a two table approach (depending on index and various other things
SQL queries can be bit more complicated, for example with two separate joins to the the person table, one for the applicant the other for the co-applicant . (Nothing intractable but a bit more complexity.
On the whole, the proper design is in most likelihood the one with a single table. Only possible exception is if over time the info kept for one type of applicant was starting to diverge significantly from the other type(s) of applicant. (And even then we can deal with this situation in different ways, including the introduction of a related table for these extra fields, as it may make more sense; Yes, a two table system again, but one where the extra fields may fit "naturally" together in term of their semantics, usage etc...)
Both of your variants have one disadvantage: any person can be an applicant and co-applicant twice and more. So you should use table Person( PersonID, FirstName, LastName , SSN, Email... ) and table Co-Applicants (PersonID as Applicant, PersonID as CoApplicant)
How about since each Applicant can have a Co-Applicant -- just go with one table in total. So you'd have Applicants, which has an optional foreign key field 'coapplicant' (or similar).
If the fields between applicant and co-applicant are identical, then I would suggest that you put them in the same table and use an "applicant type" field to indicate main or co- applicant. IF there's some information special about the co-applicant (such as relationship to main applicant, extra phone numbers, other stuff) you might want to normalize to a separate table and refer from there back to the co-applicant (by (co-)applicant ID) in the applicant table.
Keep Two table>
1ST User type code ID
In this table u can keep user type ie applicat And Co applicant
2nd table User--> here u can keep all the field with similar coloums with user type code as foregin key.
By this you can easily distingush between two user.
I know - I'm too late on this.... The Loan Application is your primary entity. You will have one or more applicants for the loan. Drop the idea of Person - you're creating something that you can't control. I've been there, done that and got the T-Shirt.