Db design - table dependency and naming conventions - sql

I'm about to design a db for a new project and I'm kinda stuck on some "concept" stuff.
My initial question is very similar to this one.
Relational table naming convention
That is:
"If I got a table "user" and then I got products that only the user
will have, should the table be named "user-product" or just "product"?
This is a one to many relationship."
In the above thread I found the answer by PerformanceDBA very useful and well written,
but I'm not sure about some points. Quoting part of the answer:
"It doesn't matter if user::product is 1::n. What matters is whether product is a separate entity and whether it is Independent, ie. it can exist on its own. Therefore product, not user_product. And if product exists only in the context of an user, ie. it is Dependent,
then user_product."
This is a very interesting point, but generates another question:
what exactly are the definitions of Independent and Dependent table?
Example 1, we have two tables:
The table User
Id
Username
FullName
The 1::n table Message, representing a collection of messages sent by the users
UserId (FK to User.Id)
Text
The Message table is dependent from the User table or not?
The question I'm asking to myself here, is: "Would the message entity exist without the user?" but I'm not sure about the answer, because it would be "the message would exist but would be anonymous." is this enough to make the Message table dependent from the User table (so i should name the table "UserMessage")?
Example 2, we have two tables:
The table User
Id
Username
FullName
The 1::1 table Profile, representing a user profile
UserId (FK to User.Id)
First Name
Last Name
Gender
Same question, is the table Profile dependent by the User table? I think it is, because a profile without a user would not really make sense.
I'm not sure though, so how can I decide safely if a table is dependent by another or not?

I think you may really have 3 entities to consider. User, product and user_product. Test relationships by describing them with a verb. The relationship between a user and a product is most likely a many-to-many (a user can order many products, and a product can be ordered by many users). This indicates that a composite table between them that takes the primary keys of both tables is needed (and maybe attributes only if they describe a fact about the user/product combination). user_product is what links a user with his products (and a product with who ordered it) and is thus dependent.
That said, in your examples the message and profile tables are dependent, since they cannot exist without a user (their primary key). Use user - user_message and user - user_profile.
Another example of an independent table would be a lookup table (code/description table).
To answer your last question, an entity is considered dependent if its primary key must exist in another entity before it can exist i.e you can't have a profile without a user so it is dependent.

Related

Database design relations in User and Profile

I'm designing a web application for a school. So far, I'm stuck with the database which has these tables:
users
id
username
password
profile
user_id (FK)
name
last_name
sex
group_id (FK)
(other basic information)
... And other tables irrelevant now, like events, comitees, groups and so on.
So, the users table stores basic information about the login, and the profiles table stores all the personal data about the user.
Now, the *group_id* column in the profile table has a foreign key that references the ID column of the group in which the user is currently enrolled, in the groups table. A user can only be enrolled in one group at once, so there's no need for any additional tables.
The thing is that it doesn't make much sense to me declaring a relation like group HAS MANY profiles. Instead, the relation should be group HAS MANY users, but then, I would have to put a *group_id* column on the users table, which doesn't really fit in, since the users table only stores auth information.
On the other side, I would like to list all the users enrolled in a group using an ORM and getting the a users collection and not profiles. The way I see it, is that the users table is like the 'parent' and the profiles table extends the users table.
The same problem would occur when setting attendances for events. Should I reference the profile as a foreign key in the events_attendance table? Or should I reference the user ID?
Of course both solutions could be implemented and work, but which of them is the best choice?
I have dug a little and found that both solutions would comply with 3NF, so in theory, would be correct, but I'm having a hard time designing the right way my database.
This is a question of your own conventions. You need to decide what is the main entity, right after that you can easiy find a proper solution. Both ways are good, but if you think of User as of the main entity while Profile is a property then you should put GroupId into User, otherwise, if you mean User and Profile as a single entity, you can leave GroupId in Profile, and by this you're not saying group HAS MANY profiles but group HAS MANY users.
By setting a proper one-to-one relation (User-Profile) you can force your data integrity good enough.

Simple database table design

I'm trying to design a database structure using best practice but I can't get my head around something which I'm sure is fundamental. The DB is for the users(100+) to subscribe to which magazines(100+) they read.
I have a table for the usernames, user info and magazine titles, but I'm unsure where to list the magazines that each user follows. Do I add a column in the user table and link it to the magazine table or would each user be setup with their own "follow" table that lists the magazine there? I'm getting myself confused I think so any help would be great.
Regards
Ryan
What you're struggling with is called a many-to-many relationship.
To solve this problem, you need a third table--perhaps called user_magazines. This third table should two key fields, one from the user table and the other from the magazine table. For example, user_id column and a magazine_id column. This is called a compound key. With both of these columns, you are now able to discern which books have been read by a whichever user.
This is best understood visually:
In the picture above you can see that the third table (the middle table, stock_category) enables us to know what stock item belongs to which categories.
First of all, you must understand a many-to-many relationship, like take your example of users and magazines. First understand the scenario : A single user can follow many magazines, and a single magazine can be followed by many users, so there exists a many-to-many relationship between users and magazines.
Whenever there exists many-to-many relationship between two entities, we have to introduce a third entity between them which is called an associative entity!
so you have to introduce a third entity named as per your choice and it will be containing information about which user is following which magazine
you can go to http://sqlrelationship.com/many-to-many-relationship/ for better understanding using diagrams
You should have a users table, with an auto-incrementing primary key, username, and anything else you want to store about that user.
Next, a magazines table which contains another auto-incrementing primary key, the name of the mag and anything else you need to store about that magazine.
Finally, a subscriptions table. this should have an auto-incrementing primary key (actually that's not really necessary on this table but personally I would add it), a user_ID column and a magazine_ID column.
To add a subscription, just add a new record to the subscription table containing the ID of the user and the ID of the relevant magazine. This allows for users to subscribe to multiple magazines.
If you want to get fancy you can add referential integrity constraints to the subscriptions table - this tells the database management system that a particular column is a reference to another table, and can specify what to do upon modifying it (for example you could have the DBMS automatically delete subscriptions owned by a particular user if that user is deleted)
You definitely do NOT want to add a column to the user table and have it refer to the magazine table. Users would only be able to follow or subscribe to one magazine which doesn't reflect the real world.
You'll want to have a join table that has a userId and a magazineId. For each magazine that a user subscribes to there will be one entry in the join table.
I'm inferring a little bit about your table structure but if you had:
User (id, login)
Magazine (id, name)
User_Magazine (userId, magazineId)
Perhaps this last table should be called subscription because there may be other information like the subscription end date which you'd want to track and that is really what it is representing in the real world.
You'd be able to put an entry into the User_Magazine table for every subscription.
And if you wanted to see all the magazines a user with the login jdoe had you'd do:
SELECT name
FROM User, Magazine, User_Magazine
WHERE login = 'jdoe'
AND User.id = User_Magazine.userId
AND Magazine.id = User_Magazine.magazineId
You should create a separate table called UserMagazineSubs. Make the UserID + MagazineTile ID as a composite key.
This table will capture all User and Magazine relationship details.
A User_To_Magazine table, that has two columns - UserId and MagazineId, and the key is composite containing both columns

Link table(s) or redundant columns, SQL optimisation

I have two tables, Users and People, both of which share a common attribute, email address, of which they should be allowed to have many email addresses.
I can see three options myself:
One link table with redundant columns:
Users [id,email_id] and People [id,email_id]
EmailAddress [id,user_id,person_id,email_id]
Emails [id,address,type]
Two link tables without redundancies:
Users [id,email_id] and People [id,email_id]
PersonEmail [id,person_id,email_id]
UserEmail [id,user_id,email_id]
Emails [id,address,type]
No link tables with redundant columns:
Users [id] and People [id]
Emails [id,address,type,user_id,person_id]
Does anyone have any idea what would be the best option, or if there is any other ways? Also, if anyone knows how to implement or feel it is better to have link tables without the generated id column please also specify.
Update: a User has many People, a person belongs to a User
First off, the relationship between user and e-mail is 1:N, not M:N, so in any case you don't need the "link" table EmailAddress.
You need to decide which of these possibilities is true for your application:
User is always person.
Person is always user.
There can be a person that is not user and there can be a user that is not person.
Option 1:
Assuming the option (1) is the correct one, the logical model should look like this:
The symbol between Person and User is "category", which at the level of the physical database can be implemented either:
as a "1 to 0 or 1" relationship between separate tables Person and User,
or a single table containing both person and user fields, where user fields are NULL for persons that are not also users.
If you have...
many user-specific fields,
there are user-specific foreign keys,
new kinds of persons could be added in the future
and you don't need to squeeze-out every last drop of performance,
...choose the implementation strategy with two tables.
If there are:
relatively few user-specific fields,
there are no user-specific relationships,
low "evolvability" is acceptable
and performance is of high importance,
...choose the implementation strategy with the single table.
Similar analysis can be done for each of the remaining possibilities...
Option 2:
Option 3:
If the two entities are conceptually related, then it might make sense to have one table. But if they are two different concepts, then in my experience it is best to have separate tables in order to avoid future confusion. And you're not going to take a big hit anywhere by doing so.
Isn't the User a Person (People)?
That would solve the redundant field issue right away.
----------
| Person |
----------
|
--------
| User |
--------
The User should have the single e-mail field, or mantain the relation with the e-mails table, since Person is an abstract concept not related to any application.
I would say start thinking about (re)modeling your schema, so you won't have problems like this.
Read the Multiple Table Inheritance in Rails guide, that should get you started.

Relational Database Design (MySQL)

I am starting a new Project for a website based on "Talents" - for example:
Models
Actors
Singers
Dancers
Musicians
The way I propose to do this is that each of these talents will have its own table and include a user_id field to map the record to a specific user.
Any user who signs up on the website can create a profile for one or more of these talents. A talent can have sub-talents, for example an actor can be a tv actor or a theatre actor or a voiceover actor.
So for example I have User A - he is a Model (Catwalk Model) and an Actor (TV actor, Theatre actor, Voiceover actor).
My questions are:
Do I need to create separate tables to store sub-talents of this user?
How should I perform the lookups of the top-level talents for this user? I.e. in the user table should there be fields for the ID of each talent? Or should I perform a lookup in each top-level talent table to see if that user_id exists in there?
Anything else I should be aware of?
before answering your questions... i think that user_id should not be in the Talents table... the main idea here is that "for 1 talent you have many users, and for one user you have multiple talent".. so the relation should be NxN, you'll need an intermediary table
see: many to many
now
Do I need to create seperate tables to store sub-talents of this
user?
if you want to do something dynamic (add or remove subtalents) you can use a recursive relationship. That is a table that is related to itself
TABLE TALENT
-------------
id PK
label
parent_id PK FK (a foreign key to table Talent)
see : recursive associations
How should I perform the lookups of the top-level talents for this
user? I.e. in the user table should
there be fields for the ID of each
talent? Or should I perform a lookup
in each top-level talent table to see
if that user_id exists in there?
if you're using the model before, it could be a nightmare to make queries, because your table Talents is now a TREE that can contain multiple levels.. you might want to restrict yourself to a certain number of levels that you want in your Talent's table i guess two is enough.. that way your queries will be easier
Anything else I should be aware of?
when using recursive relations... the foreign key should alow nulls because the top levels talents wont have a parent_id...
Good luck! :)
EDIT: ok.. i've created the model.. to explain it better
Edit Second model (in the shape of a Christmas tree =D ) Note that the relation between Model & Talent and Actor & Talent is a 1x1 relation, there are different ways to do that (the same link on the comments)
to find if user has talents.. join the three tables on the query =)
hope this helps
You should have one table that has everything about the user (name, dob, any other information about the user). You should have one table that has everything about talents (id, talentName, TopLevelTalentID (to store the "sub" talents put a reference to the "Parent" talent)). You should have a third table for the many to many relationship between users and talents: UserTalents which stores the UserID and the TalentID.
Here's an article that explains how to get to 3rd NF:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
This is a good question to show some of the differences and similarities between object oriented thinking and relational modelling.
First of all there are no strict rules regarding creating the tables, it depends on the problem space you are trying to model (however, having a field for each of the tables is not necessary at all and constitutes a design fault - mainly because it is inflexible and hard to query).
For example perfectly acceptable design in this case is to have tables
Names (Name, Email, Bio)
Talents (TalentType references TalentTypes, Email references Names)
TalentTypes (TalentType, Description, Parent references TalentTypes)
The above design would allow you to have hierarchical TalentTypes and also to keep track which names have which talents, you would have a single table from which you could get all names (to avoid registering duplicates), you have a single table from which you could get a list of talents and you can add new talent types and/or subtypes easily.
If you really need to store some special fileds on each of the talent types you can still add these as tables that reference general talents table.
As an illustration
Models (Email references Talents, ModelingSalary) -- with a check constraint that talents contain a record with modelling talent type
Do notice that this is only an illustration, it might be sensible to have Salary in the Talents table and not to have tables for specific talents.
If you do end up with tables for specific talents in a sense you can look at Talents table as sort of a class from which a particular talent or sub-talent inherits properties.
ok sorry for the incorrect answer.. this is a different approach.
The way i see it, a user can have multiple occupations (Actor, Model, Musician, etc.) Usually what i do is think in objects first then translate it into tables. In P.O.O. you'd have a class User and subclasses Actor, Model, etc. each one of them could also have subclasses like TvActor, VoiceOverActor... in a DB you'd have a table for each talent and subtalent, all of them share the same primary key (the id of the user) so if the user 4 is and Actor and a Model, you would have one registry on the Actor's Table and another on the Model Table, both with id=4
As you can see, storing is easy.. the complicated part is to retrieve the info. That's because databases dont have the notion of inheritance (i think mysql has but i haven't tried it).. so if you want to now the subclases of the user 4, i see three options:
multiple SELECTs for each talent and subtalent table that you have, asking if their id is 4.
SELECT * FROM Actor WHERE id=4;SELECT * FROM TvActor WHERE id=4;
Make a big query joining all talent and subtalent table on a left join
SELECT * from User LEFT JOIN Actor ON User.id=Actor.id LEFT JOIN TvActor ON User.id=TvActor.id LEFT JOIN... WHERE User.id=4;
create a Talents table in a NxN relation with User to store a reference of each talent and subtalents that the User has, so you wont have to ask all of the tables. You'd have to make a query on the Talents table to find out what tables you'll need to ask on a second query.
Each one of these three options have their pros and cons.. maybe there's another one =)
Good Luck
PS: ahh i found another option here or maybe it's just the second option improved

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ