How to model the association of CDs with Genres? - sql

I'm creating a database design for a web site that sells music CDs, and I've hit a brick wall because I can't get my head around whether the relationship between an CD and it's Genre's is one-to-many or many-to-many
Each CD can have multiple genres. For example "Ministry of Sound Dubstep Anthems" could have genres such as Dubstep, Dance and Electro.
The more I think about it though, each genre can also be linked to a number of CD's.
The way I have my Database at the moment is:
cd_table
PK ID
FK Genre
Description
...
genre_table
PK ID
Genre
If it was a many-to-many am I right in thinking I will need a join table such as:
cd_genre
CD_ID
Genre_ID
And then have them both acting as a Primary Key? How would I link the cd_genre to the cd_table? Or do I just remove the FK in cd_table, and then do a join when querying the CD's?

You're correct - the relationship between CD and Genre is many-to-many, for exactly the reasons you provide.
And yes, a cross-reference/join table like you describe is what you need.
Interestingly, depending on other constraints, you might want to be able to put genre in other places, too. For instance, song: an artist like Weird Al may put several different genres of songs on one of his albums, and the album itself doesn't necessarily have any single 'main' genre (but often have some 'theme').

I think you answered your own question. CD's and Genres are many-to-many, and you accomplish it with that two column table with the two foreign keys, usually referred to as a "join table." It's generally not necessary to think of that table as having a primary key, but rationally the two columns together are a composite primary key.

To determine multiplicity, ask the same question starting with the different relations:
Can a CD belong to many Genres?
Can a Genre contain more than one CD?
Of course, this is may be the wrong database model. Instead, Song<->Genre and Song<->CD, is the higher Normalized Form. It is a higher NF because CD<->Genre can be composed from CD<->Song<->Genre but CD<->Genre cannot answer the same queries about Songs on the CD (unless a CD can be at most one Genre).
On the other hand, one might only care about CDs and not Songs .. in which case modeling Songs irrelevant or even harmful excessive data. Likewise, perfect classification might overcomplicate practical classification - be aware of the tradeoffs between granularity of information and the use to the problem domain.

Related

How to resolve multiple values in one column in a table?

I have been tasked with designing a database from a scenario. However, while designing my solution I found I would have multiple values in one cell. We were told this is a repeating group and should be avoided in a database.
I get the repeating groups when I want to link the songs on an album to the album they are found on. For instance, there can be one or many songs on an album. However, a song could be on one or many albums (Dean Martin - Silver Bells could be on a Christmas Hits album and a Dean Martin album).
If I reference each song to its album I would use the AlbumId as a foreign key. However, if it was in multiple albums then I would have multiple AlbumId's as the foreign key. This gives me a repeating group as there will be multiple Ids in the same cell.
If this was reversed by storing all the album's songs on the Album entity I would have the same issue as the SongId would be a foreign key and each album will have multiple SongId's in the same cell.
The design I have includes these following entities:
song
and
album
The song entity will contain the following attribute types:
SongId (PK)
SongDuration
AlbumId (FK)
AudioFileSize
AudioFile
SongTitle
SongLyrics
SongNotes
The album entity will contain the following attribute types:
AlbumId (PK)
AlbumTitle
NumOfTracks
ReleaseDate
ProductionLabel (FK) //Goes to another table that has no issues.
AlbumCoverImage
CoverImageStory
AmountOfCDs
I am quite new to database design and I feel I have grasped it well. However, I am puzzled on how to solve this.
If any more information on the database is required I will happily provide it.
Any help is greatly appreciated.
Best regards,
Steve.
You have a many-to-many relationship. So, you can use a junction/association table:
create table songAlbums (
. . .,
songId int references songs(songId),
albumId int references albums(albumId),
. . .
)
You might want to include other information, such as the position on the album. Such a table could have a composite primary key (songId, albumId) or a synthetic primary key (generated always as identity).
We were told this is a repeating group and should be avoided in a database.
Not "avoided", but modelled, solved. Each column must be Atomic:
1NF: no multiple or compound values
2NF: no repeating groups
The simple solution is to model the multiple values in a subordinate table. In this case, with two Identifers Song and Album, an Associative table.
RecordID
That is the first and foremost error. It cripples both the modelling exercise, and the resulting "database".
The Relational Model requires:
the Key must be "made up from the data"
(ie. not an manufactured ID; GUID; UUID; etc, none of which are data)
each row (as distinct from a record with a RecordId) in each table must be unique
Data uniqueness cannot be obtained from a ID; GUID; UUID; etc. Further, the stupid thing is always an additional column and index.
That needs to be corrected.
Third, you have some columns in the wrong tables.
Album Data Model
Modelling is substantially cheaper than trying SQL. See if this satisfies the requirement.
Rather than going back-and-forth bringing you up to speed with Relational Databases, I have fixed up all issues. Eg. you have multiple CDs per Album, but that was not handled or requested, it must be resolved.
Also available in PDF.
It is rendered in IDEF1X, the Standard for modelling a Relational Database. You may find the short Introduction to IDEF1X helpful.

Why no many-to-many relationships?

I am learning about databases and SQL for the first time. In the text I'm reading (Oracle 11g: SQL by Joan Casteel), it says that "many-to-many relationships can't exist in a relational database." I understand that we are to avoid them, and I understand how to create a bridging entity to eliminate them, but I am trying to fully understand the statement "can't exist."
Is it actually physically impossible to have a many-to-many relationship represented?
Or is it just very inefficient since it leads to a lot of data duplication?
It seems to me to be the latter case, and the bridging entity minimizes the duplicated data. But maybe I'm missing something? I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship, either in the text or anywhere else I've searched. I've been searching all day and only finding the same information repeated: "don't do it, and use a bridging entity instead." But I like to ask why. :-)
Thanks!
Think about a simple relationship like the one between Authors and Books. An author can write many books. A book could have many authors. Now, without a bridge table to resolve the many-to-many relationship, what would the alternative be? You'd have to add multiple Author_ID columns to the Books table, one for each author. But how many do you add? 2? 3? 10? However many you choose, you'll probably end up with a lot of sparse rows where many of the Author_ID values are NULL and there's a good chance that you'll run across a case where you need "just one more." So then you're either constantly modifying the schema to try to accommodate or you're imposing some artificial restriction ("no book can have more than 3 authors") to force things to fit.
A true many-to-many relationship involving two tables is impossible to create in a relational database. I believe that is what they refer to when they say that it can't exist. In order to implement a many to many you need an intermediary table with basically 3 fields, an ID, an id attached to the first table and an id atached to the second table.
The reason for not wanting many-to-many relationships, is like you said they are incredibly inefficient and managing all the records tied to each side of the relationship can be tough, for instance if you delete a record on one side what happens to the records in the relational table and the table on the other side? Cascading deletes is a slippery slope, at least in my opinion.
Normally (pun intended) you would use a link table to establish many-to-many
Like described by Joe Stefanelli, let's say you had Authors and Books
SELECT * from Author
SELECT * from Books
you would create a JOIN table called AuthorBooks
Then,
SELECT *
FROM Author a
JOIN AuthorBooks ab
on a.AuthorId = ab.AuthorId
JOIN Books b
on ab.BookId = b.BookId
hope that helps.
it says that "many-to-many relationships can't exist in a relational database."
I suspect the author is just being controversial. Technically, in the SQL language, there is no means to explicitly declare a M-M relationship. It is an emergent result of declaring multiple 1-M relations to the table. However, it is a common approach to achieve the result of a M-M relationship and it is absolutely used frequently in databases designed on relational database management systems.
I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship,
They should be used where they are appropriate to be used would be a more accurate way of saying this. There are times, such as the books and authors example given by Joe Stafanelli, where any other solution would be inefficient and introduce other data integrity problems. However, M-M relationships are more complicated to use. They add more work on the part of the GUI designer. Thus, they should only be used where it makes sense to use them. If you are highly confident that one entity should never be associated with more than one of some other entity, then by all means restrict it to a 1-M. For example, if you were tracking the status of a shipment, each shipment can have only a single status at any given time. It would over complicate the design and not make logical sense to allow a shipment to have multiple statuses.
Of course they can (and do) exist. That sounds to me like a soapbox statement. They are required for a great many business applications.
Done properly, they are not inefficient and do not have duplicate data either.
Take a look at FaceBook. How many many-to-many relationships exist between friends and friends of friends? That is a well-defined business need.
The statement that "many-to-many relationships can't exist in a relational database." is patently false.
Many-to-many relationships are in fact very useful, and also common. For example, consider a contact management system which allows you to put people in groups. One person can be in many groups, and each group can have many members.
Representation of these relations requires an extra table--perhaps that's what your book is really saying? In the example I just gave, you'd have a Person table (id, name, address etc) and a Group table (id, group name, etc). Neither contains information about who's in which group; to do that you have a third table (call it PersonGroup) in which each record contains a Person ID and a Group ID--that record represents the relation between the person and the group.
Need to find the members of a group? Your query might look like this (for the group with ID=1):
SELECT Person.firstName, Person.lastName
FROM Person JOIN PersonGroup JOIN Group
ON (PersonGroup.GroupID = 1 AND PersonGroup.PersonID = Person.ID);
It is correct. The Many to Many relationship is broken down into several One to Many relationships. So essentially, NO many to many relationship exists!
Well, of course M-M relationship does exist in relational databases and they also have capability of handling at some level through bridging tables, however as the degree of M-M relationship increases it also increases complexity which results in slow R-W cycles and latency.
It is recommended to avoid such complex M-M relationships in a Relational Database. Graph Databases are the best alternative and good at handling Many to Many relationship between objects and that's why social networking sites uses Graph databases for handling M-M relationship between User and Friends, Users and Events etc.
Let's invent a fictional relationship (many to many relationship) between books and sales table. Suppose you are buying books and for each book you buy needs to generate an invoice number for that book. Suppose also that the invoice number for a book can represent multiple sales to the same customer (not in reality but let's assume). We have a many to many relationship between books and sales entities.
Now if that's the case, how can we get information about only 1 book given that we have purchased 3 books since all books would in theory have the same invoice number? That introduces the main problem of using a many to many relationship I guess. Now if we add a bridging entity between Books and sales such that each book sold have only 1 invoice number, no matter how many books are purchases we can still correctly identify each books.
In a many-to-many relationship there is obvious redundancy as well as insert, update and delete anomaly which should be eliminated by converting it to 2 one-to-many relationship via a bridge table.
M:N relationships should not exist in database design. They are extremely inefficient and do not make for functional databases. Two tables (entities) with a many-to-many relationship (aircraft, airport; teacher, student) cannot both be children of each other, there would be no where to put foreign keys without an intersecting table. aircraft-> flight <- airport; teacher <- class -> student.
An intersection table provides a place for an entity that is dependent on two other tables, for example, a grade needs both a class and a student, a flight needs both an aircraft and an airport. Many-to-many relationships conceal data. Intersection tables reveal this data and create one-to-many relationships that can be more easily understood and worked with. So, the question arises, what table should the flight be in--aircraft or airport. Neither, they should be foreign keys in the intersection table, Flight.

Relational Database Design (MySQL)

I am starting a new Project for a website based on "Talents" - for example:
Models
Actors
Singers
Dancers
Musicians
The way I propose to do this is that each of these talents will have its own table and include a user_id field to map the record to a specific user.
Any user who signs up on the website can create a profile for one or more of these talents. A talent can have sub-talents, for example an actor can be a tv actor or a theatre actor or a voiceover actor.
So for example I have User A - he is a Model (Catwalk Model) and an Actor (TV actor, Theatre actor, Voiceover actor).
My questions are:
Do I need to create separate tables to store sub-talents of this user?
How should I perform the lookups of the top-level talents for this user? I.e. in the user table should there be fields for the ID of each talent? Or should I perform a lookup in each top-level talent table to see if that user_id exists in there?
Anything else I should be aware of?
before answering your questions... i think that user_id should not be in the Talents table... the main idea here is that "for 1 talent you have many users, and for one user you have multiple talent".. so the relation should be NxN, you'll need an intermediary table
see: many to many
now
Do I need to create seperate tables to store sub-talents of this
user?
if you want to do something dynamic (add or remove subtalents) you can use a recursive relationship. That is a table that is related to itself
TABLE TALENT
-------------
id PK
label
parent_id PK FK (a foreign key to table Talent)
see : recursive associations
How should I perform the lookups of the top-level talents for this
user? I.e. in the user table should
there be fields for the ID of each
talent? Or should I perform a lookup
in each top-level talent table to see
if that user_id exists in there?
if you're using the model before, it could be a nightmare to make queries, because your table Talents is now a TREE that can contain multiple levels.. you might want to restrict yourself to a certain number of levels that you want in your Talent's table i guess two is enough.. that way your queries will be easier
Anything else I should be aware of?
when using recursive relations... the foreign key should alow nulls because the top levels talents wont have a parent_id...
Good luck! :)
EDIT: ok.. i've created the model.. to explain it better
Edit Second model (in the shape of a Christmas tree =D ) Note that the relation between Model & Talent and Actor & Talent is a 1x1 relation, there are different ways to do that (the same link on the comments)
to find if user has talents.. join the three tables on the query =)
hope this helps
You should have one table that has everything about the user (name, dob, any other information about the user). You should have one table that has everything about talents (id, talentName, TopLevelTalentID (to store the "sub" talents put a reference to the "Parent" talent)). You should have a third table for the many to many relationship between users and talents: UserTalents which stores the UserID and the TalentID.
Here's an article that explains how to get to 3rd NF:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
This is a good question to show some of the differences and similarities between object oriented thinking and relational modelling.
First of all there are no strict rules regarding creating the tables, it depends on the problem space you are trying to model (however, having a field for each of the tables is not necessary at all and constitutes a design fault - mainly because it is inflexible and hard to query).
For example perfectly acceptable design in this case is to have tables
Names (Name, Email, Bio)
Talents (TalentType references TalentTypes, Email references Names)
TalentTypes (TalentType, Description, Parent references TalentTypes)
The above design would allow you to have hierarchical TalentTypes and also to keep track which names have which talents, you would have a single table from which you could get all names (to avoid registering duplicates), you have a single table from which you could get a list of talents and you can add new talent types and/or subtypes easily.
If you really need to store some special fileds on each of the talent types you can still add these as tables that reference general talents table.
As an illustration
Models (Email references Talents, ModelingSalary) -- with a check constraint that talents contain a record with modelling talent type
Do notice that this is only an illustration, it might be sensible to have Salary in the Talents table and not to have tables for specific talents.
If you do end up with tables for specific talents in a sense you can look at Talents table as sort of a class from which a particular talent or sub-talent inherits properties.
ok sorry for the incorrect answer.. this is a different approach.
The way i see it, a user can have multiple occupations (Actor, Model, Musician, etc.) Usually what i do is think in objects first then translate it into tables. In P.O.O. you'd have a class User and subclasses Actor, Model, etc. each one of them could also have subclasses like TvActor, VoiceOverActor... in a DB you'd have a table for each talent and subtalent, all of them share the same primary key (the id of the user) so if the user 4 is and Actor and a Model, you would have one registry on the Actor's Table and another on the Model Table, both with id=4
As you can see, storing is easy.. the complicated part is to retrieve the info. That's because databases dont have the notion of inheritance (i think mysql has but i haven't tried it).. so if you want to now the subclases of the user 4, i see three options:
multiple SELECTs for each talent and subtalent table that you have, asking if their id is 4.
SELECT * FROM Actor WHERE id=4;SELECT * FROM TvActor WHERE id=4;
Make a big query joining all talent and subtalent table on a left join
SELECT * from User LEFT JOIN Actor ON User.id=Actor.id LEFT JOIN TvActor ON User.id=TvActor.id LEFT JOIN... WHERE User.id=4;
create a Talents table in a NxN relation with User to store a reference of each talent and subtalents that the User has, so you wont have to ask all of the tables. You'd have to make a query on the Talents table to find out what tables you'll need to ask on a second query.
Each one of these three options have their pros and cons.. maybe there's another one =)
Good Luck
PS: ahh i found another option here or maybe it's just the second option improved

What is the best practise for relational database tables in mysql?

I know, there is a lot of info on mysql out there. But I was not really able to find an answer to this specific and actually simple question:
Let's say I have two tables:
USERS
(with many fields, e.g. name, street, email, etc.) and
GROUPS
(also with many fields)
The relation is (I guess?) 1:n, that is ONE user can be a member of MANY groups.
What I dis, is create another table, named USERS_GROUPS_REL. This table has only two fields:
us_id (unique key of table USERS) and
gr_id (unique key of table GROUPS)
In PHP I do a query with join.
Is this "best practice" or is there a better way?
Thankful for any hint!
Hi all,
thanks for your quick and helpful support. Knowing that I was on the right way builds up my mysql-self-confidence a little. :-)
As many commented, my example is not 1:n but many to many. Just as a quick sql-lesson: :-)
Are these the right terms?
1:n one to many
n:1 many to one
n:n many to many?
You have described a many-to-many relationship. Using that relationship, many USERs can be members of many GROUPS. For your purposes, this is indeed the correct way to go.
One-to-many and one-to-one relationships do not require the link or cross-reference table (USERS_GROUPS_REL).
You might find this tutorial useful:
http://www.tonymarston.net/php-mysql/many-to-many.html
It's quite the best case :)
Creating a compound predicate primary key in the relation table is the optimal solution:
ALTER TABLE USERS_GROUPS_REL ADD PRIMARY KEY(us_id, gr_id)
That would be the best practice.
You have two tables on either side of the join and a mapping table (or linker table, or whatever other terminology is out there) in the middle which handles the Many-to-Many relationship (A Group has many Users and a User can belong to many Groups).
To increase performance, you should make a composite key for the mapping table out of us_id and gr_id. You can also create an index on the inverse so that if you need to sort by gr_id, you get the performance benefit there too.
One user can belong to many groups, and one group contains many users, so it is a many-to-many relationship.
The way you describe it is a typical and correct one. A many-to-many relationship is often mapped with an association class or relation table.
Sometimes the relation table has more columns. For example, a user may be the administrator of a particular group. The users_group_rel table would then also have a field administrator.

How do you resolve a many-to-many collection entity in a RDBMS?

I'm trying to model artists and songs and I have a problem where I have a Song_Performance can be performed by many artists (say a duet) so I have an Artist_Group to represent who the songs is performed by.
Well, I now have a many-to-many relationship between Artist and Artist_Group, where an Artist_Group is uniquely identified by the collection of artists in that group. I can create an intersection entity that represents an Artist's participation in an Artist_Group (Artist_Group_Participation?)
I'm having trouble coming up with how to come up with a primary key for the Artist_Group entity that preserves the fact that the same set of artists represents the same group, and lacking a primary key for the Artist_Group entity means I'm lacking a foreign key for the Artist_Group_Participation entity.
The book "Mastering Data Modeling" by John Carlis and Joseph Maguire mention this shape and refer it to as a "Many-Many Collection Entity" and state that it is very rare, but doesn't state how to resolve it since obviously a many-to-many relationship can't be stored directly in a RDBMS. How do I go about representing this?
Edit:
Looks like everyone is suggesting an intersection table, but that's not my issue here. I have that. My issue is enforcing the constraint that you cannot add an Artist_Group entry where the group of artists that it contains are the same as an existing group, ignoring order. I thought about having the ID for Artist_Group be a varchar that is the concatenation of the various artists that comprise it, which would solve the issue if order mattered, but having an Artist_Group for "Elton John and Billy Joel" doesn't prevent the addition of a group for "Billy Joel and Elton John".
I guess I'm missing the point of the "Artist_Group" relation.
The data model in my mind is:
Artist: an individual person.
Song: The song itself.
Performance: A particular performance or arrangement of a song. Usually this would have one song, but you could provide an m:n linking table to accommodate a medley. Ideally, this would be a single real performance, i.e., there would be an associated date.
Recording: A particular fixed version of a performance (CD or whatever). Usually a Performance only has one Recording, but having a separate table would handle the Grateful Dead / multiple-bootleg scenario, as well as re-release albums, radio play vs. live vs. CD versions, etc.
Performance_Artists: A linking table from a particular performance to a list of performers. For each, you could also have an attribute that describes their role(s) in the performance (vocalist, drummer, etc.).
There's no explicit relationship between a set of performers, except that they share performances in common. Thus, any table that attempts to combine random sets of artists outside the context of a recording is not an accurate relational model, as there is no real relationship.
If you are trying to represent an explicit relationship between a set of artists (i.e., they are in the same band), well, bands have names that have uniqueness (though not enough to be a primary key), and a band could be stored simply as an Artist, and then have an Artist_Member linking table that is self-referencing back to the individual Artist records. Or you could have a separate Band table, and a Band_Members table to assign artists to it, perhaps with dates of membership. Either way, just remember that band members change over time and band roles change from one song to the next, so associating a band with a performance should not substitute for linking performances directly to the artists involved.
The primary key for both the Artist and Artist_Group would be an numeric, incremental ID. Then you'd have an Artist_Group_Participation table that has two columns: artist_id and group_id. These would be foreign keys that refer to the ID of their respective tables. Then to SELECT everything you'd use a JOIN.
EDIT: Sorry, I misunderstood your question. The only other way I can think of is add an "artists" column to your Artist_Group table that contains a serialized array (assuming you're using PHP, but other languages have equivalents) of the artists and their IDs. Then just add a UNIQUE constraint to the column.
You could make each artist's ID correspond to a bit in a bitfield. So if Elton John is ID 12 and Billy Joel is ID 123, then the "group" formed by a duet between Elton John and Billy Joel is Artist_Group ID 10633823966279326983230456482242760704 (i.e. it has the 12th and 123rd bit set).
You could enforce the relationship using the intersection table. For example, using a CHECK constraint in PostgreSQL:
CREATE TABLE Artist_Group_Participation (
artist_id int not null,
artist_group_id int not null,
PRIMARY KEY (artist_id, artist_group_id),
FOREIGN KEY (artist_id) REFERENCES Artists (artist_id),
FOREIGN KEY (artist_group_id) REFERENCES Artist_Group (artist_group_id),
CHECK (B'1'<<artist_id & artist_group_id <> 0)
);
Admittedly, this is a hack. It applies extra significance to the Artist_Group surrogate key, when surrogate keys are supposed to be unique but not contain information.
Also if you have thousands of artists, and new artists every day, things could get unwieldy because the length of the Artist_Group key's data type needs to grow larger all the time.
I guess you could build a primary key by sorting and concatenate the artist ids ??
group: 3,2,6 -> 2-3-6 and 6,3,2 -> 2-3-6
I don't have much experience in RDBMS. However, I have read papers of Codd and books by C.J. Date.
So, instead of using RDBMS jargon, I'll try to explain in more common sensical terms (at least to me!)
Here goes -
Singer names should be standard on "First Name - Last Name" basis
Each "Singer" should have an entry in the "Artists Group" table even if they have performed solo
Each entry in the "Artists Group" will consist of multiple "Singer" ordered alphabetically. There should be a single occurance of a specific combination.
Each song will have an entry of a unique record from "Artists Group" regardless of whether they are solo, duets or in a gang.
I don't know if this makes much sense, but it's my two cents!