How to resolve multiple values in one column in a table?

How to resolve multiple values in one column in a table? - sql

I have been tasked with designing a database from a scenario. However, while designing my solution I found I would have multiple values in one cell. We were told this is a repeating group and should be avoided in a database.
I get the repeating groups when I want to link the songs on an album to the album they are found on. For instance, there can be one or many songs on an album. However, a song could be on one or many albums (Dean Martin - Silver Bells could be on a Christmas Hits album and a Dean Martin album).
If I reference each song to its album I would use the AlbumId as a foreign key. However, if it was in multiple albums then I would have multiple AlbumId's as the foreign key. This gives me a repeating group as there will be multiple Ids in the same cell.
If this was reversed by storing all the album's songs on the Album entity I would have the same issue as the SongId would be a foreign key and each album will have multiple SongId's in the same cell.
The design I have includes these following entities:
song
and
album
The song entity will contain the following attribute types:
SongId (PK)
SongDuration
AlbumId (FK)
AudioFileSize
AudioFile
SongTitle
SongLyrics
SongNotes
The album entity will contain the following attribute types:
AlbumId (PK)
AlbumTitle
NumOfTracks
ReleaseDate
ProductionLabel (FK) //Goes to another table that has no issues.
AlbumCoverImage
CoverImageStory
AmountOfCDs
I am quite new to database design and I feel I have grasped it well. However, I am puzzled on how to solve this.
If any more information on the database is required I will happily provide it.
Any help is greatly appreciated.
Best regards,
Steve.

You have a many-to-many relationship. So, you can use a junction/association table:
create table songAlbums (
. . .,
songId int references songs(songId),
albumId int references albums(albumId),
. . .
)
You might want to include other information, such as the position on the album. Such a table could have a composite primary key (songId, albumId) or a synthetic primary key (generated always as identity).

We were told this is a repeating group and should be avoided in a database.
Not "avoided", but modelled, solved. Each column must be Atomic:
1NF: no multiple or compound values
2NF: no repeating groups
The simple solution is to model the multiple values in a subordinate table. In this case, with two Identifers Song and Album, an Associative table.
RecordID
That is the first and foremost error. It cripples both the modelling exercise, and the resulting "database".
The Relational Model requires:
the Key must be "made up from the data"
(ie. not an manufactured ID; GUID; UUID; etc, none of which are data)
each row (as distinct from a record with a RecordId) in each table must be unique
Data uniqueness cannot be obtained from a ID; GUID; UUID; etc. Further, the stupid thing is always an additional column and index.
That needs to be corrected.
Third, you have some columns in the wrong tables.
Album Data Model
Modelling is substantially cheaper than trying SQL. See if this satisfies the requirement.
Rather than going back-and-forth bringing you up to speed with Relational Databases, I have fixed up all issues. Eg. you have multiple CDs per Album, but that was not handled or requested, it must be resolved.
Also available in PDF.
It is rendered in IDEF1X, the Standard for modelling a Relational Database. You may find the short Introduction to IDEF1X helpful.

Related

Linking table entry to multiple other entries

I'm attempting to create a basic music database for a school project.
I would like to link each song to three (3) other 'similar' songs.
I know how to link two tables together with FOREIGN KEY but I'm unsure how to link two entries within the same table.
The programs I'm using are PHPmyadmin and DBDesigner 4.
Thanks in advance for any assistance :)

First of all, when designing a database, you never want to "assume '(3)'." In other words, you don't want "repeating groups" such that the database design would be broken if you ever needed 4.
To me, "is-similar-to" is a many-to-many relationship that would list an arbitrary number of songs that are similar, with a structure such as
SONG_ID_1,
SONG_ID_2,
DEGREE_OF_SIMILARITY (some kind of percentage ...?)
So, for any song, you'd be looking to this table to find all songs that have ever been listed as "similar to" this song. You'd incorporate this table with an INNER JOIN, and be prepared to deal with an arbitrary number of matches.

You're looking for a relationship table, where you keep the song id along with the linked song id.
Example:
Table Song - ID PRIMARY KEY, NAME
Table Song_Link - ID_SONG (from Table Song), ID_LINKED_SONG (from Table Song)
This way you can store the link between both songs on a row basis.
Take into consideration that the link goes both ways.

How to model the association of CDs with Genres?

I'm creating a database design for a web site that sells music CDs, and I've hit a brick wall because I can't get my head around whether the relationship between an CD and it's Genre's is one-to-many or many-to-many
Each CD can have multiple genres. For example "Ministry of Sound Dubstep Anthems" could have genres such as Dubstep, Dance and Electro.
The more I think about it though, each genre can also be linked to a number of CD's.
The way I have my Database at the moment is:
cd_table
PK ID
FK Genre
Description
...
genre_table
PK ID
Genre
If it was a many-to-many am I right in thinking I will need a join table such as:
cd_genre
CD_ID
Genre_ID
And then have them both acting as a Primary Key? How would I link the cd_genre to the cd_table? Or do I just remove the FK in cd_table, and then do a join when querying the CD's?

You're correct - the relationship between CD and Genre is many-to-many, for exactly the reasons you provide.
And yes, a cross-reference/join table like you describe is what you need.
Interestingly, depending on other constraints, you might want to be able to put genre in other places, too. For instance, song: an artist like Weird Al may put several different genres of songs on one of his albums, and the album itself doesn't necessarily have any single 'main' genre (but often have some 'theme').

I think you answered your own question. CD's and Genres are many-to-many, and you accomplish it with that two column table with the two foreign keys, usually referred to as a "join table." It's generally not necessary to think of that table as having a primary key, but rationally the two columns together are a composite primary key.

To determine multiplicity, ask the same question starting with the different relations:
Can a CD belong to many Genres?
Can a Genre contain more than one CD?
Of course, this is may be the wrong database model. Instead, Song<->Genre and Song<->CD, is the higher Normalized Form. It is a higher NF because CD<->Genre can be composed from CD<->Song<->Genre but CD<->Genre cannot answer the same queries about Songs on the CD (unless a CD can be at most one Genre).
On the other hand, one might only care about CDs and not Songs .. in which case modeling Songs irrelevant or even harmful excessive data. Likewise, perfect classification might overcomplicate practical classification - be aware of the tradeoffs between granularity of information and the use to the problem domain.

How to design a media table with references to multiple (at least 4) tables?

I am designing a database for my cookbooks. I have created multiple tables in my design: books, authors, recipes, ingredients and for all these items I want to link media (images or video) to items in all these tables.
I was thinking of a design like:
media_id,
rid (primary key of foreign table),
rtype (1=book, 2=author, 3=recipe, 4=ingredient),
media_type(1=image,2=video),
media_url
But how will I ensure relational integrity?
Thanks

Your proposed design seems to imply that each entity (book, author, etc.) can have multiple media files, so to maintain relational integrity, I'd have separate junction tables for each relationship.

If there's only 1 media-item for each table, the media_id should be in the tables in stead of the other way around.
If several media-items are possible you shoud link them within an extra table. There should be an extra table per item (bookid_mediaid for example).
If you think it should be linkable within one table, you are actually stating that those items have at least something in common. Otherwise the rid would have different meaning throughout the records, depending on type, and that's not possible in relational theory.
Concluding :
Your design is not good. Either you should have a relationship per entity or find what's common for all entities and use that to link to your mediatypes.

Matthijs,
For a relational design, you have to look at how these objects are related to each other and then decide on the data model. Here is an initial draft and the conditions that I am assuming. You might have different rules for your case.. post them and someone should be able to post an accurate model.
Each book can be written by many authors and each author can write different books. ( M:N)
RECIPES <--> BOOKS (M:N)
RECIPES <--> INGREDIENTS (M:N)
These would be the initial set of tables.
BOOKS, AUTHORS, BOOKS_AUTHORS_ASC, RECIPES, BOOKS_RECIPES_ASC,
INGREDIENTS, RECIPES_INGREDIENTS_ASC.
If the media is related to one and only one book, you'll have a different table , say, media with the following columns. BOOK_ID is the parent column to which this media is associated.
MEDIA
-------------------------------------
MEDIA_ID NUMBER -- Primary-key
BOOK_ID NUMBER -- Foreign Key..
MEDIA_TYPE varchar2(20) -- 'IMAGE','VIDEO' etc..
other_column1 varchar2(50),
other_column2 varchar2(50)
-- so on..
And if a given image/video can be associated to multiple books and each book can have different images, then you'll have a different entity for the association.. something like..
Media
--------
Media_id number -- primary key
media_type varchar2(10),
media_name varchar2(100),
--- other columns.
MEDIA_BOOK_ASC
--------------
MEDIA_BOOK_ASC_ID NUMBER,
MEDIA_ID NUMBER, --foreign key to media table.
BOOK_ID NUMBER, --foreign key to book table
--other columns related to associations...

How do you resolve a many-to-many collection entity in a RDBMS?

I'm trying to model artists and songs and I have a problem where I have a Song_Performance can be performed by many artists (say a duet) so I have an Artist_Group to represent who the songs is performed by.
Well, I now have a many-to-many relationship between Artist and Artist_Group, where an Artist_Group is uniquely identified by the collection of artists in that group. I can create an intersection entity that represents an Artist's participation in an Artist_Group (Artist_Group_Participation?)
I'm having trouble coming up with how to come up with a primary key for the Artist_Group entity that preserves the fact that the same set of artists represents the same group, and lacking a primary key for the Artist_Group entity means I'm lacking a foreign key for the Artist_Group_Participation entity.
The book "Mastering Data Modeling" by John Carlis and Joseph Maguire mention this shape and refer it to as a "Many-Many Collection Entity" and state that it is very rare, but doesn't state how to resolve it since obviously a many-to-many relationship can't be stored directly in a RDBMS. How do I go about representing this?
Edit:
Looks like everyone is suggesting an intersection table, but that's not my issue here. I have that. My issue is enforcing the constraint that you cannot add an Artist_Group entry where the group of artists that it contains are the same as an existing group, ignoring order. I thought about having the ID for Artist_Group be a varchar that is the concatenation of the various artists that comprise it, which would solve the issue if order mattered, but having an Artist_Group for "Elton John and Billy Joel" doesn't prevent the addition of a group for "Billy Joel and Elton John".

I guess I'm missing the point of the "Artist_Group" relation.
The data model in my mind is:
Artist: an individual person.
Song: The song itself.
Performance: A particular performance or arrangement of a song. Usually this would have one song, but you could provide an m:n linking table to accommodate a medley. Ideally, this would be a single real performance, i.e., there would be an associated date.
Recording: A particular fixed version of a performance (CD or whatever). Usually a Performance only has one Recording, but having a separate table would handle the Grateful Dead / multiple-bootleg scenario, as well as re-release albums, radio play vs. live vs. CD versions, etc.
Performance_Artists: A linking table from a particular performance to a list of performers. For each, you could also have an attribute that describes their role(s) in the performance (vocalist, drummer, etc.).
There's no explicit relationship between a set of performers, except that they share performances in common. Thus, any table that attempts to combine random sets of artists outside the context of a recording is not an accurate relational model, as there is no real relationship.
If you are trying to represent an explicit relationship between a set of artists (i.e., they are in the same band), well, bands have names that have uniqueness (though not enough to be a primary key), and a band could be stored simply as an Artist, and then have an Artist_Member linking table that is self-referencing back to the individual Artist records. Or you could have a separate Band table, and a Band_Members table to assign artists to it, perhaps with dates of membership. Either way, just remember that band members change over time and band roles change from one song to the next, so associating a band with a performance should not substitute for linking performances directly to the artists involved.

The primary key for both the Artist and Artist_Group would be an numeric, incremental ID. Then you'd have an Artist_Group_Participation table that has two columns: artist_id and group_id. These would be foreign keys that refer to the ID of their respective tables. Then to SELECT everything you'd use a JOIN.
EDIT: Sorry, I misunderstood your question. The only other way I can think of is add an "artists" column to your Artist_Group table that contains a serialized array (assuming you're using PHP, but other languages have equivalents) of the artists and their IDs. Then just add a UNIQUE constraint to the column.

You could make each artist's ID correspond to a bit in a bitfield. So if Elton John is ID 12 and Billy Joel is ID 123, then the "group" formed by a duet between Elton John and Billy Joel is Artist_Group ID 10633823966279326983230456482242760704 (i.e. it has the 12th and 123rd bit set).
You could enforce the relationship using the intersection table. For example, using a CHECK constraint in PostgreSQL:
CREATE TABLE Artist_Group_Participation (
artist_id int not null,
artist_group_id int not null,
PRIMARY KEY (artist_id, artist_group_id),
FOREIGN KEY (artist_id) REFERENCES Artists (artist_id),
FOREIGN KEY (artist_group_id) REFERENCES Artist_Group (artist_group_id),
CHECK (B'1'<<artist_id & artist_group_id <> 0)
);
Admittedly, this is a hack. It applies extra significance to the Artist_Group surrogate key, when surrogate keys are supposed to be unique but not contain information.
Also if you have thousands of artists, and new artists every day, things could get unwieldy because the length of the Artist_Group key's data type needs to grow larger all the time.

I guess you could build a primary key by sorting and concatenate the artist ids ??
group: 3,2,6 -> 2-3-6 and 6,3,2 -> 2-3-6

I don't have much experience in RDBMS. However, I have read papers of Codd and books by C.J. Date.
So, instead of using RDBMS jargon, I'll try to explain in more common sensical terms (at least to me!)
Here goes -
Singer names should be standard on "First Name - Last Name" basis
Each "Singer" should have an entry in the "Artists Group" table even if they have performed solo
Each entry in the "Artists Group" will consist of multiple "Singer" ordered alphabetically. There should be a single occurance of a specific combination.
Each song will have an entry of a unique record from "Artists Group" regardless of whether they are solo, duets or in a gang.
I don't know if this makes much sense, but it's my two cents!

Owner ID type database fields

Suppose you have these tables: RestaurantChains, Restaurants, MenuItems - with the obvious relations between them. Now, you have tables Comments and Ratings, which store the customer comments/ratings about chains, restaurants and menu items. What would be the best way to link these tables? The obvious solutions could be:
Use columns OwnerType and OwnerID in the tables Comments and Ratings, but now I can't add foreign keys to link comments/ratings with the objects they are ment for
Create separate tables of Comments and Ratings for each table, e.g. MenuItemRatings, MenuItemComments etc. This solution has the advantage that all the correct foreign keys are present and has the obvious disadavantage of having lots and lots of tables with basically the same structure.
So, which solution works better? Or is there even a better solution that I don't know about?

Since comments about a menu item are different from comments about a restaurant (even if they happen to share the same structure) I would put them in separate tables and have the appropriate FKs to enforce some data integrity in your database.
I don't know why there is an aversion to having more tables in your database. Unless you're going from 50 tables to 50,000 tables you're not going to see a performance problem due to large catalog tables (and having more, smaller tables in this case should actually give you better performance). I would also tend to think that it would be a lot clearer to understand when dealing with tables called "Menu_Item_Comments" and "Restaurant_Comments" than it would to deal with a table called "Comments" and not knowing what exactly is really in it just by the name of it.

How about this alt text http://www.freeimagehosting.net/uploads/8241ff5c76.png

Have a single Comments/Rating table for all the objects and dont use automatically generated foreign keys. The key in the ratings table eg RatingID can be placed in a field in Restaurant, Chain, Menuitems table and they can all point to the same table, they are still foreign keys.
If you need to know in reverse what object the review relates to you would need to have a field specifying the type of review it was, but that should be all.

Use a single table for comments and use GUID's as primary keys for your entites.
Then you can select comments without even knowing beforehand where they belong to:
SELECT CommentText
FROM Comments c, Restaurants r
WHERE c.Source = r.Id
SELECT CommentText
FROM Comments c, Chains ch
WHERE c.Source = ch.Id
etc.
You can't use foreign keys for comments, of course, but it's not that comments cannot live without foreign keys.
You may clean orphaned comments in triggers but there's nothing bad if some of them are left.
You amy also create a global Entity table (with a single GUID column), make your Chains, Restaurants, MenuItems and Comments refer to that table with a FOREING KEY ON DELETE CASCADE, and when DELETE'ing, say, a restaurant, delete it from that table instead. It will delete both a restaurant and all comments on it, and you still have your integrity.

If you want to take advantage of foreign key constraint and normalize the attributes of comments (and ratings) across base tables, you may need to create relationship tables between base tables and comments (and ratings).
e.g. for Restaurants and Comments:
Restaurants
id (PK)
(attributes of restaurants...)
RestaurantComments
id (PK)
restaurantid (FK to Restaurants)
commentid (FK to Comments)
Comments
id (PK)
(attributes of comments...)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas