MS SQL creating many-to-many relation with a junction table - sql

I'm using Microsoft SQL Server Management Studio and while creating a junction table should I create an ID column for the junction table, if so should I also make it the primary key and identity column? Or just keep 2 columns for the tables I'm joining in the many-to-many relation?
For example if this would be the many-to many tables:
MOVIE
Movie_ID
Name
etc...
CATEGORY
Category_ID
Name
etc...
Should I make the junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
Movie_Category_Junction_ID
[and make the Movie_Category_Junction_ID my Primary Key and use it as the Identity Column] ?
Or:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
[and just leave it at that with no primary key or identity table] ?

I would use the second junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
The primary key would be the combination of both columns. You would also have a foreign key from each column to the Movie and Category table.
The junction table would look similar to this:
create table movie_category_junction
(
movie_id int,
category_id int,
CONSTRAINT movie_cat_pk PRIMARY KEY (movie_id, category_id),
CONSTRAINT FK_movie
FOREIGN KEY (movie_id) REFERENCES movie (movie_id),
CONSTRAINT FK_category
FOREIGN KEY (category_id) REFERENCES category (category_id)
);
See SQL Fiddle with Demo.
Using these two fields as the PRIMARY KEY will prevent duplicate movie/category combinations from being added to the table.

There are different schools of thought on this. One school prefers including a primary key and naming the linking table something more significant than just the two tables it is linking. The reasoning is that although the table may start out seeming like just a linking table, it may become its own table with significant data.
An example is a many-to-many between magazines and subscribers. Really that link is a subscription with its own attributes, like expiration date, payment status, etc.
However, I think sometimes a linking table is just a linking table. The many to many relationship with categories is a good example of this.
So in this case, a separate one field primary key is not necessary. You could have a auto-assign key, which wouldn't hurt anything, and would make deleting specific records easier. It might be good as a general practice, so if the table later develops into a significant table with its own significant data (as subscriptions) it will already have an auto-assign primary key.
You can put a unique index on the two fields to avoid duplicates. This will even prevent duplicates if you have a separate auto-assign key. You could use both fields as your primary key (which is also a unique index).
So, the one school of thought can stick with integer auto-assign primary keys, and avoids compound primary keys. This is not the only way to do it, and maybe not the best, but it won't lead you wrong, into a problem where you really regret it.
But, for something like what you are doing, you will probably be fine with just the two fields. I'd still recommend either making the two fields a compound primary key, or at least putting a unique index on the two fields.

I would go with the 2nd junction table. But make those two fields as Primary key. That will restrict duplicate entries.

Related

Designing the primary key in associative table

Suppose I have an artist table like:
id
name
1
John Coltrane
2
Springsteen
and a song table like:
id
title
1
Singing in the rain
2
Mimosa
Now an artist can write more than one song, and a song can be written by more than one artist. We have a many-to-many relation. We need an associative table!
How to design the primary key of the associative table?
One way would be to define a composite key of the two foreign keys, like this:
CREATE TABLE artist_song_map(
artist_id INTEGER,
song_id INTEGER,
PRIMARY KEY(artist_id, song_id),
FOREIGN KEY(artist_id) REFERENCES artist(id),
FOREIGN KEY(song_id) REFERENCES song(id)
)
Another way would be to have a synthetic primary key, and impose an unique constraint on the tuple of the two foreign keys:
CREATE TABLE artist_song_map(
id INTEGER PRIMARY KEY AUTOINCREMENT,
artist_id INTEGER,
song_id INTEGER,
UNIQUE(artist_id, song_id),
FOREIGN KEY(artist_id) REFERENCES artist(id),
FOREIGN KEY(song_id) REFERENCES song(id)
)
Which design choice is better?
Unless you define the table as WITHOUT ROWID both queries will create the same table.
The column id in your 2nd way adds nothing but an alias for the column rowid that will be created in any of the 2 ways.
Since this is a bridge table, you only need to define the combination of the columns artist_id and song_id as UNIQUE.
If you want to extend your design with other tables, like a playlist table, you will have to decide how it will be linked to the existing tables:
If there is no id column in artist_song_map then you will link
playlist to song and artist, just like you did with
artist_song_map.
If there is an id column in artist_song_map then you can link playlist directly to that id.
I suggest that you base your decision not only on these 3 tables (song, artist and artist_song_map), but also on the tables that you plan to add.
Logically the both design is the same. But from administration aspect the identity design is more efficient. Less disk fragmentation and future redesign or maintenance will be easier.
Bridge tables normally don't require a ID(auto_inCREMNT) to identify the rows.
The linking columns(foreign key) are the main point, as thea link artists to a 8or songs)
only when you need special attributes to that bridge or you want to reference a row of that bridge table and don't want to have ttwo linking columns, then you would use such an ID field, but as i said normally you never need it
While, generally, the differences are minor, the composite/compound foreign key design sounds more natural. A separate primary key together with the associated index take additional space in the database. Further, if you use a composite primary key, you can declare the table as WITHOUT ROWID. According to the official docs, "in some cases, a WITHOUT ROWID table can use about half the amount of disk space and can operate nearly twice as fast".

SQL Server use same Guid as primary key in 2 tables

We have 2 tables with a 1:1 relationship.
1 table should reference the other, typically one would use a FK relationship.
Since there is a 1:1 relationship, we could also directly use the same Guid in both tables as primary key.
Additional info: the data is split into 2 tables since the data is rather separate, think "person" and "address" - but in a world where there is a clear 1:1 relationship between the 2.
As per the tags I was suggested I assume this is called "shared primary key".
Would using the same Guid as PK in 2 tables have any ill effects?
To consolidate info from comments into answer...
No, there are no ill effects of two tables sharing PK.
You will still need to create a FK reference from 2nd table, FK column will be the same as PK column.
Though, your example of "Person" and "Address" in 1:1 situation is not best suited. Common usage of this practice is entities that extend one another. For example: Table "User" can hold common info on all users, but tables "Candidate" and "Recruiter" can each expand on it, and all tables can share same PK. Programming language representation would also be classes that extends one another.
Other (similar) example would be table that store more detailed info than the base table like "User" and "UserDetails". It's 1:1 and no need to introduce additional PK column.
Code sample where PK is also a FK:
CREATE TABLE [User]
(
id INT PRIMARY KEY
, name NVARCHAR(100)
);
CREATE TABLE [Candidate]
(
id INT PRIMARY KEY FOREIGN KEY REFERENCES [User](id)
, actively_looking BIT
);
CREATE TABLE [Recruiter]
(
id INT PRIMARY KEY
, currently_hiring BIT
, FOREIGN KEY (id) REFERENCES [User](id)
);
PS: As mentioned GUID is not best suited column for PK due to performance issues, but that's another topic.

composite primary key with practical approach

I just need to understand the concept behind the composite primary key. I have googled about it, understood that it is a combination of more than one column of a table.But my questions is, what is the practical approach of this key over any data? when i should use this concept? can you show me any practical usage of this key on excel or SQL server?
It may be a weird type of question for any sql expert. I apologize for this kind of idiotic question. If anybody feels it is an idiot question, please forgive me.
A typical use-case for a composite primary key is a junction/association table. Consider orders and products. One order could have many products. One product could be in many orders. The orderProducts table could be defined as:
create table orderProducts (
orderId int not null references orders(orderId),
productId int not null references products(productId),
quantity int,
. . .
);
It makes sense to declare (orderId, productId) as a composite primary key. This would impose the constraint that any given order has any given product only once.
That said, I would normally use a synthetic key (orderProductId) and simply declare the combination as unique.
The benefit of a composite primary key as that it enforces the uniques (which could also be done with a uniqueness constraint). It also wastes no space that would be needed for an additional key.
There are downsides to composite primary keys as compared to identity keys:
Identity keys keep track of the order of inserts.
Identity keys are typically only 4 bytes.
Foreign key references consist of only one column.
By default, SQL Server clusters on primary keys. This imposes an ordering and can result in fragmentation (although that is doubtful for this example).
Let's say I have a table of cars. It includes the model and make of the cars. I do not want to insert the same exact car into my table, but there are cars that will have the same make and cars that will have the same model (assume both Ford and Toyota make a car called the 'BlergWagon').
I could enforce uniqueness of make/model with a composite key that includes both values. A unique key on just make would not allow me to add more than 1 Toyota and a unique key on just model would not allow me to enter more than 1 BlergWagon.
Another example would be grades, terms, years, students, and classes. I could enforce uniqueness for a student in a class and a specific semester in a specific year so that my table does not have 2 dupe records that show the same class in the same semester in the same year with the same student.
Another part of your post is about primary key, which I'll assume means you are talking about a clustered index. Clustered index enforces order of the table. So you could throw this onto an identity column to order the table and add a unique, nonclustered index to enforce uniqueness on your other columns.

One Primary Key Value in many tables

This may seem like a simple question, but I am stumped:
I have created a database about cars (in Oracle SQL developer). I have amongst other tables a table called: Manufacturer and a table called Parentcompany.
Since some manufacturers are owned by bigger corporations, I will also show them in my database.
The parentcompany table is the "parent table" and the Manufacturer table the "child table".
for both I have created columns, each having their own Primary Key.
For some reason, when I inserted the values for my columns, I was able to use the same value for the primary key of Manufacturer and Parentcompany
The column: ManufacturerID is primary Key of Manufacturer. The value for this is: 'MBE'
The column: ParentcompanyID is primary key of Parentcompany. The value for this is 'MBE'
Both have the same value. Do I have a problem with the thinking logic?
Or do I just not understand how primary keys work?
Does a primary key only need to be unique in a table, and not the database?
I would appreciate it if someone shed light on the situation.
A primary key is unique for each table.
Have a look at this tutorial: SQL - Primary key
A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique
values. A primary key column cannot have NULL values.
A table can have only one primary key, which may consist of single or
multiple fields. When multiple fields are used as a primary key, they
are called a composite key.
If a table has a primary key defined on any field(s), then you cannot
have two records having the same value of that field(s).
Primary key is table-unique. You can use same value of PI for every separate table in DB. Actually that often happens as PI often incremental number representing ID of a row: 1,2,3,4...
For your case more common implementation would be to have hierarchical table called Company, which would have fields: company_name and parent_company_name. In case company has a parent, in field parent_company_name it would have some value from field company_name.
There are several reasons why the same value in two different PKs might work out with no problems. In your case, it seems to flow naturally from the semantics of the data.
A row in the Manufacturers table and a row in the ParentCompany table both appear to refer to the same thing, namely a company. In that case, giving a company the same id in both tables is not only possible, but actually useful. It represents a 1 to 1 correspondence between manufacturers and parent companies without adding extra columns to serve as FKs.
Thanks for the quick answers!
I think I know what to do now. I will create a general company table, in which all companies will be stored. Then I will create, as I go along specific company tables like Manufacturer and parent company that reference a certain company in the company table.
To clarify, the only column I would put into the sub-company tables is a column with a foreign key referencing a column of the company table, yes?
For the primary key, I was just confused, because I hear so much about the key needing to be unique, and can't have the same value as another. So then this condition only goes for tables, not the whole database. Thanks for the clarification!

How should this database sub type relationship be modelled?

I am revising a legacy multi-tenant application where the shopping cart function stores multiple vendors and multiple clients in the same database. Some clients of one vendor may be clients of a different vendor. Some vendors might actually be clients of another vendor.
I currently have a table for the super-type 'party' with primary key party_ID, a table for the subtype 'company' with primary key company_ID (references party_ID) and a table for the role of 'vendor' with primary key vendor_ID (references company_ID). I also have a junction table, 'client' with a composite primary key of vendor_ID and party_ID.
My question is how should the 'order' table reference the vendor and client tables? My first thought is that the table should have a composite primary key of vendor_ID, client_ID and order_ID (order_ID could be auto-increment across the table or sequential per vendor_ID + client_ID)
but this seemed a bit fishy as there were three attributes making up the key...
Does anyone have any insight into this topic? Most 'shopping carts' only deal with a single vendor, so the order table simply lists client_ID as a foreign key.
Thanks!
My question is how should the order table reference the vendor and
client tables? My first thought is that the table should have a
composite primary key of 'vendor_ID', 'client_ID' and 'order_ID' but
this seemed a bit fishy as there were three keys...
Composite primary key doesn't mean three keys. It means one key consisting of three columns.
But that's not the real issue.
An order is an accounting record; it must not change over time. Storing the ID numbers is risky unless you've built temporal tables, and I doubt you've done that. If a vendor changes its name today, its name no longer matches the name on earlier orders. You must not let that happen with accounting records.
Unless you mean something unusual by "order", I'd expect Order_id to be its primary key. There might be other constraints; there might even be other key constraints to prevent duplicate orders that differ only by Order_id. But I'd still expect Order_id to be the primary key of a table of orders.
If vendors and clients are subtypes, I'd expect any (high risk) id numbers you store to reference the id numbers in the subtype tables. In your case, you seem to have an additional table that identifies the clients of vendors; it contains the columns {vendor_id, client_id}. The foreign key references for that table should be obvious.
Your table of orders should have one foreign key reference to that table, not one foreign key to vendors and another foreign key to clients. So in the table of orders, foreign key (vendor_id, client_id) references vendor_clients (vendor_id, client_id). The table of vendor clients will need either a primary key constraint or a unique constraint on {vendor_id, client_id}.
But you shouldn't do that for accounting unless you're using temporal tables. Instead, you should probably store both the id numbers and the text.
I would start with something like this. I do admit that I still do not quite understand difference between company, vendor, and client in your question. As Catcall mentioned, in this model you are not allowed to delete Parties (People, Organizations); accounting records should be frozen -- usually by capturing current customer/supplier info in order table.
For your primary key, you'll want just order_id.
Really, the composite (and unique) key I would use would be [vendor_id, client_id, occurredAt] (where occurredAt is a timestamp) - assuming orders could only be placed once a millisecond. However, this is something of a wide key, and some systems don't appreciate those. You'll still need these columns, and probably indexed, however.