Db design relationship between n:n tables? - sql

I have poor skills for db design and I need some help about relations setup.
So use case is:
User which can be Coach or Client.
Client can have many coaches and coaches can have many clients.
Coach can create many workouts for client.
Client can also have many workouts assigned from coach.
Workout have many sets, sets have many exercises and reps etc...
So I created design on image.
But what make me feel that I am doing everything wrong is double keys.
In this table client_has_coach_has_workout I have two keys referencing n:n table and it confuses me as when I need to import data it will always have both keys from n:n table.
Any help please.

This design can work if you want it to, but I would consider reversing the link between coach and client to user.
remove client_id from user
remove coach_id from user
Add user_id as a foreign key to client
Add user_id as a foreign key to coach
Ultimately yes you are correct that now there are composite keys, but this is not a problem because during importing you would always need to lookup both the client_id and the coach_Id anyway.
To simplify the model logic, you can add an arbitrary Id key column to client_has_coach then client_has_coach_has_workout only needs a single foreign key client_has_coach_id that links back to the client_has_coach table, this forces us to lookup the specific linking record from client_has_coach and helps us enforce it's existence.
If you do use a composite key, then we can generally skip the lookup for client_has_coach and assume that it already exists, but this can lead to orphaned rows or scenarios where there is no record in client_has_coach corresponding to a combination of keys in client_has_coach_has_workout.
Composite keys work well in models where we do not need to maintain integrity between client_has_coach and client_has_coach_has_workout. But your naming conventions suggests that this is not a desirable aspect in your model.
Either way, on import of data you would need to first lookup the coach record, then lookup the client record to check for or create the record in client_has_coach, you would then import corresponding rows into client_has_coach_has_workout.

Related

Adding an artificial primary key versus using a unique field [duplicate]

This question already has answers here:
Surrogate vs. natural/business keys [closed]
(19 answers)
Why would one consider using Surrogate keys vs Natural with ON UPDATE CASCADE?
(1 answer)
Closed 7 months ago.
Recently I Inherited a huge app from somebody who left the company.
This app used a SQL server DB .
Now the developer always defines an int base primary key on tables. for example even if Users table has a unique UserName field , he always added an integer identity primary key.
This is done for every table no matter if other fields could be unique and define primary key.
Do you see any benefits whatsoever on this? using UserName as primary key vs adding UserID(identify column) and set that as primary key?
I feel like I have to add add another element to my comments, which started to produce an essay of comments, so I think it is better that I post it all as an answer instead.
Sometimes there are domain specific reasons why a candidate key is not a good candidate for joins (maybe people change user names so often that the required cascades start causing performance problems). But another reason to add an ever-increasing surrogate is to make it the clustered index. A static and ever-increasing clustered index alleviates a high-cost IO operation known as a page split. So even with a good natural candidate key, it can be useful to add a surrogate and cluster on that. Read this for further details.
But if you add such a surrogate, recognise that the surrogate is purely internal, it is there for performance reasons only. It does not guarantee the integrity of your data. It has no meaning in the model, unless it becomes part of the model. For example, if you are generating invoice numbers as an identity column, and sending those values out into the real world (on invoice documents/emails/etc), then it's not a surrogate, it's part of the model. It can be meaningfully referenced by the customer who received the invoice, for example.
One final thing that is typically left out of this discussion is one particular aspect of join performance. It is often said that the primary key should also be narrow, because it can make joins more performant, as well as reducing the size of non-clustered indexes. And that's true.
But a natural primary key can eliminate the need for a join in the first place.
Let's put all this together with an example:
create table Countries
(
countryCode char(2) not null primary key clustered,
countryName varchar(64) not null
);
insert Countries values
('AU', 'Australia'),
('FR', 'France');
create table TourLocations
(
tourLocationName varchar(64) not null,
tourLocationId int identity(1,1) unique clustered,
countryCode char(2) not null foreign key references Countries(countryCode),
primary key (countryCode, tourLocationName)
);
insert TourLocations (TourLocationName, countryCode) values
('Bondi Beach', 'AU'),
('Eiffel Tower', 'FR')
I did not add a surrogate key to Countries, because there aren't many rows and we're not going to be constantly inserting new rows. I already know what all the countries are, and they don't change very often.
On the TourLocations table I have added an identity and clustered on it. There could be very many tour locations, changing all the time.
But I still must have a natural key on TourLocations. Otherwise I could insert the same tour location name with the same country twice. Sure, the Id's will be different. But the Id's don't mean anything. As far as any real human is concerned, two tour locations with the same name and country code are completely indistinguishable. Do you intend to have actual users using the system? Then you've got a problem.
By putting the same country and location name in twice I haven't created two facts in my database. I have created the same fact twice! No good. The natural key is necessary. In this sense The Impaler's answer is strictly, necessarily, wrong. You cannot not have a natural key. If the natural key can't be defined as anything other than "every meaningful column in the table" (that is to say, excluding the surrogate), so be it.
OK, now let's investigate the claim that an int identity key is advantageous because it helps with joins. Well, in this case my char(2) country code is narrower than an int would have been.
But even if it wasn't (maybe we think we can get away with a tinyint), those country codes are meaningful to real people, which means a lot of the time I don't have to do the join at all.
Suppose I gave the results of this query to my users:
select countryCode, tourLocationName
from TourLocations
order by 1, 2;
Very many people will not need me to provide the countries.countryName column for them to know which country is represented by the code in each of those rows. I don't have to do the join.
When you're dealing with a specific business domain that becomes even more likely. Meaningful codes are understood by the domain users. They often don't need to see the long description columns from the key table. So in many cases no join is required to give the users all of the information they need.
If I had foreign keyed to an identity surrogate I would have to do the join, because the identity surrogate doesn't mean anything to anyone.
You are talking about the difference between synthetic and natural keys.
In my [very] personal opinion, I would recommend to always use synthetic keys (and always call it id). The main problem is that natural keys are never unique; they are unique in theory, yes, but in the real world there are a myriad of unexpected and inexorable events that will make this false.
In database design:
Natural keys correspond to values present in the domain model. For example, UserName, SSN, VIN can be considered natural keys.
Synthetic keys are values not present in the domain model. They are just numeric/string/UUID values that have no relationship with the actual data. They only serve as a unique identifiers for the rows.
I would say, stick to synthetic keys and sleep well at night. You never know what the Marketing Department will come up with on Monday, and suddenly "the username is not unique anymore".
Yes having a dedicated int is a good thing for PK use.
you may have multiple alternate keys, that's ok too.
two great reasons for it:
it is performant
it protects against key mutation ( editing a name etc. )
A username or any such unique field that holds meaningful data is subject to changes. A name may have been misspelled or you might want to edit a name to choose a better one, etc. etc.
Primary keys are used to identify records and, in conjunction with foreign keys, to connect records in different tables. They should never change. Therefore, it is better to use a meaningless int field as primary key.
By meaningless I mean that apart from being the primary key it has no meaning to the users.
An int identity column has other advantages over a text field as primary key.
It is generated by the database engine and is guaranteed to be unique in multi-user scenarios.
it is faster than a text column.
Text can have leading spaces, hidden characters and other oddities.
There are multiple kinds of text data types, multiple character sets and culture dependent behaviors resulting in text comparisons not always working as expected.
int primary keys generated in ascending order have a superior performance in conjunction with clustered primary keys (which is a SQL-Server specialty).
Note that I am talking from a database point of view. In the user interface, users will prefer identifying entries by name or e-mail address, etc.
But commands like SELECT, INSERT, UPDATE or DELETE will always identify records by the primary key.
This subject - quite much like gulivar travels and wars being fought over which end of the egg you supposed to crack open to eat.
However, using the SAME "id" name for all tables, and autonumber? Yes, it is LONG establihsed choice.
There are of course MANY different views on this subject, and many advantages and disavantages.
Regardless of which choice one perfers (or even needs), this is a long established concept in our industry. In fact SharePoint tables use "ID" and autonumber by defualt. So does ms-access, and there probably more that do this.
The simple concpet?
You can build your tables with the PK and child tables with forighen keys.
At that point you setup your relationships between the tables.
Now, you might decide to add say some invoice number or whatever. Rules might mean that such invoice number is not duplicated.
But, WHY do we care of you have some "user" name, or some "invoice" number or whatever. Why should that fact effect your relational database model?
You mean I don't have a user name, or don't have a invoice number, and the whole database and relatonships don't work anymore? We don't care!!!!
The concept of data, even required fields, or even a column having to be unique ?
That has ZERO to do with a working relational data model.
And maybe you decide that invoice number is not generated until say sent to the customer. So, the fact of some user name, invoice number or whatever? Don't care - you can have all kinds of business rules for those numbers, but they have ZERO do to do with the fact that you designed a working relational data model based on so called "surrogate" or sometime called synthetic keys.
So, once you build that data model - even with JUST the PK "id" and FK (forighen keys), you are NOW free to start adding columns and define what type of data you going to put in each table. but, what you shove into each table has ZERO to do with that working related data model. They are to be thought as seperate concpets.
So, if you have a user name - add that column to the table. If you don't want users name, remove the column. As such data you store in the table has ZERO to do with the automatic PK ID you using - it not really any different then say what area of memory the computer going to allocate to load that data. Basic data operations of the system is has nothing to do with having build database with relationships that simple exist. And the data columns you add after having built those relationships is up to you - but will not, and should not effect the operation of the database and relationships you built and setup. Not only are these two concepts separate, but they free the developer from having to worry about the part that maintains the relationships as opposed to data column you add to such tables to store user data.
I mean, in json data, xml? We often have a master + child table relationship. We don't care how that relationship is maintained - but only that it exists.
Thus yes, all tables have that pk "ID". Even better? in code, you NEVER have to guess what the PK id is - it always the same!!!
So, data and columns you put and toss into a table? Those columns and data have zero to do with the PK id, and while it is the database generating that PK? It could be a web service call to some monkeys living in a far away jungle eating banana's and they give you a PK value based on how many bananas they eaten. We just really don't' care about that number - it is just internal house keeping numbers - one that we don't see or even care about in most code. And thus the number one rule to such auto matic PK values?
You NEVER give that auto PK number any meaning from a user and applcation point of view.
In summary:
Yes, using a PK called "id" for all tables? Common, and in fact in SharePoint and many systems, it not only the default, but is in fact required for such systems to operate.
Its better to use userid. User table is referenced by many other tables.
The referenced table would contain the primary key of the user table as foreign key.
Its better to use userid since its integer value,
it takes less space than string values of username and
the searches by the database engine would be faster
user(userid, username, name)
comments(commentid, comment, userid) would be better than
comments(commentid, comment, username)

SQL tables design layout

I'm trying to design my database with very basic tables and I am confused on the CORRECT way to do it.
I've attached a picture of the main info, and I'm not quite sure how to link them. Meaning what should be a foreign key, or should some of these tables include of LIST<> of the other tables.
UPDATE TO TABLES
As per your requirements, You are right about the associative table
Client can have multiple accounts And Accounts can have multiple clients
Then, Many (Client) to Many (Account)
So, Create an associate table to break the many to many relationship first. Then join it that way
Account can have only one Manager
Which means One(Manager) to Many(Accounts)
So, add an attribute called ManagerID in Accounts
Account can have many traedetail
Which means One(Accounts) to Many(TradeDetails)
So, add an attribute called AccountID in TradeDetails
Depends on whether you are looking to have a normalized database or some other type of design paradigm. I recommend doing some reading on the concepts of database normalization and referential integrity.
What I would do is make tables that have a 1 to 1 relationship such as account/manager into a single table (unless you can think of a really good reason not to). Add Clientid as a foreign key to Account. Add AccountID as a foreign key to TradeDetail. You are basically setting up everything as 1 to many relationships where the table that has 1 record for the id has the field as a primary key and the table that has many has it as a foreign key.

Is it possible to implement a TRUE one-to-one relation?

Consider the following model where a Customer should have one and only one Address and an Address should belong to one and only one Customer:
To implement it, as almost everybody in DB field says, Shared PK is the solution:
But I think it is a fake one-to-one relationship. Because nothing in terms of database relationship actually prevents deleting any row in table Address. So truely, it is 1..[0..1] not 1..1
Am I right? Is there any other way to implement a true 1..1 relation?
Update:
Why cascade delete is not a solution:
If we consider cascade delete as a solution we should put this on either of the tables. Let's say if a row is deleted from table Address, it causes corresponding row in table Customer to be deleted. it's okay but half of the solution. If a row in Customer is deleted, the corresponding row in Address should be deleted as well. This is the second half of the solution, and it obviously makes a cycle.
Beside my comment
You could implement DELETE CASCADE See HOW
I realize there is also the problem of insert.
You have to insert Customer first and then Address
So I think the best way if you really want a 1:1 is create a single table instead.
Customer
CustomerID
Name
Address
City
Sorry, is this meant to be a real-world database relationship? In all of the many databases I have ever built with customer data, there has always been real cases of either customers with multiple addresses, or more than one organisation at the same address.
I wouldn't want to lead you into a database modelling fallacy by suggesting anything different.
Yes, the "shared PK" idiom you show is for 1-to-0-or-1.
The straightforward way to have a true 1-to-1 correspondence is to have one table with Customer and Address as CKs (candidate keys). (Via UNIQUE NOT NULL and/or PRIMARY KEY.) You could offer the separate tables as views. Unfortunately typical DBMSs have restrictions on what you can do via the views, in particular re updating.
The relational way to have separate CUSTOMER and ADDRESS tables and a third table/association/relationship with Customer and Address columns as CKs plus FKs on Customer to and from CUSTOMER and on Address to and from ADDRESS (or equivalent constraint(s)). Unfortunately most DBMSs needlessly won't let you declare cycles in FKs and you cannot impose the constraints without triggers/complexity. (Ultimately, if you want to have proper integrity in a typical SQL database you need to use triggers and complex idioms.)
Entity-oriented design methods unfortunately artificially distinguish between entities, associations and properties. Here is an example where if you consider the simplest design to simply be the one table with PKs then you don't want to always have to have distinct tables for each entity. Or if you consider the simplest design to be the three tables (or even two) with the PKs and FKs (or some other constraint(s) for 1-to-1) then unfortunately typical DBMSs just don't declaratively/ergonomically support that particular design situation.
(Straightforward relational design is to have values (that are sometimes used as ids) 1-to-1 with application things but then just have whatever relevant application relationships/associations/relations and corresponding/representing tables/relations as needed to describe your application situations.)
It's possible in principle to implement a true 1-1 data structure in some DBMSs. It's very difficult to add data or modify data in such a structure using standard SQL however. Standard SQL only permits one table to be updated at a time and therefore as soon as you insert a row into one or other table the intended constraint is broken.
Here are two examples. First using Tutorial D. Note that the comma between the two INSERT statements ensures that the 1-1 constraint is never broken:
VAR CUSTOMER REAL RELATION {
id INTEGER} KEY{id};
VAR ADDRESS REAL RELATION {
id INTEGER} KEY{id};
CONSTRAINT one_to_one (CUSTOMER{id} = ADDRESS{id});
INSERT CUSTOMER RELATION {
TUPLE {id 1234}
},
INSERT ADDRESS RELATION {
TUPLE {id 1234}
};
Now the same thing in SQL.
CREATE TABLE CUSTOMER (
id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE ADDRESS (
id INTEGER NOT NULL PRIMARY KEY);
INSERT INTO CUSTOMER (id)
VALUES (1234);
INSERT INTO ADDRESS (id)
VALUES (1234);
ALTER TABLE CUSTOMER ADD CONSTRAINT one_to_one_1
FOREIGN KEY (id) REFERENCES ADDRESS (id);
ALTER TABLE ADDRESS ADD CONSTRAINT one_to_one_2
FOREIGN KEY (id) REFERENCES CUSTOMER (id);
The SQL version uses two foreign key constraints, which is the only kind of multi-table constraint supported by most SQL DBMSs. It requires two INSERT statements which means I could only insert a row before adding the constraints, not after.
A strict one-to-one constraint probably isn't very useful in practice but it's actually just a special case of something more important and interesting: join dependency. A join dependency is effectively an "at least one" constraint between tables rather than "exactly one". In the world outside databases it is common to encounter examples of business rules that ought to be implemented as join dependencies ("each customer must have AT LEAST ONE addresss", "each order must have AT LEAST ONE item in it"). In SQL DBMSs it's hard or impossible to implement join dependencies. The usual solution is simply to ignore such business rules thus weakening the data integrity value of the database.
Yes, what you say is true, the dependent side of a 1:1 relationship may not exist -- if only for the time it takes to create the dependent entity after creating the independent entity. In fact, all relationships may have a zero on one side or the other. You can even turn the relationship into a 1:m by placing the FK of the address in the Customer row and making the field not null. You can still have addresses that aren't referenced by any customer.
At first glance, a m:n may look like an exception. The intersection entry is generally defined so that neither FK can be null. But there can be customers and addresses both that have no entry referring to them. So this is really a 0..m:0..n relationship.
What of it? Everyone I've ever worked with has understood that "one" (as in 1:1) or "many" (as in 1:m or m:n) means "no more than this." There is no "exactly this, no more or less." For example, we can design a 1:3 relationship on paper. We cannot strictly enforce it in any database. We have to use triggers, stored procedures and/or scheduled tasks to seek out and call our attention to deviations. Execute a stored procedure weekly, for instance, that will seek and and flag or delete any such orphaned addresses.
Think of it like a "frictionless surface." It exists only on paper.
I see this question as a conceptual misunderstanding. Relations are between different things. Things with a "true 1-to-1 relation" are by definition aspects or attributes of the same thing, and belong in the same table. No, of course a person and and address are not the same, but if they are inseparable, and must always be inserted, deleted, or otherwise acted upon as a unit, then as data they are "the same thing". This is exactly what is described here.
Yes, and it's actually quite easy: just put both entities in the same table!
OTOH, if you need to keep them in separate tables for some reason, then you need a key in one table referencing1 a key in another, and vice-versa. This, of course, represents a "chicken and egg" problem2 which can be resolved by deferring the enforcement of FKs to the end of the transaction3. This works only on DBMSes that support deferred constraints (such as Oracle and PostgreSQL).
1 Via a foreign key.
2 Inserting a row in the first table is impossible because that would violate the referential integrity towards the second table, but inserting a row in the second table is impossible because that would violate the referential integrity towards the first table, etc... Ditto for deletion.
3 So you simply insert both rows, and then check both FKs.

Should one-to-one relationships ever be split into two tables?

I have one table [Users] and another table [Administrators] linked 1:0..1 . Is it best practice to merge these tables? I have read a lot of answers on SO stating splitting tables is only necessary for one-to-many relationships.
My reasoning for separating them is so I can reference administrators with AdministratorId rather than the general UserId. In other tables I have fields which should only ever contain an administrator so it acts as a referential check.
There is a rule of thumb that states a table either models an entity/class or the relationship between entities/classes but not both. However, it is only a rule of thumb, never say never!
SQL generally has a problem with dedicated 1:1 relationship tables because the only inter-table constraints commonly found are foreign keys. However, a FK does not require that a value exists in the referencing table. This makes the relationship 1:0..1 ("one-to-zero-or-one"), which is usually acceptable.
Strict 1:1 requires a workaround. Because SQL lacks multiple assignment, the workaround usually involves resorting to procedural code e.g. two deferrable 'bi-directional' FKs; triggers; forcing updates via CRUD stored procs; etc.
In contrast, modelling a 1:1 relationship in the same table is easy: declare both columns as NOT NULL!
I think the best option is to have two tables for the two different entities, Users and Administrators, possibly with the same Primary Key.
CREATE TABLE User
( UserId int
, ... other data --- data for all users
, PRIMARY KEY (UserId)
) ;
CREATE TABLE Administrator
( AdministratorId int
, ... other data --- data for administrators only
, PRIMARY KEY (AdministratorId)
, FOREIGN KEY AdministratorId
REFERENCES User(UserId)
) ;
This way, as you mention, other tables can reference the AdministratorId:
CREATE TABLE OtherTable
( OtherTableId int
, AdministratorId int
, ... other data
, ...
, FOREIGN KEY AdministratorId
REFERENCES Administrator(AdministratorId)
) ;
Benefits:
referential integrity is trivially implemented.
the relevant data (for Users and Admins) can be stored in the relevant tables so you have less columns in the tables and fewer NULL data.
any query that needs a JOIN to Administrator table will have to look up only a few rows, compared to the (possibly huge) number of rows of the User table. If you have only one table, you'll end up with code like:
WHERE User.admin = True
which may not be easily optimized.
There are several reasons why you might want to keep them separate. One is if the records in one table represent a subset of the records in the other. This patterns is called sub-classing, and is clearly the case in your situation.
This is wise even if the fields (the data) you need to store about admins are not different from the data you need to store about all users. Another reason is if the the usage patterns for a few columns is very different (greater frequency of access) from the usage patterns for the rest of the columns.
Let me stick to the title of your Question first.
Yes, One to one relationship can be split into different tables when you need the following:
Modularity
Security / Data Abstraction
I guess the modularity bit is quite clear.
The Security bit is obvious too since you will require access to both the tables to fetch the whole picture of data.
For Example, Let's suppose you have the below as the scenario:
Wherein, Customer has a dummy id for passport and Passport table doesn't have corresponding customer info.
Hence, the person with access to both tables can only map the passport id with the customer and see the whole picture.
It is common to have tables in a one-to-one relationship. First if they are separate entities that will need to be queried separately or if they are subclasses (as in your case) then the separate table makes sense. Also if the primary table is getting too large for the maximum record size it makes sense to have an additional table in a one-to-one relationship. Finally you have the case where the relationship is 1-1 now, but has the potential to be 1-many in the future such as when you only have one phone number now but wil probably need to store more later. In this case it will be less work to go ahead and make it a separate table.
The crtical piece to setting up a 1-1 relationship is to enforce it as 1-1. The easiest way to do this is to make the FK field also the PK field in the second table.

Why do I read so many negative opinions on using composite keys?

I was working on an Access database which loved auto-numbered identifiers. Every table used them except one, which used a key made up of the first name, last name and birthdate of a person. Anyways, people started running into a lot of problems with duplicates, as tables representing relationships could hold the same relationship twice or more. I decided to get around this by implementing composite keys for the relationship tables and I haven't had a problem with duplicates since.
So I was wondering what's the deal with the bad rep of composite keys in the Access world? I guess it's slightly more difficult to write a query, but at least you don't have to put in place tons of checks every time data is entered or even edited in the front end. Are they incredibly super inefficient or something?
A composite key works fine for a single table, but when you start to create relations between tables it can get a bit much.
Consider two tables Person and Event, and a many-to-many relations between them called Appointment.
If you have a composite key in the Person table made up of the first name, last name and birth date, and a compossite key in the Event table made up of place and name, you will get five fields in the Appointment table to identify the relation.
A condition to bind the relation will be quite long:
select Person,*, Event.*
from Person, Event, Appointment
where
Person.FirstName = Appointment.PersonFirstName and
Person.LastName = Appointment.PersonLastName and
Person.BirthDate = Appointment.PersonBirthDate and
Event.Place = Appointment.EventPlace and
Event.Name = Appointment.EventName`.
If you on the other hand have auto-numbered keys for the Person and Event tables, you only need two fields in the Appointment table to identify the relation, and the condition is a lot smaller:
select Person,*, Event.*
from Person, Event, Appointment
where
Person.Id = Appointment.PersonId and Event.Id = Appointment.EventId
If you only use pure self-written SQL to access your data, they are OK.
However, some ORMs, adapters etc. require having a single PK field to identify a record.
Also note that a composite primary key is almost invariably a natural key (there is hardly a point in creating a surrogate composite key, you can as well use a single-field one).
The most common usage of a composite primary key is a many-to-many link table.
When using the natural keys, you should ensure they are inherently unique and immutable, that is an entity is always identified by the same value of the key, once been reflected by the model, and only one entity can be identified by any value.
This it not so in your case.
First, a person can change their name and even the birthdate
Second, I can easily imagine two John Smiths born at the same day.
The former means that if a person changes their name, you will have to update it in each and every table that refers to persons; the latter means that the second John Smith will not be able to make it into your database.
For the case like yours, I would really consider adding a surrogate identifier to your model.
Unfortunately one reason for those negative opinions is probably ignorance. Too many people don't understand the concept of Candidate Keys properly. There are people who seem to think that every table needs only one key, that one key is sufficient for data integrity and that choosing that one key is all that matters.
I have often speculated that it would be a good thing to deprecate and phase out the use of the term "primary key" altogether. Doing that would focus database designers minds on the real issue: that a table should have as many keys as are necessary to ensure the correctness of the data and that some of those keys will probably be composite. Abolishing the primary key concept would do away with all those fatuous debates about what the primary key ought to be or not be.
If your RDBMS supports them and if you use them correctly (and consistently), unique keys on the composite PK should be sufficient to avoid duplicates. In SQL Server at least, you can also create FKs against a unique key instead of the PK, which can be useful.
The advantage of a single "id" column (or surrogate key) is that it can improve performance by making for a narrower key. Since this key may be carried to indexes on that table (as a pointer back to the physical row from the index row) and other tables as a FK column that can decrease space and improve performance. A lot of it depends on the specific architecture of your RDBMS though. I'm not familiar enough with Access to comment on that unfortunately.
As Quassnoi points out, some ORMs (and other third party applications, ETL solutions, etc.) don't have the capability to handle composite keys. Other than some ORMs though, most recent third party apps worth anything will support composite keys though. ORMs have been a little slower in adopting that in general though.
My personal preference for composite keys is that although a unique index can solve the problem of duplicates, I've yet to see a development shop that actually fully used them. Most developers get lazy about it. They throw on an auto-incrementing ID and move on. Then, six months down the road they pay me a lot of money to fix their duplicate data issues.
Another issue, is that auto-incrementing IDs aren't generally portable. Sure, you can move them around between systems, but since they have no actual basis in the real world it's impossible to determine one given everything else about an entity. This becomes a big deal in ETL.
PKs are a pretty important thing in the data modeling world and they generally deserve more thought then, "add an auto-incrementing ID" if you want your data to be consistent and clean.
Surrogate keys are also useful, but I prefer to use them when I have a known performance issue that I'm trying to deal with. Otherwise it's the classic problem of wasting time trying to solve a problem that you might not even have.
One last note... on cross-reference tables (or joining tables as some call them) it's a little silly (in my opinion) to add a surrogate key unless required by an ORM.
Composite Keys are not just composite primary keys, but composite foreign keys as well. What do I mean by that? I mean that each table that refers back to the original table needs a column for each column in the composite key.
Here's a simple example, using a generic student/class arrangement.
Person
FirstName
LastName
Address
Class
ClassName
InstructorFirstName
InstructorLastName
InstructorAddress
MeetingTime
StudentClass - a many to many join table
StudentFirstName
StudentLastName
StudentAddress
ClassName
InstructorFirstName
InstructorLastName
InstructorAddress
MeetingTime
You just went from having a 2-column many-to-many table using surrogate keys to having an 8-column many-to-many table using composite keys, because they have 3 and 5 column foreign keys. You can't really get rid of any of these fields, because then the records wouldn't be unique, since both students and instructors can have duplicate names. Heck, if you have two people from the same address with the same name, you're still in serious trouble.
Most of the answers given here don't seem to me to be given by people who work with Access on a regular basis, so I'll chime in from that perspective (though I'll be repeating what some of the others have said, just with some Access-specific comments).
I use surrogate a key only when there is no single-column candidate key. This means I have tables with surrogate PKs and with single-column natural PKs, but no composite keys (except in joins, where they are the composite of two FKs, surrogate or natural doesn't matter).
Jet/ACE clusters on the PK, and only on the PK. This has potential drawbacks and potential benefits (if you consider a random Autonumber as PK, for instance).
In my experience, the non-Null requirement for a composite PK makes most natural keys impossible without using potentially problematic default values. It likewise wrecks your unique index in Jet/ACE, so in an Access app (before 2010), you end up enforcing uniqueness in your application. Starting with A2010, table-level data macros (which work like triggers) can conceivably be used to move that logic into the database engine.
Composite keys can help you avoid joins, because they repeat data that with surrogate keys you'd have to get from the source table via a join. While joins can be expensive, it's mostly outer joins that are a performance drain, and it's only with non-required FKs that you'd get the full benefit of avoiding outer joins. But that much data repetition has always bothered me a lot, since it seems to go against everything we've ever been taught about normalization!
As I mentioned above, the only composite keys in my apps are in N:N join tables. I would never add a surrogate key to a join table except in the relatively rare case in which the join table is itself a parent to a related tables (e.g., Person/Company N:N record might have related JobTitles, i.e., multiple jobs within the same company). Rather than store the composite key in the child table, you'd store the surrogate key. I'd likely not make the surrogate key the PK, though -- I'd keep the composite PK on the pair of FK values. I would just add an Autonumber with a unique index for joining to the child table(s).
I'll add more as I think of it.
It complicates queries and maintenance. If you are really interested in this subject I'd recommend looking over the number of posts that already cover this. This will give you better info than any one response here.
https://stackoverflow.com/search?q=composite+primary+key
In the first place composite keys are bad for performance in joins. Further they are much worse for updating records as you have to update all the child records as well. Finally very few composite keys are actually really good keys. To be a good key it should be unique and not be subject to change. The example you gave as a composite key you used fails both tests. It is not unique (there are people with the same name born on the same day) and names change frequently causing much unnecessary updating of all the child tables.
As far as table with autogenrated keys casuing duplicates, that is mostly due to several factors:
the rest of the data in the table
can't be identified in any way as
unique
a design failure of forgetting to
create a unique index on the possible
composite key
Poor design of the user interface
which doesn't attempt to find
matching records or which allows data
entry when a pull down might be more
appropriate.
None of those are the fault of the surrogate key, they just indicate incompetent developers.
I think some coders see the complexity but want to avoid it, and most coders don't even think to look for the complexity at all.
Let's consider a common example of a table that had more than one candidate key: a Payroll table with columns employee_number, salary_amount, start_date and end_date.
The four candidate keys are as follows:
UNIQUE (employee_number, start_date); -- simple constraint
UNIQUE (employee_number, end_date); -- simple constraint
UNIQUE (employee_number, start_date, end_date); -- simple constraint
CHECK (
NOT EXISTS (
SELECT Calendar.day_date
FROM Calendar, Payroll AS P1
WHERE P1.start_date <= Calendar.day_date
AND Calendar.day_date < P1.end_date
GROUP
BY P1.employee_number, Calendar.day_date
)
); -- sequenced key i.e. no over-lapping periods for the same employee
Only one of those keys are required to be enforced i.e. the sequenced key. However, most coders wouldn't think to add such a key, let alone know how to code it in the first place. In fact, I would wager that most Access coders would add an incrementing autonumber column to the table, make the autonumber column the PRIMARY KEY, fail to add constraints for any of the candidate keys and will have convinced themselves that their table has a key!