Lets say I have 2 tables:
Species
SpeciesId
SpeciesName
Animal
AnimalId
SpeciesId - foreign key
If you give the end-user ability to change SpeciesName, that means they can affect the species of all animals that reference the changed record (at least from user standpoint). This may be a bit of an extreme example, but how are situations like this usually handled? Put the responsibility on the end-user to know what they are doing? Disallow name change if it has been used before?
We are discussing this situation at work and I want to get input from some others. One of the solutions that was brought up was to remove the foreign key (e.g. put a text field for species in the Animal table). This doesn't seem right to me, because at what point do you draw the line of using foreign keys? To me it seems like more of a training issue to make sure admins understand the impact of the changes they make. I know it's an open-ended question and it may vary per scenario, but I'm just trying to get some general opinions.
This is a design decision that you have to make. You need to determine what is more important from a business perspective. Do you value historical accuracy or efficiently updating the information?
In your example, I would put less emphasis on history for the following reasons.
Only the most recent convention is significant. Assume an animal moves from one genus to another, it really doesn't provide any value to know what the old and now invalid genus was.
All animals of the same species should have the same species ID. You get this for free with Foreign Keys. Assume a tiger was added prior to a species name change. Then a different tiger was added after the species name change. Both tigers still belong to the same species.
Querying the database by ID will be easier and more reliable than using a string let alone delving into the dirty business of string parsing. You don't need to worry about character encoding, capitalization, white space, punctuation, etc. Assume that you would like to retrieve all animals of one or more species.
Put the responsibility on the end-user to know what they are doing?
You need to decide what the end user is able to update. If your end user is a biologist that is well informed about the scientific names of the species, he should be able to update this information. Otherwise, maybe it's a good idea to prevent the user from modifying this column at all, or only if this particular species has any animal associated with it.
One of the solutions that was brought up was to remove the foreign key
Don't do that. You will lose the ability to join the information from these tables. Imagine your table Species has a column "Continent", indicating if the species is found on America, Africa, Europe, Asia, etc. If you use the foreign key, you can ask questions like "What are all the animals that belong to an american species?" This will be impossible if you remove the foreign key.
Related
As a hobby project, I've taken on the challenge of creating a database for storing the details of monsters from a certain popular monster-collecting RPG whose name rhymes with Blokémon.
The logical place to start of course is a table called Species, to hold the basic demographic details of each species. The trouble is, 20 years of exceptions and gimmicks has meant there's not actually a single demographic left that matches 1:1 to a species in all cases. Some examples:
Name: We call it Bulbasaur but Japan calls it Fushigidane (or フシギダネ if you prefer). Other languages have different names.
Category: (Bulbasaur is a "Seed" Pokémon for eg) This would be 1:1 but recently-added species Hoopa has to be awkward and have two. And there's still the language thing anyway.
Height/Weight/Stats: Most species just have one "forme", but quite a few now have multiple, and each has different stats and appearance. Many of these stats would live at the Forme level of the hierarchy, not the Species level.
The result of all this is all that remains is the concept of a species, and concept is difficult to store in a database. For example, Pikachu's a little yellow electric woodland mouse thing, and that's all it ever is so it graciously only has one set of demographics (its even called Pikachu in most languages). If every species were like Pikachu, this would be a very simple to design table. Shaymin, on the other hand? Well, its one species, but it has two formes - Sky Forme and Land Forme - each with different stats. The Sky Forme is a flying white dog. The Land Forme is a little green hedgehog.
Regardless, species is still a useful thing to have. It links formes together, and every species has a name even if that name differs between languages. You can count the number of species, or look at species that appear within a particular game. But the only field that can exist in such a table is an ID. It's the only thing we can consider fixed for every single species. I will probably also include a "Label" field for my own developer sanity, but it wouldn't be considered part of the dataset, just a helper for me personally.
Is this an acceptable case for a single-column ID table, or is there a better way to structure this?
Is this an acceptable case for a single-column ID table
Yes.
From a relational perspective: A table holds rows of values that are in a certain relation to each other, ie participate in a certain relationship, ie are associated in a certain way, ie satisfy a certain statement template aka predicate. Your predicate of interest is Species(ID) "ID is a species". So make that a table. You will have lots of other predicates like "ID is a species and ...". But as long as none of them has IDs in 1:1 correspondence with those in Species you can't use any of them instead of Species. (You might be able to express Species as, say, a union of projections of them, but that's a separate design issue.)
From an ERM perspective: There are some species. So there is a species entity type. Its table gets a surrogate key. You aren't interested in any attributes. So don't have any.
There's just nothing special about having a single-column table.
I wonder, is it bad or good idea to use auto increment primary key as business entity identifier such as Partner Id or Account Number?
Also, what pitfalls I can face if I'll choose that approach?
I don't think everyone shares the same opinion, but I do think it is bad practice. Passing ID's to the user as the 'key' is bad in my opinion, for a number of reasons:
ID's aren't natural to users. They are not talking about project '1474623', they are talking about project 'ABC'. They aren't talking about person '363528', they are talking about 'Patrick Hofman';
ID's are fragile. You can't really rely on them not changing. What if you choose to move to another database platform, or a new version of the current platform, and you want to move all data using 'insert' statements, it is possible to loose the ID fields.
In our products, we always use a 'natural key', next to the primary key, a key that is understood by humans.
If there is no human understandable natural key available, for example when it is a logging table, you can revert to a artificial key.
There are at least three desirable characteristics you should keep in mind when choosing or designing keys: Simplicity, Stability and Familiarity. In practice people often find it simpler to remember and work with words and letters rather than just numbers and that is why alphanumeric identifiers are generally more common than numeric-only identifiers (examples of alphanumeric identifiers: car licence plates, airline flight numbers, seat reservation numbers, state and country codes, postal codes, email addresses). There are studies and annecdotal evidence to support the idea that alphanumeric keys are more usable than numbers alone. Also, alphanumeric identifiers can often be shorter than numeric ones. On the other hand, sequential numeric-only identifiers are very common for some applications (e.g. invoice numbers, bank account numbers). So I suggest that you should be guided by your users' / business needs when determining these things.
Note that DBMS engine-level sequence generators often come with limitations that make them unsuitable for some applications. For example it may not be easy to update them or to use them in a distributed database architecture. Another common limitation is that only one "auto incrementing" column may be permitted per table, which precludes their use as a business key if you also want a surrogate key for the same table.
I want to know if I can use human readable primary keys for a relatively small number of database objects, which will describe large metropolitan areas.
For example, using "washington_dc" as the pk for the Washington, DC metro area, or "nyc" for the New York City one.
Tons of objects will be foreign keyed to these metro area objects, and I'd like to be able to tell where a person or business is located just by looking at their database record.
I'm just worried because my gut tells me this might be a serious crime against good practices.
So, am I "allowed" to do this kind of thing?
Thanks!
It all depends on the application - natural primary keys make a good deal of sense on the surface, since they are human readable and don't require any joins when displaying data to end users.
However, natural primary keys tend to be larger than INT (or even BIGINT) suragate primary keys and there are very few domains where there isn't some danger of having a natural primary key change. To take your example, a city changing its name is not a terribly uncommon occurrence. When a city's name changes you are then left with either an update that needs to touch every instance of city as a foreign key or with a primary key that no longer reflects reality ("The data shows Leningrad, but it really is St. Petersburg.")
So in sum, natural primary keys:
Take up more disc space (most of the time)
Are more susceptible to change (in the majority of cases)
Are more human readable (as long as they don't change)
Whether #1 and #2 are sufficiently counteracted by #3 depends on what you are building and what its use is.
I think that this question
What are the design criteria for primary keys?
gives a really good overview of the tradeoffs you might be making. I think the answer given is the correct one, but its brevity belies some significant thinking you actually have to do to work out what's right for you.
(From that answer)
The criteria for consideration of a primary key are:
Uniqueness
Irreducibility (no subset of the key uniquely identifies a row in the table)
Simplicity (so that relational representation & manipulation can be simpler)
Stability (should not be altered frequently)
Familiarity (meaningful to the user)
For what it's worth, the small number of times I've had problems with scaling by choosing strings as the primary key is about the same as the number of time's I've had problems with redundant data using an autoincrement key. The problems that arise with autoincrement keys are worse, in my opinion, because you don't usually see them as soon.
A primary key must be unique and immutable, a human-readable string can be used as a PK so long as it meets both of those requirements.
In the example you've given, it sounds fine, given that cities don't change their names (and in the rare event they do then you can change the PK value with enough effort).
One of the main reasons you'd use numeric PKs instead of strings is performance (the other being to take advantage of automatically-incrementing IDs, see IDENTITY). If you anticipate more than a hundred queries per second on your textual PK then I would move to use int or bigint as a PK type. When you reach that level of database size and complexity you tend to stop using SSMS to edit table data directly and use your own tools, which would presumably perform a JOIN so you'd get the city name in the same resultset as the city's numeric PK.
you are allowed.
it is generally not the best practice.
numeric - auto incrementing keys are preferred. they are easily maintained and allow for coding of input forms and other interfaces where the user does not have to think up a new string as a key...
imagine: should it be washington, or washington_dc or dc or washingtondc.. etc.
I understand the concept of database normalization, but always have a hard time explaining it in plain English - especially for a job interview. I have read the wikipedia post, but still find it hard to explain the concept to non-developers. "Design a database in a way not to get duplicated data" is the first thing that comes to mind.
Does anyone has a nice way to explain the concept of database normalization in plain English? And what are some nice examples to show the differences between first, second and third normal forms?
Say you go to a job interview and the person asks: Explain the concept of normalization and how would go about designing a normalized database.
What key points are the interviewers looking for?
Well, if I had to explain it to my wife it would have been something like that:
The main idea is to avoid duplication of large data.
Let's take a look at a list of people and the country they came from. Instead of holding the name of the country which can be as long as "Bosnia & Herzegovina" for every person, we simply hold a number that references a table of countries. So instead of holding 100 "Bosnia & Herzegovina"s, we hold 100 #45. Now in the future, as often happens with Balkan countries, they split to two countries: Bosnia and Herzegovina, I will have to change it only in one place. well, sort of.
Now, to explain 2NF, I would have changed the example, and let's assume that we hold the list of countries every person visited.
Instead of holding a table like:
Person CountryVisited AnotherInformation D.O.B.
Faruz USA Blah Blah 1/1/2000
Faruz Canada Blah Blah 1/1/2000
I would have created three tables, one table with the list of countries, one table with the list of persons and another table to connect them both. That gives me the most freedom I can get changing person's information or country information. This enables me to "remove duplicate rows" as normalization expects.
One-to-many relationships should be represented as two separate tables connected by a foreign key. If you try to shove a logical one-to-many relationship into a single table, then you are violating normalization which leads to dangerous problems.
Say you have a database of your friends and their cats. Since a person may have more than one cat, we have a one-to-many relationship between persons and cats. This calls for two tables:
Friends
Id | Name | Address
-------------------------
1 | John | The Road 1
2 | Bob | The Belltower
Cats
Id | Name | OwnerId
---------------------
1 | Kitty | 1
2 | Edgar | 2
3 | Howard | 2
(Cats.OwnerId is a foreign key to Friends.Id)
The above design is fully normalized and conforms to all known normalization levels.
But say I had tried to represent the above information in a single table like this:
Friends and cats
Id | Name | Address | CatName
-----------------------------------
1 | John | The Road 1 | Kitty
2 | Bob | The Belltower | Edgar
3 | Bob | The Belltower | Howard
(This is the kind of design I might have made if I was used to Excel-sheets but not relational databases.)
A single-table approach forces me to repeat some information if I want the data to be consistent. The problem with this design is that some facts, like the information that Bob's address is "The belltower" is repeated twice, which is redundant, and makes it difficult to query and change data and (the worst) possible to introduce logical inconsistencies.
Eg. if Bob moves I have to make sure I change the address in both rows. If Bob gets another cat, I have to be sure to repeat the name and address exactly as typed in the other two rows. E.g. if I make a typo in Bob's address in one of the rows, suddenly the database has inconsistent information about where Bob lives. The un-normalized database cannot prevent the introduction of inconsistent and self-contradictory data, and hence the database is not reliable. This is clearly not acceptable.
Normalization cannot prevent you from entering wrong data. What normalization prevents is the possibility of inconsistent data.
It is important to note that normalization depends on business decisions. If you have a customer database, and you decide to only record a single address per customer, then the table design (#CustomerID, CustomerName, CustomerAddress) is fine. If however you decide that you allow each customer to register more than one address, then the same table design is not normalized, because you now have a one-to-many relationship between customer and address. Therefore you cannot just look at a database to determine if it is normalized, you have to understand the business model behind the database.
This is what I ask interviewees:
Why don't we use a single table for an application instead of using multiple tables ?
The answer is ofcourse normalization. As already said, its to avoid redundancy and there by update anomalies.
This is not a thorough explanation, but one goal of normalization is to allow for growth without awkwardness.
For example, if you've got a user table, and every user is going to have one and only one phone number, it's fine to have a phonenumber column in that table.
However, if each user is going to have a variable number of phone numbers, it would be awkward to have columns like phonenumber1, phonenumber2, etc. This is for two reasons:
If your columns go up to phonenumber3 and someone needs to add a fourth number, you have to add a column to the table.
For all the users with fewer than 3 phone numbers, there are empty columns on their rows.
Instead, you'd want to have a phonenumber table, where each row contains a phone number and a foreign key reference to which row in the user table it belongs to. No blank columns are needed, and each user can have as few or many phone numbers as necessary.
One side point to note about normalization: A fully normalized database is space efficient, but is not necessarily the most time efficient arrangement of data depending on use patterns.
Skipping around to multiple tables to look up all the pieces of info from their denormalized locations takes time. In high load situations (millions of rows per second flying around, thousands of concurrent clients, like say credit card transaction processing) where time is more valuable than storage space, appropriately denormalized tables can give better response times than fully normalized tables.
For more info on this, look for SQL books written by Ken Henderson.
I would say that normalization is like keeping notes to do things efficiently, so to speak:
If you had a note that said you had to
go shopping for ice cream without
normalization, you would then have
another note, saying you have to go
shopping for ice cream, just one in
each pocket.
Now, In real life, you would never do
this, so why do it in a database?
For the designing and implementing part, thats when you can move back to "the lingo" and keep it away from layman terms, but I suppose you could simplify. You would say what you needed to at first, and then when normalization comes into it, you say you'll make sure of the following:
There must be no repeating groups of information within a table
No table should contain data that is not functionally dependent on that tables primary key
For 3NF I like Bill Kent's take on it: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key.
I think it may be more impressive if you speak of denormalization as well, and the fact that you cannot always have the best structure AND be in normal forms.
Normalization is a set of rules that used to design tables that connected through relationships.
It helps in avoiding repetitive entries, reducing required storage space, preventing the need to restructure existing tables to accommodate new data, increasing speed of queries.
First Normal Form: Data should be broken up in the smallest units. Tables should not contain repetitive groups of columns. Each row is identified with one or more primary key.
For example, There is a column named 'Name' in a 'Custom' table, it should be broken to 'First Name' and 'Last Name'. Also, 'Custom' should have a column named 'CustiomID' to identify a particular custom.
Second Normal Form: Each non-key column should be directly related to the entire primary key.
For example, if a 'Custom' table has a column named 'City', the city should has a separate table with primary key and city name defined, in the 'Custom' table, replace the 'City' column with 'CityID' and make 'CityID' the foreign key in the tale.
Third normal form: Each non-key column should not depend on other non-key columns.
For example, In an order table, the column 'Total' is dependent on 'Unit price' and 'quantity', so the 'Total' column should be removed.
I teach normalization in my Access courses and break it down a few ways.
After discussing the precursors to storyboarding or planning out the database, I then delve into normalization. I explain the rules like this:
Each field should contain the smallest meaningful value:
I write a name field on the board and then place a first name and last name in it like Bill Lumbergh. We then query the students and ask them what we will have problems with, when the first name and last name are all in one field. I use my name as an example, which is Jim Richards. If the students do not lead me down the road, then I yank their hand and take them with me. :) I tell them that my name is a tough name for some, because I have what some people would consider 2 first names and some people call me Richard. If you were trying to search for my last name then it is going to be harder for a normal person (without wildcards), because my last name is buried at the end of the field. I also tell them that they will have problems with easily sorting the field by last name, because again my last name is buried at the end.
I then let them know that meaningful is based upon the audience who is going to be using the database as well. We, at our job will not need a separate field for apartment or suite number if we are storing people's addresses, but shipping companies like UPS or FEDEX might need it separated out to easily pull up the apartment or suite of where they need to go when they are on the road and running from delivery to delivery. So it is not meaningful to us, but it is definitely meaningful to them.
Avoiding Blanks:
I use an analogy to explain to them why they should avoid blanks. I tell them that Access and most databases do not store blanks like Excel does. Excel does not care if you have nothing typed out in the cell and will not increase the file size, but Access will reserve that space until that point in time that you will actually use the field. So even if it is blank, then it will still be using up space and explain to them that it also slows their searches down as well.
The analogy I use is empty shoe boxes in the closet. If you have shoe boxes in the closet and you are looking for a pair of shoes, you will need to open up and look in each of the boxes for a pair of shoes. If there are empty shoe boxes, then you are just wasting space in the closet and also wasting time when you need to look through them for that certain pair of shoes.
Avoiding redundancy in data:
I show them a table that has lots of repeated values for customer information and then tell them that we want to avoid duplicates, because I have sausage fingers and will mistype in values if I have to type in the same thing over and over again. This “fat-fingering” of data will lead to my queries not finding the correct data. We instead, will break the data out into a separate table and create a relationship using a primary and foreign key field. This way we are saving space because we are not typing the customer's name, address, etc multiple times and instead are just using the customer's ID number in a field for the customer. We then will discuss drop-down lists/combo boxes/lookup lists or whatever else Microsoft wants to name them later on. :) You as a user will not want to look up and type out the customer's number each time in that customer field, so we will setup a drop-down list that will give you a list of customer, where you can select their name and it will fill in the customer’s ID for you. This will be a 1-to-many relationship, whereas 1 customer will have many different orders.
Avoiding repeated groups of fields:
I demonstrate this when talking about many-to-many relationships. First, I draw 2 tables, 1 that will hold employee information and 1 that will hold project information. The tables are laid similar to this.
(Table1)
tblEmployees
* EmployeeID
First
Last
(Other Fields)….
Project1
Project2
Project3
Etc.
**********************************
(Table2)
tblProjects
* ProjectNum
ProjectName
StartDate
EndDate
…..
I explain to them that this would not be a good way of establishing a relationship between an employee and all of the projects that they work on. First, if we have a new employee, then they will not have any projects, so we will be wasting all of those fields, second if an employee has been here a long time then they might have worked on 300 projects, so we would have to include 300 project fields. Those people that are new and only have 1 project will have 299 wasted project fields. This design is also flawed because I will have to search in each of the project fields to find all of the people that have worked on a certain project, because that project number could be in any of the project fields.
I covered a fair amount of the basic concepts. Let me know if you have other questions or need help with clarfication/ breaking it down in plain English. The wiki page did not read as plain English and might be daunting for some.
I've read the wiki links on normalization many times but I have found a better overview of normalization from this article. It is a simple easy to understand explanation of normalization up to fourth normal form. Give it a read!
Preview:
What is Normalization?
Normalization is the process of
efficiently organizing data in a
database. There are two goals of the
normalization process: eliminating
redundant data (for example, storing
the same data in more than one table)
and ensuring data dependencies make
sense (only storing related data in a
table). Both of these are worthy goals
as they reduce the amount of space a
database consumes and ensure that data
is logically stored.
http://databases.about.com/od/specificproducts/a/normalization.htm
Database normalization is a formal process of designing your database to eliminate redundant data. The design consists of:
planning what information the database will store
outlining what information users will request from it
documenting the assumptions for review
Use a data-dictionary or some other metadata representation to verify the design.
The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item, such as a user profile. Don't worry about normalizing data in table that will have records inserted but not updated, such as history logs or financial transactions.
References
When not to Normalize your SQL Database
Database Design Basics
+1 for the analogy of talking to your wife. I find talking to anyone without a tech mind needs some ease into this type of conversation.
but...
To add to this conversation, there is the other side of the coin (which can be important when in an interview).
When normalizing, you have to watch how the databases are indexed and how the queries are written.
When in a truly normalized database, I have found that in situations it's been easier to write queries that are slow because of bad join operations, bad indexing on the tables, and plain bad design on the tables themselves.
Bluntly, it's easier to write bad queries in high level normalized tables.
I think for every application there is a middle ground. At some point you want the ease of getting everything out a few tables, without having to join to a ton of tables to get one data set.
I'm just wondering what the optimal solution is here.
Say I have a normalized database. The primary key of the whole system is a varchar. What I'm wondering is should I relate this varchar to an int for normalization or leave it? It's simpler to leave as a varchar, but it might be more optimal
For instance I can have
People
======================
name varchar(10)
DoB DateTime
Height int
Phone_Number
======================
name varchar(10)
number varchar(15)
Or I could have
People
======================
id int Identity
name varchar(10)
DoB DateTime
Height int
Phone_Number
======================
id int
number varchar(15)
Add several other one-to-many relationships of course.
What do you all think? Which is better and why?
I believe that the majority of people who have developed any significant sized real world database applications will tell you that surrogate keys are the only realistic solution.
I know the academic community will disagree but that is the difference between theoretical purity and practicality.
Any reasonable sized query that has to do joins between tables that use non-surrogate keys where some tables have composite primary keys quickly becomes unmaintainable.
Can you really use names as primary keys? Isn't there a high risk of several people with the same name?
If you really are so lucky that your name attribute can be used as primary key, then - by all means - use that. Often, though, you will have to make something up, like a customer_id, etc.
And finally: "NAME" is a reserved word in at least one DBMS, so consider using something else, e.g. fullname.
Using any kind of non-synthetic data (i.e. anything from the user, as opposed to generated by the application) as a PK is problematic; you have to worry about culture/localization differences, case sensitivity (and other issues depending on DB collation), can result in data problems if/when that user-entered data ever changes, etc.
Using non-user-generated data (Sequential GUIDs (or non-sequential if your DB doesn't support them or you don't care about page splits) or identity ints (if you don't need GUIDs)) is much easier and much safer.
Regarding duplicate data: I don't see how using non-synthetic keys protects you from that. You still have issues where the user enters "Bob Smith" instead of "Bob K. Smith" or "Smith, Bob" or "bob smith" etc. Duplication management is necessary (and pretty much identical) regardless of whether your key is synthetic or non-synthetic, and non-synthetic keys have a host of other potential issues that synthetic keys neatly avoid.
Many projects don't need to worry about that (tightly constrained collation choices avoid many of them, for example) but in general I prefer synthetic keys. This is not to say you can't be successful with organic keys, clearly you can, but for many projects they're not the better choice.
I think if your VARCHAR was larger you would notice you're duplicating quite a bit of data throughout the database. Whereas if you went with a numeric ID column, you're not duplicating nearly the same amount of data when adding foreign key columns to other tables.
Moreover, textual data is a royal pain in terms of comparisons, your life is much easier when you're doing WHERE id = user_id versus WHERE name LIKE inputname (or something similar).
If the "name" field really is appropriate as a primary key, then do it. The database will not get more normalized by creating a surrogate key in that case. You will get some duplicate strings for foreign keys, but that is not a normalization issue, since the FK constraint guarantrees integrity on strings just as it would on surrogate keys.
However you are not explaining what the "name" is. In practice it is very seldom that a string is appropriate as a primary key. If it is the name of a person, it wont work as a PK, since more than one person can have the same name, people can change names and so on.
One thing that others don't seem to have mentioned is that joins on int fields tend to perform better than joins on varchar fields.
And I definitely would always use a surrogate key over using names (of people or businesses) because they are never unique over time. In our database, for instance, we have 164 names with over 100 instances of the same name. This clearly shows the dangers of considering using name as a key field.
The original question is not one of normalization. If you have a normalized database, as you stated, then you do not need to change it for normalization reasons.
There are really two issues in your question. The first is whether ints or varchars a preferable for use as primary keys and foreign keys. The second is whether you can use the natural keys given in the problem definition, or whether you should generate a synthetic key (surrogate key) to take the place of the natural key.
ints are a little more concise than varchars, and a little more efficient for such things as index processing. But the difference is not overwhelming. You should probably not make your decision on this basis alone.
The question of whether the natural key provided really works as a natural key or not is much more significant. The problem of duplicates in a "name" column is not the only problem. There is also the problem of what happens when a person changes her name. This problem probably doesn't surface in the example you've given, but it does surface in lots of other database applications. An example would be the transcript over four years of all the courses taken by a student. A woman might get married and change her name in the course of four years, and now you're stuck.
You either have to leave the name unchanged, in which case it no longer agrees with the real world, or update it retroactively in all the courses the person took, which makes the database disagree with the printed rosters made at the time.
If you do decide on a synthetic key, you now have to decide whether or not the application is going to reveal the value of the synthetic key to the user community. That's another whole can of worms, and beyond the scope of this discussion.