I got a Person table, each Person can visit several countries. The countries visited by each Person is stored in table CountryVisit
Person:
PersonId,
Name
CountryVisit:
CountryVisitId (primary key)
PersonId (foreign key to 'Person.PersonId')
CountryName
VisitDate
For the CountryVisit Table, my primary key is CountryVisitId which is an identity column. This design will result in that a Person can have only 1 CountryVisit but the CountryVisitId can be 40 for example. Is it a better practice to create another surrogate key column to act as an identity column while the CountryVisitId be a natural key that is unique for each PersonId ?
It is pretty good. I would suggest that you have a separate table for countries, with one row per country. Then the CountryVisits table would have:
CountryVisitId PrimaryKey,
PersonId ForeignKey,
CountryId ForeignKey,
VisitDate
This will ensure that the country name is always spelled correctly and consistently. If you want a list of countries to get started, check out this Wikipedia page. Also note that your definition of country may be different from the standard list of countries (there are actually several out there), so you should use your own auto-incremented primary key, rather than using the country code.
And, you should relax the requirement and remove the unique or primary key on PersonId, CountryId, unless you want to enforce only one visit per country.
Related
I've heard that having an autogenerated primary key is a convention. However, I'm trying to understand its benefits in the following scenario:
CREATE TABLE countries
(
countryID int(11) PRIMARY KEY AUTO_INCREMENT,
countryName varchar(128) NOT NULL
);
CREATE TABLE students
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin int(11) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries (countryID)
);
INSERT INTO countries (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
If I want to insert something into the students table, I need to lookup the countryIDs in the countries table:
INSERT INTO students (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', (SELECT countryID FROM countries WHERE countryName = 'Germany')),
('Erik Jakobsson', (SELECT countryID FROM countries WHERE countryName = 'Sweden')),
('Manuel Verdi', (SELECT countryID FROM countries WHERE countryName = 'Italy')),
('Min Lin', (SELECT countryID FROM countries WHERE countryName = 'China'));
In a different scenario, as I know that the countryNames in the countries table are unique and not null, I could to the following:
CREATE TABLE countries2
(
countryName varchar(128) PRIMARY KEY
);
CREATE TABLE students2
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin varchar(128) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries2 (countryName)
);
INSERT INTO countries2 (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
Now, inserting data into the students2 table is simpler:
INSERT INTO students2 (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', 'Germany'),
('Erik Jakobsson', 'Sweden'),
('Manuel Verdi', 'Italy'),
('Min Lin', 'China');
So why should the first option be the better one, given that countryNames are unique and are never going to change?
There are two apects involved here:
natural keys vs. surrogate keys
autoincremented values
You are wondering why to have to deal with some arbitrary number for a country, when a country can be uniquely identified by its name. Well, imagine you use the country names in several tables to relate rows to each other. Then at some point you are told that you misspelled a country. You want to correct this, but have to do this in every table the country occurs in. In big databases you usually don't have cascading updates in order to avoid updates that unexpectedly take hours instead of mere minutes or seconds. So you must do this manually, but the foreign key constraints get in your way. You cannot change the parent table's key, because there are child tables using this, and you cannot change the key in the child tables first, because that key has to exist in the parent table. You'll have to work with a new row in the parent table and start from there. Quite some task. And even if you have no spelling issue, at some point someone might say "we need the official country names; you have China, but it must be the People's Republic of China instead" and again you must look up and change that contry in all those tables. And what about partial backups? A table gets totally messed up due to some programming error and must be replaced by last week's backup, because this is the best you have. But suddenly some keys don't match any more. You never want a table's key to change.
You say "country names are unique and are never going to change". Think again :-)
It is easier to have your database use a technical arbitrary ID instead. Then the country name only exists in the country table. And if that name must get changed, you change it just in that one place, and all relations stay intact. This, however, doesn't mean that natural keys are worse than technical IDs. They are not. But it's more difficult with them to set up a database correctly. In case of countries a good natural key would be a country ISO code, because this uniquely identifies a country and doesn't change. This would be my choice here.
With students it's the same. Students usually have a student number or student ID in real world, so you can simply use this number to uniquely identifiy a student in the database. But then, how do we get these unique student IDs? At a large university, two secretaries may want to enter new students at the same time. They ask the system what the last student's ID was. It was #11223, so they both want to issue #11224, which causes a conflict of course, because only one student can be given that number. In order to avoid this, DBMS offer sequences of which numbers are taken. Thus one of the secretaries pulls #11224 and the other #11225. Auto-incremented IDs work this way. Both secretaries enter their new student, the rows get inserted into the student table and result in the two different IDs that get reported back to the secretaries. This makes sequences and auto incrementing IDs a great and safe tool to work with.
Convention can be a useful guide. It isn't necessarily the best option in all situations.
There are usually tradeoffs involved, like space, convenience, etc.
While you showed one method of resolving / inserting the proper country key value, there's a slightly less wordy option supported by standard SQL (and many databases).
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
SELECT *
FROM (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
) AS x
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
and a little less wordy again:
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
Here's a test case with MariaDB 10.5:
Working test case (updated)
On a database I have the following:
USERS
Id (PK)
Name
DOCTORS
Id (PK)
UserId (FK)
CurriculumVitae
Birthdate
...
All doctors I also Users of the application but have extra columns and I am displaying all doctors on a page.
Does it make sense to have Id and UserId on DOCTORS table?
Or should I make UserId both the PK and FK of table DOCTORS:
DOCTORS
UserId (PK, FK)
CurriculumVitae
Birthdate
...
In your opinion when should I go for this option?
Yes, you can make UserId (as you have defined it) both the primary key and foreign key. The primary key specifies that it is unique within the table. The foreign key maps it back to the Users table.
This is an example of a subsetting relationship.
I would actually call the columns Users.UserId and Doctors.DoctorId. I think that is less confusing nomenclature that captures the most important aspect of the two columns. However, the actual names are not important for your question (and yours are reasonable).
If you did not make it a PK also then you could have duplicates which is probably not what you want.
Lets say I have a database with 4 types of things in it: Neighborhood, City, Houses, and Comment, like so
City:
name
ID
Neighborhood:
name
ID
CityID (foreign key to City:ID)
House:
name
ID
NeighborhoodID (foreign key to Neighborhood:ID)
CityID (foreign key to City:ID)
Comment:
ID
Text
???? (key to the subject of the comment)
I want users to be able to comment on a City, a Neighborhood or a House. How do I express this relationship in SQL?
One idea I had was to create 3 one to many relationship tables:
CommentToCity:
commentID
cityID
then when fetching the list of Cities, I could do a join on this table as well to get the related comments. I would then create a similar situation for House and Neighborhood.
Another idea would be to have globally unique identifiers in City, House and Neighborhood, and then have that global ID be the foreign key in the comment. Then when fetching the City it would do a join on comments looking for that global ID.
Are either of these a good way? Is there a better way?
Add a commentID to City, Neighborhood And House. (foreign key to comment:Id)
This way you just create an empty comment and add its Id if there is no comment on something.
I know that a primary key must be unique, but is it okay for a primary key to be equal to a different column in the same table by coincidence?
For instance, I have 2 tables. One table is called person that holds information about a person (ID, email, telephone, address, name). The other table is staff (ID, pID(person ID), salary, position).
In staff the ID column is the primary key and is used to uniquely identify a staff member. The number is from 1 - 100. However, the pID (person ID) may be equal to the ID. For instance the staff ID may be 1 and the pID that it references to may be equal to 1.
Is that okay?
The job of the primary key is to uniquely and reliably identify each row - therefore, it must be unique and NOT NULL - anything else is irrelevant.
If you just happen to have a second column with the exact same values - I'd be wondering why that is the case - but that doesn't in any way affect the primary key negatively.
Primary key of a table must be unique and not null. There are no restrictions on uniquity between tables. It's 100% up to you.
Yes. There's no checking of relationships between different columns in a table.
The restriction you're worried about doesn't even make sense. Suppose you had a table for persons with columns ID, name, and year_of_birth. It wouldn't allow someone who was born in 1975 to have ID = 1975.
Trying to design part of a database to hold addresses, companies and contacts. I had one design of it in which I've now got the job of 'cleaning' it due to poor design.
Bought a copy of Joe Celko's SQL Programmer Style for reference as I'm coming from a programming angle so I ended up with...
Addresses
street_1_adr varchar(80) primary key
street_2_adr varchar(80)
street_3_adr varchar(80)
zip_code varchar(10) foreign key/primary key > Regions.zip_code
With a check to ensure all addresses are unique to prevent duplicates.
Regions
city varchar(80)
region varchar(80)
zip_code varchar(10) primary key
country_nbr integer foreign key/primary key > Countries.country_nbr
With a check to ensure all regions are unique to prevent duplicates.
Countries
country_nbr integer primary key
country_nm varchar(80)
country_code char(3)
With a check to ensure that only one record exists for all the information.
Companies
company_nm varchar(80) primary key
street_1_adr varchar(80) foreign key > Addresses.street_1_adr
zip_code varchar(10) foreign key > Addresses.zip_code
Extra information
With a check to ensure that only one company with that name can exist at the address specified
Contacts
company_nm varchar(80) primary key/foreign key > Companies.company_nm
first_nm varchar(80) primary key
last_nm varchar(80) primary key
Extra information
But this means that if I want to hook, as an example, an order onto a contact I need to do it with three fields.
Does this look right or have I completetly missed the point?
Firstly, I recommend using integer values for your primary keys
(if using mysql auto_increment is a handy feature, too)
When using your PK (primary key) as a FK (foreign key) in an different table, use the same datatype and don't save names.
You seem to save the company_name in "Contacts" even though you could simply save the ID of the company and get the name via a join-select.
IN your case it is OK, since the name is the primary key (varchar), but what happens when you get the same company name twice (eg Mc Donalds has more than one location)
ERP systems deploy those kind of structures mostly as or near as:
company (id and name)
site (id, name, FK company, additional information like address)
address (mostly referenced directly in site and sometime part of site)
region + country (all of them are "basic" data and referenced by ID in address table)
company table mostly only saves the ID and Name of an company.
site table (with foreign key relation to company) gives the "company" its adresses, legal information, etc.
A couple of thoughts:
First of all, a zip code can represent multiple cities/towns in the same state. Also, one city can have multiple zip codes.
Usually you to not find an address table separate from the entity. In other words, your company table should carry the full address.
The primary keys for the tables are usually unique identifiers or auto-increment numbers separate from the actual names. That way, if a company or contact changes it's name, or a typo was entered and corrected, you do not need to cascade the change to other tables.
You may want to future proof your design by allowing for many addresses and contacts to be added to a company. What you would do is create a many to many relationship by using a junction table (http://en.wikipedia.org/wiki/Junction_table)
Company
--------------
CompanyID (PK)
...
Address
--------------
AddressID (PK)
...
CompanyAddress
--------------
CompanyID (PK)
AddressID (PK)
The CompanyAddress table will allow you to have multiple addresses for each company. You can also do the same for contacts, depending if the contact is associated with the company or the address. Below is another link that talks about how to create the many to many relationship.
http://www.tomjewett.com/dbdesign/dbdesign.php?page=manymany.php