SQL - Insert valid data based on predefined values - sql

I have two tables city and suburb. Postgresql code:
CREATE TABLE city
(
id uuid PRIMARY KEY,
name character varying NOT NULL,
CONSTRAINT city_id_key UNIQUE (id),
CONSTRAINT city_name_key UNIQUE (name)
)
CREATE TABLE suburb
(
id uuid PRIMARY KEY,
city_id uuid NOT NULL,
name character varying NOT NULL,
CONSTRAINT fk_suburb_city FOREIGN KEY (city_id) REFERENCES city (id),
CONSTRAINT suburb_id_key UNIQUE (id),
CONSTRAINT suburb_name_key UNIQUE (name)
)
I want to create a table called address for storing city + suburb pairs. Here is the DDL for address table:
CREATE TABLE address
(
id uuid NOT NULL,
city_name character varying NOT NULL,
suburb_name character varying NOT NULL
)
I want to make sure that only redundant copies of information is inserted into address. Here is an example:
I want to allow inserting into address all the city_name suburb_name pairs:
SELECT c.name AS city_name, s.name AS suburb_name
FROM city c, suburb s
WHERE c.id = s.city_id
Result:
A - B
A - C
X - Y
For the data above I want to allow all the pairs:
A - B
A - C
X - Y
But if someone wants to insert A - Y pair into address, I want the DBMS to raise an error/exception.
Questions:
Does it make sense to check a constraint like this?
If it is a valid idea, what is the best solution for this? Trigger, stored procedure, some kind of constraint?
I prefer DBSM independent solutions. I'm more interested in the basic idea of the suggested solution not in a postgresql specific solution.
Reflecting to #Yuri G's answer: I don't want any join when I read form address. I want to store real values in it not ids. Slow insert into address is not problem. Fast read is important for me. Change in city or suburb table is not a problem after insertion in address. So no need for update in address table. I just want to make sure that the data I insert into address is a valid city - suburb pair (according to city and suburb tables).
My plan is to upload city and suburb tables with lots of data and use them for validating insertions in address table. I don't want to allow my users to insert into address for example: "New York - Fatima bint Mubarak St" because Fatima bint Mubarak St. is in Abu Dhabi.
Thank you for the answers.

Let the software on the client side operate the record identifiers for City & Suburb, not their values.
And do so on the server side too:
CREATE TABLE address
(
id uuid NOT NULL,
city_id character varying NOT NULL,
suburb_id character varying NOT NULL,
CONSTRAINT fk_city FOREIGN KEY (city_id) REFERENCES city (id),
CONSTRAINT fk_suburb FOREIGN KEY (suburb_id) REFERENCES suburb (id),
)
Of course, you'll need 2 lookup operations prior to insert, basically two selects by name of the city/suburb, to retrieve these IDs (or deny operation).
Though this way you'd be keeping data integrity most simple & efficient way, I believe.

Related

Benefits of using an autogenerated primary key instead of a constant unique name

I've heard that having an autogenerated primary key is a convention. However, I'm trying to understand its benefits in the following scenario:
CREATE TABLE countries
(
countryID int(11) PRIMARY KEY AUTO_INCREMENT,
countryName varchar(128) NOT NULL
);
CREATE TABLE students
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin int(11) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries (countryID)
);
INSERT INTO countries (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
If I want to insert something into the students table, I need to lookup the countryIDs in the countries table:
INSERT INTO students (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', (SELECT countryID FROM countries WHERE countryName = 'Germany')),
('Erik Jakobsson', (SELECT countryID FROM countries WHERE countryName = 'Sweden')),
('Manuel Verdi', (SELECT countryID FROM countries WHERE countryName = 'Italy')),
('Min Lin', (SELECT countryID FROM countries WHERE countryName = 'China'));
In a different scenario, as I know that the countryNames in the countries table are unique and not null, I could to the following:
CREATE TABLE countries2
(
countryName varchar(128) PRIMARY KEY
);
CREATE TABLE students2
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin varchar(128) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries2 (countryName)
);
INSERT INTO countries2 (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
Now, inserting data into the students2 table is simpler:
INSERT INTO students2 (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', 'Germany'),
('Erik Jakobsson', 'Sweden'),
('Manuel Verdi', 'Italy'),
('Min Lin', 'China');
So why should the first option be the better one, given that countryNames are unique and are never going to change?
There are two apects involved here:
natural keys vs. surrogate keys
autoincremented values
You are wondering why to have to deal with some arbitrary number for a country, when a country can be uniquely identified by its name. Well, imagine you use the country names in several tables to relate rows to each other. Then at some point you are told that you misspelled a country. You want to correct this, but have to do this in every table the country occurs in. In big databases you usually don't have cascading updates in order to avoid updates that unexpectedly take hours instead of mere minutes or seconds. So you must do this manually, but the foreign key constraints get in your way. You cannot change the parent table's key, because there are child tables using this, and you cannot change the key in the child tables first, because that key has to exist in the parent table. You'll have to work with a new row in the parent table and start from there. Quite some task. And even if you have no spelling issue, at some point someone might say "we need the official country names; you have China, but it must be the People's Republic of China instead" and again you must look up and change that contry in all those tables. And what about partial backups? A table gets totally messed up due to some programming error and must be replaced by last week's backup, because this is the best you have. But suddenly some keys don't match any more. You never want a table's key to change.
You say "country names are unique and are never going to change". Think again :-)
It is easier to have your database use a technical arbitrary ID instead. Then the country name only exists in the country table. And if that name must get changed, you change it just in that one place, and all relations stay intact. This, however, doesn't mean that natural keys are worse than technical IDs. They are not. But it's more difficult with them to set up a database correctly. In case of countries a good natural key would be a country ISO code, because this uniquely identifies a country and doesn't change. This would be my choice here.
With students it's the same. Students usually have a student number or student ID in real world, so you can simply use this number to uniquely identifiy a student in the database. But then, how do we get these unique student IDs? At a large university, two secretaries may want to enter new students at the same time. They ask the system what the last student's ID was. It was #11223, so they both want to issue #11224, which causes a conflict of course, because only one student can be given that number. In order to avoid this, DBMS offer sequences of which numbers are taken. Thus one of the secretaries pulls #11224 and the other #11225. Auto-incremented IDs work this way. Both secretaries enter their new student, the rows get inserted into the student table and result in the two different IDs that get reported back to the secretaries. This makes sequences and auto incrementing IDs a great and safe tool to work with.
Convention can be a useful guide. It isn't necessarily the best option in all situations.
There are usually tradeoffs involved, like space, convenience, etc.
While you showed one method of resolving / inserting the proper country key value, there's a slightly less wordy option supported by standard SQL (and many databases).
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
SELECT *
FROM (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
) AS x
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
and a little less wordy again:
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
Here's a test case with MariaDB 10.5:
Working test case (updated)

How to enforce a unique constraint across multiple tables?

I have created the following two tables to map out students and teachers :
CREATE TABLE students(
student_id SERIAL PRIMARY KEY,
first_name NOT NULL VARCHAR(50),
last_name NOT NULL VARCHAR(50),
phone VARCHAR(15) UNIQUE NOT NULL CHECK (phone NOT LIKE '%[^0-9]%'),
email VARCHAR(30) UNIQUE NOT NULL CHECK (email NOT LIKE '%#%'),
graduationYear SMALLINT CHECK (graduationYear > 1900)
);
CREATE TABLE teachers(
teacher_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
departament VARCHAR(40) NOT NULL,
email VARCHAR(30) UNIQUE NOT NULL CHECK (email NOT LIKE '%#%'),
phone VARCHAR(15) UNIQUE NOT NULL CHECK (phone NOT LIKE '%[^0-9]%')
);
As you can see, both tables have a column for phone and email. I want these two to be unique to each individual.
How can I introduce a constraint that will check if the phone number/email introduced, for example, in the students table doesn't already exist in the teachers table?
Is there any kind of keyword that works like UNIQUE but on multiple tables or should I take another approach?
Edit: as #a_horse_with_no_name pointed out, LIKE doesn't support regular expressions. I should have used SIMILAR TO.
I would create a single table person that contains all attributes that are common to both including a type column that identifies teachers and students. Then you can create unique constraints (or indexes) on the phone and email columns.
To store the "type specific" attributes (graduation year, department) you can either have nullable columns in the person table and only put in values depending on the type. If you do not expect to have more "type specific" attributes apart from those two, this is probably the easiest solution
If you expect more "type specific" attributes, additional tables (student and teacher) with containing those can also be used. This is the traditional way of modelling inheritance in a relational database. As Postgres supports table inheritance, you could also create the teacher and student tables to inherit from the person table.

Can you make a foreign key from two tables?

So, I have three tables (mail,country and client). Table COUNTRY is parent table for table MAIL, so primary key od table MAIL is postcode from MAIL and countrycode from COUNTRY. I need to add a foreign key in table CLIENT that is made of postcode and countrycode but I want postcode in that foreign key to come from COUNTRY not MAIL. I only know how to make foreign key in CLIENT from MAIL:
CREATE TABLE COUNTRY (countrycode char(3),
CONSTRAINT pk_country PRIMARY KEY(countrycode))
CREATE TABLE MAIL (postcode numeric(5), countrycode char(3),
FOREIGN KEY(countrycode) REFERENCES country(countrycode),
CONSTRAINT pk_mail PRIMARY KEY(postcode,countrycode))
CREATE TABLE CLIENT (OIB numeric(11), postcode numeric(5), countrycode char(3),
FOREIGN KEY(postcode,countrycode) REFERENCES mail(postcode,countrycode),
CONSTRAINT pk_client PRIMARY KEY(OIB))
BUT I don't want that. I want my countrycode in table CLIENT to come from COUNTRY not from MAIL.
I've tryed:
FOREIGN KEY(postcode,countrycode) REFERENCES mail(postcode) AND country(countrycode)
but it doesn't work.
Is there a way to do that?
Don't use compound primary keys. Don't replicate data across multiple tables. So, look up the country based on the post code. I would suggest:
CREATE TABLE COUNTRIES (
countryid int identity(1, 1) primary key,
countrycode char(3) unique
);
CREATE TABLE PostCodes (
postcodeid int identity(1, 1) primary key,
countryid int references countries(countryid),
postcode nvarchar(25),
unique (countryid, postcode)
);
CREATE TABLE CLIENTS (
clientId int identity(1, 1) primary key,
OIB numeric(11) unique,
postcodeid int references postcodes(postcodeid)
);
Note some things:
You can look up the country via the postal code.
Postal codes are not always numeric.
Countries change over time (e.g. East Timor, South Sudan).
Postal codes can change (e.g. 10021 on the Upper East Side of Manhattan split into 10065 and 10075, once upon a time).
No, you can't have a single foreign key reference multiple tables, but you don't need to: You can have multiple foreign keys per table, so you can just reference both.
CREATE TABLE CLIENT (
OIB numeric(11),
postcode numeric(5),
countrycode char(3),
FOREIGN KEY(postcode,countrycode) REFERENCES mail(postcode,countrycode),
FOREIGN KEY(countrycode) REFERENCES country(countrycode),
CONSTRAINT pk_client PRIMARY KEY(OIB)
)
Other posts have presented a number of arguments regarding your existing table structures, some (imho) more valid than others. My reply is based solely on your existing structures, and does not attempt to second guess whether they are “right” or “wrong”. (Yes, I have opinions on the topic, but that’s not what you were asking about.)
On the face of it, there’s no need for the FK on column CountryCode in CLIENT to go all the way to COUNTRY to “validate itself”. The FK in MAIL ensures that any CountryCode in MAIL will also be in COUNTRY; so, an FK in CLIENT to MAIL on CountryCode is just as good at validating CountryCode as an FK directly to Country.
The once exception I can see to this is if you need to have a CLIENT with a valid CountryCode, but without a valid MAIL entry. (One thing unclear from your model is whether the columns are NULLable or not.) This might be the case if a CLIENT must have a COUNTRY, but does not have to have a MAIL. To do this, I’d use two FKs: one to MAIL on both columns, and one to COUNTRY on CountryCode… with column PostalCode set to allow NULLs. This way, CountryCode will always be validated against COUNTRY, and CountryCode + PostalCode will always be validated against MAIL--but only when PostalCode is NOT NULL.
Again, whether this is “right” or “wrong” architecture ultimately depends on what problems you are trying to solve—business, performance, storage (volume), and so on.

Implementing complex references without a trigger in PostgreSQL

I am creating a PostgreSQL database: Country - Province - City.
A city must belong to a country and can belong to a province.
A province must belong to a country.
A city can be capital of a country:
CREATE TABLE country (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL
);
CREATE TABLE province (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL,
country_id integer NOT NULL,
CONSTRAINT fk_province_country FOREIGN KEY (country_id) REFERENCES country(id)
);
CREATE TABLE city (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL,
province_id integer,
country_id integer,
CONSTRAINT ck_city_provinceid_xor_countryid
CHECK ((province_id is null and country_id is not null) or
(province_id is not null and country_id is null)),
CONSTRAINT fk_city_province FOREIGN KEY (province_id) REFERENCES province(id),
CONSTRAINT fk_city_country FOREIGN KEY (country_id) REFERENCES country(id)
);
CREATE TABLE public.capital (
country_id integer NOT NULL,
city_id integer NOT NULL,
CONSTRAINT pk_capital PRIMARY KEY (country_id, city_id),
CONSTRAINT fk_capital_country FOREIGN KEY (country_id) REFERENCES country(id),
CONSTRAINT fk_capital_city FOREIGN KEY (city_id) REFERENCES city(id)
);
For some (but not all) countries I will have province data, so a city will belong to a province, and the province to a country. For the rest, I shall just know that the city belongs to a country.
Issue #1: Concerning the countries that I do have province data, I was looking for a solution that will disallow a city to belong to a country and at the same time to a province of a different country.
I preferred to enforce through a check constraint that either province or country (but NOT both) are not null in city. Looks like a neat solution.
The alternative would be to keep both province and country info within the city and enforce consistency through a trigger.
Issue #2: I want to disallow that a city is a capital to a country to which it does not belong. That seems impossible without a trigger after my solution to issue #1 because there is no way to directly reference the country a city belongs to.
Maybe the alternative solution to issue #1 is better, it also simplifies future querying.
I would radically simplify your design:
CREATE TABLE country (
country_id serial PRIMARY KEY -- pk is not null automatically
,country text NOT NULL -- just use text
,capital int REFERENCES city -- simplified
);
CREATE TABLE province ( -- never use "id" as name
province_id serial PRIMARY KEY
,province text NOT NULL -- never use "name" as name
,country_id integer NOT NULL REFERENCES country -- references pk per default
);
CREATE TABLE city (
city_id serial PRIMARY KEY
,city text NOT NULL
,province_id integer NOT NULL REFERENCES province,
);
Since a country can only have one capitol, no n:m table is needed.
Never use "name" or "id" as column names. That's an anti-pattern of some ORMs. Once you join a couple of tables (which you do a lot in relational databases) you end up with multiple columns of the same non-descriptive name, causing all kinds of problems.
Just use text. No point in varchar(n). Avoid problem like this.
The PRIMARY KEY clause makes a column NOT NULL automatically. (NOT NULL sticks, even if you later remove the pk constraint.)
And most importantly:
A city only references one province in all cases. No direct reference to country. Therefore mismatches are impossible, on-disk storage is smaller and your whole design is much simpler. Queries are simpler.
For every country enter a single dummy-province with an empty string as name (''), representing the country "as a whole". (Possibly even with the same id, you could have provinces and countries draw from the same sequence ...). Do this automatically in a trigger. This trigger is optional, though.
I chose an empty string instead of NULL, so the column can still be NOT NULL and a unique index over (country_id, province) does its job. You can easily identify this province representing the whole country and deal with it as appropriate in your application.
I am using a similar design successfully in multiple instances.
I think you can actually implement all these constraints without using triggers. It does require a bit of restructuring of the data.
Start by enforcing the relationship (using foreign keys) of:
city --> province --> country
For countries with no province information, invent a province -- perhaps with the country name, perhaps some weird default name ("CountryProvince"). This allows you to have only one set of relationship between the three entities. It automatically ensures that cities and provinces are in the right country, because you would get the country through the province.
The final question is about capitals. There is a way that you can implement and enforce uniqueness with no triggers. Keep a flag in the cities table and use a unique filtered index to guarantee uniqueness:
create unique index on cities_capitalflag on cities(capitalflag) where capitalflag = 'Y';
EDIT:
You are right about the the filtered index needing the country. But that requires storing the country in that table, which, in turn, requires keeping the provinces and cities aligned with respect to country. So, this solution gets close to not needing triggers but it isn't there.

How to design relation between tables employee,client and phone Number?

I have a relational database with a Client table, containing id, name, and address, with many phone numbers
and I have an Employee table, also containing id, name, address, etc., and also with many phone numbers.
Is it more logical to create one "Phone Number" table and link the Clients and Employees, or to create two separate "Phone Number" tables, one for Clients and one for Employees?
If I choose to create one table, can I use one foreign key for both the Client and Employee or do I have to make two foreign keys?
If I choose to make one foreign key, will I have to make the Client ids start at 1 and increment by 5, and Employee ids start at 2 and increment by 5 so the two ids will not be the same?
If I create two foreign keys will one have a value and the other allow nulls?
The solution which I would go with would be:
CREATE TABLE Employees (
employee_id INT NOT NULL,
first_name VARCHAR(30) NOT NULL,
...
CONSTRAINT PK_Employees PRIMARY KEY (employee_id)
)
CREATE TABLE Customers (
customer_id INT NOT NULL,
customer_name VARCHAR(50) NOT NULL,
...
CONSTRAINT PK_Customers PRIMARY KEY (customer_id)
)
-- This is basic, only supports U.S. numbers, and would need to be changed to
-- support international phone numbers
CREATE TABLE Phone_Numbers (
phone_number_id INT NOT NULL,
area_code CHAR(3) NOT NULL,
prefix CHAR(3) NOT NULL,
line_number CHAR(4) NOT NULL,
extension VARCHAR(10) NULL,
CONSTRAINT PK_Phone_Numbers PRIMARY KEY (phone_number_id),
CONSTRAINT UI_Phone_Numbers UNIQUE (area_code, prefix, line_number, extension)
)
CREATE TABLE Employee_Phone_Numbers (
employee_id INT NOT NULL,
phone_number_id INT NOT NULL,
CONSTRAINT PK_Employee_Phone_Numbers PRIMARY KEY (employee_id, phone_number_id)
)
CREATE TABLE Customer_Phone_Numbers (
customer_id INT NOT NULL,
phone_number_id INT NOT NULL,
CONSTRAINT PK_Customer_Phone_Numbers PRIMARY KEY (customer_id, phone_number_id)
)
Of course, the model might changed based on a lot of different things. Can an employee also be a customer? If two employees share a phone number how will you handle it on the front end when the phone number for one employee is changed? Will it change the number for the other employee as well? Warn the user and ask what they want to do?
Those last few questions don't necessarily affect how the data is ultimately modeled, but will certainly affect how the front-end is coded and what kind of stored procedures you might need to support it.
"The Right Way", allowing you to use foreign keys for everything, would be to have a fourth table phoneNumberOwner(id) and have fields client.phoneNumberOwnerId and employee.phoneNumberOwnerId; thus, each client and each employee has its own record in the phoneNumberOwner table. Then, your phoneNumbers table becomes (phoneNumberOwnerId, phoneNumber), allowing you to attach multiple phone numbers to each phoneNumberOwner record.
Maybe you can somehow justify it, but to my way of thinking it is not logical to have employees and clients in the same table. It seems you wan to do this only so that your foreign keys (in the telephone-number table) all point to the same table. This is not a good reason for combining employees and clients.
Use three tables: employees, clients, and telephone-number. In the telephone table, you can have a field that indicates employee or client. As an aside, I don't see why telephone number needs to be a foreign key: that only adds complexity with very little benefit, imo.
Unless there are special business requirements I would expect a telephone number to be an attribute of an employee or client entity and not an entity in its own right.
If it were considered an entity in its own right it would be 'all key' i.e. its identifier is the compound of its attributes and has no attributes other than its identifier. If the sub-attributes aren't stored apart then it only has one attribute i.e. the telephone number itself! Therefore, it isn't usually 'interesting' enough to be an entity in its own right and a telephone numbers table, whether superclass or subclass, is usually overkill (as I say, barring special business requirements).