Related
I've heard that having an autogenerated primary key is a convention. However, I'm trying to understand its benefits in the following scenario:
CREATE TABLE countries
(
countryID int(11) PRIMARY KEY AUTO_INCREMENT,
countryName varchar(128) NOT NULL
);
CREATE TABLE students
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin int(11) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries (countryID)
);
INSERT INTO countries (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
If I want to insert something into the students table, I need to lookup the countryIDs in the countries table:
INSERT INTO students (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', (SELECT countryID FROM countries WHERE countryName = 'Germany')),
('Erik Jakobsson', (SELECT countryID FROM countries WHERE countryName = 'Sweden')),
('Manuel Verdi', (SELECT countryID FROM countries WHERE countryName = 'Italy')),
('Min Lin', (SELECT countryID FROM countries WHERE countryName = 'China'));
In a different scenario, as I know that the countryNames in the countries table are unique and not null, I could to the following:
CREATE TABLE countries2
(
countryName varchar(128) PRIMARY KEY
);
CREATE TABLE students2
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin varchar(128) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries2 (countryName)
);
INSERT INTO countries2 (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
Now, inserting data into the students2 table is simpler:
INSERT INTO students2 (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', 'Germany'),
('Erik Jakobsson', 'Sweden'),
('Manuel Verdi', 'Italy'),
('Min Lin', 'China');
So why should the first option be the better one, given that countryNames are unique and are never going to change?
There are two apects involved here:
natural keys vs. surrogate keys
autoincremented values
You are wondering why to have to deal with some arbitrary number for a country, when a country can be uniquely identified by its name. Well, imagine you use the country names in several tables to relate rows to each other. Then at some point you are told that you misspelled a country. You want to correct this, but have to do this in every table the country occurs in. In big databases you usually don't have cascading updates in order to avoid updates that unexpectedly take hours instead of mere minutes or seconds. So you must do this manually, but the foreign key constraints get in your way. You cannot change the parent table's key, because there are child tables using this, and you cannot change the key in the child tables first, because that key has to exist in the parent table. You'll have to work with a new row in the parent table and start from there. Quite some task. And even if you have no spelling issue, at some point someone might say "we need the official country names; you have China, but it must be the People's Republic of China instead" and again you must look up and change that contry in all those tables. And what about partial backups? A table gets totally messed up due to some programming error and must be replaced by last week's backup, because this is the best you have. But suddenly some keys don't match any more. You never want a table's key to change.
You say "country names are unique and are never going to change". Think again :-)
It is easier to have your database use a technical arbitrary ID instead. Then the country name only exists in the country table. And if that name must get changed, you change it just in that one place, and all relations stay intact. This, however, doesn't mean that natural keys are worse than technical IDs. They are not. But it's more difficult with them to set up a database correctly. In case of countries a good natural key would be a country ISO code, because this uniquely identifies a country and doesn't change. This would be my choice here.
With students it's the same. Students usually have a student number or student ID in real world, so you can simply use this number to uniquely identifiy a student in the database. But then, how do we get these unique student IDs? At a large university, two secretaries may want to enter new students at the same time. They ask the system what the last student's ID was. It was #11223, so they both want to issue #11224, which causes a conflict of course, because only one student can be given that number. In order to avoid this, DBMS offer sequences of which numbers are taken. Thus one of the secretaries pulls #11224 and the other #11225. Auto-incremented IDs work this way. Both secretaries enter their new student, the rows get inserted into the student table and result in the two different IDs that get reported back to the secretaries. This makes sequences and auto incrementing IDs a great and safe tool to work with.
Convention can be a useful guide. It isn't necessarily the best option in all situations.
There are usually tradeoffs involved, like space, convenience, etc.
While you showed one method of resolving / inserting the proper country key value, there's a slightly less wordy option supported by standard SQL (and many databases).
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
SELECT *
FROM (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
) AS x
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
and a little less wordy again:
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
Here's a test case with MariaDB 10.5:
Working test case (updated)
I'm trying to build school management system and I'm having trouble designing an optimal database structure. I have Students, Staff and Users tables for login. User table will have login information only (userNumber, password) and Students and Staff will contain personal information. I separated Students and Staff because they contain different personal data. But they both have a userNumber.
users(
id,
userNumber,
password
)
students(
id,
studentNumber,
name,
age
)
staff(
id,
staffNumber,
name,
age,
salary,
dateOfHiring,
staffType
)
Let's say I'm login in with a userNumber 98242, how can let the system know where should I look, in Students table or Staff table?
I would like some recommendations on database structures.
just add column userType to users table
You could do a few things. You could create a type in the users table and look that up. You could also join the both the tables and then on recieving a record check if the student id or staff id has been returned.
Then your query could be something like
SELECT users.id as user_id, students.id, staff.id FROM users
LEFT JOIN students ON users.id = students.id
LEFT JOIN staff ON users.id = staff.id
WHERE id = 98242
Inheritance:
create table persons (
id,
name,
age
);
create table users (
number,
password
) inherits (persons);
create table students (
) inherits (users);
create table stuff (
salary,
dateOfHiring,
staffType
) inherits (users);
Schematically, something like this.
Using tableoid system column you could to know the origin of the particular row:
select
*,
tableoid::regclass -- Prints the origin table name (users, students, stuff, ...)
from users
where number = 98242;
While there are only few separate columns for students and staff members, I would keep it simple:
CREATE TABLE person (
person_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, name text
, birthday date -- never age! bitrots in no time
, student_number int
, staff_number int
, salary numeric
, hired_at date
, staff_type text
, CONSTRAINT one_role_max CHECK (student_number IS NULL
OR (staff_number, salary, hired_at, staff_type) IS NULL)
, CONSTRAINT one_role_min CHECK (student_number IS NOT NULL
OR (staff_number, salary, hired_at, staff_type) IS NOT NULL)
);
CREATE TABLE users (
user_number int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
, person_id int NOT NULL REFERENCES person
, password text -- encrypted !
);
This way, one person can have 0-n user accounts - which is the typical reality. You can restrict to a single account per person by adding UNIQUE (person_id) to table users.
The CHECK constraint one_role_max enforces that either student columns or staff columns must stay NULL.
The CHECK constraint one_role_min enforces that at least one of both must have any values.
Adapt what must/can be filled in to your needs. The expressions work excellently for the current design. See:
NOT NULL constraint over a set of columns
While it's strictly "either/or" and the only student column is student_number, this query answers your question:
SELECT CASE WHEN student_number IS NULL THEN 'staff' ELSE 'student' END AS user_role
FROM person
WHERE person_id = (SELECT person_id FROM users WHERE user_number = 98242);
Or remove one or both CHECK constraints to allow the same person to be student and staff, or neither. Adapt above query accordingly.
You could use inheritance for this (like Abelisto demonstrates), but I'd rather stay away from it. There once was the idea of an object-relational DBMS. But the community has largely moved on. It works, but with caveats. Partitioning used to be a major use case. But declarative partitioning in Postgres 10 mostly superseded the inheritance-based implementation. There is not too much interest in it any more.
What about all those empty columns? Am I wasting a lot of space there? The opposite is the case. The disk footprint won't get much smaller than this. NULL storage is very cheap. See:
Does not using NULL in PostgreSQL still use a NULL bitmap in the header?
i have a database for university so the students take many courses but each student have id as primary key also each course have an id as primary key its not allowed to have duplicate id at same table what should i use to make a table that same id but different courses id
In a normalized relational database, when you have a situation where items from one table can be associated with many items in a second table, and likewise those related items can be associated with many items in the first table, you should use a third table to store those relationships. In your case, that table might be called Enrollments.
You can use JOIN operations on that third table to relate, group, and aggregate across that relationship. You can also store any metadata associated with that relationship in the JOIN- table, for example you might store the enrollment date there.
Like vicatcu said, create a third table
Course_id | Student_id
The ID's of the courses and the students are uniq in their own tables. The relation between course < - > student is stored in that third table.
I'm struggling to find how to make a specific query to a set of tables that I have on my local database.
CREATE TABLE Gym (
eid INT PRIMARY KEY,
name VARCHAR(127) UNIQUE,
district VARCHAR(127),
area INT);
CREATE TABLE Trainer (
id INT PRIMARY KEY,
name VARCHAR(127),
birth_year INT,
year_credentials_expiry INT
);
CREATE TABLE Works (
eid INT,
id INT,
since INT,
FOREIGN KEY (eid) REFERENCES Gym (eid),
FOREIGN KEY (id) REFERENCES Trainer (id),
PRIMARY KEY (eid,id));
The question is the following: I have several gyms and there are some cases where two or more gyms are located on the same district. How can I know, by district, which is the gym with less trainers on it?
The only thing I managed to get is the number of trainers per gym. Considering that, I can only get the gym with the minimum trainers from all districts...
NOTE: I am NOT allowed to use subqueries (SELECT inside SELECT's; SELECT inside FROM's)
Thank you in advance.
If you can have subqueries in the where or having clauseclause, then you can approach it this way.
First look at your query that counts the number of trainers by district.
Now, write a query (using that as a subquery) that calculates the minimum number of trainers in a gym in a district.
Next, take your first query, add a having clause and correlate it to the second by district and number of trainers.
This does use a subquery in a subquery, but it is in the having clause. I'm not writing the query for you, since you need to learn how to do that yourself.
By the way, if you have window/analytic functions there may be other solutions.
So I have two tables in this simplified example: People and Houses. People can own multiple houses, so I have a People.Houses field which is a string with comma delimeters (eg: "House1, House2, House4"). Houses can have multiple people in them, so I have a Houses.People field, which works the same way ("Sam, Samantha, Daren").
I want to find all the rows in the People table corresponding to the the names of people in the given house, and vice versa for houses belong to people. But I can't figure out how to do that.
This is as close as I've come up with so far:
SELECT People.*
FROM Houses
LEFT JOIN People ON Houses.People Like CONCAT(CONCAT('%', People.Name), '%')
WHERE House.Name = 'SomeArbitraryHouseImInterestedIn'
But I get some false positives (eg: Sam and Samantha might both get grabbed when I just want Samantha. And likewise with House3, House34, and House343, when I want House343).
I thought I might try and write a SplitString function so I could split a string (using a list of delimiters) into a set, and do some subquery on that set, but MySQL functions can't have tables as return values.
Likewise you can't store arrays as fields, and from what I gather the comma-delimited elements in a long string seems to be the usual way to approach this problem.
I can think of some different ways to get what I want but I'm wondering if there isn't a nice solution.
Likewise you can't store arrays as fields, and from what I gather the comma-delimited elements in a long string seems to be the usual way to approach this problem.
I hope that's not true. Representing "arrays" in SQL databases shouldn't be in a comma-delimited format, but the problem can be correctly solved by using a junction table. Comma-separated fields should have no place in relational databases, and they actually violates the very first normal form.
You'd want your table schema to look something like this:
CREATE TABLE people (
id int NOT NULL,
name varchar(50),
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE houses (
id int NOT NULL,
name varchar(50),
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE people_houses (
house_id int,
person_id int,
PRIMARY KEY (house_id, person_id),
FOREIGN KEY (house_id) REFERENCES houses (id),
FOREIGN KEY (person_id) REFERENCES people (id)
) ENGINE=INNODB;
Then searching for people will be as easy as this:
SELECT p.*
FROM houses h
JOIN people_houses ph ON ph.house_id = h.id
JOIN people p ON p.id = ph.person_id
WHERE h.name = 'SomeArbitraryHouseImInterestedIn';
No more false positives, and they all lived happily ever after.
The nice solution is to redesign your schema so that you have the following tables:
People
------
PeopleID (PK)
...
PeopleHouses
------------
PeopleID (PK) (FK to People)
HouseID (PK) (FK to Houses)
Houses
------
HouseID (PK)
...
Short Term Solution
For your immediate problem, the FIND_IN_SET function is what you want to use for joining:
For People
SELECT p.*
FROM PEOPLE p
JOIN HOUSES h ON FIND_IN_SET(p.name, h.people)
WHERE h.name = ?
For Houses
SELECT h.*
FROM HOUSES h
JOIN PEOPLE p ON FIND_IN_SET(h.name, p.houses)
WHERE p.name = ?
Long Term Solution
Is to properly model this by adding a table to link houses to people, because you're likely storing redundant relationships in both tables:
CREATE TABLE people_houses (
house_id int,
person_id int,
PRIMARY KEY (house_id, person_id),
FOREIGN KEY (house_id) REFERENCES houses (id),
FOREIGN KEY (person_id) REFERENCES people (id)
)
The problem is that you have to use another schema, like the one proposed by #RedFilter. You can see it as:
People table:
PeopleID
otherFields
Houses table:
HouseID
otherFields
Ownership table:
PeopleID
HouseID
otherFields
Hope that helps,
Hi you just change the table name places, left side is People and then right side is Houses:
SELECT People.*
FROM People
LEFT JOIN Houses ON Houses.People Like CONCAT(CONCAT('%', People.Name), '%')
WHERE House.Name = 'SomeArbitraryHouseImInterestedIn'