Best way to join two relations in SQL? - sql

I have two relations Employees and Tasks. In the relation Employees I save relevant information about the employee including the primary key employee_id. In the relation Tasks I save all relevant information about the tasks including the primary key task_id.
What would be the best way to join these two relations so you know which employee has which task? I assume an employee can have multiple tasks and a task can have multiple employees. Would it be best to add employee_id as a foreign key to tasks like this.
CREATE TABLE Employee(
Name TEXT,
Employee_id INT PRIMARY KEY
);
CREATE TABLE Tasks(
task_description TEXT,
Employee_id INT,
FOREIGN KEY (Employee_id) REFERENCES Employee(Employee_id),
task_id INT PRIMARY KEY
);
My problem with this is that since multiple employees can have the same task, it's not a key for the relation Tasks.
Another option I thought about was this. I don't know if this has the desired effect.
CREATE TABLE Employee(
Name TEXT,
Employee_id INT PRIMARY KEY
);
CREATE TABLE Tasks(
task_description TEXT,
Employee_id INT,
task_id INT,
PRIMARY KEY(Employee_id, task_id)
);
What other (better) options are there?

This is, by definition, a many-to-many relationship that require a separate table:
CREATE TABLE EmployeeTasks (
Employee_id INT,
Task_id INT,
PRIMARY KEY (Employee_id, Task_id),
FOREIGN KEY Employee_Id REFERENCES Employees(Employe_id),
FOREIGN KEY Task_id REFERENCES Tasks(Task_id));
CREATE INDEX EmployeeTasks_Task_id on EmployeeTasks(Task_id);
You should pluralize table names.

Related

Enforce referential integrity in a ternary relation

Question
Employees are organized into teams. Each team can have multiple employees, and each employee can belong to multiple teams. This many-to-many relationship is represented by the team_membership table.
Each project is assigned to one team. Projects are subdivided into tasks, and each task is assigned to an employee.
Is it possible to guarantee that a task's employee is a member of the corresponding project's team, without adding triggers or redundant columns?
Example tables
CREATE TABLE employee
(
employee_id bigserial PRIMARY KEY,
employee_name text
);
CREATE TABLE team
(
team_id bigserial PRIMARY KEY,
team_name text
);
CREATE TABLE team_membership
(
team_id bigint NOT NULL REFERENCES team,
employee_id bigint NOT NULL REFERENCES employee,
PRIMARY KEY (team_id, employee_id)
);
CREATE TABLE project
(
project_id bigserial PRIMARY KEY,
team_id bigint NOT NULL REFERENCES team,
project_name text
);
CREATE TABLE task
(
task_id bigserial PRIMARY KEY,
task_name text,
project_id bigint NOT NULL REFERENCES project,
employee_id bigint NOT NULL REFERENCES employee
);
What I have already tried
Use a trigger to check validity when data changes. This would require writing similar trigger procedures for the employee and team_membership tables.
CREATE FUNCTION check_employee_member_of_team() RETURNS trigger AS $$
BEGIN
IF NOT new.employee_id IN (
SELECT employee_id FROM team_membership tm
JOIN project pr ON tm.team_id = pr.team_id
WHERE pr.proejct_id = new.project_id
) THEN
RAISE EXCEPTION 'Employee is not a member of project';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER insert_or_update_task_trigger
BEFORE INSERT OR UPDATE ON task
FOR EACH ROW EXECUTE PROCEDURE check_employee_member_of_team();
Add a team_id column to the task table, and enforce the constraint using composite foreign keys. project.team_id and task.team_id are redundant.
CREATE TABLE project
(
project_id bigserial PRIMARY KEY,
team_id bigint NOT NULL REFERENCES team,
project_name text,
UNIQUE (project_id, team_id)
);
CREATE TABLE task
(
task_id bigserial PRIMARY KEY,
task_name text,
project_id bigint NOT NULL,
team_id bigint NOT NULL,
employee_id bigint NOT NULL REFERENCES employee,
FOREIGN KEY (project_id, team_id) REFERENCES project (project_id, team_id),
FOREIGN KEY (team_id, employee_id) REFERENCES team_membership (team_id, employee_id)
);
Add a redundant team_id to task and add the new column to the foreign key referencing project. This requires a redundant unique constraint on project(team_id, project_id).
Then create a foreign key constraint from task to team_membership. Then the task can only be assigned to a team member.
The following solution, a variation on that of #LaurenzAlbe, reduces the redunduncy while maintaining all the constraints:
employee (empl_id, empl_name), primary key empl_id
team (team_id, team_name), primary key team_id
team_membership (empl_id, team_id),
primary key (empl_id, team_id),
foreign key empl_id references employee,
foreign key team_id references team
project (project_num, team_id, project_name)
primary key(project_num, team_id)
foreign key team_id references team
task (task_num, project_num, team_id, empl_id, task_name)
primary key(task_num, project_num, team_id)
foreign key (project_num, team_id) references project,
foreign key (empl_id, team_id) references team_membership
Note that project_num is a sequential number internal to each task (so different tasks can have the same set of numbers for their projects), while task_num is a sequential number internal to each project.
Finally, note that, as for the comment of #DamirSudarevic, the foreign key on team_membership is enough to ensure that empl_id and team_id reference to an existing employee and team, respectively.
Normalisation to the n-th degree might not always be a very good idea. Storage is far cheaper now than days of Boyce–Codd. The cost of additional complexity and operational costs are far higher by orders of magnitude. Avoiding triggers and such native features will make the solution less portable - lets say in a few years from now your business was super successful and you have to scale this out into a no-sql solution or a platform where triggers are not supported
It might be wise to drop the purist pursuit of too much normalisation and look at more pragmatic solution. Then there is many ways to skin this cat by using foreign keys alone. Other answers are also good in this respect. One way I can think is below and I haven't denormalised too much yet:
CREATE TABLE employee
(
employee_id bigserial PRIMARY KEY,
employee_name text
);
CREATE TABLE team
(
team_id bigserial PRIMARY KEY,
team_name text
);
CREATE TABLE team_membership
(
team_id bigint NOT NULL REFERENCES team,
employee_id bigint NOT NULL REFERENCES employee,
-- PRIMARY KEY (team_id, employee_id)
teamp_membership_id bigserial PRIMARY KEY
);
CREATE TABLE project
(
project_id bigserial PRIMARY KEY,
-- team_id bigint NOT NULL REFERENCES team,
project_name text
);
CREATE TABLE project_team
(
project_team_id bigserial PRIMARY KEY,
project_id bigint REFERENCES project,
teamp_membership_id bigint REFERENCES team_membership
);
CREATE TABLE task
(
task_id bigserial PRIMARY KEY,
task_name text,
project_team_id bigint NOT NULL REFERENCES project_team
);
Surrogate key to team_membership called team_member_id
Remove team_id from project table
Create project-team mapping tables because probably not all employees in a team may be in a project. Also modify the task table.

Create 2 tables having same column but not a primary key

( Question is only for a college project as I'm stuck with the requirement)
I want to create 2 tables in SQL Server, say 'table1' and 'table2' in the same database. Both should have a column say 'col1' which is not a primary key.
So how should I create it so that when I insert data into one table, the other gets updated automatically?
NOTE: So this is for a college project, we are asked to make a specific type of primary keys so referencing that is not an option, now I have to have same entity in 2 different tables, so is a good idea to somehow reference them, any ideas?
For eg, en employee's project details will have his/her empID and the dependants' table will have empID as well. But I cannot make it primary key since that is already defined by the Professor. But updating one should update another as well, does that make sense?
You can have two types of parent-child relationship.
Identifying relationship : Here, child depends on the parent to identify itself. E.g. Project requires Employee to exist. Here, Project should have EmployeeId part of its primary key or EmployeeId as its primary key. If EmployeeId is primary key of project then, an employee can have only one project.
CREATE TABLE Employee
(
EmployeeId INT,
EmployeeName VARCHAR(255) NOT NULL,
PRIMARY KEY(EmployeeId)
)
GO
CREATE TABLE EmployeeProject
(
EmployeeId INT,
EmployeeName VARCHAR(255) NOT NULL,
PRIMARY KEY(EmployeeId),
FOREIGN KEY (EmployeeId) REFERENCES Employee(EmployeeId),
)
GO
Non-identifying relationship: Here, child does not depend on the parent to identify itself. E.g. Project can be defined without Employee. Here, Project can have EmployeeId as foreign key. If EmployeeId is NOT NULL column, then it is mandatory to have an employee. If EmployeeId is NULL column, then it is not mandatory to have an employee.
CREATE TABLE Employee
(
EmployeeId INT,
EmployeeName VARCHAR(255) NOT NULL,
PRIMARY KEY(EmployeeId)
)
GO
CREATE TABLE EmployeeProject
(
EmployeeProjectId INT,
EmployeeName VARCHAR(255) NOT NULL,
EmployeeId INT NOT NULL, -- Can be NULL, if it is not mandatory
PRIMARY KEY(EmployeeProjectId),
FOREIGN KEY (EmployeeId) REFERENCES Employee(EmployeeId),
)
GO
First while creating tables, the entity which is to be refereced as foreign key, make it unique:
Table1:
[SIN] int NOT NULL UNIQUE,
Second, in another table where calling [SIN] as FK, put conditions for update and delete:
Table2:
[SIN] INT CONSTRAINT [SIN_FK1] FOREIGN KEY REFERENCES Employee([SIN]) ON DELETE SET NULL ON UPDATE CASCADE
By doing this, whenever you update or delete records in Table1, the corresponding record in Table2 will be updated.

Can I use one same primary key in two different tables?

'''
CREATE TABLE Employee
(
employeeID INT (10) PRIMARY KEY,
Name CHAR (20)
);
'''
'''
CREATE TABLE SALARY
(
employeeID INT (10) PRIMARY KEY,
Salary INT (10)
);
'''
Is it possible to use the same primary key in both tables?
Yes. You can have same column name as primary key in multiple tables.
Column names should be unique within a table. A table can have only one primary key, as it defines the Entity integrity.
If this question is about data modelling parent-child relationship, There are two types. You are read more on this.
Identifying relationship : Child identifies itself by the help of parent. Here, Employee & Salary will share the same primary key. Primary key of parent table (EmployeeId) will be primary key of child table also (Salary).
Non-Identifying relationship: Child is having its own identity. Here, Employee & Salary will have different primary key. Child table will have its own primary key(say SalaryId) and will have primary key of parent as a Foreign key(EmployeeId).
If your question is if you can use the same EMPLOYEEID column as the primary ID on multiple tables, the answer is "YES, YOU CAN"
You can use the same column as primary index on multiple tables, but you cannot have more than one primary index on a table
Yes, you can. You would make salary.employeeid both the table's primary key and a foreign key to the employee table:
CREATE TABLE salary
(
employeeid INT (10) NOT NULL,
salary INT (10) NOT NULL,
CONSTRAINT pk_salary PRIMARY KEY (employeeid,
CONSTRAINT fk_salary_employeeid FOREIGN KEY (employeeid) REFERENCES employee (employeeid)
);
This creates a {1}:{0,1} relationship between the tables and ensures that you cannot store a salary without having stored the employee and that you cannot store more than one salary for one employee.
This is something we rarely do. (We would rather make the salary a column in the employee table.) The only advantage of a separate salary table I see is that you can grant rights on the employee table but revoke them on the salary table, so as to make the salary table invisible to some database users.
You can do this, however, it is bad design.
I would suggest having the EmployeeId as the PK on the employee table and the EmployeeId as a Foreign Key on the Salary table, with the Salary table having it's own PK (most likely SalaryId).
Also the field [Name] I would personally steer clear of too, as "Name" is a reserved word in SQL.
CREATE TABLE dbo.Employee
(
EmployeeId BIGINT IDENTITY(1,1)
,EmployeeName VARCHAR(20) NOT NULL
,CONSTRAINT PK_Emp PRIMARY KEY (EmployeeId)
);
GO
CREATE TABLE dbo.Salary
(
SalaryId BIGINT IDENTITY(1,1)
,EmployeeId BIGINT NOT NULL
,Salary INT NOT NULL
,CONSTRAINT PK_Sal PRIMARY KEY (SalaryId)
,CONSTRAINT FK_EmpSal FOREIGN KEY (EmployeeId)
REFERENCES Employee(EmployeeId)
);
GO
All of that said, I think a little more thought into the db structure you should most likely end up with 3 tables. It is likely that many staff will have the same salary, lets say 5 employees are on 40,000 and 3 are on 50,000, etc.
You will end up storing the same Salary value multiple times.
A better way is to store that value once and have a third table that links an employee with a salary (in this case I have called it [Earnings]).
With this structure the salary of say 40,000 is stored 1 time in the db and you can link an employeeId to it multiple times.
CREATE TABLE dbo.Employee
(
Id BIGINT IDENTITY(1,1)
,EmployeeName VARCHAR(20) NOT NULL
,CONSTRAINT PK_Emp PRIMARY KEY (Id)
);
GO
CREATE TABLE dbo.Salary
(
Id BIGINT IDENTITY(1,1)
,Salary INT NOT NULL
,CONSTRAINT PK_Sal PRIMARY KEY (Id)
);
GO
CREATE TABLE dbo.Earnings
(
Id BIGINT IDENTITY(1,1)
,EmployeeId BIGINT NOT NULL
,SalaryId BIGINT NOT NULL
,CONSTRAINT PK_Ear PRIMARY KEY (Id)
,CONSTRAINT FK_EmpEar FOREIGN KEY (EmployeeId)
REFERENCES Employee(Id)
,CONSTRAINT FK_SalEar FOREIGN KEY (SalaryId)
REFERENCES Salary(Id)
);
GO
Technically, it is possible.
However, it is advisable to use Foreign key. This will:
-Avoid redundancy
-Help in maintaining db structure
-Improve readability
For this example, use:
CREATE TABLE Employee ( EmployeeID INT PRIMARY KEY, Name CHAR (20), Primary Key (EmployeeId) );
CREATE TABLE SALARY ( SalaryId INT , EmployeeID INT , Salary INT (10), Primary Key (SalaryId), Foreign Key (EmployeeId) REFERENCES Employee );
*An even better approach would be to add a constraint instead of just mentioning
[key names]
Like:
CREATE TABLE Employee(
EmployeeId INT,
Name CHAR(20)
)
ALTER TABLE Employee ADD CONSTRAINT PK_EmployeeId PRIMARY KEY
Can do by differents ways
I have a "Main" table named "Peoples" with "peopleId" as PK Identity
"Child" tables
"PeopleDocs" with "peopleId" PK
"PeopleImages" with "peopleId" PK
"PeopleKeys" with "peopleId" PK.
So, when i create one
record on "Peoples" i populate the "childs" with empty records to work
later.
This logic is about a federal law on my country and some people
informations has a life cicle and must be deleted if the life cicle
ends or the owner of the data demands. So we split the people
sensitive data on child tables to not lose all bussiness records,
refferences and log about that people.
Col1 as Primary key on both tables
SELECT
P.COL1,
P.COL2,
C.COL2,
C.COL3
FROM
TBL1 P, TBL2 C
WHERE
C.COL1 = P.COL1
Or using joins
SELECT
P.COL1,
P.COL2,
C.COL2,
C.COL3
FROM
TBL1 AS P
INNER JOIN
TBL2 AS C
ON
C.COL1 = P.COL1

SQL check constraint

I'm trying to create a constraint to check that a project can have only one employee whose role is project leader but at the same time can have other employees with different roles.
My table definition:
CREATE TABLE employee
( employee_id INT NOT NULL PRIMARY KEY
,employee_role VARCHAR(15) NOT NULL
, CHECK (employee_role in ('project_leader', 'administrator', 'member'))
)
CREATE TABLE project
( project_id INT NOT NULL PRIMARY KEY
, name VARCHAR(50)
, employee_id INT NOT NULL
, employee_role VARCHAR(15) NOT NULL
, CONSTRAINT employee_project_FK
FOREIGN KEY (employee_id, employee_role)
REFERENCES employee (employee_id, employee_role)
, CONSTRAINT only_one_project_leader
CHECK (employee_role = 'project_leader')
) ;
It's unclear to me how this can be expressed in a constraint and what I need to change. If anyone would inform me what I'm doing wrong, I'd appreciate it.
You are missing a table. Your data structure wants three tables:
Employee, which lists information about employees
Project, which lists information about projects
ProjectEmployee, which is an association table between the two
If you want a constraint that a project has only one leader, then you can simply add a column to Project called ProjectLeader. This will enforce the constraint, because there is only one slot per project for the leader. If you have to have a leader, then add in a check constraint to be sure this is not NULL.
A sign that something is wrong with the data model is that project_id is a primary key in the project table. This implies that for a given project_id, there is only one employee. I don't think that is what you intend.
EDIT:
The tables would look something like:
CREATE TABLE project
( project_id INT NOT NULL PRIMARY KEY,
name VARCHAR(50),
project_leader int references employee(employee_id)
) ;
CREATE TABLE projectemplyee
( projectemployee_id INT NOT NULL PRIMARY KEY,
project_id int references project(project_id),
employee_id int references employee(employee_id),
employee_role VARCHAR(15) NOT NULL
) ;
There is only one slot for a leader in each project. You do not need a constraint to enforce the one-ness.

Enforce constraints between tables

How do you establish a constraint where, one column (not the primary key) has to have the same value as another table's column. I'm not quite sure how to phrase it so here's an example:
Ex:
I have three tables, Employee, Director, Division and Department
The structure for the tables are as follows:
Employee
Id
Name
DirectorId (FK)
DepartmentID (FK)
Director
Id
Name
DepartmentID (FK)
Department
Id
Name
DivisionId (FK)
Division
Id
Name
Employees and directors both have departments, each department has a division but their divisions have to be the same. Is there a way to enforce this? (Hopefully without having to resort to triggers)
Create stored procedures that will have proper GRANTS and don't allow user to INSERT into table directly. Use stored procedures as interface to the database, and check required conditions in them before insertion.
First, your example tables are doing too much. There is a design principle that states that a single table should model an entity or a relationship between entities but not both. The relationships between departments, directors and employees (I'm assuming that directors are not employees; I'm also omitting divisions for the moment).
Second, a table can have more than one key, known as candidate keys. Further, you can create a UNIQUE constraint by 'appending' a non-unique column to a key. For example, employees' names do not make for a good key, hence the reason for having an employee ID (I don't think the same can be said for departments i.e. department name in itself is a good enough key). If employee_ID is unique then it follows that (employee_name, employee_ID) will also be unique.
Third, a table can be referenced by any UNIQUE constraint, it doesn't have to be the table's 'primary' key (which partly explains why 'primary key' is a bit of a nonsense).
The great thing about the above is that one can model the required constraints using FOREIGN KEY and row-level CHECK constraints. SQL optimizers and programmers prefer declarative solutions to procedural code (triggers, stored procs, etc). This vanilla SQL DDL will port to most SQL products.
So, the department name can be combined with both the director key and the employee key respectively and these compound keys can be referencesd in a simple two-tier org chart table: because both the employee's department and their director's department will appear in the same table, a simple row-level CHECK constraint can be used to test that they are the same e.g.
Entity tables:
CREATE TABLE Departments
(
department_name VARCHAR(30) NOT NULL UNIQUE
);
CREATE TABLE Employees
(
employee_ID INTEGER NOT NULL UNIQUE,
employee_name VARCHAR(100) NOT NULL
);
CREATE TABLE Directors
(
director_ID INTEGER NOT NULL UNIQUE,
director_name VARCHAR(100) NOT NULL
);
Relationship tables:
CREATE TABLE EmployeeDepartments
(
employee_ID INTEGER NOT NULL UNIQUE
REFERENCES Employees (employee_ID),
employee_department_name VARCHAR(30) NOT NULL
REFERENCES Departments (department_name),
UNIQUE (employee_department_name, employee_ID)
);
CREATE TABLE DirectorDepartments
(
director_ID INTEGER NOT NULL UNIQUE
REFERENCES Directors (director_ID),
director_department_name VARCHAR(30) NOT NULL
REFERENCES Departments (department_name),
UNIQUE (director_department_name, director_ID)
);
CREATE TABLE OrgChart
(
employee_ID INTEGER NOT NULL UNIQUE,
employee_department_name VARCHAR(30) NOT NULL,
FOREIGN KEY (employee_department_name, employee_ID)
REFERENCES EmployeeDepartments
(employee_department_name, employee_ID),
director_ID INTEGER NOT NULL,
director_department_name VARCHAR(30) NOT NULL,
FOREIGN KEY (director_department_name, director_ID)
REFERENCES DirectorDepartments
(director_department_name, director_ID),
CHECK (employee_department_name = director_department_name)
);
Now a slightly more interesting scenario would be when a director is assigned a division, rather than a specific department, and you had to test that the employee's department was in the same division as her director:
Entity tables:
CREATE TABLE Divisions
(
division_name VARCHAR(20) NOT NULL UNIQUE
);
CREATE TABLE Departments
(
department_name VARCHAR(30) NOT NULL UNIQUE,
division_name VARCHAR(20) NOT NULL
REFERENCES Divisions (division_name),
UNIQUE (division_name, department_name)
);
CREATE TABLE Employees
(
employee_ID INTEGER NOT NULL UNIQUE,
employee_name VARCHAR(100) NOT NULL
);
CREATE TABLE Directors
(
director_ID INTEGER NOT NULL UNIQUE,
director_name VARCHAR(100) NOT NULL
);
Relationship tables:
CREATE TABLE EmployeeDepartments
(
employee_ID INTEGER NOT NULL UNIQUE
REFERENCES Employees (employee_ID),
employee_department_name VARCHAR(30) NOT NULL
REFERENCES Departments (department_name),
UNIQUE (employee_department_name, employee_ID)
);
CREATE TABLE DirectorDivisions
(
director_ID INTEGER NOT NULL UNIQUE
REFERENCES directors (director_ID),
director_division_name VARCHAR(20) NOT NULL
REFERENCES divisions (division_name),
UNIQUE (director_division_name, director_ID)
);
CREATE TABLE OrgChart
(
employee_ID INTEGER NOT NULL UNIQUE,
employee_department_name VARCHAR(30) NOT NULL,
FOREIGN KEY (employee_department_name, employee_ID)
REFERENCES EmployeeDepartments
(employee_department_name, employee_ID),
employee_division_name VARCHAR(20) NOT NULL
REFERENCES divisions (division_name),
FOREIGN KEY (employee_division_name, employee_department_name)
REFERENCES Departments (division_name, department_name),
director_ID INTEGER NOT NULL,
director_division_name VARCHAR(20) NOT NULL,
FOREIGN KEY (director_division_name, director_ID)
REFERENCES DirectorDivisions
(director_division_name, director_ID),
CHECK (employee_division_name = director_division_name)
);
There is no limitation on creating foreign keys--there's nothing to stop you from defining a foreign key constraint on these tables:
EMPLOYEE
DIRECTOR
...associating them to the DEPARTMENT table. Though frankly, I don't see the need for a DIRECTOR table--that should be either a boolean indicator in the EMPLOYEE table, or possibly an EMPLOYEE_TYPE_CODE with it's own foreign key constraint to distinguish between employees and directors.
Multiple Foreign Keys
The presence of a foreign key also doesn't stop you from putting a second (or third, etc) constraint on the same column. Consider the scenario of TABLE_C.column having foreign key constraints to both TABLE_A.col and TABLE_B.col -- this is perfectly acceptable in the database, but it means that only values that exist in both TABLE_A.col and TABLE_B.col can exist in the TABLE_C.column. IE:
TABLE_A
col
----
a
b
c
TABLE_B
col
----
c
Based on this example data, TABLE_C.column could only ever allow "c" as value to exist in the column if someone added two foreign key constraints to the TABLE_C.column, referencing TABLE_A.col and TABLE_B.col.