Normalizing and composite table structure

Normalizing and composite table structure - sql

I have difficulty to optimizing the database design in a situation
There is a table with 3000 employees and another one table with 300 departments.
Problem is there is a possibility to 3000 employees in 300 departments. Basically each employee will work in all department 300x3000 records (worse case). May be only 10 employees will work in 300 department (best case) and not a problem or have a composite table.
So what is the best way to structure
Please advice me
Thanks

In a relational database, the appropriate pattern for this is to store the fact that an employee works in a department is a relation:
create table employees(employee_id, ..., primary key(employee_id));
create table departments(department_id, ..., primary key(department_id));
create table works_in(
employee_id,
department_id,
primary key(employee_id, department_id),
foreign key(employee_id) references employees(employee_id),
foreign key(department_id) references departments(department_id)
);
/* Add column datatypes as needed */
This pattern is applicable in almost all cases, even for large relationship tables. If you are worried about space consumption, many databases can store this kind of table very efficiently, e.g., storing them physically as an index-organized table, e.g., storing them grouped by employee_id, etc.
However, even 300 x 3000 entries does not sound as if you would need such kind of physical database design optimization.

Related

How to implement ER Relationships :One to one, One To Many, Many to Many in Oracle?

Perhaps, this could seem like the basics of database design.But,certainly i find these concepts a lil tricky in the sense that how they relate to each other.
Theory as per my understanding.
One to One :
When each particular entity has one record.
One to Many :
When each particular entity has many record
Many to Many :
Multiple entities have multiple records.
As per the above if i could relate an example as
ONE TO ONE
Each employee having a unique passport number.
Table Employee
Empid(pk)
empname
passpordid(fk to passport)
Table passport
passportid(pk)
passportno
ONE TO MANY
An organisation having multiple employees
TABLE ORGANISATION
ORGID (PK)
ORGNAME
EMPID (FK TO EMPLOYEE)
TABLE EMPLOYEE
EMPID (PK)
EMPNAME
SALARY
This is the part that i want to know more that is many to many. I mean if we see one to many here. As a whole it could be said as many to many as many organisations having many employees but does that mean the whole relationship is many to many not one to many or one to many is a subset of many to many in the above example.
I wanna know the difference mainly between one to many and many to many both theoritically and by implementation.

An example of a many-to-many relationship would be EMPLOYEES and SKILLS, where SKILLS are things like "SQL", "Javascript", "Management" etc. An employee may have many skills (e.g. may know SQL and Javascript), and a particular skill by be possessed by many employees (e.g. Jack and Jill both know SQL).
In a database like Oracle, you need a third table to express the many-to-many relationship between EMPLOYEES and SKILLS:
create table EMPLOYEE_SKILLS
( empid references EMPLOYEES
, skillid references SKILLS
, constraint EMPLOYEE_SKILLS_PK primary key (empid, skillid)
);
Note that this third table has a foreign key to both of the other tables.
The table can also hold further information about the relationship - for example:
create table EMPLOYEE_SKILLS
( empid references EMPLOYEES
, skillid references SKILLS
, rating number
, date_certified date
, constraint EMPLOYEE_SKILLS_PK primary key (empid, skillid)
);

This is more a theory question that a programming one, but why not.
You could imagine a "many to many" as 1 table being a list of employees, and another table being a list of products sold.
Each employee is likely to "handle" more than 1 product, and each product could very well be handled by multiple employees, and it's quite possible that products A and B can be handled by, at the same time, employees C and D.

You seem to have the basic idea of one-to-one and one-to many down, in theory.
One-to-one: each item A has a separate and unique item B
In a database, you could include the new data in the same table. You could put it in a separate table, and enforce the one-to-one constraint by making the id into the remote table "unique" (indexing constraint), forcing the id for item B to not be repeated in the table. In your example, this would be a constraint added to the passportid.
One to many: each item A has some number of item Bs
In a database, you put the key in the table on the "many" side. So many Bs would have the same foreign key into table A. In your example, put an orgid as a foireign key into the employee table. Many employees can then point to a single organization, and an employee can belong to one organization.
Many to many: some number of As can be linked to any number of Bs and vice-versa
In a database, this is usually implemented as a link table, having just the IDs from both A and B. Following you example, consider that an employee can be part of multiple groups (or projects), multitasking across those groups. You could have a group table:
GROUPID (PK)
GROUPNAME
And the link table would then look like
LINKID (PK)
EMPID (FK TO EMPLOYEE)
GROUPID (FK TO GROUP)
This lets any employee be part of any number of groups, and groups to have any number of employees, AKA many-to-many.
A side comment:
Technically, you don't HAVE to have a LINKID in the link table. But it makes it easier to manage the data in that table, and is generally considered "best practice".

Primary Key in three tables

Just curious if I can have the same primary key in 3 different tables? I am going to create an Employee, FullTime and PartTime tables. I would like to make an EmployeeID the primary key for all 3. Any thoughts?

You can have the primary key EmployeeId in a table called Employees. This would have common information, such as date of hire and so on.
Then, each of your subtables can have an EmployeeId that is both a primary key in the table and a foreign key reference to Employees.EmployeeId. This is one way to implement a one-of relationship using relational tables.
Unfortunately, unless you use triggers, this mechanism doesn't prevent one employee from being in the two other tables, but that is not part of your question.

It sounds like your design is wrong.
The entity is the employee
An attribute of an employee is their [current^] employment status.
Therefore, in its simplest form, you need a single employee table, with a column that indicates their status.
To improve this further, the employee status column should have a foreign key relationship with another table that stores the possible employee statuses.
^ current status is a 1:1 relationship. If you wanted the history of changes, this is a 1:M and needs modelling differently.

Include a foreign key column in the table?

I have the following situation. My table is:
Table: CompanyEmployees
EmployeeID
Date of Birth
Date Joined
I also want to store the sales information for each employee. I have this:
Table: DealsCompleted
ID
EmployeeID
Deal Name
Deal Amount
My question is this- should there be a column in CompanyEmployees called "DealsCompletedID" which directly refers to the ID column in DealsCompleted, or is it acceptabe to just create a foreign key between the two Employee ID columns? Does this disadvantage the design or potentially distort the normalization?
I am unclear what the rule is as to whether I should include an extra column in CompanyEmployees or not.
EDIT Please assume there will only be one row in the deal table, per employee.

A FOREIGN KEY should point from one table to its referenced row in a parent table, the two tables should generally not reference each other (with foreign keys defined in both).
The FOREIGN KEY is most appropriately defined in the DealsCompleted table, which points back to CompanyEmployees.EmployeeID. Think about it this way - The CompanyEmployees table stores information about employees. Deals they completed do not really count as information about employees. However, the employee who completed a deal is a part of the information about a deal, so the key belongs there.
Having DealsCompleted.EmployeeID will allow for a proper one to many relationship between employees and deals. That is, one employee can have as many related rows in DealsCompleted as needed. Including a DealsCompleted column in the CompanyEmployees table on the other hand, would require you to either duplicate rows about employees, which breaks normalization, or include multiple DealCompletedID values in one column which is also incorrect.
Update after edit above Even if you plan for only a one-to-one relationship (one deal per employee), it is still more appropriate to reference the EmployeeID in DealsCompleted rather than the other way around (or both ways). ...And it allows you to expand to one-to-many, when the need arises.

Assuming that the relationship will always be one-to-one, as you state, then the answer depends on what is the primary entity within the Domain Model. If this database is at its core a database about Deals, and employee data is ancillary, then I would add an EmployeeId FK column in the Deal table. If otoh, this is a database about Employees, and Deals are ancillary, then eliminate the EmployeeId column in the Deal table, and add a DealId FK column to the Employeee table.

Dealing with circular reference when entering data in SQL

What kind of sql tricks you use to enter data into two tables with a circular reference in between.
Employees
EmployeeID <PK>
DepartmentID <FK> NOT NULL
Departments
DepartmentID <PK>
EmployeeID <FK> NOT NULL
The employee belongs to a department, a department has to have a manager (department head).
Do I have to disable constraints for the insert to happen?

I assume your Departments.EmployeeID is a department head. What I'd do is make that column nullable; then you can create the department first, then the employee.

Q: Do I have to disable constraints for the insert to happen?
A: In Oracle, no, not if the foreign key constraints are DEFERRABLE (see example below)
For Oracle:
SET CONSTRAINTS ALL DEFERRED;
INSERT INTO Departments values ('foo','dummy');
INSERT INTO Employees values ('bar','foo');
UPDATE Departments SET EmployeeID = 'bar' WHERE DepartmentID = 'foo';
COMMIT;
Let's unpack that:
(autocommit must be off)
defer enforcement of the foreign key constraint
insert a row to Department table with a "dummy" value for the FK column
insert a row to Employee table with FK reference to Department
replace "dummy" value in Department FK with real reference
re-enable enforcement of the constraints
NOTES: disabling a foreign key constraint takes effect for ALL sessions, DEFERRING a constraint is at a transaction level (as in the example), or at the session level (ALTER SESSION SET CONSTRAINTS=DEFERRED;)
Oracle has allowed for foreign key constraints to be defined as DEFERRABLE for at least a decade. I define all foreign key constraints (as a matter of course) to be DEFERRABLE INITIALLY IMMEDIATE. That keeps the default behavior as everyone expects, but allows for manipulation without requiring foreign keys to be disabled.
see AskTom: http://www.oracle.com/technology/oramag/oracle/03-nov/o63asktom.html
see AskTom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10954765239682
see also: http://www.idevelopment.info/data/Oracle/DBA_tips/Database_Administration/DBA_12.shtml
[EDIT]
A: In Microsoft SQL Server, you can't defer foreign key constraints like you can in Oracle. Disabling and re-enabling the foreign key constraint is an approach, but I shudder at the prospect of 1) performance impact (the foreign key constraint being checked for the ENTIRE table when the constraint is re-enabled), 2) handling the exception if (when?) the re-enable of the constraint fails. Note that disabling the constraint will affect all sessions, so while the constraint is disabled, other sessions could potentially insert and update rows which will cause the reenable of the constraint to fail.
With SQL Server, a better approach is to remove the NOT NULL constraint, and allow for a NULL as temporary placeholder while rows are being inserted/updated.
For SQL Server:
-- (with NOT NULL constraint removed from Departments.EmployeeID)
insert into Departments values ('foo',NULL)
go
insert into Employees values ('bar','foo')
go
update Departments set EmployeeID = 'bar' where DepartmentID = 'foo'
go
[/EDIT]

This problem could be solved with deferable constraints. Such constraints are checked when the whole transaction is commited, thus allowing you to insert both employee and department in the same transaction, referring to each other. (Assuming the data model makes sense)

Refactor the schema by removing the circular reference.
Delete an ID column from either of the table schema.
Departments.EmployeeID doesn't seem to belong there in my opinion.

I can't think of a non hackish way to do this. I think you will need to remove the constraint or do some type of silly dummy values that get updated after all the inserts.
I'd recommend refactoring the DB schema. I can't think of any reasons why you would want it to work this way.
Maybe something like, Employee, EmployeeDepartment (EmployeeId, DepartmentId) and Department would be a better way to accomplish the same goal.

You could create a row in the Department table for 'Unassigned'
To create a new department with a new Employee you then would
Create the Employee (EmployeeA) in the 'Unassigned' Department
Create the new department (DepartmentA) with the employee EmployeeA
Update EmployeeA to be in DepartmentA
This wouldn't invalidate your current schema, and you could set up a task to be run regularly to check there are no members of the Unassigned department.
You would also need to create a default employee to be the Employee of Unassigned
EDIT:
The solution proposed by chaos is much simpler though

There are a few good designs I've used. All involve removing the "manager" EmployeeID from the Department table and removing the DepartmentID from the Employee table. I've seen a couple answers which mention it, but I'll clarify how we used it:
I typically end up with an EmployeeDepartment relationship link table - many to many, usually with flags like IsManager, IsPrimaryManager, IsAdmin, IsBackupManager etc., which clarify the relationship Some may be constrained so that there is only one Primary Manager allowed per department (although a person can be a PrimaryManager of multiple departments). If you don't like the single table, then you can have multiple tables: EmployeeDepartment, ManagerDepartment, etc. but then you could have situations where a person is a manager but not an employee, etc.
We also typically allowed people to be members of multiple departments.
For simplified access, you can provide views which perform the join appropriately.

Yes, in this instance you will have to disable a foreign key.

You need to get rid of one or the other reference permanently . This is not a viable design structure. Which has to be entered first? Department or Employee? Unless your departments are all one employee big, the structure doesn't make sense anyway as each employee would have to have a distinct departmentid.

Constraints instead Triggers (Specific question)

I read here some reasons to use constraints instead of triggers. But I have a doubt. How can be assure (using only constraints), the coherence between SUPERCLASS tables and SUBCLASSES tables?
Whit a trigger is only a matter of check when INS.. UPD...
Is there a way to define that kinda relation by using only constraints (I'm newbie at this), thanks!

You can use constraints to ensure that every ContractEmployees row has a corresponding Employees row, and likewise for SalariedExployees. I don't know of a way to use constraints to enforce the opposite: making sure that for every Employees row, there is a row either in ContractEmployees or SalariedEmployees.
Backing up a bit... there are three main ways to model OO inheritance in a relational DB. The terminology is from Martin Fowler's Patterns of Enterprise Application Architecture:
Single table inheritance: everything is just in one big table, with lots of optional columns that apply only to certain subclasses. Easy to do but not very elegant.
Concrete table inheritance: one table for each concrete type. So if all employees are either salaried or contract, you'd have two tables: SalariedEmployees and ContractEmployees. I don't like this approach either, since it makes it harder to query all employees regardless of type.
Class table inheritance: one table for the base class and one per subclass. So three tables: Employees, SalariedEmployeees, and ContractEmployees.
Here is an example of class table inheritance with constraints (code for MS SQL Server):
CREATE TABLE Employees
(
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
FirstName VARCHAR(100) NOT NULL DEFAULT '',
LastName VARCHAR(100) NOT NULL DEFAULT ''
);
CREATE TABLE SalariedEmployees
(
ID INT NOT NULL PRIMARY KEY REFERENCES Employees(ID),
Salary DECIMAL(12,2) NOT NULL
);
CREATE TABLE ContractEmployees
(
ID INT NOT NULL PRIMARY KEY REFERENCES Employees(ID),
HourlyRate DECIMAL(12,2) NOT NULL
);
The "REFERENCES Employees(ID)" part on the two subclass tables defines a foreign key constraint. This ensures that there must be a row in Employees for every row in SalariedEmployees or ContractEmployees.
The ID column is what links everything together. In the subclass tables, the ID is both a primary key for that table, and a foreign key pointing at the base class table.

Here's how I'd model a contract vs salary employee setup:
EMPLOYEE_TYPE_CODE table
EMPLOYEE_TYPE_CODE, pk
DESCRIPTION
Examples:
EMPLOYEE_TYPE_CODE DESCRIPTION
-----------------------------------
CONTRACT Contractor
SALARY Salaried
WAGE_SLAVE I can't be fired - slaves are sold
EMPLOYEES table
EMPLOYEE_ID, pk
EMPLOYEE_TYPE_CODE, foreign key to the EMPLOYEE_TYPE_CODE table
firstname, lastname, etc..
If you're wanting to store a hierarchical relationship, say between employee and manager (who by definition is also an employee):
EMPLOYEES table
EMPLOYEE_ID, pk
EMPLOYEE_TYPE_CODE, foreign key to the EMPLOYEE_TYPE_CODE table
MANAGER_ID
The MANAGER_ID would be filled with the employee_id of the employee who is their manager. This setup assumes that an employee could only have one manager. If you worked in a place like what you see in the movie "Office Space", you need a different setup to allow for an employee record to associate with 2+ managers:
MANAGE_EMPLOYEES_XREF table
MANAGER_EMPLOYEE_ID, pk, fk to EMPLOYEES table
EMPLOYEE_ID, pk, fk to EMPLOYEES table

Databases are relational and constraints enforce relational dependencies pretty well, been doing so for some 30 years now. What is this super and sub class you talk about?
Update
Introducing the OO inheritance relationships in databases is actually quite problematic. To take your example, contract-employee and fulltime-employee. You can model this as 1) a single table with a discriminator field, as 2) two unrelated tables, or as 3) three tables (one with the common parts, one with contract specific info, one with fulltime specific info).
However if you approach the very same problem from a traditional normal form point of view, you may end up with a structure similar to 1) or 3), but never as 2). More often than not, you'll end up with something that looks like nothing you'd recommend from your OO design board.
The problem is that when this collision of requirements happens, today almost invariably the OO design will prevail. Often times, the relational model will not even be be on the table. Why I see this as a 'problem' is that most times databases far outlive their original application. All too often I see some design that can be traced back to a OO domain driven design session from an application long forgotten, and one can see in the database schema the places where, over time, the OO design was 'hammered' into place to fit what the relational engine underneath could support, scale and deliver. The tell sign for me is tables organized on a clustered index around a identity ID when no one ever is interrogating those tables for a specific ID.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas