Structure for 3 or more tables with relationship - sql

I need help about database structure,
so I need to join 3 tables but I'm confuse what structure is the right way.
Region_table
id region_name
1 Region1
2 Region2
Province
id province_name
1 Province1
City
id city_name
1 city1
I'm thinking to create new table that connects all.
like id, region_id, province_id, city_id
region_province_city
id region_id province_id city_id
1 1 1 1
2 1 1 2
or I can do like
Province
id province_name region_id
1 Province1 1
City
id city_name province_id
1 city1 1

I would generally suggest the hierarchical structure. That is, adding the parent id to each of the two tables.
Why? This guarantees that the region on a given province is always the same. That can be hard to guarantee in the 3-way junction table. After all, you don't have an independent relationship among the three entities. You have a hierarchical relationship and you would like to enforce that data integrity.
An example of an independent relationship would be:
Customers
Items
Payment Method
Presumably a customer could purchase any item using any payment method -- and even use different payment methods on the same item.
Would I always recommend this structure? Well, no. If the data is not being modified, then you are pretty safe with the three-way table. It works very well for looking up information about a "city". It can be verified when created. And it might provide some simplification (particularly if the hierarchies are deeper).
That said, it works best when your joins are to the lowest level. It is a little tricky to get the region for a province. A typical solution would be to augment the data with NULL values for city to get that information.

Related

Database table design approach

I have a question about proper table design. Imagine the situation:
You have two entities in the table (Company and Actor).
Actor could have several types (i.e. Shop, PoS, Individual etc.)
The relation between entities is many-to-many because one Actor can be assigned to multiple Companies and vice-versa (one Company can have multiple Actors)
To outline this relation, I can create a linking table called 'C2A' for instance. So, we'll end up with the structure like this:
| Company1 | Actor1(Shop)
| Company1 | Actor2(PoS)
| Company2 | Actor1(PoS)
etc.
This approach works just fine until requirement changes. Now we need to have a possibility to assign Actor to an Actor to build another sort of hierarchy, i.e. one Actor (Shop) might have multiple other Actors (PoS's) assigned to it and all this belong to a certain company. So, we'll end up with the structure like this:
| Company1 | Actor1(Shop) | NULL
| NULL | Actor1(Shop) | Actor1(PoS)
| NULL | Actor1(Shop) | Actor2(PoS)
etc.
I need to be able to express relations (between Company (C) and Actor(A)) like this: C - A - A; A - C - A; A - A - C
Having two similar fields in one table is not the best idea. How are the relationships like this normally designed?
Create two separate tables for entities Company and Actors.
Company( Id (PK), Name)
Actor(Id (PK), Name)
Now, if you are sure about many-many. You can go ahead with creating a separate table for this.
CompanyActorMap( MapId (PK), CompanyId (FK), ActorId (FK))
For Actor Heirarchy, use a separate table as it has nothing to do with how the hierarchy is related to the company, its about how the Actors are related to each other. Best approach for multiple level infinite hierarchy is to create a new table with two columns for Actor Id as Foreign Key
ActorHierarchy( HierarchyId (PK), ChildActorId (FK), ParentActorId(FK))
For an Actor that has no parent in hierarchy, there can be an entry in CompanyActorMap table denoting the head of hierarchy and the remaining chain can be derived from the ActorHierarchy table by checking the PArentActorId column and its childActorId.
This is obviously a rough draft. Hope this helps.

Purpose of Self-Joins

I am learning to program with SQL and have just been introduced to self-joins. I understand how these work, but I don't understand what their purpose is besides a very specific usage, joining an employee table to itself to neatly display employees and their respective managers.
This usage can be demonstrated with the following table:
EmployeeID | Name | ManagerID
------ | ------------- | ------
1 | Sam | 10
2 | Harry | 4
4 | Manager | NULL
10 | AnotherManager| NULL
And the following query:
select worker.employeeID, worker.name, worker.managerID, manager.name
from employee worker join employee manager
on (worker.managerID = manager.employeeID);
Which would return:
Sam AnotherManager
Harry Manager
Besides this, are there any other circumstances where a self-join would be useful? I can't figure out a scenario where a self-join would need to be performed.
Your example is a good one. Self-joins are useful whenever a table contains a foreign key into itself. An employee has a manager, and the manager is... another employee. So a self-join makes sense there.
Many hierarchies and relationship trees are a good fit for this. For example, you might have a parent organization divided into regions, groups, teams, and offices. Each of those could be stored as an "organization", with a parent id as a column.
Or maybe your business has a referral program, and you want to record which customer referred someone. They are both 'customers', in the same table, but one has a FK link to another one.
Hierarchies that are not a good fit for this would be ones where an entity might have more than one "parent" link. For example, suppose you had facebook-style data recording every user and friendship links to other users. That could be made to fit in this model, but then you'd need a new "user" row for every friend that a user had, and every row would be a duplicate except for the "relationshipUserID" column or whatever you called it.
In many-to-many relationships, you would probably have a separate "relationship" table, with a "from" and "to" column, and perhaps a column indicating the relationship type.
I found self joins most useful in situations like this:
Get all employees that work for the same manager as Sam. (This does not have to be hierarchical, this can also be: Get all employees that work at the same location as Sam)
select e2.employeeID, e2.name
from employee e1 join employee e2
on (e1.managerID = e2.managerID)
where e1.name = 'Sam'
Also useful to find duplicates in a table, but this can be very inefficient.
There are several great examples of using self-joins here. The one I often use relates to "timetables". I work with timetables in education, but they are relevant in other cases too.
I use self-joins to work out whether two items clash with one another, e.g. a student is scheduled for two lessons which happen at the same time, or a room is double booked. For example:
CREATE TABLE StudentEvents(
StudentId int,
EventId int,
EventDate date,
StartTime time,
EndTime time
)
SELECT
se1.StudentId,
se1.EventDate,
se1.EventId Event1Id,
se1.StartTime as Event1Start,
se1.EndTime as Event1End,
se2.StartTime as Event2Start,
se2.EndTime as Event2End,
FROM
StudentEvents se1
JOIN StudentEvents se2 ON
se1.StudentId = se2.StudentId
AND se1.EventDate = se2.EventDate
AND se1.EventId > se2.EventId
--The above line prevents (a) an event being seen as clashing with itself
--and (b) the same pair of events being returned twice, once as (A,B) and once as (B,A)
WHERE
se1.StartTime < se2.EndTime AND
se1.EndTime > se2.StartTime
Similar logic can be used to find other things in "timetable data", such as a pair of trains it is possible to take from A via B to C.
Self joins are useful whenever you want to compare records of the same table against each other. Examples are: Find duplicate addresses, find customers where the delivery address is not the same as the invoice address, compare a total in a daily report (saved as record) with the total of the previous day etc.

What is the best way to design this particular database/SQL issue?

Here's a tricky normalization/SQL/Database Design question that has been puzzling us. I hope I can state it correctly.
You have a set of activities. They are things that need to be done -- a glorified TODO list. Any given activity can be assigned to an employee.
Every activity also has an enitity for whom the activity is to be performed. Those activities are either a Contact (person) or a Customer (business). Each activity will then have either a Contact or a Customer for whom the activity will be done. For instance, the activity might be "Send a thank you card to Spacely Sprockets (a customer)" or "Send marketing literature to Tony Almeida (a Contact)".
From that structure, we then need to be able to query to find all the activities a given employee has to do, listing them in a single relation that would be something like this in it simplest form:
-----------------------------------------------------
| Activity | Description | Recipient of Activity |
-----------------------------------------------------
The idea here is to avoid having two columns for Contact and Customer with one of them null.
I hope I've described this correctly, as this isn't as obvious as it might seem at first glance.
So the question is: What is the "right" design for the database and how would you query it to get the information asked for?
It sounds like a basic many-to-many relationship and I'd model it as such.
The "right" design for this database is to have one column for each, which you say you are trying to avoid. This allows for a proper foreign key relationship to be defined between those two columns and their respective tables. Using the same column for a key that refers to two different tables will make queries ugly and you can't enforce referential integrity.
Activities table should have foreign keys ContactID, CustomerID
To show activities for employee:
SELECT ActivityName, ActivityDescription, CASE WHEN a.ContactID IS NOT NULL THEN cn.ContactName ELSE cu.CustomerName END AS Recipient
FROM activity a
LEFT JOIN contacts cn ON a.ContactID=cn.ContactID
LEFT JOIN customers cu ON a.CustomerID=cu.CustomerID
It's not clear to me why you are defining Customers and Contacts as separate entities, when they seem to be versions of the same entity. It seems to me that Customers are Contacts with additional information. If at all possible, I'd create one table of Contacts and then mark the ones that are Customers either with a field in that table, or by adding their ids to a table Customers that has the extended singleton customer information in it.
If you can't do that (because this is being built on top of an existing system the design of which is fixed) then you have several choices. None of the choices are good because they can't really work around the original flaw, which is storing Customers and Contacts separately.
Use two columns, one NULL, to allow referential integrity to work.
Build an intermediate table ActivityContacts with its own PK and two columns, one NULL, to point to the Customer or Contact. This allows you to build a "clean" Activity system, but pushes the ugliness into that intermediate table. (It does provide a possible benefit, which is that it allows you to limit the target of activities to people added to the intermediate table, if that's an advantage to you).
Carry the original design flaw into the Activities system and (I'm biting my tongue here) have parallel ContactActivity and CustomerActivity tables. To find all of an employee's assigned tasks, UNION those two tables together into one in a VIEW. This allows you to maintain referential integrity, does not require NULL columns, and provides you with a source from which to get your reports.
Here is my stab at it:
Basically you need activities to be associated to 1 (contact or Customer) and 1 employee that is to be a responsible person for the activity. Note you can handle referential constraint in a model like this.
Also note I added a businessEntity table that connects all People and places. (sometimes useful but not necessary). The reason for putting the businessEntity table is you could simple reference the ResponsiblePerson and the Recipient on the activity to the businessEntity and now you can have activities preformed and received by any and all people or places.
If I've read the case right, Recipients is a generalization of Customers and Contacts.
The gen-spec design pattern is well understood.
Data modeling question
You would have something like follows:
Activity | Description | Recipient Type
Where Recipient Type is one of Contact or Customer
You would then execute a SQL select statement as follows:
Select * from table where Recipient_Type = 'Contact';
I realize there needs to be more information.
We will need an additional table that is representative of Recipients(Contacts and Customers):
This table should look as follows:
ID | Name| Recipient Type
Recipient Type will be a key reference to the table initially mentioned earlier in this post. Of course there will need to be work done to handle cascades across these tables, mostly on updates and deletes. So to quickly recap:
Recipients.Recipient_Type is a FK to Table.Recipient_Type

			
				
[ActivityRecipientRecipientType]
ActivityId
RecipientId
RecipientTypeCode
||| ||| |||_____________________________
| | |
| -------------------- |
| | |
[Activity] [Recipient] [RecipientType]
ActivityId RecipientId RecipientTypeCode
ActivityDescription RecipientName RecipeintTypeName
select
[Activity].ActivityDescription
, [Recipient].RecipientName
from
[Activity]
join [ActivityRecipientRecipientType] on [Activity].ActivityId = [ActivityRecipientRecipientType].ActivityId
join [Recipient] on [ActivityRecipientRecipientType].RecipientId = [Recipient].RecipientId
join [RecipientType] on [ActivityRecipientRecipientType].RecipientTypeCode = [RecipientType].RecipientTypeCode
where [RecipientType].RecipientTypeName = 'Contact'
Actions
Activity_ID | Description | Recipient ID
-------------------------------------
11 | Don't ask questions | 0
12 | Be cool | 1
Activities
ID | Description
----------------
11 | Shoot
12 | Ask out
People
ID | Type | email | phone | GPS |....
-------------------------------------
0 | Troll | troll#hotmail.com | 232323 | null | ...
1 | hottie | hottie#hotmail.com | 2341241 | null | ...
select at.description,a.description, p.* from Activities at, Actions a, People p
where a."Recipient ID" = p.ID
and at.ID=a.activity_id
result:
Shoot | Don't ask questions | 0 | Troll | troll#hotmail.com | 232323 | null | ...
Ask out | Be cool | 1 | hottie | hottie#hotmail.com | 2341241 |null | ...
Model another Entity: ActivityRecipient, which will be inherited by ActivityRecipientContact and ActivityRecipientCustomer, which will hold the proper Customer/Contact ID.
The corresponding tables will be:
Table: Activities(...., RecipientID)
Table: ActivityRecipients(RecipientID, RecipientType)
Table: ActivityRecipientContacts(RecipientID, ContactId, ...,ExtraContactInfo...)
Table: ActivityRecipientCustomers(RecipentID, CustomerId, ...,ExtraCustomerInfo...)
This way you can also have different other columns for each recipient type
I would revise that definition of Customer and Contact. A customer can be either an person or a business, right? In Brazil, there's the terms 'pessoa jurídica' and 'pessoa física' - which in a direct (and mindless) translation become 'legal person' (business) and 'physical person' (individual). A better translation was suggested by Google: 'legal entity' and 'individual'.
So, we get an person table and have an 'LegalEntity' and 'Individual' tables (if there's enough attributes to justify it - here there's plenty). And the receiver become an FK to Person table.
And where has gone the contacts? They become an table that links to person. Since a contact is a person that is contact of another person (example: my wife is my registered contact to some companies I'm customer). People can have contacts.
Note: I used the word 'Person' but you can call it 'Customer' to name that base table.

Dynamic Tables?

I have a database that has different grades per course (i.e. three homeworks for Course 1, two homeworks for Course 2, ... ,Course N with M homeworks). How should I handle this as far as database design goes?
CourseID HW1 HW2 HW3
1 100 99 100
2 100 75 NULL
EDIT
I guess I need to rephrase my question. As of right now, I have two tables, Course and Homework. Homework points to Course through a foreign key. My question is how do I know how many homeworks will be available for each class?
No, this is not a good design. It's an antipattern that I called Metadata Tribbles. You have to keep adding new columns for each homework, and they propagate out of control.
It's an example of repeating groups, which violates the First Normal Form of relational database design.
Instead, you should create one table for Courses, and another table for Homeworks. Each row in Homeworks references a parent row in Courses.
My question is how do I know how many homeworks will be available for each class?
You'd add rows for each homework, then you can count them as follows:
SELECT CourseId, COUNT(*) AS Num_HW_Per_Course
FROM Homeworks
GROUP BY CourseId
Of course this only counts the homeworks after you have populated the table with rows. So you (or the course designers) need to do that.
Decompose the table into three different tables. One holds the courses, the second holds the homeworks, and the third connects them and stores the result.
Course:
CourseID CourseName
1 Foo
Homework:
HomeworkID HomeworkName HomeworkDescription
HW1 Bar ...
Result:
CourseID HomeworkID Result
1 HW1 100

Should I split this table into two?

I am trying to wrap my head around database normalization. This is my first time trying to create a working database so please forgive me for my ignorance. I am trying to create an automated grad Check system for a class project. The following table keeps track of all options for a major for a set number of catalog years. The table is as follows
PID Title Dept Courses Must_have
Some options give the user a choice of a set number of classes out of the total listed (hence the Must_have attribute). A completed row would look like this:
PID Title Dept Courses Must_have
--------------------------------------------
1 bis acct 201|202 NULL
Title is the name of the option that can come with the major. If bis (business information systems) had a choice of classes, one row would have a number in the Must_have for only one row.
My question is should I split this table into two different tables? I know the way I currently have it seems somewhat... well wrong. Any help would be greatly appreciated.
I would break dept into a separate table and associate it with a numeric ID. Then break your "courses" field into a "join table". Something like this:
majors
Id Title DepartmentID
major_courses
Id MajorId CourseId MustHave
departments
Id Title
So that, you may have a major like:
1 bis 1
a major_course like:
1 1 201 0
1 1 202 0
1 1 203 1 -- must have 203
then departments like:
1 bis
So now, to get a list of courses for the first major you can do this:
SELECT major_courses.CourseId, major_courses.MustHave, departments.Title
FROM majors
RIGHT JOIN major_courses ON major_courses.CourseId = majors.Id
INNER JOIN departments ON departments.Id = majors.DepartmendID
WHERE major.id = 1
I would split it into three tables. The first would be majors and would contain PID, Title, Dept, the second would be courses, containing the course ID, course name and any other information, and the last would be a mapping between majors and courses (perhaps named courses_majors). The courses_majors table would contain the ID of the major, the ID of a course and a flag to show whether or not it is required by that major.
(This is assuming that one course could be used in multiple majors)