Should I split this table into two? - sql

I am trying to wrap my head around database normalization. This is my first time trying to create a working database so please forgive me for my ignorance. I am trying to create an automated grad Check system for a class project. The following table keeps track of all options for a major for a set number of catalog years. The table is as follows
PID Title Dept Courses Must_have
Some options give the user a choice of a set number of classes out of the total listed (hence the Must_have attribute). A completed row would look like this:
PID Title Dept Courses Must_have
--------------------------------------------
1 bis acct 201|202 NULL
Title is the name of the option that can come with the major. If bis (business information systems) had a choice of classes, one row would have a number in the Must_have for only one row.
My question is should I split this table into two different tables? I know the way I currently have it seems somewhat... well wrong. Any help would be greatly appreciated.

I would break dept into a separate table and associate it with a numeric ID. Then break your "courses" field into a "join table". Something like this:
majors
Id Title DepartmentID
major_courses
Id MajorId CourseId MustHave
departments
Id Title
So that, you may have a major like:
1 bis 1
a major_course like:
1 1 201 0
1 1 202 0
1 1 203 1 -- must have 203
then departments like:
1 bis
So now, to get a list of courses for the first major you can do this:
SELECT major_courses.CourseId, major_courses.MustHave, departments.Title
FROM majors
RIGHT JOIN major_courses ON major_courses.CourseId = majors.Id
INNER JOIN departments ON departments.Id = majors.DepartmendID
WHERE major.id = 1

I would split it into three tables. The first would be majors and would contain PID, Title, Dept, the second would be courses, containing the course ID, course name and any other information, and the last would be a mapping between majors and courses (perhaps named courses_majors). The courses_majors table would contain the ID of the major, the ID of a course and a flag to show whether or not it is required by that major.
(This is assuming that one course could be used in multiple majors)

Related

Problem with running Efficient search on a DB table based on multiple permutations

I have a table with StudentId to subject mapping and below are the two possible schema's
DB: RDBMS
schema1:
StudentId
Subject
1
a
1
b
2
a
2
c
3
b
4
b
4
c
schema2:
StudentId
Subjects
1
a, b
2
a, c
3
b
4
b, c
Constraints:
The maximum number of subjects that a student can subscribe to can be 10. The total number of subjects available are ~50.
Use Case:
I would like to filter students subscribed to a list of subjects, only those requested subjects and nothing outside.
Eg: if I look for students subscribed to subjects a,b then the result should only include students subscribed to a,b or a or b. not a,c or b,c
If I run this search in the sample schema provided above then
Expected result: StudentId 1 and 3. Note that 2 and 4 are not part of the result because of c
Solution1:
If I go with Schema1 then I have to query table "where subject in ('a' , 'b')" which will return all studentIds and I will have have additional logic after fetching the result and discard studentId 2 and 4
And the same with schema2 as well where i run a LIKE "%a%" or LIKE "%b%" which returns all the result.
I can also create a list of possible combinations to query like 'a' or 'b' or 'a,b' but this combinations set will grow huge if requesting for multiple subjects.
Problem:
Always returns a huge result set which will need additional processing.
Solution2:
select distinct(StudentId) from schema1 where subject NOT IN ('c');
select StudentId from schema2 where subjects NOT LIKE "%c%";
Here I can directly fetch only the studentIds as expected but this solution has the below problem
Will involve me to know all available subjects
Run a query filter with all subjects except for the requested subjects
The subject list can grow over time and the query becomes inefficient
Solution3:
select distinct(StudentId)
from schema1 where StudentId NOT IN (
select distinct(StudentId)
from schema1 where subject NOT IN ('a', 'b')
);
Is there an efficient way to run this query? Or a better schema design that would help with the use case? Is there other database technologies that are better at solving such problems efficiently?
There is no consideration at all. A table represents an entity. Your data has two entities: students and subjects.
These entities are connected by an n-m relationship: any student can have multiple subjects; any subject can have multiple students.
So, you want three tables:
students
subjects
studentSubjects
In relational databases, you do not want to store multiple items in a single column, particularly in a string column. Of your two options, the first is basically correct, but I would recommend a subjects table and numeric ids for studentIds and subjectIds.

How to remove line duplicates SQL via compare two same table

How to remove duplicate values a = b and b = a?
with a as(select w.id , w.doc, w.org
, d.name_s, d.name_f, d.name_p, d.spec
, o.name, o.extid
from crm_s_workplaces w
join crm_s_docs d on d.id=w.doc
join crm_s_orgs o on o.id=w.org
where d.active=1 and d.cst='NY' and w.active=1 and w.cst='NY' and o.active=1
and
o.cst='NY')
select a1.doc, a2.doc,
a1.org,a1.name_s,a1.name_f,a1.name_p,a2.name_s,a2.name_f,a2.name_p from a a1
join a a2 on
a1.name_s=a2.name_s and
substr(a1.name_f,1,1)=substr(a2.name_f,1,1) and
substr(a1.name_p,1,1)=substr(a2.name_p,1,1) and
a1.org=a2.org and
a1.spec<>a2.spec
order by a1.name_s `enter code here`
ER model diagram:
Repeat example:
Sometimes comes across a1.spec > a2.spec:
What you are calling "duplicates" are actually not duplicates in your database.
You basically have multiple doc records for what could be the same person or not. See that even their names do not always match. For instace,
doc_id 1131385 has NAME_F = "Gabr" while
doc_id 1447530 has NAME_F = "Gabor"
In your database these are two different entities, and you cannot match them using primary key. You can try to join on the first, middle and last names, but as you can see in the above example with Gabor/Gabr, even that would not work.
Can you change the schema of the db? If so you need to separate the docs in one table - 1 record per doctor. And have the specialization in a separate table with the folloing columns:
spec_id (int, PK)
doc_id (foreign key to Doc table)
specialization
that way, if you have 1 doctor with 3 specs, he/she will show up only once in doc table, and multiple times in spec table.
I just notice something else. You have spec field in table workplaces. why? If you meant to say that Doc Gabor works as admin in hospital 1 but as a Therapist in hospital 2, you can do that. However, you have to remove the spec field from the doc table and only use the spec in workplaces table.

Purpose of Self-Joins

I am learning to program with SQL and have just been introduced to self-joins. I understand how these work, but I don't understand what their purpose is besides a very specific usage, joining an employee table to itself to neatly display employees and their respective managers.
This usage can be demonstrated with the following table:
EmployeeID | Name | ManagerID
------ | ------------- | ------
1 | Sam | 10
2 | Harry | 4
4 | Manager | NULL
10 | AnotherManager| NULL
And the following query:
select worker.employeeID, worker.name, worker.managerID, manager.name
from employee worker join employee manager
on (worker.managerID = manager.employeeID);
Which would return:
Sam AnotherManager
Harry Manager
Besides this, are there any other circumstances where a self-join would be useful? I can't figure out a scenario where a self-join would need to be performed.
Your example is a good one. Self-joins are useful whenever a table contains a foreign key into itself. An employee has a manager, and the manager is... another employee. So a self-join makes sense there.
Many hierarchies and relationship trees are a good fit for this. For example, you might have a parent organization divided into regions, groups, teams, and offices. Each of those could be stored as an "organization", with a parent id as a column.
Or maybe your business has a referral program, and you want to record which customer referred someone. They are both 'customers', in the same table, but one has a FK link to another one.
Hierarchies that are not a good fit for this would be ones where an entity might have more than one "parent" link. For example, suppose you had facebook-style data recording every user and friendship links to other users. That could be made to fit in this model, but then you'd need a new "user" row for every friend that a user had, and every row would be a duplicate except for the "relationshipUserID" column or whatever you called it.
In many-to-many relationships, you would probably have a separate "relationship" table, with a "from" and "to" column, and perhaps a column indicating the relationship type.
I found self joins most useful in situations like this:
Get all employees that work for the same manager as Sam. (This does not have to be hierarchical, this can also be: Get all employees that work at the same location as Sam)
select e2.employeeID, e2.name
from employee e1 join employee e2
on (e1.managerID = e2.managerID)
where e1.name = 'Sam'
Also useful to find duplicates in a table, but this can be very inefficient.
There are several great examples of using self-joins here. The one I often use relates to "timetables". I work with timetables in education, but they are relevant in other cases too.
I use self-joins to work out whether two items clash with one another, e.g. a student is scheduled for two lessons which happen at the same time, or a room is double booked. For example:
CREATE TABLE StudentEvents(
StudentId int,
EventId int,
EventDate date,
StartTime time,
EndTime time
)
SELECT
se1.StudentId,
se1.EventDate,
se1.EventId Event1Id,
se1.StartTime as Event1Start,
se1.EndTime as Event1End,
se2.StartTime as Event2Start,
se2.EndTime as Event2End,
FROM
StudentEvents se1
JOIN StudentEvents se2 ON
se1.StudentId = se2.StudentId
AND se1.EventDate = se2.EventDate
AND se1.EventId > se2.EventId
--The above line prevents (a) an event being seen as clashing with itself
--and (b) the same pair of events being returned twice, once as (A,B) and once as (B,A)
WHERE
se1.StartTime < se2.EndTime AND
se1.EndTime > se2.StartTime
Similar logic can be used to find other things in "timetable data", such as a pair of trains it is possible to take from A via B to C.
Self joins are useful whenever you want to compare records of the same table against each other. Examples are: Find duplicate addresses, find customers where the delivery address is not the same as the invoice address, compare a total in a daily report (saved as record) with the total of the previous day etc.

Returning duplicated values only once from a join query

I'm trying to extract info from a table in my database based on a persons job. In one table i have all the clients info, in another table linked by ID_no their job title and the branches theyre associated with. the problem I'm having is when i join both tables I'm returning some duplicates because a person can be associated with more than one branch.
I would like to know how to return the duplicated values only once, because all I care about for the moment is the persons id number and what their job title is.
SELECT *
FROM dbo.employeeinfo AS ll
LEFT OUTER JOIN employeeJob AS lly
ON ll.id_no = lly.id_no
WHERE lly.job_category = 'cle'
I know Select Distinct will not work in this situation since the duplicated values return different branches.
Any help would be appreciated. Thanks
I'm using sql server 2008 by the way
*edit to show result i would like
------ ll. ll. lly. lly.
rec_ID --employeeID---Name-----JobTitle---Branch------
1 JX100 John cle london
2 JX100 John cle manchester
3 JX690 Matt 89899 london
4 JX760 Steve 12345 london
I would like the second record to not display because i'm not interested in the branch. i just need to know the employee id and his job title, but because of how the tables are structured it's returning JX100 twice because he's recorded as working in 2 different branches
You must use SELECT DISTINCT and specify you ONLY want person id number and job title.
I don't know exactly your fields name, but I think something like this could work.
SELECT DISTINCT ll.id_no AS person_id_number,
lly.job AS person_job
FROM dbo.employeeinfo AS ll LEFT OUTER JOIN
employeeJob AS lly ON ll.id_no = lly.id_no
WHERE lly.job_category = 'cle'

SQL Merging Associated Records

Let's say we have a database with a table that has many other associated tables. If you diagrammed the database, this would be the table at the center with many foreign key relationships spiraling out of it.
To make it more concrete, let's say the two records in this central table are Initech and Contoso. Initech and Contoso are both associated with many other records in associated tables like Employees, AccountingTransactions, etc. Let's say the two merged (Initech bought Contoso) and from a data standpoint, it really is as simple as merging all the records. What's the easiest way to take all of Contoso's related records, make them point to Initech and then delete Contoso?
UPDATE with CASCADE comes tantalizingly close, but it obviously can't work without turning off constraints and then turning them back on (yuck).
Is there a nice generic way to do this without hunting down every single linked table and migrating them one by one? This has to be a common requirement. It's come up in two places in this project and can be summed up with: Entity A needs to control everything Entity B current controls. How can I make it happen?
Before Merge:
Companies
ID Name
1 Contoso
2 Initech
Employees
ID Name CompanyId
1 Bob 1
2 Ted 2
After Merge:
Companies
ID Name
2 Initech
Employees
ID Name CompanyId
1 Bob 2
2 Ted 2
All my attempts at searching only turned up questions about merging separate databases... so sorry if this has been asked before.
This query is likely vendor-dependent, but in MySQL:
UPDATE Employees e, Cars c, OtherEntity o
SET e.CompanyId = 2, c.CompanyId = 2, o.CompanyID = 2
WHERE e.CompanyID = 1 OR c.CompanyId = 1 OR o.CompanyId = 1;
Succinctly, no; there isn't a generic way to do it.
Consider your sample database with tables Companies, Employees, Departments, and AccountingTransactions.
You need to delete one of the company records (because after the merger, you will only record the current state of affairs).
You need to alter the employee records to change the employing company. However, it is quite possible that there is an employee number N in both companies, and one of those (presumably Contoso's) will have to be assigned a new employee number.
You probably face the problem that department 1 in Conotoso's data is Engineering, but in Initech's is Finance. So, you need to worry about how you are going to map the department numbers between the two companies, and then you face the problem of assigning Contoso's employees to Initech's departments.
For the historical accounting transactions, you probably have to keep Contoso's historical accounting records in Contoso's name, while some (of the most recent) transactions will need to be migrated to Initech's name. So maybe you won't be deleting the Contoso record from the table of companies after all, but you won't be able to use it to identify any new records.
These are just a small sampling of the reasons why such mappings cannot readily be automated.
No, there's no simple generic way of merging rows and cascading those changes throughout your system. You can script it all - which may be the best way, depending on your scenario - or devise a workaround.
One workaround might be to implement a parenting pattern on your central table (or abstract it to another table). You would then end up with something like
Companies
ID ParentID Name
1 2 Contoso
2 null Initech
or
Companies
ID ParentID Name
1 3 Contoso
2 3 Initech
3 null MegaInitech
and all your queries that join onto this central Companies table now check ID and ParentID;
SELECT *
FROM Employees
WHERE CompanyId IN (SELECT ID FROM Companies WHERE ID = #id OR ParentID = #ID)
Abstract this away to a view or function
CREATE FUNCTION fn_IsMemberOf
(
#companyId INT,
#parentId INT
)
RETURNS BIT
AS
BEGIN
DECLARE #result BIT = 0
SELECT #result = 1 FROM Companies
WHERE ID = #companyId
AND COALESCE(ParentID, ID) = #parentID
RETURN #result
END
SELECT *
FROM Employees
WHERE fn_IsMemberOf(CompanyId, 1) = 1
(haven't tested this but you get the idea)