SQL normalization - sql

right now, i have a table:
Id - CollegeName - CourseName
this table is not normalized so i have many Courses for every 1 College
I need to normalize this into two tables:
Colleges: CollegeID - CollegeName
Courses: CourseID - CollegeID - CourseName
Is there an easy way to do this?
Thank you

CREATE TABLE dbo.College
(
CollegeId int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
CollegeName nvarchar(100) NOT NULL
)
CREATE TABLE dbo.Course
(
CourseId int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
CollegeId int NOT NULL,
CourseName nvarchar(100) NOT NULL
)
ALTER TABLE dbo.Course
ADD CONSTRAINT FK_Course_College FOREIGN KEY (CollegeId)
REFERENCES dbo.College (CollegeId)
--- add colleges
INSERT INTO dbo.College (CollegeName)
SELECT DISTINCT CollegeName FROM SourceTable
--- add courses
INSERT INTO dbo.Course (CollegeId, CourseName)
SELECT
College.CollegeId,
SourceTable.CourseName
FROM
SourceTable
INNER JOIN
dbo.College ON SourceTable.CollegeName = College.CollegeName

If you create the 2 new tables with Colleges.CollegeID and Courses.CourseID as auto numbered fields, you can go with :
INSERT INTO Colleges (CollegeName)
SELECT DISTINCT CollegeName
FROM OLdTable ;
INSERT INTO Courses (CollegeID, CourseName)
SELECT Colleges.CollegeID, OldTable.CourseName
FROM OldTable
JOIN Colleges
ON OldTable.CollegeName = Colleges.CollegeName ;

I agreed with #Andomar's first comment: remove the seemingly redundant Id column and your CollegeName, CourseName table is already in 5NF.
What I suspect you need is a further table to give courses attributes so that you can model the fact that, say, Durham University's B.Sc. in Computing Science is comparable with Harvard's A.B. in Computer Science (via attributes 'computing major', 'undergraduate', 'country=US, 'country=UK', etc).

Sure.
Create a College table with a college_id (primary key) column, and a college_name column which is used as a unique index column.
Just refer to the college_id column, not college_name, in the Course table.

Related

Update and renew data based on data in other tables

There are 3 tables student, course, and takes as following
CREATE TABLE student
(
ID varchar(5),
name varchar(20) NOT NULL,
dept_name varchar(20),
tot_cred numeric(3,0) CHECK (tot_cred >= 0),
PRIMARY KEY (ID),
FOREIGN KEY (dept_name) REFERENCES department
ON DELETE SET NULL
)
CREATE TABLE takes
(
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4,0),
grade varchar(2),
PRIMARY KEY (ID, course_id, sec_id, semester, year),
FOREIGN KEY (course_id, sec_id, semester, year) REFERENCES section
ON DELETE CASCADE,
FOREIGN KEY (ID) REFERENCES student
ON DELETE CASCADE
)
CREATE TABLE course
(
course_id varchar(8),
title varchar(50),
dept_name varchar(20),
credits numeric(2,0) CHECK (credits > 0),
PRIMARY KEY (course_id),
FOREIGN KEY (dept_name) REFERENCES department
ON DELETE SET NULL
)
tot_cred column data in the student table now is assigned with random values (not correct), I want to perform the query that updates and renews those data based on the course's grade each student has taken. For those students who received F grade will be excluded and those who didn't take any course will be assigned 0 as tot_cred.
I came up with two approaches, one is
UPDATE student
SET tot_cred = (SELECT SUM(credits)
FROM takes, course
WHERE takes.course_id = course.course_id
AND student.ID = takes.ID
AND takes.grade <> 'F'
AND takes.grade IS NOT NULL)
This query meets all my needs, but for those students who didn't take any course, it does assign NULL value instead of 0.
The second is using case when
UPDATE student
SET tot_cred = (select sum(credits)
case
when sum(credits) IS NOT NULL then sum(credits)
else 0 end
FROM takes as t, course as c
WHERE t.course_id = c.course_id
AND t.grade<>'F' and t.grade IS NOT NULL
)
But it assigned 0 to all students. Is any way to achieve the above requirement?
If the 1st query meets your requirement and the only problem is that it returns NULL for the students that did not take any course then the easiest solution would be to use instead of SUM() aggregate function the function TOTAL() which will return 0 instead of NULL:
UPDATE student AS s
SET tot_cred = (
SELECT TOTAL(c.credits)
FROM takes t INNER JOIN course c
ON t.course_id = c.course_id
WHERE t.ID = s.ID AND t.grade <> 'F' AND t.grade IS NOT NULL
);
The same could be done with COALESCE():
SELECT COALESCE(SUM(credits), 0)...
Also, use a proper join with an ON clause and aliases for the tables to improve readability.

For each ‘CptS’ course, find the percentage of the students who failed the course. Assume a passing grade is 2 or above

CREATE TABLE Course (
courseno VARCHAR(7),
credits INTEGER NOT NULL,
enroll_limit INTEGER,
classroom VARCHAR(10),
PRIMARY KEY(courseNo), );
CREATE TABLE Student (
sID CHAR(8),
sName VARCHAR(30),
major VARCHAR(10),
trackcode VARCHAR(10),
PRIMARY KEY(sID),
FOREIGN KEY (major,trackcode) REFERENCES Tracks(major,trackcode) );
CREATE TABLE Enroll (
courseno VARCHAR(7),
sID CHAR(8),
grade FLOAT NOT NULL,
PRIMARY KEY (courseNo, sID),
FOREIGN KEY (courseNo) REFERENCES Course(courseNo),
FOREIGN KEY (sID) REFERENCES Student(sID) );
So far I've been able to create two seperate queries, one that counts the number of people who failes. And the other counts the number of people who passed. I'm having trouble combining these to produce the number of people passed / number of people failed. For each course.
SELECT course.courseno, COUNT(*) FROM course inner join enroll on enroll.courseno = course.courseno
WHERE course.courseno LIKE 'CptS%' and enroll.grade < 2
GROUP BY course.courseno;
SELECT course.courseno, COUNT(*) FROM course inner join enroll on enroll.courseno = course.courseno
WHERE course.courseno LIKE 'CptS%' and enroll.grade > 2
GROUP BY course.courseno;
The end result should look something like
courseno passrate
CptS451 100
CptS323 100
CptS423 66
You can do a conditional average for this:
select
courseno,
avg(case when grade > 2 then 100.0 else 0 end) passrate
from enroll
where courseno like 'CptS%'

SQLite, aggregation query as where clause

Given the schema:
CREATE TABLE Student (
studentID INT PRIMARY KEY NOT NULL,
studentName TEXT NOT NULL,
major TEXT,
class TEXT CHECK (class IN ("Freshman", "Sophomore", "Junior", "Senior")),
gpa FLOAT CHECK (gpa IS NULL OR (gpa >= 0 AND gpa <= 4)),
FOREIGN KEY (major) REFERENCES Dept(deptID) ON UPDATE CASCADE ON DELETE CASCADE
);
CREATE TABLE Dept (
deptID TEXT PRIMARY KEY NOT NULL CHECK (LENGTH(deptID) <= 4),
NAME TEXT NOT NULL UNIQUE,
building TEXT
);
CREATE TABLE Course (
courseNum INT NOT NULL,
deptID TEXT NOT NULL,
courseName TEXT NOT NULL,
location TEXT,
meetDay TEXT NOT NULL CHECK (meetDay IN ("MW", "TR", "F")),
meetTime INT NOT NULL CHECK (meetTime >= '07:00' AND meetTime <= '17:00'),
PRIMARY KEY (courseNum, deptID),
FOREIGN KEY (deptID) REFERENCES Dept(deptID) ON UPDATE CASCADE ON DELETE CASCADE
);
CREATE TABLE Enroll (
courseNum INT NOT NULL,
deptID TEXT NOT NULL,
studentID INT NOT NULL,
PRIMARY KEY (courseNum, deptID, studentID),
FOREIGN KEY (courseNum, deptID) REFERENCES Course ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (studentID) REFERENCES Student(studentID) ON UPDATE CASCADE ON DELETE CASCADE
);
I'm attempting to find the names, IDs, and the number of courses they are taking, for the students who are taking the highest number of courses. The sELECT to retrieve the names and IDs is simple enough, however I'm having trouble figuring out how to select the number of courses each student is taking, and then find the max of that and use it as a WHERE clause.
This is what I have so far:
SELECT Student.studentName, Student.studentID, COUNT(*) AS count
FROM Enroll
INNER JOIN Student ON Enroll.studentID=Student.studentID
GROUP BY Enroll.studentID
So first you get count of all the enrolled classes per student
SELECT COUNT() AS num
FROM Enroll
GROUP BY studentID
You can then check that against your existing query using HAVING to get your final query.
SELECT Student.studentName,Student.studentID,COUNT(*) AS count
FROM Enroll
INNER JOIN Student ON Enroll.studentID=Student.studentID
GROUP BY Enroll.studentID
HAVING COUNT()=(SELECT COUNT() AS num FROM Enroll GROUP BY studentID);
So to recap this basically gets the number which represents the highest number of enrollments for any student, then gets all students where that number is their count of enrollments, thus all students which have the highest, or equal highest number of enrollments.
We use HAVING because it is applied after the GROUP BY, meaning you can't use aggregate functions such as COUNT() in a WHERE clause.

SQL DML Query AVG and COUNT

I am beginner at SQL and I am trying to create a query.
I have these tables:
CREATE TABLE Hospital (
hid INT PRIMARY KEY,
name VARCHAR(127) UNIQUE,
country VARCHAR(127),
area INT
);
CREATE TABLE Doctor (
ic INT PRIMARY KEY,
name VARCHAR(127),
date_of_birth INT,
);
CREATE TABLE Work (
hid INT,
ic INT,
since INT,
FOREIGN KEY (hid) REFERENCES Hospital (hid),
FOREIGN KEY (ic) REFERENCES Doctor (ic),
PRIMARY KEY (hid,ic)
);
The query is: What is the average in each country of the number of doctors working in hospitals of that country (1st column: each country, 2nd column: average)? Thanks.
You first need to write a query that counts the doctors per hospital
select w.hid, count(w.ic)
from work w
group by w.hid;
Based on that query, you can retrieve the average number of doctors per country:
with doctor_count as (
select w.hid, count(w.ic) as cnt
from work w
group by w.hid
)
select h.country, avg(dc.cnt)
from hospital h
join doctor_count dc on h.hid = dc.hid
group by h.country;
If you have an old DBMS that does not support common table expressions the above can be rewritten as:
select h.country, avg(dc.cnt)
from hospital h
join (
select w.hid, count(w.ic) as cnt
from work
group by w.hid
) dc on h.hid = dc.hid;
Here is an SQLFiddle demo: http://sqlfiddle.com/#!12/9ff79/1
Btw: storing date_of_birth as an integer is a bad choice. You should use a real DATE column.
And work is a reserved word in SQL. You shouldn't use that for a table name.

How to I design a database constraint so two entities can only have a many to many relationship if two field values within them match?

I have a database with four tables as follows:
Addressbook
--------------------
id
more fields
Contact
---------------------
id
addressbook id
more fields
Group
---------------------
id
addressbook id
more fields
Group to Contact
---------------------
Composite key
Group id
Contact id
My relationships are one to many for addressbook > contact, one to many for addressbook > group and many to many between contact and groups.
So in summary, I have an addressbook. Contacts and groups can be stored within it and they cannot be stored in more than one addressbook. Furthermore as many contacts that are needed can be added to as many groups as are needed.
My question now poses as follows. I wish to add the constraint that a contact can only be a member of a group if both of them have the same addressbook id.
As I am not a database person this is boggling my brain. Does this mean I have designed my table structure wrong? Or does this mean that I have to add a check somewhere before inserting into the group to contact table? This seems wrong to me because I would want it to be impossible for SQL queries to link contacts to groups if they do not have the same id.
You should be able to accomplish this by adding a addressbook_id column to your Group to Contact bridge table, then using a compound foreign key to both the Contacts and Groups tables.
In PostgreSQL (but easily adaptable to any DB, or at least any DB that supports compound FKs):
CREATE TABLE group_to_contact (
contact_id INT,
group_id INT,
addressbook_id INT,
CONSTRAINT contact_fk FOREIGN KEY (contact_id,addressbook_id)
REFERENCES contacts(id,addressbook_id),
CONSTRAINT groups_fk FOREIGN KEY (group_id,addressbook_id)
REFERENCES groups(id,addressbook_id)
)
By using the same addressbook_id column in both constraints, you are of course enforcing that they are the same in both referenced tables.
OK - the Many to Many is governed by the GroupToContact table.
So the constraints are between Group and GroupToContact and between Contact and GroupToContact (GTC)
Namely
[Group].groupId = GTC.GroupId AND [Group].AddressBookid = GTC.AddressBookId
And
Contact.ContactId = GTC.ContactID AND Contact.AddressBookId = GTC.AddressBookId
So you will need to add AddressBookId to GroupToContact table
One further note - you should not define any relationship between Contact and Group directly - instead you just define the OneToMany relationships each has with the GroupToContact table.
As BonyT suggestion:
Addressbook
---------------
*id*
...more fields
PRIMARY KEY (id)
Contact
-----------
*id*
addressbook_id
...more fields
PRIMARY KEY (id)
FOREIGN KEY (addressbook_id)
REFERENCES Addressbook(id)
Group
---------
*id*
addressbook_id
...more fields
PRIMARY KEY (id)
FOREIGN KEY (addressbook_id)
REFERENCES Addressbook(id)
Group to Contact
--------------------
*group_id*
*contact_id*
addressbook_id
PRIMARY KEY (group_id, contact_id)
FOREIGN KEY (addressbook_id, contact_id)
REFERENCES Contact(addressbook, id)
FOREIGN KEY (addressbook_id, group_id)
REFERENCES Group(addressbook, id)
As A CHECK Constraint can't include sub-queries.
You could create a trigger that checks that the group and contact have the same addressbookid
and generate an error if they do not.
Although a database trigger defined to enforce an integrity rule does not check the data already in the table, I would recommended that you use a trigger only when the integrity rule cannot be enforced by an integrity constraint.
CREATE TRIGGER tr_Group_to_Contact_InsertOrUpdate on Group_to_Contact
FOR INSERT, UPDATE AS
IF (SELECT Count(*) FROM inserted i
INNER JOIN Group g ON i.groupid= g.groupid AND a.addressbookid=i.addressbookid
INNER JOIN Address a ON a.addressbookid=I.addressbookid AND a.addressd=i.addressid) = 0
BEGIN
RAISERROR('Address Book Mismatch', 16, 1)
rollback tran
END
Note:(This is from memory so probably not syntactically correct)
In your E-R (Entity-Relationship) model, the entities Group and Contact are (or should be) "dependent entities", which is to say that the existence of a Group or Contact is predicated upon that of 1 or more other entities, in this case AddressBook, that contributes to the identity of the dependent entity. The primary key of a dependent entity is composite and includes foreign keys to the entity(ies) upon which it is dependent.
The primary key of both Contact and Group include the primary key of the AddressBook to which they belong. Once you do that, everything falls into place:
create table Address
(
id int not null ,
... ,
primary key (id) ,
)
create table Contact
(
address_book_id int not null ,
id int not null ,
... ,
primary key ( address_book_id , id ) ,
foreign key ( address_book_id ) references AddressBook ( id ) ,
)
create table Group
(
address_book_id int not null ,
id int not null ,
... ,
primary key ( address_book_id , id ) ,
foreign key ( address_book_id ) references AddressBook( id ) ,
)
create table GroupContact
(
address_book_id int not null ,
contact_id int not null ,
group_id int not null ,
primary key ( address_book_id , contact_id , group_id ) ,
foreign key ( address_book_id , contact_id ) references Contact ( address_book_id , id ) ,
foreign key ( address_book_id , group_id ) references Group ( address_book_id , id ) ,
)
Cheers.