how to index a one way relation table? - sql

I'm not a DB guy so this may be a trivial question...
Suppose
1) i have a relation table (I think that's what it's called), student_class, which holds a student_id and a class_id, (representing a many-to-many between a student table and a class table)
2) i do various query that results in a student_id (perhaps among other things) and then the results are "LEFT OUTER JOIN"ed to the student_class and LOJed again to the class table to get the associated class information.
3) i do that a lot, but i don't care to find the students from a given class, or any other thing you may think is common to do in the context of students and classes.
4) i have tens of thousands of students but only about 100 classes
5a) 99% of the students are not enrolled in any class (what a great school) and the rest are enrolled in only and only 1 class
5b) alternatively to 5a, on the average, every student is enrolled in about 2 classes
So how many and which of the indices below should i create in the student_class table for this sole purpose, and is the answer different for 5a and 5b?
a. index on student_id
b. index both student_id and class_id
c. index on class_id

I would create one index for each column.
There's not an argument from your question to only add index to class_id. You select values according to student_id and to class_id, so i think it's reasonable to have them both.
Additionally, your index needs don't change if there are more students enrolled in each class.
You make use of the indexes in the same way for both cases.
And for the amount of records you have, the indexes are going to be relatively small.

Related

SQL Recursive Query | multiple Tables Foreign Keys

Scenario
I have a few tables, each table represents an entity of a unique type. For example lets go with:
School, Subject, Class, Teacher. Listed in order as Parent -> Child
Schema
Each table has:
ID: UUID
Name: CHAR VARYING
{parent}_id: UUID<-- example, class would have Subject_id, or Teacher would have Class_id.
The {parent}_id is the foreign id for each table.
Problem
I want to make a query that lists all the teachers of a given school. In order to do this in this Schema, I need to first query Subject by School_id, then Class by subject_id and then finally teacher by class_id.
A recursive functions makes sense to me but all tutorials I find are doing this within a single table and by ids which don't change with each recursion. In my example, each recursion I will need to search for a different ID.
Question
How do you go about doing this? I could make an array of the ids and make an index, increase index and use that to access the id in the array. This however seems like a common query so I believe there might be a more elegant solution.
Note: I am using PostgreSQL
Edit for Comment
I am using PostgreSQL DB and PGAdmin
Why would UUID not work? It has worked up to this point with no problems; even works with cascading delete using foreign keys.
I can show actual schema. However here is a fictitious layout. Quite straight forward I hope.
School
ID
Name
Subject
ID
Name
School_ID
Class
ID
Name
Subject_ID
Teacher
ID
Name
Class_ID
Expected output
Teacher_ID, Teacher_Name, Class_Name, Subject_Name, School_Name
Something like?:
select
Teacher_ID, Teacher_Name, Class_Name, Subject_Name, School_Name
from
school
join
subject
on
school.id = subject.school_id
join
class
on
class.subject_id = subject.id
join
teacher
on
teacher.class_id = class.id

SQL question: how to find rows that share all of the same rows in a composite table?

I'm working on my SQL project using the Oracle database for class, and I'm asked a question that I see far too often.
You have three tables:
STUDENT: SNO, SNAME
CLASS: CNO, CNAME
ATTENDANCE: SNO, CNO, Grade
The question I keep finding is of a similar type: Find the names of the students that attend in all of the classes that "John" (or anyone else) attends.
John attends three classes, so I have to find the students that also attend those three classes (could be more, but those three must be there). However, I won't always know how many classes John (or whoever) attends, so it can't be hardcoded like that.
SELECT jclass.CNO
FROM attendance jclass
INNER JOIN student on jclass.SNO = student.SNO
WHERE student.SNAME = 'John';
This gets me the classes that John attends. I tried to add the identifier for the other students:
SELECT student.SNAME
FROM student
INNER JOIN attendance on student.SNO = attendance.SNO
INNER JOIN class on attendance.CNO = class.CNO
WHERE student.SNAME <> 'John'
AND class.CNO IN (SELECT jclass.CNO
FROM attendance jclass
INNER JOIN student on jclass.SNO = student.SNO
WHERE student.SNAME = 'John');
However, this only gets me the students that appear in at least one of John's classes, rather than all of them. I can see why it's doing this, but I'm not sure how to fix it. It's the one big struggle I'm having with SQL.
Here is one way - assuming SNO is primary key in the first table, CNO is primary key in the second table, and (SNO, CNO) is (composite) primary key in the third table, and that the input student is given by a unique identifier (first name is distinctly NOT a unique identifier, so the problem stated in terms of giving "John" as the input makes no sense). Here I assume the "special" student is identified by SNO = 1001; you can make 1001 into a variable, or change it to a subquery that selects a (unique!!) SNO based on some other inputs.
I didn't try to make the query as efficient as possible, or use features you most likely haven't seen in your class. Rather, I tried to make it as elementary and as readable as possible.
select sno
from attendance
where cno in (select cno from attendance where sno = 1001)
group by sno
having count(*) = (select count(*) from attendance where sno = 1001)
;
The strategy is simple: the subquery in the in condition finds the classes attended by the "special" student, then from the attendance table we select only rows for those classes. Group by student, and count. Keep only the students for whom the count is equal to the total count for the "special" student. Note the last condition is about groups, not about input rows, so it belongs in the having clause.

Can I skip a join in my select?

When learing about joins, our instructor says to not skip tables.
For example, lets do a query that Selects the Last_Name, First_name, and Numeric_Grade.
I would write
Select Last_Name, First_Name, Numeric_Grade
From Student
Join Grade
Using(Student_id)
He says to write
Select Last_Name, First_Name, Numeric_Grade
From Student
Join Enrollment
Using(Student_id)
Join Grade
Using(Student_id)
Im confused because as long as long as i can link them through similar fields, i dont see the point of going enrollment.
He has not given me a reason for going through enrollment, other than its what the Diagram shows. Follow the diagram.
Do I have to go through Enrollment? Is it the safe way to do it, or does it not matter because Grade and Student have a Student_id primary key?
Quoting Alice Rischert in Oracle SQL By Example, lab 7.2:
The second choice is to join the STUDENT_ID from the GRADE table directly to the STUDENT_ID of the STUDENT table, thus skipping the ENROLLMENT table entirely. - - This shortcut is perfectly acceptable, even if it does not follow the primary key/foreign key relationship path. In this case, you can be sure not to build a Cartesian product because you can guarantee only one STUDENT_ID in the STUDENT table for every STUDENT_ID in the GRADE table. In addition, it also eliminates a join; thus, the query executes a little faster and requires fewer resources. The effect is probably fairly negligible with this small result set.
The only reason to go through the Enrollment table would be if you need information (fields) from that table. If both the Enrollment and Grade table have a Student_id field then you wouldn't need to go through Enrollment to get there.
In your example it looks like you are looking for First and Last Name, which should both come from the Student table and Numeric_Grade which should come from the Grade table. In this instance, there would be no need for the Enrollment table. If there were a WHERE clause that required something from the Enrollment table then yes you would need to include it, but your example I would say it is not needed.
If this is a question on a test or assignment and the teacher is requesting you go through the Enrollment table too I would do it just to appease him, but knowing that you don't actually need to do it to get the information that you require.
Depend on your tables. Sometimes you can but sometimes dont.
For example imagine in enrollment you have something like student_quit_course
Then you may only want grade of student actually finish the course and you need all three table
For this particular case you will have a GRADE for several section_id but to know what is that section you need [Section] and [Course] both join using [Enrollment]

writing sql query between tables

Which SQL query could I write to satisfiy this need:
"List the names of the students who take a course from instructor named John."
Not sure that you can, from the depicted relations.
You can identify tutors by selecting on InstructorID and filtering on Instructor.FirstName.
You can join that subset onto course, via the InstructorCourses Join Table - join InstructorID to that and join the result to Courses using CourseID
In this way, Instructor.InstructorID -> (InstructorCourses.InstructorID , InstructorCourses.CourseID ) -> Courses.CourseID.
This lets you find information about the courses taught by instructors filtered on their name.
You don't present any link between students and courses in your diagram. I suspect you're missing a relation StudentCourses, which ought to be similar to InstructorCourses, but rather links students to courses. With that data in the mix, you can extend the join to match students to the courses from the relationship you already have.
Your diagram implies a relation between Student and InstructorCourses, which seems incorrect - both because there is no key to join on, and also because the logical relationship would not be correct. I think this is probably an error.
It is impossible to satisfy the SQL query you need because your conception does not allow it in that there is no relationship between the 2 tables Student and InstructorCourses.

Dynamic Tables?

I have a database that has different grades per course (i.e. three homeworks for Course 1, two homeworks for Course 2, ... ,Course N with M homeworks). How should I handle this as far as database design goes?
CourseID HW1 HW2 HW3
1 100 99 100
2 100 75 NULL
EDIT
I guess I need to rephrase my question. As of right now, I have two tables, Course and Homework. Homework points to Course through a foreign key. My question is how do I know how many homeworks will be available for each class?
No, this is not a good design. It's an antipattern that I called Metadata Tribbles. You have to keep adding new columns for each homework, and they propagate out of control.
It's an example of repeating groups, which violates the First Normal Form of relational database design.
Instead, you should create one table for Courses, and another table for Homeworks. Each row in Homeworks references a parent row in Courses.
My question is how do I know how many homeworks will be available for each class?
You'd add rows for each homework, then you can count them as follows:
SELECT CourseId, COUNT(*) AS Num_HW_Per_Course
FROM Homeworks
GROUP BY CourseId
Of course this only counts the homeworks after you have populated the table with rows. So you (or the course designers) need to do that.
Decompose the table into three different tables. One holds the courses, the second holds the homeworks, and the third connects them and stores the result.
Course:
CourseID CourseName
1 Foo
Homework:
HomeworkID HomeworkName HomeworkDescription
HW1 Bar ...
Result:
CourseID HomeworkID Result
1 HW1 100