How to make this sub-sub-query work? - sql

I am trying to do this in one query. I asked a similar question a few days ago but my personal requirements have changed.
I have a game type website where users can attend "classes". There are three tables in my DB.
I am using MySQL. I have four tables:
hl_classes (int id, int professor,
varchar class, text description)
hl_classes_lessons (int id, int
class_id, varchar
lessonTitle, varchar lexiconLink,
text lessonData)
hl_classes_answers
(int id, int lesson_id, int student,
text submit_answer, int percent)
hl_classes stores all of the classes on the website.
The lessons are the individual lessons for each class. A class can have infinite lessons. Each lesson is available in a specific term.
hl_classes_terms stores a list of all the terms and the current term has the field active = '1'.
When a user submits their answers to a lesson it is stored in hl_classes_answers. A user can only answer each lesson once. Lessons have to be answered sequentially. All users attend all "classes".
What I am trying to do is grab the next lesson for each user to do in each class. When the users start they are in term 1. When they complete all 10 lessons in each class they move on to term 2. When they finish lesson 20 for each class they move on to term 3. Let's say we know the term the user is in by the PHP variable $term.
So this is my query I am currently trying to massage out but it doesn't work. Specifically because of the hC.id is unknown in the WHERE clause
SELECT hC.id, hC.class, (SELECT MIN(output.id) as nextLessonID
FROM ( SELECT id, class_id
FROM hl_classes_lessons hL
WHERE hL.class_id = hC.id
ORDER BY hL.id
LIMIT $term,10 ) as output
WHERE output.id NOT IN (SELECT lesson_id FROM hl_classes_answers WHERE student = $USER_ID)) as nextLessonID
FROM hl_classes hC
My logic behind this query is first to For each class; select all of the lessons in the term the current user is in. From this sort out the lessons the user has already done and grab the MINIMUM id of the lessons yet to be done. This will be the lesson the user has to do.
I hope I have made my question clear enough.

My assumption is that you want a query that will tell what the next lesson is for a particular student for a given term, or null if there are no further classes for that student in that term. The result should be one row or null.
In order to do that with any efficiency (and IMHO, sanity) you need to revisit your table structure and assumptions about your data first. I am assuming from the table structures that you provided and how you described the lesson numbers, that there would be, for example, class 1, lessons 1, 2, 3, ..., 10, 11, ... 20, 21, ..., 30, and then class 2, lessons 1...30, and then class 3, lessons 1...30, etc. Further, lessons 1-10 for each class correspond to term 1, 11-20, to term 2, and 21-30 to term 3. Finally, terms are completed in order--class 3 lesson 10 is completed before class 1 lesson 11.
First, rather than using your class number as both a unique identifier and and ordering number (class 1 happens before class 2, etc), I would suggest a unique id field (probably an auto-increment), and a separate class_num field for the ordering number. (This is less critical for the classes table, than it is for the lessons table, described next.)
Next, and similarly, lessons should get a unique id field separate from it's lesson number field. The id would be the PK. This unique id is necessary to greatly simplify the query you want, as well as any other queries you might need. Without it you are dealing with a two-field composite key that makes many joins and subqueries nightmarishly complicated. You would probably want an additional unique index on class_id and lesson_num so that a lesson number is not re-used for a class. Also, this table should contain the term_num (or term_id) that a particular lesson for a particular class is assigned to. This will keep you from having to calculate what term a lesson is in using an overcomplicated MOD formula. That would be overkill. Just store the term number with the lesson information, and you can organize terms however you want.
Next, the answers table's id field should be a unique auto-increment. If it is important, you might also want a unique index on lesson_id and student_id (although this means either no retakes, or retake overwrites).
So I now have:
hl_classes (int id, int class_num, professor, class_name, description) PK: id, autoinc
hl_classes_lessons (int id, int class_id, int lesson_num, int term_num, l_title, l_link, l_data) PK: id, autoinc; Unique Key: class_id, lesson_num
hl_classes_answers (int id, int lesson_id, int student, ans, pct) PK: id, autoinc; Unique Key: lesson_id, student
With that, I came up with:
select hC.id as next_class_id, hL.id as next_lesson_id, hC.class as next_class, hL.term_num as term_num, hC.class_num as next_class_num, hL.lesson_num AS next_lesson_num
from hl_classes hC
left join hl_classes_lessons hL on hL.class_id = hC.id
where hL.term_num = $TERM_NUM
and hL.id not in (
select hA.lesson_id
from hl_classes_answers hA
where student = $USER_ID
)
order by hC.class_num, hL.lesson_num
limit 1;
This will give you back either one row containing the relevant information about the next class for that student, given that term, or all nulls. Note that the ids are not for display, as they could be any ol' number. You would display the _num fields.

I am not sure what you want the end result to look like, and why you have the LIMIT $term in your query, but if you want to get all the classes and the next lesson (if available) for the user you can use this:
SELECT c.*, l.*
FROM hl_classes c JOIN (
SELECT l.class_id, MIN(l.id) NextLessonID
FROM hl_classes_lessons l LEFT JOIN (
SELECT sca.class_id, MAX(sca.lesson_id) MaxID
FROM hl_classes_answers sca
WHERE sca.student = $USER_ID
GROUP BY sca.class_id
) cm ON (l.class_id = cm.class_id AND l.id > cm.MaxID) OR cm.class_id IS NULL
GROUP BY l.class_id
) nid ON c.id = nid.class_id
JOIN hl_classes_lessons l ON c.id = l.class_id AND l.id = nid.NextLessonID

Related

SQL Recursive Query | multiple Tables Foreign Keys

Scenario
I have a few tables, each table represents an entity of a unique type. For example lets go with:
School, Subject, Class, Teacher. Listed in order as Parent -> Child
Schema
Each table has:
ID: UUID
Name: CHAR VARYING
{parent}_id: UUID<-- example, class would have Subject_id, or Teacher would have Class_id.
The {parent}_id is the foreign id for each table.
Problem
I want to make a query that lists all the teachers of a given school. In order to do this in this Schema, I need to first query Subject by School_id, then Class by subject_id and then finally teacher by class_id.
A recursive functions makes sense to me but all tutorials I find are doing this within a single table and by ids which don't change with each recursion. In my example, each recursion I will need to search for a different ID.
Question
How do you go about doing this? I could make an array of the ids and make an index, increase index and use that to access the id in the array. This however seems like a common query so I believe there might be a more elegant solution.
Note: I am using PostgreSQL
Edit for Comment
I am using PostgreSQL DB and PGAdmin
Why would UUID not work? It has worked up to this point with no problems; even works with cascading delete using foreign keys.
I can show actual schema. However here is a fictitious layout. Quite straight forward I hope.
School
ID
Name
Subject
ID
Name
School_ID
Class
ID
Name
Subject_ID
Teacher
ID
Name
Class_ID
Expected output
Teacher_ID, Teacher_Name, Class_Name, Subject_Name, School_Name
Something like?:
select
Teacher_ID, Teacher_Name, Class_Name, Subject_Name, School_Name
from
school
join
subject
on
school.id = subject.school_id
join
class
on
class.subject_id = subject.id
join
teacher
on
teacher.class_id = class.id

SQL question: how to find rows that share all of the same rows in a composite table?

I'm working on my SQL project using the Oracle database for class, and I'm asked a question that I see far too often.
You have three tables:
STUDENT: SNO, SNAME
CLASS: CNO, CNAME
ATTENDANCE: SNO, CNO, Grade
The question I keep finding is of a similar type: Find the names of the students that attend in all of the classes that "John" (or anyone else) attends.
John attends three classes, so I have to find the students that also attend those three classes (could be more, but those three must be there). However, I won't always know how many classes John (or whoever) attends, so it can't be hardcoded like that.
SELECT jclass.CNO
FROM attendance jclass
INNER JOIN student on jclass.SNO = student.SNO
WHERE student.SNAME = 'John';
This gets me the classes that John attends. I tried to add the identifier for the other students:
SELECT student.SNAME
FROM student
INNER JOIN attendance on student.SNO = attendance.SNO
INNER JOIN class on attendance.CNO = class.CNO
WHERE student.SNAME <> 'John'
AND class.CNO IN (SELECT jclass.CNO
FROM attendance jclass
INNER JOIN student on jclass.SNO = student.SNO
WHERE student.SNAME = 'John');
However, this only gets me the students that appear in at least one of John's classes, rather than all of them. I can see why it's doing this, but I'm not sure how to fix it. It's the one big struggle I'm having with SQL.
Here is one way - assuming SNO is primary key in the first table, CNO is primary key in the second table, and (SNO, CNO) is (composite) primary key in the third table, and that the input student is given by a unique identifier (first name is distinctly NOT a unique identifier, so the problem stated in terms of giving "John" as the input makes no sense). Here I assume the "special" student is identified by SNO = 1001; you can make 1001 into a variable, or change it to a subquery that selects a (unique!!) SNO based on some other inputs.
I didn't try to make the query as efficient as possible, or use features you most likely haven't seen in your class. Rather, I tried to make it as elementary and as readable as possible.
select sno
from attendance
where cno in (select cno from attendance where sno = 1001)
group by sno
having count(*) = (select count(*) from attendance where sno = 1001)
;
The strategy is simple: the subquery in the in condition finds the classes attended by the "special" student, then from the attendance table we select only rows for those classes. Group by student, and count. Keep only the students for whom the count is equal to the total count for the "special" student. Note the last condition is about groups, not about input rows, so it belongs in the having clause.

SQL Query to return a table of specific matching values based on a criteria

I have 3 tables in PostgreSQL database:
person (id, first_name, last_name, age)
interest (id, title, person_id REFERENCES person)
location (id, city, state text NOT NULL, country, person_id REFERENCES person)
city can be null, but state and country cannot.
A person can have many interests but only one location. My challenge is to return a table of people who share the same interest and location.
All ID's are serialized and thus created automatically.
Let's say I have 4 people living in "TX", they each have two interests a piece, BUT only person 1 and 3 share a similar interest, lets say "Guns" (cause its Texas after all). I need to select all people from person table where the person's interest title (because the id is auto generated, two Guns interest would result in two different ID keys) equals that of another persons interest title AND the city or state is also equal.
I was looking at the answer to this question here Select Rows with matching columns from SQL Server and I feel like the logic is sort of similar to my question, the difference is he has two tables, to join together where I have three.
return a table of people who share the same interest and location.
I'll interpret this as "all rows from table person where another rows exists that shares at least one matching row in interest and a matching row in location. No particular order."
A simple solution with a window function in a subquery:
SELECT p.*
FROM (
SELECT person_id AS id, i.title, l.city, l.state, l.country
, count(*) OVER (PARTITION BY i.title, l.city, l.state, l.country) AS ct
FROM interest i
JOIN location l USING (person_id)
) x
JOIN person p USING (id)
WHERE x.ct > 1;
This treats NULL values as "equal". (You did not specify clearly.)
Depending on undisclosed cardinalities, there may be faster query styles. (Like reducing to duplicative interests and / or locations first.)
Asides 1:
It's almost always better to have a column birthday (or year_of_birth) than age, which starts to bit-rot immediately.
Asides 2:
A person can have [...] only one location.
You might at least add a UNIQUE constraint on location.person_id to enforce that. (If you cannot make it the PK or just append location columns to the person table.)

SQL query to select records that have one value in a column but missing another value

I have a database that has two tables. One holds the name of the classes a student can take (class_id, class_desc) and the other has the students information (id, fname, lname, class_id). I join the tables to get a roster of who is taking what class by joining on the class_id. How do I go about getting a roster of students that are taking class 'cl_2055' but not taking class 'cl_6910'?
You really need to change your schema if you can. You should have three tables. Student, Class and StudentClass, (StudentClass contains a pointer to each of the other tables.
But if you insist on using the schema you have...
SELECT
*
FROM
students
WHERE
class_id = 'cl_2055'
AND id NOT IN (SELECT id FROM students where class_id = 'cl_6910')
This assumes the id is not unique, and you are using ID to represent students. If you are using ID to represent records, then you will need the second approach:
SELECT
*
FROM
students students_in_2055
WHERE
class_id = 'cl_2055'
AND NOT EXISTS (
SELECT 1
FROM students students_in_6910
WHERE students_in_6910.class_id = 'cl_6910'
AND students_in_2055.fname = students_in_6910.fname
AND students_in_2055.lname = students_in_6910.lname
)
I like to use aggregation for this purpose
select si.fname, si.lname
from studentinformation si
where ci.classid in ('cl_2055', 'cl_6910')
group by si.fname, si.lname
having min(ci.classid) = 'cl_2055' and max(ci.classid) = 'cl_2055';
First, this assumes that the class identifiers are the ids and not the description (seems logical to me). If they are the descriptions, then you need to join in the class information.
How does this work? The where clause filters down to students who might be in both classes. The group by aggregates to a single row per student. And the having keeps students who are only in "cl_055".

How to refactor complicated SQL query which is broken

Here is the simplified model of the domain
In a nutshell, unit grants documents to to a customer. There are two types of units: main units and their child units. Both belong to the same province, and to one province may belong multiple cities. Document has numerous events (processing history). Customer belongs to one city and province.
I have to write query, which returns random set of documents, given a target main unit code. Here is the criteria:
Return 10 documents where the newest event_code = 10
Each document must belong to a different customer living in any city of the unit's region (prefer different cities)
Return the Customers newest Document which meets the criteria
There must be both document types present in the result
Result (customers chosen) should be random with each query
But...
If there's not enough customers, try to use multiple documents of the same customer as a last resort
If there aren't enough documents either, return as much as possible
If there's not a single instance of another document type, then return all the same
There may be million of rows, and the query must be as fast as possible, it is executed frequently.
I'm not sure how to structure this kind of complex query in a sane manner. I'm using Oracle and PL/SQL. Here is something I tried, but it isn't working as expected (returns wrong data). How should I refactor this query and get the random result, and also honor all those borderline rules? I'm also worried about the performance regarding the joins and wheres.
CURSOR c_documents IS
WITH documents_cte AS
SELECT d.document_id AS document_id, d.create_dt AS create_dt,
c.customer_id
FROM documents d
JOIN customers c ON (c.customer_id = d.customer_id AND
c.province_id = (SELECT region_id FROM unit WHERE unit_code = 1234))
WHERE exists (
SELECT 1
FROM event
where document_id = d.document_id AND
event_code = 10
AND create_dt =
SELECT MAX(create_dt)
FROM event
WHERE document_id = d.document_id)
SELECT * FROM documents_cte d
WHERE create_dt = (SELECT MAX(create_dt)
from documents_cte
WHERE customer_id = d.customer_id)
How to correctly make this query with efficiency, randomness in mind? I'm not asking for exact solution, but guidelines at least.
I'd avoid hierarchic tables whenever possible. In your case you are using a hierarchic table to allow for an unlimited depth, but at last it's just two levels you store: provinces and their cities. That should better be just two tables: one for provinces and one for cities. Not a big deal, but that would make your data model simpler and easier to query.
Below I am starting with a WITH clause to get a city table, as such doesn't exist. Then I go step by step: get the customers belonging to the unit, then get their documents and rank them. At last I select the ranked documents and randomly take 10 of the best ranked ones.
with cities as
(
select
c.region_id as city_id,
o.region_id as province_id
from region c
join region p on p.region_id = c.parent_region_id
)
, unit_customers as
(
select customer_id
from customer
where city_id in
(
select city_id
from cities
where
(
select region_id
from unit
where unit_code = 1234
) in (city_id, province_id)
)
)
, ranked_documents as
(
select
document.*,
row_number(partition by customer_id order by create_dt desc) as rn
from document
where customer_id in -- customers belonging to the unit
(
select customer_id
from unit_customers
)
and document_id in -- documents with latest event code = 10
(
select document_id
from event
group by document_id
having max(event_code) keep (dense_rank last order by create_dt) = 10
)
)
select *
from ranked_documents
order by rn, dbms_random.value
fetch first 10 rows only;
This doesn't take into account to get both document types, as this contradicts the rule to get the latest documents per customer.
FETCH FIRST is availavle as of Oracle 12c. In earlier versions you would use one more subquery and another ROW_NUMBER instead.
As to speed, I'd recommend these indexes for the query:
create index idx_r1 on region(region_id); -- already exists for region_id = primary key
create index idx_r2 on region(parent_region_id, region_id);
create index idx_u1 on unit(unit_code, region_id);
create index idx_c1 on customer(city_id, customer_id);
create index idx_e1 on event(document_id, create_dt, event_code);
create index idx_d1 on document(document_id, customer_id, create_dt);
create index idx_d2 on document(customer_id, document_id, create_dt);
One of the last two will be used, the other not. Check which with EXPLAIN PLAN and drop the unused one.