SQL - Tracking student exam records as they move between schools - sql

I'd like to pick some of your glorious minds for an optimal solution to my dilemma.
Scenario:
Schools have children and children take tests.
The tests point to the child, not the school.
If the child moves school, the test records are taken to the new school and the previous school has no record of the test being done as they are linked to the child.
Obviously, this isn't ideal and is the result of the database not being designed with this in mind. What would the correct course of action be; I’ve currently identified the 3 possibilities listed below which would solve the current problem. However, i cannot be sure which is best for the issue at hand - and if any better solutions exist.
Have each test store the school & student within the test records (requiring current records to be updated & increasing the size of the database)
Create a new child record, duplicating the existing data for the new school with a new ID so the test remains linked to the previous school (complicating the ability to identify previous test scores)
Separately keep track of moves to other schools, then use this additional table to identify current and previous using the timestamps (increased complexity and computational requirements)
EDIT:
So i tried to use a basic example, but requests for the task at hand have been requested.
Here's the DB Schema for the tables (simplified for problem, note: Postnatal is not important):
Patients: ID, MidwifeID, TeamID
Midwives: ID
Groups: ID
GroupsMidwives: MidwifeID, GroupsID
PatientObservations: ID, MidwifeID, PatientID
Using a query as follows:
SELECT Some Information
from Postnatals
JOIN Midwives on Postnatals.MidwifeID = Midwives.ID
JOIN Patients on Patients.PatientID = Postnatals.PatientID
JOIN GroupsMidwives on GroupsMidwives.MidwifeID = Midwives.ID
JOIN Groups on Groups.ID = GroupsMidwives.GroupID
JOIN PatientObservations on PatientObservations.PatientID =
Postnatals.PatientID
WHERE groups.Name = ?
*some extra checks*
GROUP BY Midwives.Firstname, Midwives.Surname, Midwives.ID
However, in the event that a midwife is moved to a different team, the data associated with the previous team is now owned by the newly assigned team. As described in the example detailed previously.
Thus a modification (which modification is yet to be realised) is required to make the data submitted - prior to a team change - assigned to the previous team, as of current, because of the way the records are owned by the midwife, this is not possible.

You should below suggestion as per you concern.
Step 1 ) You need to create School Master Table
ID | School | IsActive
1 | ABC | 1
2 | XYZ | 1
Step 2 ) You need to create Children Master having school id as foreign key
ID | School | Children Name| IsActive
1 | 2 | Mak | 1
2 | 2 | Jak | 1
Step 3 ) You need to create test table having children id as foreign key
ID | Children_id | Test Name | IsActive
1 | 2 | Math | 1
2 | 2 | Eng | 1
Now whenever child moves school then make child record inactive and create another active record with new school. This will help you to bifurcate the old test and new test.
do let me know in case morehelp required

Related

How to make this very complicated query from three connected models with Django ORM?

Good day, everyone. Hope you're doing well. I'm a Django newbie, trying to learn the basics of RESTful development while helping in a small app project. We currently want some of our models to update accordingly based on the data we submit to them, by using the Django ORM and the fields that some of them share wih OneToMany relationsips. Currently, there's a really difficult query that I must do for one of my fields to update automatically given that filter. First, let me explain the models. This are not real, but a doppleganger that should work the same:
First we have a Report model that is a teacher's report of a student:
class Report(models.Model):
status = models.CharField(max_length=32, choices=Statuses.choices, default=Statuses.created,)
student = models.ForeignKey(Student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
# Various dates
results_date = models.DateTimeField(null=True, blank=True)
report_created = models.DateTimeField(null=True, blank=True)
.
#Other fields that don't matter
Here we have two related models, which are student and headroom_teacher. It's not necessary to show their models, but their relationship with the next two models is really important. We also have an Exams model:
class Exams(models.Model):
student = models.ForeignKey(student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
# Various dates
results_date = models.DateTimeField(null=True, blank=True)
initial_exam_date = models.DateTimeField(null=True, blank=True)
.
#Other fields that don't matter
As you can see, the purpose of this app is akin to reporting on the performance of students after completing some exams, and every Report is made by a teacher for specific student on how he did on those exams. Finally we have a final model called StudentMood that aims to show how should an student be feeling depending on the status of their exams:
class StudentMood(models.Model):
report = models.ForeignKey(Report, on_delete=models.CASCADE,)
student_status = models.CharField(
max_length=32, choices=Status.choices,
default=None, null=True, blank=False)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
And with these three models is that we arrive to the crux of the issue. One of our possible student_status options is called Anxious for results, which we believe a student will feel during the time when he already has done an exam and is waiting for the results.
I want to automatically set my student_status to that, using a custom manager that takes into account the date that the report has been done or the day the data has been entered. I believe this can be done by making a query taking into account initial_exam_date.
I already have my custom manager set up, and the only thing missing is this query. I have no choice but to do it with Django's ORM. However, I've come up with an approximate raw SQL query, that I'm not sure if it's ok:
SELECT student_mood.id AS student_mood_id FROM
school_student_mood LEFT JOIN
school_reports report
ON student_mood.report_id = report.id AND student_mood.headroom_teacher_id = report.headroom_teacher_id
JOIN school_exams exams
ON report.headroom_teacher_id = exams.headroom_teacher_id
AND report.student_id = exams.student_id
AND exams.results_date > date where the student_mood or report data is entered, I guess
And that's what I've come to ask for help. Could someone shed some light into how to transfer this into a single query?
Without having an environment setup or really knowing exactly what you want out of the data. This is a good start.
Generally speaking, the Django ORM is not great for these types of queries, and trying to use select_related or prefetches results in really complex and inefficient queries.
I've found the best way to achieve these types of queries in Django is to break each piece of your puzzle down into a query that returns a "list" of ids that you can then use in a subquery.
Then you keep working down until you have your final output
from django.db.models import Subquery
# Grab the students of any exams where the result_date is greater than a specific date.
student_exam_subquery = Exam.objects.filter(
results_date__gt=timezone.now()
).values_list('student__id', flat=True)
# Grab all the student moods related to reports that relate to our "exams" where the student is anxious
student_mood_subquery = StudentMood.objects.filter(
student_status='anxious',
reports__student__in=Subquery(student_exam_subquery)
).values_list('report__id', flat=True)
# Get the final list of students
Student.objects.values_list('id', flat=True).filter(
reports__id__in=Subquery(student_mood_subquery)
)
Now I doubt this will work out of the box, but it's really to give you an understanding of how you might go about solving this in a way that is readable to future devs and the most efficient (db wise).
So, the issue I was running into, is that the school has exam cycles each period, and it was difficult to retrieve only the students' report for this cycle. Let's assume we have the following database:
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student | Report ID | StudentMood ID | Exam Cycle Status | Initial Exam Date | Report created a |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student 1 | 1 | 1 | Done | 01/01/2020 | 02/01/2020 |
| Student 2 | 2 | 2 | Done | 01/01/2020 | 02/01/2020 |
| Student 1 | 3 | 3 | On Going | 02/06/2020 | 01/01/2020 |
| Student 2 | 4 | 4 | On Going | 02/06/2020 | 01/01/2020 |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
And Obviously, I wanted to limit my query to just this cycle, like this:
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student | Report ID | StudentMood ID | Exam Cycle Status | Initial Exam Date | Report created a |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student 1 | 3 | 3 | On Going | 02/06/2020 | 01/01/2020 |
| Student 2 | 4 | 4 | On Going | 02/06/2020 | 01/01/2020 |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
Now, your answer, trent, was really useful, but I'm still having issues retrieving in the shape of the above:
qs_exams = Exams.objects.filter(initial_exam_date__gt=now()).values_list('student__id', flat=True)
qs_report = Report.objects.filter(student__id__in=qs_exams).values_list('id', flat=True)
qs_mood = StudentMood.objects.select_related('report') \
.filter(report__id__in=qs_report).order_by('report__student_id', '-created').distinct()
But this query is still giving me all the StudentMoods throughout the school year. Sooooo, any ideas?

Keeping an updated tally of changing records

I have a list of students and their subjects:
id | student | subject
---|---------|--------
1 | adam | math
2 | bob | english
3 | charlie | math
4 | dan | english
5 | erik | math
And I create a tally from the above list aggregating how many students are there in each subject:
id | subject | students
---|---------|--------
1 | math | 3
2 | english | 2
The student list will keep on expanding and this aggregation will be done at regular intervals.
The reason I'm keeping the Tally in a separate table in the first place is because the original table is supposed to be massive (this is just a simplification of my original problem) and so querying the original table for a current tally on-the-fly is unfeasible to do quickly enough.
Anyways, so the aggregating is pretty straight forward as long as the students don't change their subject.
But now I want to add a feature to allow students to change their subject.
My previous approach was this: while updating the Tally, I keep a counter variable up to which row of students I've already accounted for. Next time I only consider records added after that row.
Also the reason why I keep a counter is because the Students table is massive, and I don't want to scan the whole table every time as it won't scale well.
It works fine if all students are unique and no one changes their subject.
But it breaks apart now because I can no longer account for rows that come before the counter and were updated.
My second approach was using a updated_at field (instead of counter) and keep track of newly modified rows that way.
But still I don't know how to actually update the Tally accurately.
Say, Erik changes his subject from "math" to "english" in the above scenario. When I run the script to update the Tally, while it does finds the newly updated row but it simply says {"erik": "english"}. How would I know what it changed from? I need to know this to correctly decrement "math" in the Tally table while incrementing "english".
Is there a way this can be solved?
To summarize my question again, I want to find a way to be able to update the Tally table accurately (a process that runs at regular interval) with the updated/modified rows in the Student table.
I'm using NodeJS and PostgreSQL if it matters.
Why don't you do it when student add subject, remove subject, or change subject.
When student add new subject: Just increase UPDATE tbl_tally SET student = student + 1 WHERE subject = :subject;
When student remove subject: Just decrease UPDATE tbl_tally SET student = student - 1 WHERE subject = :subject;
When student change subject: Just increase new subject by one and decrease old subject by one
UPDATE tbl_tally SET student = student - 1 WHERE subject = :old_subject;
UPDATE tbl_tally SET student = student + 1 WHERE subject = :new_subject;
I am not familiar with PostgreSQL, but in MySQL, you can even do it with trigger. I think PostgreSQL also has trigger.

How to create two JOIN-tables so that I can compare attributes within?

I take a Database course in which we have listings of AirBnBs and need to be able to do some SQL queries in the Relationship-Model we made from the data, but I struggle with one in particular :
I have two tables that we are interested in, Billing and Amenities. The first one have the id and price of listings, the second have id and wifi (let's say, to simplify, that it equals 1 if there is Wifi, 0 otherwise). Both have other attributes that we don't really care about here.
So the query is, "What is the difference in the average price of listings with and without Wifi ?"
My idea was to build to JOIN-tables, one with listings that have wifi, the other without, and compare them easily :
SELECT avg(B.price - A.price) as averagePrice
FROM (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 0
) A, (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 1) B
WHERE A.id = B.id;
Obviously this doesn't work... I am pretty sure that there is a far easier solution to it tho, what do I miss ?
(And by the way, is there a way to compute the absolute between the difference of price ?)
I hope that I was clear enough, thank you for your time !
Edit : As mentionned in the comments, forgot to say that, but both tables have idas their primary key, so that there is one row per listing.
Just use conditional aggregation:
SELECT AVG(CASE WHEN a.wifi = 0 THEN b.price END) as avg_no_wifi,
AVG(CASE WHEN a.wifi = 1 THEN b.price END) as avg_wifi
FROM Billing b JOIN
Amenities a
ON b.id = a.id
WHERE a.wifi IN (0, 1);
You can use a - if you want the difference instead of the specific values.
Let's assume we're working with data like the following (problems with your data model are noted below):
Billing
+------------+---------+
| listing_id | price |
+------------+---------+
| 1 | 1500.00 |
| 2 | 1700.00 |
| 3 | 1800.00 |
| 4 | 1900.00 |
+------------+---------+
Amenities
+------------+------+
| listing_id | wifi |
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
+------------+------+
Notice that I changed "id" to "listing_id" to make it clear what it was (using "id" as an attribute name is problematic anyways). Also, note that one listing doesn't have an entry in the Amenities table. Depending on your data, that may or may not be a concern (again, refer to the bottom for a discussion of your data model).
Based on this data, your averages should be as follows:
Listings with wifi average $1600 (Listings 1 and 2)
Listings without wifi (just 3) average 1800).
So the difference would be $200.
To achieve this result in SQL, it may be helpful to first get the average cost per amenity (whether wifi is offered). This would be obtained with the following query:
SELECT
Amenities.wifi AS has_wifi,
AVG(Billing.price) AS avg_cost
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
which gives you the following results:
+----------+-----------------------+
| has_wifi | avg_cost |
+----------+-----------------------+
| 0 | 1800.0000000000000000 |
| 1 | 1600.0000000000000000 |
+----------+-----------------------+
So far so good. So now we need to calculate the difference between these 2 rows. There are a number of different ways to do this, but one is to use a CASE expression to make one of the values negative, and then simply take the SUM of the result (note that I'm using a CTE, but you can also use a sub-query):
WITH
avg_by_wifi(has_wifi, avg_cost) AS
(
SELECT Amenities.wifi, AVG(Billing.price)
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
)
SELECT
ABS(SUM
(
CASE
WHEN has_wifi = 1 THEN avg_cost
ELSE -1 * avg_cost
END
))
FROM avg_by_wifi
which gives us the expected value of 200.
Now regarding your data model:
If both your Billing and Amenities table only have 1 row for each listing, it makes sense to combine them into 1 table. For example: Listings(listing_id, price, wifi)
However, this is still problematic, because you probably have a bunch of other amenities you want to model (pool, sauna, etc.) So you might want to model a many-to-many relationship between listings and amenities using an intermediate table:
Listings(listing_id, price)
Amenities(amenity_id, amenity_name)
ListingsAmenities(listing_id, amenity_id)
This way, you could list multiple amenities for a given listing without having to add additional columns. It also becomes easy to store additional information about an amenity: What's the wifi password? How deep is the pool? etc.
Of course, using this model makes your original query (difference in average cost of listings by wifi) a bit tricker, but definitely still doable.

Check if a value exists in the child-parent tree

I'm creating a simple directory listing page where you can specify what kind of thing you want to list in the directory e.g. a person or a company.
Each user has an UserTypeID and there is a dbo.UserType lookup table. The dbo.UserType lookup table is like this:
UserTypeID | UserTypeParentID | Name
1 NULL Person
2 NULL Company
3 2 IT
4 3 Accounting Software
In the dbo.Users table we have records like this:
UserID | UserTypeID | Name
1 1 Jenny Smith
2 1 Malcolm Brown
3 2 Wall Mart
4 3 Microsoft
5 4 Sage
My SQL (so far) is very simple: (excuse the pseudo-code style)
DECLARE #UserTypeID int
SELECT
*
FROM
dbo.Users u
INNER JOIN
dbo.UserType ut
WHERE
ut.UserTypeID = #UserTypeID
The problem is here is that when people want to search for companies they will enter in '2' as the UserTypeID. But both Microsoft and Sage won't show up because their UserTypeIDs are 3 and 4 respectively. But its the final UserTypeParentID which tells me that they're both Companies.
How could I rewrite the SQL to ask it to return to return records where the UserTypeID = #UserTypeID or where its final UserTypeParentID is also equal to #UserTypeID. Or am I going about this the wrong way?
Schema Change
I would suggest you to break it down this schema a little bit more, to make your queries and life simpler, with this current schema you will end up writing a recursive query every time you want to get simplest data from your Users table, and trust me you dont want to do this to yourself.
I would break down this schema of these tables as follow:
dbo.Users
UserID | UserName
1 | Jenny
2 | Microsoft
3 | Sage
dbo.UserTypes_Type
TypeID | TypeName
1 | Person
2 | IT
3 | Compnay
4 | Accounting Software
dbo.UserTypes
UserID | TypeID
1 | 1
2 | 2
2 | 3
3 | 2
3 | 3
3 | 4
You say that you are "creating" this - excellent because you have the opportunity to reconsider your whole approach.
Dealing with hierarchical data in a relational database is problematic because it is not designed for it - the model you choose to represent it will have a huge impact on the performance and ease of construction of your queries.
You have opted for an Adjacently List model which is great for inserts (and deletes) but a bugger for selects because the query has to effectively reconstruct the hierarchy path. By the way an Adjacency List is the model almost everyone goes for on their first attempt.
Everything is a trade off so you should decide what queries will be most common - selects (and updates) or inserts (and deletes). See this question for starters. Also, since SQL Server 2008, there is a native HeirachyID datatype (see this) which may be of assistance.
Of course, you could store your data in an XML file (in SQL Server or not) which is designed for hierarchical data.

Creating new table from data of other tables

I'm very new to SQL and I hope someone can help me with some SQL syntax. I have a database with these tables and fields,
DATA: data_id, person_id, attribute_id, date, value
PERSONS: person_id, parent_id, name
ATTRIBUTES: attribute_id, attribute_type
attribute_type can be "Height" or "Weight"
Question 1
Give a person's "Name", I would like to return a table of "Weight" measurements for each children. Ie: if John has 3 children names Alice, Bob and Carol, then I want a table like this
| date | Alice | Bob | Carol |
I know how to get a long list of children's weights like this:
select d.date,
d.value
from data d,
persons child,
persons parent,
attributes a
where parent.name='John'
and child.parent_id = parent.person_id
and d.attribute_id = a.attribute_id
and a.attribute_type = "Weight';
but I don't know how to create a new table that looks like:
| date | Child 1 name | Child 2 name | ... | Child N name |
Question 2
Also, I would like to select the attributes to be between a certain range.
Question 3
What happens if the dates are not consistent across the children? For example, suppose Alice is 3 years older than Bob, then there's no data for Bob during the first 3 years of Alice's life. How does the database handle this if we request all the data?
1) It might not be so easy. MS SQL Server can PIVOT a table on an axis, but dumping the resultset to an array and sorting there (assuming this is tied to some sort of program) might be the simpler way right now if you're new to SQL.
If you can manage to do it in SQL it still won't be enough info to create a new table, just return the data you'd use to fill it in, so some sort of external manipulation will probably be required. But you can probably just use INSERT INTO [new table] SELECT [...] to fill that new table from your select query, at least.
2) You can join on attributes for each unique attribute:
SELECT [...] FROM data AS d
JOIN persons AS p ON d.person_id = p.person_id
JOIN attributes AS weight ON p.attribute_id = weight.attribute_id
HAVING weight.attribute_type = 'Weight'
JOIN attributes AS height ON p.attribute_id = height.attribute_id
HAVING height.attribute_type = 'Height'
[...]
(The way you're joining in the original query is just shorthand for [INNER] JOIN .. ON, same thing except you'll need the HAVING clause in there)
3) It depends on the type of JOIN you use to match parent/child relationships, and any dates you're filtering on in the WHERE, if I'm reading that right (entirely possible I'm not). I'm not sure quite what you're looking for, or what kind of database you're using, so no good answer. If you're new enough to SQL that you don't know the different kinds of JOINs and what they can do, it's very worthwhile to learn them - they put the R in RDBMS.
when you do a select, you need to specify the exact columns you want. In other words you can't return the Nth child's name. Ie this isn't possible:
1/2/2010 | Child_1_name | Child_2_name | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name | Child_2_name
Each record needs to have the same amount of columns. So you might be able to make a select that does this:
1/2/2010 | Child_1_name
1/2/2010 | Child_2_name
1/2/2010 | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name
1/4/2010 | Child_2_name
And then in a report remap it to how you want it displayed