Help with SQL aggregate functions - sql

I've been learning SQL for about a day now and I've run into a road bump. Please help me with the following questions:
STUDENT (**StudentNumber**, StudentName, TutorialNumber)
TUTORIAL (**TutorialNumber**, Day, Time, Room, TutorInCharge)
ASSESSMENT (**AssessmentNumber**, AssessmentTitle, MarkOutOf)
MARK (**AssessmentNumber**, **StudentNumber**, RawMark)
PK and FK are identified within "**". I need to generate queries that:
1) List of assessment tasks results showing: Assessment Number, Assessment Title, and average Raw Mark. I know how to use the avg function for a single column, but to display something for multiple columns... a little unsure here.
My attempt:
SELECT RawMark, AssessmentNumber, AsessmentTitle
FROM MARK, ASSESSMENT
WHERE RawMark = (SELECT (RawMark) FROM MARK)
AND MARK.AssessmentNumber = ASSESSMENT.AssessmentNumber;
2) Report on tutorial enrollment showing: Tutorial Number, Day, Room, Tutor in Charge and number of students enrolled. Same as the avg function, now for the count function. Would this require 2 queries?
3) List each student's Raw Mark in each of the assessment tasks showing: Assessment Number, Assessment Title, Student Number, Student Name, Raw Mark, Tutor in Charge and Time. Sort on Tutor in Charge, Day and Time.

Here is an example for the first one, just take the logic and see if you can expand it to the other questions. I find that these things can be hard to lear if you can't find any solid examples but once you get the hang of it you'll sort it out pretty quick.
1)
SELECT a.AssessmentNumber, a.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT a LEFT JOIN MARK m ON a.AssessmentNumber = m.AssessmentNumber
GROUP BY a.AssessmentNumber, a.AssessmentTitle
OR not using a left join or alias table names
SELECT ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT,MARK
WHERE ASSESSMENT.AssessmentNumber = MARK.AssessmentNumber
GROUP BY ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle

Related

Removing repeating values from sql

I have a question for my homework in class that goes as such:
The Professor wants to review information about questions on quizzes that appear to be difficult.
Create a view named HardQuizzes that contains the quiz number, quiz date, average score, question id, number of students who chose A, number of students who chose B, number of students who chose C, and number of students who chose D for each question on a quiz where the average score is less than 15. Verify that the view has been created correctly. Confirm the change.
And I think I came up with a way to get the answer with this:
CREATE VIEW HardQuizzes
AS
SELECT DISTINCT
QQ.QuizNum,
QQ.QuizDate,
AvgScore,
NumChoseA,
NumChoseB,
NumChoseC,
NumChoseD
FROM
QuizQuestions QQ,
Quizzes Q
WHERE
QQ.QuizNum = Q.QuizNum
AND AvgScore < 15
But when I do this it creates the view but so many of the values repeat and I cant figure out a way to stop them repeating. Is there a way do that?
This is a screenshot of the view when I make it
Since it's an assignment, hoping to just nudge you in the right direction -
What other data is available in the original tables that you can investigate?
Are there multiple classes taking each of these quizzes?
If there are multiple classes, would the intent of the question lead you to believe the records should be combined?
If that's the case this could be accomplished with GROUP BY and aggregation functions over the NumChose columns.

Query complex in Oracle SQL

I have the following tables and their fields
They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says:
It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes.
Well right now I have this:
select person.gender,score.score
from person,athlete,score,competition,sport
where person.idperson = athlete.idathlete and
athlete.idathlete= score.idathlete and
competition.idsport = sport.idsport and
person.gender='F' and competition.idsport=18 and score.score in
('Gold','Silver','Bronze')
group by
person.gender,
score.score;
And I got this out
By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records.
Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it.
I see it very complicated or I'm a bit saturated from so much spinning, I need some help.
Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps.
select p.gender,
((sysdate - p.birthday) / 365) age,
s.score
from person p join athlete a on a.idathlete = p.idperson
left join score s on s.idathlete = a.idathlete
left join competition c on c.idcompetition = s.idcompetition
where p.gender = 'F'
and s.score in ('Gold', 'Silver', 'Bronze')
and c.idsport = 18
order by age;
when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result
although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age:
select p.gender,
((sysdate - p.birthday) / 365 age
from person p
where p.gender = 'F'
Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table)
Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says:
... even when there is no data of any specific value for the set of records displayed by the query.
It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So:
NO : from person p, athlete a
where a.idathlete = p.idperson
and p.gender = 'F'
YES: from person p join athlete a on a.idathlete = p.idperson
where p.gender = 'F'
Then move to yet another table, and so forth.
Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.

SQL movie database querying

I'm having trouble solving the following SQL requests:
Give the names of the actors that have acted in more films than 'sara allgood' and who have acted in films that won the 'cannes film festival'. Also, give the filmname.
Get the percentage of movies who won awards out of all movies produced between the years 1970 and 1990.
There are several tables but I'm assuming that only 4 are needed:
'films','remakes','casts', 'awtypes'
'films' attributes: filmid, filmname, year, director, studio, award
'remakes' attributes: filmid, title, year, priorfilm, prioryear
'casts' attributes: filmid, filmname, actor, award(10)
'awtypes' attributes: award(10), org(100), country, colloquial(50), year
It's a bit unclear to me how to match the award to the 'Cannes film festival' in the first query since the award field is only 10 characters meaning it is a reference to the awtypes table but I don't know which field in the awtypes table contains the name of the award and I don't have access to the database at the moment so it's either org or colloquial.
As for the second I don't know how I could compute the percentage but it seems that it should be solved using a union operator for the movies produced between 1970 and 1990 and the films that have won an award (I don't know how to place a condition for having at least one award).
A few hints, I hope they help you!
Give the names of the actors that have acted in more films than 'sara
allgood' and who have acted in films that won the 'cannes film
festival'. Also, give the filmname.
Based on the attributes you're stating, I would say that you can get to the right awtypes attribute via the casts table. They both contain the award(10) column. Given your data, I would expect the org(100) column to contain something on the organization that provides the prizes, so that would be my guess in this case for the cannes film festival content. But you would have to try it out and see what results you get. Unfortunately, as in this case, it is often quite hard to guess the contents of a column based only on column names.
Get the percentage of movies who won awards out of all movies produced
between the years 1970 and 1990.
Based on the info stated in your question, I would go with a guess that the award column in the films table contains a boolean or something that states if the movie won an award or not. You'd have to try this out. If that's the case, you can use a COUNT(*) on all movies between 1970 and 1990 and a COUNT(*) on all movies WHERE award = 1 (or something) to get the total numbers.
You could indeed combine these in a computation query with a UNION. Example that might help you:
SELECT SUM(cnt1) / SUM(cnt2) ... do the right computation here ...
FROM ( SELECT COUNT(*) AS cnt1
,0 AS cnt2
FROM table1
UNION ALL
SELECT 0 AS cnt1
,COUNT(*) AS cnt2
FROM table2) AS sub

Using distinct and then an aggregate function in Postgresql?

This is a pretty basic problem and for whatever reason I can't find a reasonable solution. I'll do my best to explain.
Say you have an event ticket (section, row, seat #). Each ticket belongs to an attendee. Multiple tickets can belong to the same attendee. Each attendee has a worth (ex: Attendee #1 is worth $10,000). That said, here's what I want to do:
1. Group the tickets by their section
2. Get number of tickets (count)
3. Get total worth of the attendees in that section
Here's where I'm having problems: If Attendee #1 is worth $10,000 and is using 4 tickets, sum(attendees.worth) is returning $40,000. Which is not accurate. The worth should be $10,000. Yet when I make the result distinct on the attendee, the count is not accurate. In an ideal world it'd be nice to do something like
select
tickets.section,
count(tickets.*) as count,
sum(DISTINCT ON (attendees.id) attendees.worth) as total_worth
from
tickets
INNER JOIN
attendees ON attendees.id = tickets.attendee_id
GROUP BY tickets.section
Obviously this query doesn't work. How can I accomplish this same thing in a single query? OR is it even possible? I'd prefer to stay away from sub queries too because this is part of a much larger solution where I would need to do this across multiple tables.
Also, the worth should follow the ticket divided evenly. Ex: $10,000 / 4. Each ticket has an attendee worth of $5,000. So if the tickets are in different sections, they take their prorated worth with them.
Thanks for your help.
You need to aggregate the tickets before the attendees:
select ta.section, sum(ta.numtickets) as count, sum(a.worth) as total_worth
from (select attendee_id, section, count(*) as numtickets
from tickets
group by attendee_id, section
) ta INNER JOIN
attendees a
ON a.id = ta.attendee_id
GROUP BY ta.section
You still have a problem of a single attendee having seats in multiple sections. However, you do not specify how to solve that (apportion the worth? randomly choose one section? attribute it to all sections? canonically choose a section?)
Using jsonb_object_agg:
select
tickets.section,
count(tickets.*) as count,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(attendees.id, attendees.worth))
) as total_worth
from
tickets
INNER JOIN
attendees ON attendees.id = tickets.attendee_id
GROUP BY tickets.section

minimizing runtime of calculating several ranks (based on grades) on large sql tables, how short can it get (ms access)

I've been stuck with the rather famous problem of ranking students by grade for a couple weeks now, and while I've learned much, I still haven't solved my problem (the ranks are generated, but the process is too slow):
I have a large table (320,000 rows) that contains the student codes (serves as an identifier, instead of their names), the students classroom, the test , the tests date, the subject, the question number and the students grade on that question. This table is the base for everything else that is calculated and its size makes all these calculations very very slow, to the point where I find me almost breaking everything here at work.
First, some intel on the school (very little info, required to understand the problem)
Here at the school we have weekly tests over several subjects. The school is also separated in classrooms with different purposes (one is focused on math, physics and chemistry, another one is focused on biology, and the last one focuses on history, Portuguese and geography). But they all do the same tests every week.
What we want to do is calculate the standard deviation for each question for everyone in the school (not per-classroom) and the average grade per question (also for everyone in the school), and then generate the following ranks (all of them per date):
-Rank per subject per classroom (with "raw" grades), Rank per subject considering the whole school (with "raw" grades) and Rank per subject considering the whole school (using normalized grades, with the standard deviation per question and the average grade per question information)
-The same ranks that were mentioned above, but not per Subject, considering instead all subjects
As you can see, after calculating the average grades and the standard deviations, we still need to calculate the sums of the grades on each question, and rank according to these sums (the actual subject/test grades). I've attacked this problem in a few ways:
1)
Created two tables, one with the grades per student per subject (fields: Students code, Students classroom, Date of test, Subject, Grade, Normalized Grade, Rank in Classroom, Rank in School, Rank in School using normalized grades) and another with the grades per student per test (all subjects taken into account, fields: Students code, Students classroom, Date of test, Grade, Normalized Grade, Rank in Classroom, Rank in School, Rank in School using normalized grades).
The insertion of data in these tables takes about 50 seconds
Then, I tried using SQL to rank, however, I ran into some problems:
-Access has no ROW_NUMBER or RANK functions, and thus I have to use queries with COUNT, like (below is just a simplified version):
SELECT 1+(SELECT Count(*) FROM grades_table_per_subject t2 WHERE
t2.Grade > t1.Grade AND t1.Date=t2.Date AND t1.Subject=t2.Subject) AS [Global Rank],
1+(SELECT Count(*) FROM grades_table_per_subject t3 WHERE t3.Grade > t1.Grade AND
t3.Date=t1.Date AND t3.Subject=t1.Subject AND t3.Classroom=t1.Classroom) AS
[Rank in classroom] FROM grades_table_per_subject;
There still is the rank with the normalized grades in the query above, but I omitted it.
The table grades_table_per_subject has about 45,000 lines and this query takes more than 15 minutes here, even with indexing (tried many different index combinations, even some odd ones when I saw that the ones that should work didn't).
I also tried to ORDER BY Count() DESC the inner selects, but I hit ctrl+break after 7 minutes and no results.
2)
Added the following fields to the tables above:
Rank in Classroom, Rank in School, Rank in School using normalized grades
Then I tried using VBA with DAO and manually update the Rank fields, running the following code (simplified version):
Set rs = CurrentDb.OpenRecordset("SELECT Classroom, Date, Subject, Grade, [Rank in classroom] FROM
grades_table_per_subject ORDER BY Date, Classroom, Subject, Grade DESC;", dbOpenDynaset)
...
...
rs.movefirst
i=1
While Not rs.eof
'Verifies if there was a change on either one of Subject, Classroom, Date and if so:
...
i = 1
...
rs.Edit
rs![Rank in classroom]=i
rs.Update
i = i + 1
rs.movenext
Wend
rs.close
This obviously builds only one of the ranks (in this case per subject per classroom), and it takes alone 3min 10sec.
I verified that it takes so long due to the writes on the table (rs.Edit and rs.Update are the culprits, commenting them makes the whole thing run in only 4 seconds), but I need the ranks written to the table to generate an access report later.
FINALLY:
I could generate all the ranks once and make ways for the users to access all the data very quickly, but the idea is that everything should be calculated on-the-fly. The times we have achieved, however, make this impossible.
Overall, the question to be asked is the following:
-Is there a way to calculate the ranks shown above through an Access Query under 10 seconds, or to use VBA and calculate-insert these ranks to the table in a similar time considering the size of the tables used here?
Also, I would love to see a list of efficient ranking algorithms, so that even if I can't do everything quickly, I can improve it as much as possible.
I could generate all the ranks once and make ways for the users to access all the data very quickly, but the idea is that everything should be calculated on-the-fly.
Why?
Why bother regenerating the same data over and over? It's most likely preferable to generate these statistics when the data changes and just look them up every other time. Redoing work you've already done whenever somebody wants to check something is just silly.
I just saw you say ms access only
so ignore this answer -- or consider moving to a real DB if you want to be able to do this type of power processing.
original answer below
I don't have access to your test data, but how fast does this run?
SELECT RANK () OVER (PARTITION BY [Date],[Subject] ORDER BY Grade) AS [Global Rank],
RANK () OVER (PARTITION BY [Date],[Subject], Classroom ORDER BY Grade) AS [Rank in classroom]
FROM grades_table_per_subject
My guess is you are not going to be able to beat SQL Servers ranking speed in VBA, if this is not fast enough then you need to look in the profiler and see what indexes it suggests you make.