Counting Just One Record Per Pupil Though Multiple Are Matched - sql

I've set up a SQL Fiddle to illustrate the question...
I have a database of pupils (referenced by PupilId) who have assessments (AssessmentLevelId) recorded in various subjects (NCSubjectId) at various period (PeriodId).
Not every possible period may have an assessment in it.
PupilId | PeriodId | NCSubjectId | AssessmentLevelId
-----------------------------------------------------
100 | 1 | 10 | 1
100 | 3 | 10 | 2
200 | 1 | 10 | 1
300 | 1 | 10 | 1
400 | 1 | 10 | 1
100 | 5 | 10 | 2
300 | 7 | 10 | 2
100 | 15 | 10 | 2
I want to find the number of pupils who have a particular assessment level by a particular PeriodId.
So far I have this:
SELECT PupilId, COUNT(1) FROM NCAssessment
WHERE AssessmentLevelId = 2
AND NCSubjectId=10
AND PeriodId <= 10
GROUP BY PupilId
Which finds the pupil ids, but pupil 100 has a count of 2. I guess I need to wrap this in another query but am stumped. Any suggestions?
This is using Azure SQL.
Thanks.

If I understand your question correctly, I think this might be what you are looking for:
AssessmentLevelId = 2 has been removed from the query, because some Periods may not have an assessment.
SELECT AssessmentLevelID, PeriodID, COUNT(DISTINCT PupilID)
FROM NCAssessment
WHERE NCSubjectId=10 AND
PeriodId <= 10
GROUP BY AssessmentLevelID, PeriodID
If this isn't correct, could you please post a sample result you are expecting. Thanks!

If you want the number of distinct pupils that match, then use count(distinct):
SELECT COUNT(DISTINCT PupilId) as NumMatchingPupils, COUNT(*) as NumMatchingAssessments
FROM NCAssessment
WHERE AssessmentLevelId = 2 AND NCSubjectId = 10 AND PeriodId <= 10;
COUNT(DISTINCT) will count each pupil once, regardless of the number of maps. COUNT(*) or COUNT(1) will count the number of assessments that match.

Related

Sum of two tables using SQL

I'm trying to get the sum of two columns, but it seems to be adding incorrectly. I have a table Tbl_Booths and another table called Tbl_Extras.
In the Tbl_Booths:
BoothId | ExhId | BoothPrice
1 | 1 | 400
2 | 1 | 500
3 | 2 | 400
4 | 3 | 600
So totalBoothPrice for ExhId = 1 is 900
Tbl_Extras:
ExtraId | ExhId | Item | ItemCost
1 | 1 | PowerSupply | 400
2 | 2 | PowerSupply | 400
3 | 1 | Lights | 600
4 | 3 | PowerSupply | 400
5 | 4 | Lights | 400
So totalItemCost for ExhId = 1 is 1000
I need to find a way to get the sum of totalBoothPrice + totalItemCost
The value should of course be 900 + 1000 = 1900
I'm a total beginner to SQL so please have patience :-)
Thank you in advance for any input you can give me, since I'm going made here !
It is used in a Caspio database system.
You can use union all to combine the two tables and then aggregate:
select exhid, sum(price)
from ((select exhid, boothprice as price
from tbl_booths
) union all
(select exhid, itemcost as price
from tbl_extras
)
) e
group by exhid;
This returns the sum for all exhid values. If you want to filter them, then you can use a where clause in either the outer query or both subqueries.
Here is a db<>fiddle.
Booth totals:
select exhid, sum(boothprice) as total_booth_price
from tbl_booths
group by exhid;
Extra totals:
select exhid, sum(itemcost) as total_item_cost
from tbl_extras
group by exhid;
Joined:
select
exhid,
b.total_booth_price,
e.total_item_cost,
b.total_booth_price + e.total_item_cost as total
from
(
select exhid, sum(boothprice) as total_booth_price
from tbl_booths
group by exhid
) b
join
(
select exhid, sum(itemcost) as total_item_cost
from tbl_extras
group by exhid
) e using (exhid)
order by exhid;
This only shows exhids that have both booth and extras, though. If one can be missing use a left outer join. If one or the other can be missing, you'd want a full outer join, which MySQL doesn't support.

SQL GROUP BY and differences on same field (for MS Access)

Hi I have the following style of table under MS Access: (I didn't make the table and cant change it)
Date_r | Id_Person |Points |Position
25/05/2015 | 120 | 2000 | 1
25/05/2015 | 230 | 1500 | 2
25/05/2015 | 100 | 500 | 3
21/12/2015 | 120 | 2200 | 1
21/12/2015 | 230 | 2000 | 4
21/12/2015 | 100 | 200 | 20
what I am trying to do is to get a list of players (identified by Id_Person) ordered by the points difference between 2 dates.
So for example if I pick date1=25/05/2015 and date2=21/12/2015 I would get:
Id_Person |Points_Diff
230 | 500
120 | 200
100 |-300
I think I need to make something like
SELECT Id_Person , MAX(Points)-MIN(Points)
FROM Table
WHERE date_r = #25/05/2015# or date_r = #21/12/2015#
GROUP BY Id_Person
ORDER BY MAX(Points)-MIN(Points) DESC
But my problem is that i don't really want to order by (MAX(Points)-MIN(Points)) but rather by (points at date2 - points at date1) which can be different because points can decrease with the time.
One method is to use first and last However, this can sometimes produce strange results, so I think that conditional aggregation is best:
SELECT Id_Person,
(MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY (MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) DESC;
Because you have two dates, this is more easily written as:
SELECT Id_Person,
SUM(IIF(date_r = #25/05/2015#, Points, -Points)) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY SUM(IIF(date_r = #25/05/2015#, Points, -Points)) DESC;

Select the difference of two consecutive columns

I have a table car that looks like this:
| mileage | carid |
------------------
| 30 | 1 |
| 50 | 1 |
| 100 | 1 |
| 0 | 2 |
| 70 | 2 |
I would like to get the average difference for each car. So for example for car 1 I would like to get ((50-30)+(100-50))/2 = 35. So I created the following query
SELECT AVG(diff),carid FROM (
SELECT (mileage-
(SELECT Max(mileage) FROM car Where mileage<mileage AND carid=carid GROUP BY carid))
AS diff,carid
FROM car GROUP BY carid)
But this doesn't work as I'm not able to use current row for the other column. And I'm quite clueless on how to actually solve this in a different way.
So how would I be able to obtain the value of the next row somehow?
The average difference is the maximum minus he minimum divided by one less than the count (you can do the arithmetic to convince yourself this is true).
Hence:
select carid,
( (max(mileage) - min(mileage)) / nullif(count(*) - 1, 0)) as avg_diff
from cars
group by carid;

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

Counting Only Most Recent Entries Matching Multiple Conditions

I have a database table similar to this (but many more entries):
PupilId | PeriodId | Assessment
-------------------------------
1 | 10 | 7
1 | 30 | 7
1 | 50 | 7
2 | 20 | 7
3 | 10 | 7
3 | 20 | 8
I want to find the number of pupils (i.e. distinct PupilId) who got a given assessment at some point up to and including a given PeriodId. Only the most recent assessment before or on the given PeriodId should be used.
For instance:
Number of pupils who got 7 on or before PeriodId 100 = 2 (PupilId 1 and 2)
Number of pupils who got a 7 on or before PeriodId 10 = 2 (PupilId 1 and 3)
Number of pupils who got 8 on or before PeriodId 30 = 1 (PupilId 3)
This is for SQL Azure.
Many thanks.
OK, no answers so here's what I came up with after help from another source:
SELECT COUNT(1)
FROM (
SELECT PupilId AS pupil_id, Max(PeriodId) AS max_period
FROM steph1
WHERE PeriodId <= 100
GROUP BY PupilId
) steph2, steph1
WHERE
PupilId=pupil_id AND
max_period = PeriodId AND
Assessment = 7
Hope that helps somebody else with the same issue.