SQL Aggregate functions with groupings - sql

I need to create some checks to make sure that students are enrolled in the correct courses with the correct number of units. Here is my SQL at the moment.
SELECT StudentID
,AssessmentCode
,BoardCode
,BoardCategory
,BoardUnits
,sum(cast(boardunits as int)) over (partition by studentid,boardcategory) as UnitCount
,Count(boardcategory) over (partition by studentid) as SubjectCount
FROM uvNCStudentSubjectDetails
where fileyear = 2015
and filesemester = 1
and studentyearlevel = 11
and StudentIBFlag = 0
order by Studentnameinternal,BoardCategory
This gives me the following info...
StudentID AssessmentCode BoardCode BoardCategory BoardUnits UnitCount SubjectCount
61687 11TECDAT 11080 A 2 11 7
61687 11PRS1U 11350 A 1 11 7
61687 11MATGEN 11235 A 2 11 7
61687 11LANGRB 11870 A 2 11 7
61687 11ENGSTD 11130 A 2 11 7
61687 11GEOGEO 11190 A 2 11 7
64549 11TECIND 11200 A 2 10 7
64549 11SCIPHY 11310 A 2 10 7
64549 11SCIEAE 11100 A 2 10 7
64549 11MATGEN 11235 A 2 10 7
64549 11ENGSTD 11130 A 2 10 7
64549 11TECHOS 26501 B 2 2 7
64549 11MUSDRS 63212 C 1 1 7
45461 11ECOECO 11110 A 2 13 7
45461 11ENGADV 11140 A 2 13 7
45461 11HISMOD 11270 A 2 13 7
45461 11HISLST 11220 A 2 13 7
45461 11MATMAT 11240 A 2 13 7
45461 11PRS1U 11350 A 1 13 7
45461 11SCIBIO 11030 A 2 13 7
Note for the first student, I have a count of Category A subject Units (11 in total) He is only doing Category A subjects. For the second student, he has 10 units of Category A subjects, he is doing 1 Category B subject worth 2 units and one category C subject worth 1 unit. the final student just has 13 Category A units.
Now what I would really like is something like this...!
StudentID Sum A Units Sum B Units Sum C Units Sum A Units + Sum B Units Count of Subjects
61687 11 0 0 11 7
64549 10 2 1 12 7
45461 13 0 0 13 7
So I would like some aggregated functions with a student grouped onto only 1 row and the sum of his different units as separate fields. I would also like a field which sums the Category A and B Units and also a field which gives a count of the total number of subjects they are doing. I could then use this data to set up some warning messages if a student is not doing the correct number of A or B Units etc
I have played around with common table expressions, subqueries etc but am not really sure what I am doing and am not sure which is the correct way about getting the data in the form I want.
Is anyone able to help?

SELECT
STUDENTID,
SUM(CASE BOARDCATEGORY WHEN 'A' THEN 1 ELSE 0 END) AS SUM_A_UNITS,
SUM(CASE BOARDCATEGORY WHEN 'B' THEN 1 ELSE 0 END) AS SUM_B_UNITS,
SUM(CASE BOARDCATEGORY WHEN 'C' THEN 1 ELSE 0 END) AS SUM_C_UNITS,
SUM(CASE BOARDCATEGORY WHEN 'A' THEN 1 WHEN 'B' THEN 1 ELSE 0 END) AS SUM_A_UNITS+SUM_B_UNITS,
COUNT(BOARDCODE) AS COUNT_OF_SUBJECTS
FROM (
SELECT StudentID
,AssessmentCode
,BoardCode
,BoardCategory
,BoardUnits
,sum(cast(boardunits as int)) over (partition by studentid,boardcategory) as UnitCount
,Count(boardcategory) over (partition by studentid) as SubjectCount
FROM uvNCStudentSubjectDetails
where fileyear = 2015
and filesemester = 1
and studentyearlevel = 11
and StudentIBFlag = 0
order by Studentnameinternal,BoardCategory
)
GROUP BY STUDENTID;
Wrapped your SQL statement in the solution, so that you can see what the solution does straight away.
Use SUM and CASE (i.e. SUM only when a condition is met).

Related

How to rewrite query which gives amount of specific value in row to avoid some values and count further with others?

I have a query which gives me amount of grade 5 for every student in row (if student don't have any other grade on the way):
select distinct on (student, class) scg.*
from (select student, class, grade, count(*) as cnt,
min(gradeDate), max(gradeDate), min_gradeDate, max_gradeDate
from (select t.*,
row_number() over (partition by student, class, grade order by gradeDate) as seqnum_scg,
row_number() over (partition by student, class order by gradeDate) as seqnum_sc
from t
) t
where grade = 5
group by student, class, grade, (seqnum_sc - seqnum_scg)
) scg
order by student, class, cnt desc;
The original problem is explained here:
How to count data with specific values and for specific user/person (in row)?
But now I want to extend this query with one more feature. This counter gives me max value unless some student have grade 4/3/2/1, but now I want it to:
stop counting if student has 4 or 3 grade and start over (with previous max) when student get another 5
What I mean:
Actual query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 3
New query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 5, because 4 and 3 stop counter and start it when user gets another 5
stop counting if student gets grade 2 or 1 (and give me max value before getting 2/1 grade) So the same thing which query does now for every grade except 5, but I want it only for 2 and lower (that I can specify in query).
Can someone help me rewrite the second query given by #Gordon Linoff to work like that and tell me what changed?
Edit: examples as requested:
id student grade class gradeDate
1 1 5 1 2017-03-03
2 1 5 1 2017-03-04
3 1 1 1 2017-03-05
4 1 5 1 2017-03-06
5 1 5 1 2017-03-07
6 1 5 1 2017-03-08
7 1 1 1 2017-03-09
8 2 5 2 2017-03-03
9 3 5 3 2017-03-03
10 4 5 4 2017-03-03
11 4 5 4 2017-03-04
12 4 4 4 2017-03-05
13 4 3 4 2017-03-06
14 4 5 4 2017-03-07
15 4 5 4 2017-03-08
16 5 5 5 2017-03-01
17 5 5 5 2017-03-03
18 5 5 5 2017-03-04
19 5 5 5 2017-03-05
20 5 5 5 2017-03-06
21 5 2 5 2017-03-07
22 5 5 5 2017-03-08
23 5 5 5 2017-03-09
Student one : max = 3
Student two : max = 1
Student three : max = 1
Student four : max = 4 (grade 4 and 3 stop counter, but don't reset it)
Student five : max = 5 (because grade 2 reset counter, lack of grade on date
2017-03-02 is not a problem for counter)
One of the methods can be using 2 subqueries and one analytic function
Demo: http://sqlfiddle.com/#!15/74b71/10
SELECT student, max( xxx )
FROM (
SELECT student, grp_nbr, count(CASE WHEN grade = 5 THEN 1 END) As xxx
FROM (
SELECT *,
SUM ( CASE WHEN grade in (1,2)
THEN 1 ELSE 0
END
) OVER (Partition by student Order By gradeDate ) As grp_nbr
FROM table1
) x
GROUP BY student, grp_nbr
) y
GROUP BY student
ORDER BY student
| student | max |
|---------|-----|
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
| 4 | 4 |
| 5 | 5 |

In a game show database scenario, how do I fetch the average total episode score per season in a single query?

Pardon the title gore. I'm having trouble finding a good way to express my question, which is endemic to the problem.
The Tables
season
id name
---- ------
1 Season 1
2 Season 2
3 Season 3
episode
id season_id number title
---- ----------- -------- ---------------------------------------
1 1 1 Pilot
2 1 2 1x02 - We Got Picked Up
3 1 3 1x03 - This is the Third Episode
4 2 1 2x01 - We didn't get cancelled.
5 2 2 2x02 - We're running out of ideas!
6 3 1 3x01 - We're still here.
7 3 2 3x02 - Okay, this game show is dying.
8 3 3 3x03 - Untitled
score
id episode_id score contestant_id (table not given)
---- ------------ ------- ---------------------------------
1 1 35 1
2 1 -12 2
3 1 8 3
4 1 5 4
5 2 13 1
6 2 -2 5
7 2 3 3
8 2 -14 6
9 3 -14.5 1
10 3 -3 2
11 3 1.5 7
12 3 9.5 5
13 4 22.8 1
14 4 -3 8
15 5 2 1
16 5 13.5 9
17 5 7 3
18 6 13 1
19 6 -84 10
20 6 12 11
21 7 3 1
22 7 10 2
23 8 29 1
24 8 1 5
As you can see, you have multiple episodes per season, and multiple scores per episode (one score per contestant). Contestants can reappear in later episodes (irrelevant), scores are floating point values, and there can be an arbitrary number of scores per episode.
So what am I looking for?
I'd like to get the average total episode score per season, where the total episode score is the sum of all the scores in an episode. Mathematically, this comes out to be the sum of all scores in a season divided by the number of episodes. Easy enough to comprehend, but I have had trouble doing it in a single query and getting the correct result. I'd like an output like the following:
name average_total_episode_score
---------- -----------------------------
Season 1 9.83
Season 2 21.15
Season 3 -5.33
The top-level query needs to be on the season table as it will be combined with other, similar queries on the same table. It's easy enough to do this with an aggregate in a subquery, but an aggregation executes the subquery, failing my single-query requirement. Can this be done in a single query?
Hope this should work
Select s.id, avg(score)
FROM Season S,
Episode e,
Score sc
WHERE s.id = e.season_id
AND e.id = sc.episode_id
Group by s.id
Okay, just figured it out. As usual, I had to write and post a whole book before the simple solution descended upon me.
The problem in my query (which I didn't give in the question) was the lack of a DISTINCT count. Here is a working query:
SELECT
"season"."id",
"season"."name",
(SUM("score"."score") / COUNT(DISTINCT "episode"."id")) AS "average_total_episode_score"
FROM "season"
LEFT OUTER JOIN "episode"
ON ("season"."id" = "episode"."season_id")
LEFT OUTER JOIN "score"
ON ("episode"."id" = "score"."episode_id")
GROUP BY "season"."id"
select Se.id AS Season_Id, sum(score) As season_score, avg(score) from score S join episode E ON S.episode_id = E.id
join Season se ON se.id = e.season_id group by se.id

MDX: iif condition on the value of dimension

I have 1 Virtual cube consists of 2 cubes.
Example of fact table of 1st cube.
id object_id time_id date_id state
1 10 2 1 0
2 11 5 1 0
3 10 7 1 1
4 10 3 1 0
5 11 4 1 0
6 11 7 1 1
7 10 8 1 0
8 11 5 1 0
9 10 7 1 1
10 10 9 1 2
Where State: 0 - Ok, 1 - Down, 2 - Unknown
For this cube I have one measure StateCount it should count States for each object_id.
Here for example we have such result:
for 10 : 3 times Ok , 2 times Down, 1 time Unknown
for 11 : 3 times Ok , 1 time Down
Second cube looks like this:
id object_id time_id date_id status
1 10 2 1 0
2 11 5 1 0
3 10 7 1 1
4 10 3 1 1
5 11 4 1 1
Where Status: 0 - out, 1 - in. I keep this in StatusDim.
In this table I keep records that should not be count. If object have status 1 that means that I have exclude it from count.
If we intersect these tables and use StateCount we will receive this result:
for 10 : 2 times Ok , 1 times Down, 1 time Unknown
for 11 : 2 times Ok , 1 time Down
As far as i know, i must use calculated member with IIF condition. Currently I'm trying something like this.
WITH MEMBER [Measures].[StateTimeCountDown] AS(
iif(
[StatusDimDown.DowntimeHierarchy].[DowntimeStatus].CurrentMember.MemberValue
<> "in"
, [Measures].[StateTimeCount]
, null )
)
The multidimensional way to do this would be to make attributes from your state and status columns (hopefully with user understandable members, i. e. using "Ok" and not "0"). Then, you can just use a normal count measure on the fact tables, and slice by these attributes. No need for complex calculation definitions.

How to find count from two joined tables

We have to find count for each risk category for impact level as shown in last result part
Risk Table
RiskID RiskName
----------------------
1 Risk1
2 Risk2
3 Risk3
4 Risk4
5 Risk5
6 Risk6
7 Risk7
8 Risk8
9 Risk9
10 Risk10
11 Risk11
Category Table
Cat_ID Cat_Name
--------------------------
1 Design
2 Operation
3 Technical
Risk_Category table
Risk_ID Category_ID
------------------------
1 1
1 2
2 1
3 1
3 3
4 1
5 2
6 1
7 3
8 1
9 3
10 3
Risk_Impact_Assessment table
Risk_ID Impact_Level Impact_Score
---------------------------------------------
1 High 20
2 Medium 15
3 High 20
4 Low 10
5 High 20
6 High 20
7 High 20
8 Low 10
9 Medium 15
10 Low 15
11 Medium 15
Result should be like this
Cat_Name Impact_Level_High Impact_Level_Medium Impact_Level_Low
-------------------------------------------------------------------------------------
Design 1 1 2
Operation 2
Technical 2 2 1
You probably want to use the group by clause, along with case, eg.:
select
Cat_Name,
sum(case when Impact_Level = 'High' then 1 else 0 end) as [Impact_Level_High],
sum(case when Impact_Level = 'Medium' then 1 else 0 end) as [Impact_Level_Medium],
sum(case when Impact_Level = 'Low' then 1 else 0 end) as [Impact_Level_Low]
from [Risk_Impact_Assessment]
...
group by Cat_Name;
(I left out all the joins, I assume you can write these no problem)
You can use this trick to accomplish a lot of cool things, including parametric sorting and (just like here) complicated aggregate functions with little work.

SQL for MS Access: Another question about COUNT, JOIN, 0s and Dates

I asked a question regarding joins yesterday. However although that answer my initial question, i'm having more problems.
I have a telephony table
ID | Date | Grade
1 07/19/2010 Grade 1
2 07/19/2010 Grade 1
3 07/20/2010 Grade 1
4 07/20/2010 Grade 2
5 07/21/2010 Grade 3
I also have a Grade table
ID | Name
1 Grade 1
2 Grade 2
3 Grade 3
4 Grade 4
5 Grade 5
6 Grade 6
7 Grade 7
8 Grade 8
9 Grade 9
10 Grade 10
11 Grade 11
12 Grade 12
I use the following query to get the COUNT of every grade in the telephony table, it works great.
SELECT grade.ID, Count(telephony.Grade) AS Total
FROM grade LEFT JOIN telephony ON grade.ID=telephony.Grade
GROUP BY grade.ID
ORDER BY 1;
This returns
ID | Total
1 3
2 1
3 1
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
However, what i'm trying to do is the following:
Group by date and only return results between two dates
SELECT telephony.Date, grade.ID, Count(telephony.Grade) AS Total
FROM grade LEFT JOIN telephony ON grade.ID=telephony.Grade
WHERE telephony.Date BETWEEN #07/19/2010# AND #07/23/2010#
GROUP BY telephony.Date, grade.ID
ORDER BY 1;
I'm getting the following
Date | ID | Total
07/19/2010 1 2
07/20/2010 1 1
07/20/2010 2 1
07/21/2010 3 1
It's not returning all the grades with 0 entries between the two dates, only the entries that exist for those dates. What i'm looking for is something like this:
Date | ID | Total
07/19/2010 1 2
07/19/2010 2 0
07/19/2010 3 0
07/19/2010 4 0
07/19/2010 5 0
07/19/2010 6 0
07/19/2010 7 0
07/19/2010 8 0
07/19/2010 9 0
07/19/2010 10 0
07/19/2010 11 0
07/19/2010 12 0
07/20/2010 1 1
07/20/2010 2 1
07/20/2010 3 0
07/20/2010 4 0
07/20/2010 5 0
07/20/2010 6 0
07/20/2010 7 0
07/20/2010 8 0
07/20/2010 9 0
07/20/2010 10 0
07/20/2010 11 0
07/20/2010 12 0
07/21/2010 1 2
07/21/2010 2 0
07/21/2010 3 1
07/21/2010 4 0
07/21/2010 5 0
07/21/2010 6 0
07/21/2010 7 0
07/21/2010 8 0
07/21/2010 9 0
07/21/2010 10 0
07/21/2010 11 0
07/21/2010 12 0
07/22/2010 1 2
07/22/2010 2 0
07/22/2010 3 0
07/22/2010 4 0
07/22/2010 5 0
07/22/2010 6 0
07/22/2010 7 0
07/22/2010 8 0
07/22/2010 9 0
07/22/2010 10 0
07/22/2010 11 0
07/22/2010 12 0
07/23/2010 1 2
07/23/2010 2 0
07/23/2010 3 0
07/23/2010 4 0
07/23/2010 5 0
07/23/2010 6 0
07/23/2010 7 0
07/23/2010 8 0
07/23/2010 9 0
07/23/2010 10 0
07/23/2010 11 0
07/23/2010 12 0
I hope someone can help. I'm using Microsoft Access 2003.
Cheers
Create a separate query on telephony which uses your BETWEEN #07/19/2010# AND #07/23/2010# constraint.
qryTelephonyDateRange:
SELECT *
FROM telephony
WHERE [Date] BETWEEN #07/19/2010# AND #07/23/2010#;
Then, in your original query, use:
LEFT JOIN qryTelephonyDateRange ON grade.ID=qryTelephonyDateRange.Grade
instead of
LEFT JOIN telephony ON grade.ID=telephony.Grade
You could use a subquery instead of a separate named query for qryTelephonyDateRange.
Note Date is a reserved word, so I bracketed the name to avoid ambiguity ... Access' database engine will understand it is supposed to be looking for a field named Date instead of the VBA Date() function. However, if it were my project, I would rename the field to avoid ambiguity ... name it something like tDate.
Update: You asked to see a subquery approach. Try this:
SELECT g.ID, t.[Date], Count(t.Grade) AS Total
FROM
grade AS g
LEFT JOIN (
SELECT Grade, [Date]
FROM telephony
WHERE [Date] BETWEEN #07/19/2010# AND #07/23/2010#
) AS t
ON g.ID=t.Grade
GROUP BY g.ID, t.[Date]
ORDER BY 1, 2;
Try this:
SELECT grade.ID, Count(telephony.Grade) AS Total
FROM grade LEFT JOIN telephony ON grade.ID=telephony.Grade
GROUP BY grade.ID
HAVING COUNT(telephony.Grade) > 0
ORDER BY grade.ID;
That's completely different.
You want a range of individual dates joined with your first table, and the between clause isn't going to do that for you.
I think you'll need a table with all the dates you want, say from 1/1/2010 to 12/31/2010, or whatever range you need to support. One column, 365 or however many rows with one date value each.
then join that table with the ones with the dates and grades, and limit by your date range,
then do the aggregation to count.
Take it one step at a time and it will be easier to figure out.
The way I got it to work was to:
Create a table named Dates with a single primary key date/time field named MyDate (I'm with HansUp on not using reserved words like "Date" for field names).
Fill the table with the date values I wanted (7/19/2010 to 7/23/2010, as in your example).
Write a query with the following SQL statement
SELECT x.MyDate AS [Date], x.ID, Count(t.ID) AS Total
FROM (SELECT Dates.MyDate, Grade.ID FROM Dates, Grade) AS x
LEFT JOIN Telephony AS t ON (x.MyDate = t.Date) AND (x.ID = t.Grade)
GROUP BY x.MyDate, x.ID;
That should get the results you asked for.
The subquery statement in the SQL creates a cross-join to get you every combination of date in the Dates table and grade in the Grade table.
(SELECT Dates.MyDate, Grade.ID FROM Dates, Grade) AS x
Once you have that, then it's just an outer join to the Telephony table to do the rest.