SQL Group by - Aggregate a column based on custom logic

SQL Group by - Aggregate a column based on custom logic - sql

I have come across a case where I need to aggregate the table based on custom logic.
Student_id
Subject
Result
123
Maths
FAILED
123
English
PASS
456
Maths
PASS
456
English
PASS
Now I want the result in the following way:
Student_id
overall_result
123
FAILED ( If a student failed in any subject, overall he is failed)
456
PASS
Any idea how can I do this using SQL?
EDITED: Here can be other values also like GRACE , SUPPLYMENT. A student is passed only and only when all values are PASS, otherwise FAILED

If your field "Result" allows only "PASSED" and "FAILED" values, you can use the aggregate function MIN to retrieve the smallest value between the two. If there's at least one "FAILED", it will get returned.
SELECT Student_id, MIN(Result)
FROM tab
GROUP BY Student_id
Check the demo here.
If you can have multiple values too, you can enforce any value different than "PASSED" to "FAILED", then get the MIN as per previous query.
SELECT Student_id,
MIN(CASE WHEN Result = 'PASS' THEN 'PASS'
ELSE 'FAILED'
END) AS Result
FROM tab
GROUP BY Student_id
Check the demo here.

We can use COUNT or SUM with CASE.
If the sum of FAILED values is > 0, this means the overall result will be FAILED, otherwise it will be PASS.
SELECT Student_id,
CASE
WHEN COUNT(CASE WHEN Result = 'FAILED' THEN 1 END) > 0
THEN 'FAILED'
ELSE 'PASS' END AS overall_result
FROM tab
GROUP BY Student_id;
If we use MySQL, the above query can be simplified to:
SELECT Student_id,
CASE
WHEN SUM(Result = 'FAILED') > 0
THEN 'FAILED'
ELSE 'PASS' END AS overall_result
FROM tab
GROUP BY Student_id;
Tested here: db<>fiddle

Related

SQL Count Distinct returning one extra count

How is this possible that these two methods are returning different results?
Method 1 (returns correct count):
SELECT COUNT(DISTINCT contact_id)
FROM Traffic_Action
WHERE action_type IN ('Schedule a Tour', 'Schedule Follow-up', 'Lost')
Method 2 (returns one extra count):
SELECT COUNT(DISTINCT CASE WHEN action_type IN ('Schedule a Tour', 'Schedule Follow-up', 'Lost') THEN contact_id ELSE 0 END)
FROM Traffic_Action

Remove the else part - as 0 is also counted
SELECT COUNT(DISTINCT CASE WHEN
action_type in ('Schedule a Tour','Schedule Follow-up','Lost') THEN contact_id END)
FROM Traffic_Action

No wonder you are getting two different results.
First query:
Provides you the distinct count of records where action_type in Schedule a Tour, Schedule Follow-up and Lost
SELECT COUNT(DISTINCT contact_id) FROM Traffic_Action WHERE action_type in
('Schedule a Tour','Schedule Follow-up','Lost')
Second query:
In this query any value apart from Schedule a Tour, Schedule Follow-up and Lost is considered as 0, and on taking distinct value, results one row according to your case statement
SELECT COUNT(DISTINCT CASE WHEN action_type in ('Schedule a Tour','Schedule Follow-
up','Lost') THEN contact_id ELSE 0 END) FROM Traffic_Action
In simple words,
In first query you are filtering only three values
In second query you have no filters, but case statement on three values and else condition to return 0 for non matching criteria

That means you have 1 record where contact_id is NULL. Normally, COUNT() ignores NULL values. Your second query converts NULL to zero via the "ELSE" branch. That should be why you see a difference.
You can quickly see for yourself in this example. This will return 2 although there are 3 records
select count(distinct a.col1)
from (
select 1 as Col1
union select 2
union select NULL
) a

SUM CASE WHEN (SQL)

I want to apply a simple SUM CASE WHEN to a table I have created from CSV (in DB Browser for SQLite). At the moment I have the below code:
Select user_id,
CASE Overall_Result
WHEN 'Fail' then 0
WHEN 'Pass' then 1 else 0
END as "Case When"
from NewTable
This code gives me:
Image
However, now I want to amend the above code so that it SUMS the values in the created CASE WHEN column for every particular user_id. As you can see in e.g. row 1 and 2 of the image, we have the same user_id and the corresponding values in the second column. I want to combine these user_id's so there is just one instance of each in column 1 and then have the SUM of all the corresponding values for this user_id in the second column. GROUP BY doesn't seem to do this..

Do you just want aggregation?
select user_id,
SUM(CASE Overall_Result WHEN 'Fail' then 0 WHEN 'Pass' then 1 else 0
END) as sum_value
from NewTable
group by user_id;
Or, if you just want the number of passes, you can use the shorthand:
SUM(Overall_Result = 'Pass')

just add the sum() and the related group by eg:
Select user_id,
sum( CASE Overall_Result
WHEN 'Fail' then 0
WHEN 'Pass' then 1 else 0
END ) as "Case When"
from NewTable
group by user_id

SQL query syntax in CASE WHEN ELSE END to count

Writing a query to find the number of ED visits that were discharged from non-ED units.
The column dep.ADT_UNIT_TYPE_C column stores 1 if the unit was an ED unit.
Assume NULL values are non-ED units for the purpose of this query.
Which of the following produces this number?
I am thinking it is A because in my mind, that sound the correct syntax.
COUNT(CASE WHEN THEN ELSE END standard format)
A has that.
B doesn't have the THEN? so it is incorrect syntax?
Please help me understanding the nuances between these choices.
A.)
COUNT( CASE WHEN dep.ADT_UNIT_TYPE_C is NULL OR dep.ADT_UNIT_TYPE_C <> 1 THEN NULL
ELSE 1
END )
B.)
COUNT( CASE WHEN dep.ADT_UNIT_TYPE_C is NULL or dep.ADT_UNIT_TYPE_C <> 1
ELSE NULL
END)
C.)
CASE WHEN dep.ADT_UNIT_TYPE_C Is NULL or dep.ADT_UNIT_TYPE_C <> 1 THEN COUNT (NULL)
ELSE COUNT (1)
END
D.)
CASE WHEN dep.ADT_UNIT_TYPE_C is NULL or dep.ADT_UNIT_TYPE_C <> 1 THEN COUNT(1)
ELSE COUNT(NULL)
END

You can count the records that are returned COUNT(*) and put the condition in the where clause.
If you are using Oracle, you can use NVL.
The sample below is for Oracle, but if using mysql or SQL server, you can use the ISNULL Function.
SELECT COUNT(*) FROM dep WHERE NVL(ADT_UNIT_TYPE_C, 0) != 1
It looks like however, you are joining this to another table, probably a visit table. So, you want to count visits. Visits probably stores some kind of department id or way to join it to departments.
Something like this:
SELECT COUNT(*) FROM visit v, departments d WHERE v.dep_id = d.dep_id AND NVL(d.ADT_UNIT_TYPE_C, 0) !=1
If you want the entire list like shown above, you want to use a group by. This will show you the count for each visit by department type.
SELECT COUNT(*) FROM visit v, departments d GROUP BY d.ADT_UNIT_TYPE_C

count boolean column, and average another column based on boolean column

CREATE TABLE test (
calculate_time int4 NULL,
status bool NULL
);
INSERT INTO test (calculate_time,status) VALUES
(10,true)
,(15,true)
,(20,true)
,(20,true)
,(5,false)
,(10,false)
,(15,false)
,(100,NULL)
,(200,NULL)
,(300,NULL)
;
With this query it average all calculated_time values. Is there a way I can tell it only average ones where status = true? I tried adding a where clause but would make failed and suspended result in 0.
select
avg(calculate_time) as cal_time,
count(case when status = true then 1 end) as completed,
count(case when status = false then 1 end) as failed,
count(case when status is null then 1 end) as suspended
from test;

You seem to understand the concept of conditional aggregation. You can just also use a CASE expression for the average as you did for the other terms in your select:
select
avg(case when status then calculate_time end) as cal_time,
count(case when status then 1 end) as completed,
count(case when not status then 1 end) as failed,
count(case when status is null then 1 end) as suspended
from test;
This works because the AVG function, like most of the other aggregate functions, ignore NULL values. So the records for which status is not true, their calculate_time values would be effectively ignored and would not influence the overall average.
Other side note: You may use boolean values in a Postgres query directly without comparing them to true. That is, the following two CASE expressions are equivalent, with the second one being less terse:
avg(case when status = true then calculate_time end) as cal_time,
avg(case when status then calculate_time end) as cal_time,

Adding to #Tim's answer, since Postgres 9.4 you can add a filter clause to aggregate function calls, which may save you some of the boiler-plate of writing your own case expressions:
select
avg(calculate_time) filter (where status) as cal_time,
count(*) filter (where status) as completed,
count(*) filter (where not status) as failed,
count(*) filter (where status is null) as suspended
from test;

SQL Multiple Rows to Single Row Multiple Columns

I am including a SQLFiddle to show as an example of where I am currently at. In the example image you can see that simply grouping you get up to two lines per user depending on their status and how many of those statuses they have.
http://sqlfiddle.com/#!3/9aa649/2
The way I want it to come out is to look like the image below. Having a single line per user with two totaling columns one for Fail Total and one for Pass Total. I have been able to come close but since BOB only has Fails and not Passes this query leaves BOB out of the results. which I want to show BOB as well with his 6 Fail and 0 Pass
select a.PersonID,a.Name,a.Totals as FailTotal,b.Totals as PassTotals from (
select PersonID,Name,Status, COUNT(*) as Totals from UserReport
where Status = 'Fail'
group by PersonID,Name,Status) a
join
(
select PersonID,Name,Status, COUNT(*) as Totals from UserReport
where Status = 'Pass'
group by PersonID,Name,Status) b
on a.PersonID=b.PersonID
The below picture is what I want it to look like. Here is another SQL Fiddle that shows the above query in action
http://sqlfiddle.com/#!3/9aa649/13

Use conditional aggregation if the number of values for status column is fixed.
Fiddle
select PersonID,Name,
sum(case when "status" = 'Fail' then 1 else 0 end) as failedtotal,
sum(case when "status" = 'Pass' then 1 else 0 end) as passedtotals
from UserReport
group by PersonID,Name

Use conditional aggregation:
select PersonID, Name,
sum(case when Status = 'Fail' then 1 else 0 end) as FailedTotal,
sum(case when Status = 'Pass' then 1 else 0 end) as PassedTotal
from UserReport
group by PersonID, Name;

With conditional aggregation:
select PersonID,
Name,
sum(case when Status = 'Fail' then 1 end) as Failed,
sum(case when Status = 'Passed' then 1 end) as Passed
from UserReport
group by PersonID, Name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Group by - Aggregate a column based on custom logic - sql

Related

SQL Count Distinct returning one extra count

SUM CASE WHEN (SQL)

SQL query syntax in CASE WHEN ELSE END to count

count boolean column, and average another column based on boolean column

SQL Multiple Rows to Single Row Multiple Columns

Categories

Resources