count boolean column, and average another column based on boolean column - sql

CREATE TABLE test (
calculate_time int4 NULL,
status bool NULL
);
INSERT INTO test (calculate_time,status) VALUES
(10,true)
,(15,true)
,(20,true)
,(20,true)
,(5,false)
,(10,false)
,(15,false)
,(100,NULL)
,(200,NULL)
,(300,NULL)
;
With this query it average all calculated_time values. Is there a way I can tell it only average ones where status = true? I tried adding a where clause but would make failed and suspended result in 0.
select
avg(calculate_time) as cal_time,
count(case when status = true then 1 end) as completed,
count(case when status = false then 1 end) as failed,
count(case when status is null then 1 end) as suspended
from test;

You seem to understand the concept of conditional aggregation. You can just also use a CASE expression for the average as you did for the other terms in your select:
select
avg(case when status then calculate_time end) as cal_time,
count(case when status then 1 end) as completed,
count(case when not status then 1 end) as failed,
count(case when status is null then 1 end) as suspended
from test;
This works because the AVG function, like most of the other aggregate functions, ignore NULL values. So the records for which status is not true, their calculate_time values would be effectively ignored and would not influence the overall average.
Other side note: You may use boolean values in a Postgres query directly without comparing them to true. That is, the following two CASE expressions are equivalent, with the second one being less terse:
avg(case when status = true then calculate_time end) as cal_time,
avg(case when status then calculate_time end) as cal_time,

Adding to #Tim's answer, since Postgres 9.4 you can add a filter clause to aggregate function calls, which may save you some of the boiler-plate of writing your own case expressions:
select
avg(calculate_time) filter (where status) as cal_time,
count(*) filter (where status) as completed,
count(*) filter (where not status) as failed,
count(*) filter (where status is null) as suspended
from test;

Related

SQL Group by - Aggregate a column based on custom logic

I have come across a case where I need to aggregate the table based on custom logic.
Student_id
Subject
Result
123
Maths
FAILED
123
English
PASS
456
Maths
PASS
456
English
PASS
Now I want the result in the following way:
Student_id
overall_result
123
FAILED ( If a student failed in any subject, overall he is failed)
456
PASS
Any idea how can I do this using SQL?
EDITED: Here can be other values also like GRACE , SUPPLYMENT. A student is passed only and only when all values are PASS, otherwise FAILED
If your field "Result" allows only "PASSED" and "FAILED" values, you can use the aggregate function MIN to retrieve the smallest value between the two. If there's at least one "FAILED", it will get returned.
SELECT Student_id, MIN(Result)
FROM tab
GROUP BY Student_id
Check the demo here.
If you can have multiple values too, you can enforce any value different than "PASSED" to "FAILED", then get the MIN as per previous query.
SELECT Student_id,
MIN(CASE WHEN Result = 'PASS' THEN 'PASS'
ELSE 'FAILED'
END) AS Result
FROM tab
GROUP BY Student_id
Check the demo here.
We can use COUNT or SUM with CASE.
If the sum of FAILED values is > 0, this means the overall result will be FAILED, otherwise it will be PASS.
SELECT Student_id,
CASE
WHEN COUNT(CASE WHEN Result = 'FAILED' THEN 1 END) > 0
THEN 'FAILED'
ELSE 'PASS' END AS overall_result
FROM tab
GROUP BY Student_id;
If we use MySQL, the above query can be simplified to:
SELECT Student_id,
CASE
WHEN SUM(Result = 'FAILED') > 0
THEN 'FAILED'
ELSE 'PASS' END AS overall_result
FROM tab
GROUP BY Student_id;
Tested here: db<>fiddle

Add a category column to dataset based on some statuses in another column

I have a table with a 'status' column for every 'workflow'. Status can be one of 'Ran', 'Successful' , 'Failed'. Please note every workflow that runs will have 'Ran' status, and there will be separate rows in the table to describe the end result of workflow, i.e. whether it 'Failed' or 'Succeeded', its even possible that nothing was captured as final status of the workflow.
I only want to filter the rows that had 'Ran' status, and add one more column to such rows that will divide the number of 'Ran' rows into three categories (I am only interested in right proportion of 'Ran' rows being assigned to these categories) - 'Successful', 'Failed', 'Unknown'. Since all the statuses are captured in the same column, I am having a hard time writing a query for this. Please note that I only want to use status column, because other columns are unreliable
Sample Table Data:
status
------
Ran
Ran
Ran
Ran
Ran
Successful
Failed
Ran
Ran
Failed
Desired Output:
status category
-------+--------
Ran 'Failed'
Ran 'Failed'
Ran. 'Successful'
Ran. 'Unknown'
Ran. 'Unknown'
Ran 'Unknown'
Ran. 'Unknown'
As you see above - I filter the results only by rows that had 'Ran' status, but I add another column with number of 'Failed' statuses (2) in original data leading to 'Categiory' of 2 rows being 'Failed'. Any left over number of 'Ran' after filling 'Failed' and 'successful' are assigned 'Unknown' category. It does not matter to me which row I pick to assign a category as long as number of rows are good. I know for a fact that number of 'Ran' rows will always be greater than sum of Failed' and Successful rows.
No idea about the performance but this may work:
SELECT status,
case when r <= (select count(*) from runs where status = 'Failed') then 'Failed'
when (r > (select count(*) from runs where status = 'Failed') and r <= (select count(*) from runs where status in ('Failed','Successful'))) then 'Successful'
else 'Unknown'
end as category
FROM (
SELECT
ROW_NUMBER() OVER () AS R,
runs.*
FROM runs
WHERE status = 'Ran'
) as result
Tested with Apache derby but I hope you get the idea. ROW_NUMBER also exists in RedShift.
My output:
STATUS CATEGORY
Ran Failed
Ran Failed
Ran Successful
Ran Unknown
Ran Unknown
Ran Unknown
Ran Unknown
You can use window functions. Let met assume you have two more columns workflowid and timestamp. You want this information per workflow based on the timestamp.
Then:
select t.*,
(case when min(case when status in ('Successful', 'Failed' then timestamp end) over (partition by workflowid order by timestamp) is null
then 'Failed'
when timestamp >= min(case when status = 'Successful'then timestamp end) and
timestamp < min(case when status = 'Failed' then timestamp end)
then 'Successful'
else 'Unknown'
end) as new_status
from t

SQL Count Distinct returning one extra count

How is this possible that these two methods are returning different results?
Method 1 (returns correct count):
SELECT COUNT(DISTINCT contact_id)
FROM Traffic_Action
WHERE action_type IN ('Schedule a Tour', 'Schedule Follow-up', 'Lost')
Method 2 (returns one extra count):
SELECT COUNT(DISTINCT CASE WHEN action_type IN ('Schedule a Tour', 'Schedule Follow-up', 'Lost') THEN contact_id ELSE 0 END)
FROM Traffic_Action
Remove the else part - as 0 is also counted
SELECT COUNT(DISTINCT CASE WHEN
action_type in ('Schedule a Tour','Schedule Follow-up','Lost') THEN contact_id END)
FROM Traffic_Action
No wonder you are getting two different results.
First query:
Provides you the distinct count of records where action_type in Schedule a Tour, Schedule Follow-up and Lost
SELECT COUNT(DISTINCT contact_id) FROM Traffic_Action WHERE action_type in
('Schedule a Tour','Schedule Follow-up','Lost')
Second query:
In this query any value apart from Schedule a Tour, Schedule Follow-up and Lost is considered as 0, and on taking distinct value, results one row according to your case statement
SELECT COUNT(DISTINCT CASE WHEN action_type in ('Schedule a Tour','Schedule Follow-
up','Lost') THEN contact_id ELSE 0 END) FROM Traffic_Action
In simple words,
In first query you are filtering only three values
In second query you have no filters, but case statement on three values and else condition to return 0 for non matching criteria
That means you have 1 record where contact_id is NULL. Normally, COUNT() ignores NULL values. Your second query converts NULL to zero via the "ELSE" branch. That should be why you see a difference.
You can quickly see for yourself in this example. This will return 2 although there are 3 records
select count(distinct a.col1)
from (
select 1 as Col1
union select 2
union select NULL
) a

SQL Multiple Rows to Single Row Multiple Columns

I am including a SQLFiddle to show as an example of where I am currently at. In the example image you can see that simply grouping you get up to two lines per user depending on their status and how many of those statuses they have.
http://sqlfiddle.com/#!3/9aa649/2
The way I want it to come out is to look like the image below. Having a single line per user with two totaling columns one for Fail Total and one for Pass Total. I have been able to come close but since BOB only has Fails and not Passes this query leaves BOB out of the results. which I want to show BOB as well with his 6 Fail and 0 Pass
select a.PersonID,a.Name,a.Totals as FailTotal,b.Totals as PassTotals from (
select PersonID,Name,Status, COUNT(*) as Totals from UserReport
where Status = 'Fail'
group by PersonID,Name,Status) a
join
(
select PersonID,Name,Status, COUNT(*) as Totals from UserReport
where Status = 'Pass'
group by PersonID,Name,Status) b
on a.PersonID=b.PersonID
The below picture is what I want it to look like. Here is another SQL Fiddle that shows the above query in action
http://sqlfiddle.com/#!3/9aa649/13
Use conditional aggregation if the number of values for status column is fixed.
Fiddle
select PersonID,Name,
sum(case when "status" = 'Fail' then 1 else 0 end) as failedtotal,
sum(case when "status" = 'Pass' then 1 else 0 end) as passedtotals
from UserReport
group by PersonID,Name
Use conditional aggregation:
select PersonID, Name,
sum(case when Status = 'Fail' then 1 else 0 end) as FailedTotal,
sum(case when Status = 'Pass' then 1 else 0 end) as PassedTotal
from UserReport
group by PersonID, Name;
With conditional aggregation:
select PersonID,
Name,
sum(case when Status = 'Fail' then 1 end) as Failed,
sum(case when Status = 'Passed' then 1 end) as Passed
from UserReport
group by PersonID, Name

Rails aggregate query counting rows that satisfy certain conditions

Let's say that have a table called bets. I want to run an aggregate sql query that counts rows satisfying certain conditions. For example, I want to return a count of all bets won, all bets lost, etc. I also want these counts to be grouped by several different columns. I tried a few different queries, but not getting the results I'd expect. For example:
Bet.select("user_id, event_id, bet_line_id, pick, COUNT(state = 'won') AS bets_won,
COUNT(state = 'lost') AS bets_lost, COUNT(state = 'pushed') AS bets_pushed").
group('user_id, event_id, bet_line_id, pick')
Just gives me the result of "1" for bets_won or bets_lost or bets_pushed for any of the records returned. Using Rails 3.2 + postgres.
you have to pass case expression so it will return bigint value .
Bet.select("user_id, event_id, bet_line_id, pick,
COUNT(CASE WHEN state = 'won' then 1 ELSE null END) AS bets_won,
COUNT(CASE WHEN state = 'lost' then 1 ELSE null END) AS bets_lost,
COUNT(CASE WHEN state = 'pushed' then 1 ELSE null END) AS bets_pushed").
group('user_id, event_id, bet_line_id, pick')
count(expression) is defined to count the "number of input rows for which the value of expression is not null". The state = 'won' expression only evaluates to NULL only when state is null, otherwise it will be one of the boolean values TRUE or FALSE. The result is that count(state = 'won') is actually counting the number of rows where state is not null and that's not what you're trying to do.
You can use Paritosh Piplewar's solution. Another common approach is to use sum and case:
sum(case state when 'won' then 1 else 0 end) as bets_won,
sum(case state when 'lost' then 1 else 0 end) as bets_lost,
sum(case state when 'pushed' then 1 else 0 end) as bets_pushed