Multiple Aggregate functions with a group by clause - sql

I have the following
WorkflowID FK_UA DateApprobation
----------- -------------------- -----------------------
1 3 NULL
2 1 NULL
3 1 NULL
4 2 2013-05-31 09:22:33.000
What I'm looking to do is to get a bunch of aggregate fields.
I want to get Approbated workflow , Non-Approbated workflow, All Workflows
The way I'm knowing that is if the "DateApprobation" field is null or has a value.
The thing is, I want to be able to group that by "FK_UA" so I don't know how to have 3 aggregate functions (COUNT) with a group by clause.
I'm looking for a query that can achieve that, I've tried a couple of similar case i found and it returned some weird values.
I tried this :
SELECT
FK_UA
,COUNT(WorkflowID) AS TOTAL
,COUNT(CASE when DateApprobation is not null then 1 else 0 end) AS APPROVED
,COUNT(CASE when DateApprobation is null then 1 else 0 end) AS NOT_APPROVED
FROM Workflow
GROUP BY
FK_UA
but it always return the same things for all 3 values!

SELECT
SUM(CASE WHEN [DateApprobation] IS NOT NULL THEN 1 ELSE 0 END) as [Approbated count],
SUM(CASE WHEN [DateApprobation] IS NULL THEN 1 ELSE 0 END) as [Non-Approbated count],
COUNT(*) as [Total]
FROM YourTable
GROUP BY FK_UA
If I got you right....

A standard SQL solution using COUNT()
You could have also used COUNT() but make sure you turn the values you don't want to count into NULL, not 0, as aggregate functions do not aggregate NULL values in SQL
SELECT
fk_ua,
COUNT(WorkflowID) AS total,
COUNT(CASE WHEN DateApprobation IS NOT NULL THEN 1 END) AS approved,
COUNT(CASE WHEN DateApprobation IS NULL THEN 1 END) AS not_approved
FROM Workflow
GROUP BY fk_ua
In fact, you could take this one step further in your case, because you're already counting the NOT NULL values:
SELECT
fk_ua,
COUNT(WorkflowID) AS total,
COUNT(DateApprobation) AS approved,
COUNT(WorkflowID) - COUNT(DateApprobation) AS not_approved
FROM Workflow
GROUP BY fk_ua
Or alternatively:
SELECT fk_ua, total, approved, total - approved AS not_approved
FROM (
SELECT
fk_ua,
COUNT(WorkflowID) AS total,
COUNT(DateApprobation) AS approved
FROM Workflow
GROUP BY fk_ua
) t
For large data sets, this might be slightly faster as your database should be able to recognise that there are only 2 distinct COUNT(...) expressions. Most commercial databases do.
A standard SQL solution using FILTER
Some SQL dialects, including e.g. PostgreSQL, implement the standard FILTER clause, which you can use to make things a bit more readable. Your query would then read:
SELECT
fk_ua,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE DateApprobation IS NOT NULL) AS approved,
COUNT(*) FILTER (WHERE DateApprobation IS NULL) AS not_approved
FROM Workflow
GROUP BY fk_ua

Related

Proportion request sql

There is a table of accidents and output the share of accidents number 2 to all accidents I wrote this code, but I can not make it work:
select ((select count("ID") from "DTP" where "REASON"=2)/count("REASON"))
from "DTP"
group by "ID"
Something like this (not tested):
select id, count(case reason when 2 then 1 end)/count(*) as proportion
from your_table
-- where ... (if you need to filter, for example by date)
group by id
;
count(*) counts all the rows in a group (that is, all the rows for each separate id). The case expression returns 1 when the reason is 2 and it returns null otherwise; count counts only non-null values, so it will count the rows where the reason is 2.
You can use avg():
select id,
avg(case when reason = 2 then 1.0 else 0 end)
from "DTP"
group by "ID"
This produces the ratio for each id -- based on your sample query. If you only want one row for all the data, then:
select avg(case when reason = 2 then 1.0 else 0 end)
from "DTP";

Equivalent of string contains in google bigquery

I have a table like as shown below
I would like to create two new binary columns indicating whether the subject had steroids and aspirin. I am looking to implement this in Postgresql and google bigquery
I tried the below but it doesn't work
select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%')
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%')
then 1 else 0 end as aspirin,
from db.Team01.Table_1
SELECT
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')
I expect my output to be like as shown below
Below is for BigQuery Standard SQL
#standardSQL
SELECT
subject_id,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id
if to apply to sample data from your question - result is
Row subject_id steroids aspirin
1 1 3 1
2 2 1 1
Note: instead of simple LIKE ending with lengthy and redundant text - I am using LIKE on steroids - which is REGEXP_CONTAINS
In Postgres, I would recommend using the filter clause:
select subject_id,
count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
In BigQuery, I would recommend countif():
select subject_id,
countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
You can use sum(case when . . . end) as a more general approach. However, each database has a more "local" way of expressing this logic. By the way, the FILTER clause is standard SQL, just not widely adopted.
Use conditional aggregation. This is a solution that works across most (if not all) RDBMS:
SELECT
subject_id,
MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id
NB: it is unclear why you are using LIKE, since it seems like you are having exact matches; I turned the LIKE condition to equalities.
you have missing group-by
select subject_id,
sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
then 1 else 0 end) as steroids,
sum(case when lower(drug) in ('peptide','paracetamol')
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
using like keyword
select subject_id,
sum(case when lower(drug) like '%cortisol%'
or lower(drug) like '%cortisone%'
or lower(drug) like '%dexamethasone%'
then 1 else 0 end) as steroids,
sum(case when lower(drug) like '%peptide%'
or lower(drug) like '%paracetamol%'
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
Another potentially more intutive solution would be to use the BigQuery Contains_Substr
to return boolean results.
I've not used BigQuery but have been reading the docs researching it. I came across this while looking into impact of choosing collation at design stage.
I'm either wrong or this is a new feature since answers above.
CONTAINS_SUBSTR
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr
Performs a normalized, case-insensitive search to see if a value
exists as a substring in an expression. Returns TRUE if the value
exists, otherwise returns FALSE.
Before values are compared, they are normalized and case folded with
NFKC normalization. Wildcard searches are not supported.

How to sum up all rows total count?

I'm using the following query, I need to show Grand Total count but it is throwing error like
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
SELECT
ISNULL(OQ.GroupID,'') GroupName,
CONVERT(VARCHAR, ISNULL(COUNT(CASE WHEN RequestStatusKey IN ( 1, 2 ) THEN OrderRecordID END), 0)) TotalRecord,
SUM(COUNT(CASE WHEN RequestStatusKey IN ( 1, 2 ) THEN OrderRecordID END)) AS GrandTotal
FROM dbo.tblDesk OQ
WHERE OQ.RequestStatusKey IN ( 1, 2 )
AND OQ.OrderTypeKey <> 1
AND OQ.GroupID IS NOT NULL
GROUP BY OQ.GroupID
ORDER BY OQ.GroupID
I just need to Grand total.
As the error message suggests, you can not use aggregate function inside another aggregate function.
For your query to achieve SUM of OrderRecordId when RequestStatusKey IN (1,2) you can use SUM without using COUNT like this:
SUM(CASE WHEN RequestStatusKey IN (1,2) THEN 1 ELSE 0 END) AS GrandTotal
However, as Tim suggested, since you have already used RequestStatusKey IN (1,2) in your WHERE clause you don't need to use conditional SUM. Just use COUNT without condition:
COUNT(OrderRecordId) AS GrandTotal
UPDATE:
Since you want to show sum of the all rows count in the same result, you can use ROLLUP for that:
SELECT
ISNULL(OQ.GroupID,'Grand Total') GroupName,
CONVERT(VARCHAR, COUNT(OrderRecordID)) TotalRecord
FROM tblDesk OQ
WHERE OQ.RequestStatusKey IN ( 1, 2 )
AND OQ.GroupID IS NOT NULL
GROUP BY ROLLUP (OQ.GroupID)
ORDER BY OQ.GroupID
See this SQLFiddle.

How to Count Distinct on Case When?

I have been building up a query today and I have got stuck. I have two unique Ids that identify if and order is Internal or Web. I have been able to split this out so it does the count of how many times they appear but unfortunately it is not providing me with the intended result. From research I have tried creating a Count Distinct Case When statement to provide me with the results.
Please see below where I have broken down what it is doing and how I expect it to be.
Original data looks like:
Company Name Order Date Order Items Orders Value REF
-------------------------------------------------------------------------------
CompanyA 03/01/2019 Item1 Order1 170 INT1
CompanyA 03/01/2019 Item2 Order1 0 INT1
CompanyA 03/01/2019 Item3 Order2 160 WEB2
CompanyA 03/01/2019 Item4 Order2 0 WEB2
How I expect it to be:
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 1 1
What currently comes out
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 2 2
As you can see from my current result it is counting every line even though it is the same reference. Now it is not a hard and fast rule that it is always doubled up. This is why I think I need a Count Distinct Case When. Below is my query I am currently using. This pull from a Progress V10 ODBC that I connect through Excel. Unfortunately I do not have SSMS and Microsoft Query is just useless.
My Current SQL:
SELECT
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
, Count(DISTINCT SopOrder_0.SooOrderNumber) AS 'Orders'
, SUM(CASE WHEN SopOrder_0.SooOrderNumber IS NOT NULL THEN 1 ELSE 0 END) AS 'Order Items'
, SUM(SopOrderItem_0.SoiValue) AS 'Order Value'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN 1 ELSE 0 END) AS 'INT'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'WEB%' THEN 1 ELSE 0 END) AS 'WEB'
FROM
SBS.PUB.Company Company_0
, SBS.PUB.SopOrder SopOrder_0
, SBS.PUB.SopOrderItem SopOrderItem_0
WHERE
SopOrder_0.SopOrderID = SopOrderItem_0.SopOrderID
AND Company_0.CompanyID = SopOrder_0.CompanyID
AND SopOrder_0.SooOrderDate > '2019-01-01'
GROUP BY
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
I have tried using the following line but it errors on me when importing:
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE 0 END) AS 'INT'
Just so know the error I get when importing at the moment is syntax error at or about "CASE WHEN sopOrder_0.SooParentOrderRefer" (10713)
Try removing the ELSE:
COUNT(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference END) AS num_int
You don't specify the error, but the problem is probably that the THEN is returning a string and the ELSE a number -- so there is an attempt to convert the string values to a number.
Also, learn to use proper, explicit, standard JOIN syntax. Simple rule: Never use commas in the FROM clause.
count distinct on the SooOrderNumber or the SooParentOrderReference, whichever makes more sense for you.
If you are COUNTing, you need to make NULL the thing that your are not counting. I prefer to include an else in the case because it is more consistent and complete.
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE null END) AS 'INT'
Gordon Linoff is correct regarding the source of your error, i.e. datatype mismatch between the case then value else value end. null removes (should remove) this ambiguity - I'd need to double check.
Editing my earlier answer...
Even though it looks, as you say, like count distinct is not supported in Pervasive PSQL, CTEs are supported. So you can do something like...
This is what you are trying to do but it is not supported...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
select id
,count(distinct col1) as col_count
from dups
group by id;
Stick another CTE in the query to de-duplicate the data first. Then count as normal. That should work...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
,de_dup as
(
select id
,col1
from dups
group by id
,col1
)
select id
,count(col1) as col_count
from de_dup
group by id;
These 2 versions should give the same result set.
There is always a way!!
I cannot explain the error you are getting. You are mistakenly using single quotes for alias names, but I don't actually think this is causing the error.
Anyway, I suggest you aggregate your order items per order first and only join then:
SELECT
c.coacompanyname
, so.sooorderdate
, COUNT(*) AS orders
, SUM(soi.itemcount) AS order_items
, SUM(soi.ordervalue) AS order_value
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'INT%' THEN 1 END) AS int
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'WEB%' THEN 1 END) AS web
FROM sbs.pub.company c
JOIN sbs.pub.soporder so ON so.companyid = c.companyid
JOIN
(
SELECT soporderid, COUNT(*) AS itemcount, SUM(soivalue) AS ordervalue
FROM sbs.pub.soporderitem
GROUP BY soporderid
) soi ON soi.soporderid = so.soporderid
GROUP BY c.coacompanyname, so.sooorderdate
ORDER BY c.coacompanyname, so.sooorderdate;

count boolean column, and average another column based on boolean column

CREATE TABLE test (
calculate_time int4 NULL,
status bool NULL
);
INSERT INTO test (calculate_time,status) VALUES
(10,true)
,(15,true)
,(20,true)
,(20,true)
,(5,false)
,(10,false)
,(15,false)
,(100,NULL)
,(200,NULL)
,(300,NULL)
;
With this query it average all calculated_time values. Is there a way I can tell it only average ones where status = true? I tried adding a where clause but would make failed and suspended result in 0.
select
avg(calculate_time) as cal_time,
count(case when status = true then 1 end) as completed,
count(case when status = false then 1 end) as failed,
count(case when status is null then 1 end) as suspended
from test;
You seem to understand the concept of conditional aggregation. You can just also use a CASE expression for the average as you did for the other terms in your select:
select
avg(case when status then calculate_time end) as cal_time,
count(case when status then 1 end) as completed,
count(case when not status then 1 end) as failed,
count(case when status is null then 1 end) as suspended
from test;
This works because the AVG function, like most of the other aggregate functions, ignore NULL values. So the records for which status is not true, their calculate_time values would be effectively ignored and would not influence the overall average.
Other side note: You may use boolean values in a Postgres query directly without comparing them to true. That is, the following two CASE expressions are equivalent, with the second one being less terse:
avg(case when status = true then calculate_time end) as cal_time,
avg(case when status then calculate_time end) as cal_time,
Adding to #Tim's answer, since Postgres 9.4 you can add a filter clause to aggregate function calls, which may save you some of the boiler-plate of writing your own case expressions:
select
avg(calculate_time) filter (where status) as cal_time,
count(*) filter (where status) as completed,
count(*) filter (where not status) as failed,
count(*) filter (where status is null) as suspended
from test;