Oracle SQL Conditional aggregate function in group by query - sql

I'm trying to write a query that will aggregate data in a table according to a user-supplied table that drives the aggregations. I got it to work fine when I just used a sum statement, but when I put the sum inside of a case statement to allow the user to specify sum, count, mean, etc., I get group by errors.
I replaced:
sum(column)
with:
CASE b.calculationtype
WHEN 'SUM' THEN SUM(column)
WHEN 'MEAN' THEN AVG(column)
WHEN 'COUNT' THEN COUNT(column)
WHEN 'VARIANCE' THEN VARIANCE(column)
WHEN 'STANDARD DEVIATION' THEN STDDEV(column)
END
Does Oracle see beyond the case statement when evaluating the group by function or am I out of luck trying to make the actual aggregation function change based on the value in table b?
I could always brute force it the long way and move the calculationtype logic outside of the actual query, but that seems a little painful in that I'd have 5 identical queries with different aggregate functions that are called depending on the calculation type field.
select b.REPORT,
case b.AGG_VARIABLE_A_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_A
end,
case b.AGG_VARIABLE_B_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_B
end,
--<<< problem starts >>>
case b.CALCULATIONTYPE
when 'SUM' then sum(a.column1) when 'MEAN' then avg(a.column1) when 'COUNT' then count(a.column1) when 'VARIANCE' then variance(a.column1) when 'STANDARD DEVIATION' then stddev(a.column1)
end,
case b.CALCULATIONTYPE
when 'SUM' then sum(a.column2) when 'MEAN' then avg(a.column2) when 'COUNT' then count(a.column2) when 'VARIANCE' then variance(a.column2) when 'STANDARD DEVIATION' then stddev(a.column2)
end
--<<< problem ends >>
from DATA_TABLE a
cross join CONTROL_TABLE b
where a.ID = bind_variable_id
and a.SOURCEARRAY = b.SOURCEARRAY
and b.CALCULATIONTYPE <> 'INTERNAL'
group by b.REPORT,
case b.AGG_VARIABLE_A_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_A
end,
case b.AGG_VARIABLE_B_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_B
end

add b.calculationtype in GROUP BY clause.

Related

SQL Query - To split the value into 2 column and group by same customer

I'm new to SQL, I will like to split the value into 2 columns and group it by the same customer. Below is my current table:
I have tried the query:
Select *
,Case when [Devices] = 'RF' THEN (Select [Lines] From table_name Else '0' )End As [RF]
,Case when [Devices] = 'Desktop' THEN (Select [Lines] From table_name Else '0') End As [Desktop]
From table_name
But it gives me the error : This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Please advise if anything wrong with the query.
Thank you!!
Customer
Lines
Devices
A
3
RF
A
4
Desktop
What I expected to see:
Customer
RF
Desktop
A
3
4
First problem: Don't use *, but provide the explicit column selection.
Second issue: You don't need to write sub queries in your CASE WHEN construct. The correct syntax of the query you've tried will be:
SELECT customer,
CASE WHEN devices = 'RF' THEN lines ELSE 0 END AS RF,
CASE WHEN devices = 'Desktop' THEN lines ELSE 0 END AS Desktop
FROM table_name;
Third point: This will produce two rows: One row RF 3, Desktop 0 and one row RF 0, Desktop 4. But the expected outcome according to your description is one row only. To achieve this, you need to SUM your values and GROUP BY customer:
SELECT customer,
SUM(CASE WHEN devices = 'RF' THEN lines ELSE 0 END) AS RF,
SUM(CASE WHEN devices = 'Desktop' THEN lines ELSE 0 END) AS Desktop
FROM table_name
GROUP BY customer;
Especially when doing the second option, I recommend to check if the ELSE clause of your CASE WHEN is really required since this has no effect unless all values are NULL.
In these simple use cases, such queries will work correctly. If your table is more complex and you need to cover more different cases, you should have a look on functions like PIVOT instead of writing lots of CASE WHEN constructs.
A last note: All these queries assume that your column "lines" has a numeric data type. If this isn't the case, you need to convert it. The exact syntax how to do this depends on your DB.

SQL problem with data conversions and divide by zero (pulling my hair out)

I have been googling my butt off but I can't solve this. So I reach for those more experienced than myself. Thank you and please!
I have a table with a few important columns:
AvailableQty which is varchar(200)
hardAllocQty which is varchar(200)
Value which is numberic(19,4)
I am trying to add the quantities in the two qty columns and then divide by the value. But I'm getting "Error converting data type nvarchar to numeric." and sometimes other errors depending on how I try to troubleshoot it such as "Divide by zero error encountered". Here is the code. Any help on how to make this work would be great. I think the issue is somewhere in the first half of the isnull()
select distinct part.Part_Num, part.[Part Description], part.[Part Type],
isnull(
(select (sum(cast(stock.AvailableQty as bigint))+sum(cast(stock.HardAllocQty as bigint)))/
sum(cast(stock.Value as bigint)) from dbo.StockAvailability Stock
WHERE stock.PartNo = part.Part_Num),
(SELECT
CASE
WHEN Avg(cast(NULLIF(po.cost,0) as BIGINT)) IS NULL THEN format(Avg(po.cost),'c')
ELSE format(Avg(cast(NULLIF(po.cost,0) as BIGINT)),'c')
END
from dbo.tblPO_DETAIL_DATA PO WHERE PO.[Part Number] = part.Part_Num AND
cast(po.[PO Date] as date) > DATEADD(MONTH,#LookBack60,GETDATE()))
)
as 'Avg Cost'
from dbo.UNIQUE_PARTS_LIST part
tldr;
On first inspection, this CASE statement looks offending:
CASE
WHEN Avg(cast(NULLIF(po.cost,0) as BIGINT)) IS NULL
THEN format(Avg(po.cost),'c')
ELSE format(Avg(cast(NULLIF(po.cost,0) as BIGINT)),'c')
END
I can't justify it and suspect it is a cut'n'paste issue or it's just half formed. When there are zeros, you want to average across all rows, and when there are no zeros you still want to average across all rows without conversion... otherwise you do want to use conversion.
Do you want the 0s for po.cost to be included in the average cost or do you want to exclude them?
I suspect it is the format(Avg(po.cost),'c') that is causing the problems, mainly because every other reference to po.cost uses a cast and this does not.
Quick fix if you want to exclude the 0s from the Avg Cost (and po.cost _already IS numeric)
(SELECT FORMAT(AVG(NULLIF(po.cost,0)),'c')
FROM dbo.tblPO_DETAIL_DATA PO
WHERE PO.[Part Number] = part.Part_Num
AND cast(po.[PO Date] as date) > DATEADD(MONTH,#LookBack60,GETDATE()))
)
as 'Avg Cost'
If po.Cost is NOT numeric, then use this instead:
(SELECT FORMAT(AVG(
CASE
WHEN ISNUMERIC(po.cost)
THEN CAST(po.cost AS BIGINT)
ELSE 0
END
),'c')
FROM dbo.tblPO_DETAIL_DATA PO
WHERE PO.[Part Number] = part.Part_Num
AND cast(po.[PO Date] as date) > DATEADD(MONTH,#LookBack60,GETDATE()))
)
as 'Avg Cost'
But lets explore this in detail...
Before adding aggregates into the mix, take a step back and make sure your source data is correct, run the following queries to validate your conversions:
select cast(stock.AvailableQty as bigint)
, cast(stock.HardAllocQty as bigint)
, cast(stock.Value as bigint)
from dbo.StockAvailability Stock
select cast(po.cost as bigint)
, cast(po.[PO Date] as date)
from dbo.tblPO_DETAIL_DATA po
If the data from those two queries is OK, then shouldn't be an issue, if it is an issue, you can try to isolate the rows that have invalid values, the first option I would try is to use the IsNumeric() check, you can do this in two ways, 1 - just exclude non-numerics, or 2, query for the non-numerics, so you can inspect the results and then come up with a better solution. So run the following to identify the issues:
select stock.AvailableQty, stock.HardAllocQty, stock.Value
from dbo.StockAvailability Stock
WHERE ISNUMERIC(stock.AvailableQty) = 0 OR ISNUMERIC(stock.HardAllocQty) = 0 OR ISNUMERIC(stock.Value) = 0
-- Finding the number of invalid values might help drive the solution?
select po.cost, COUNT(*)
from dbo.tblPO_DETAIL_DATA po
WHERE ISNUMERIC(po.cost)
GROUP BY po.cost
If the invalid value should be counted as a zero, then we can use CASE logic to resolve the value, or if there is a specific value that should be treated as NULL you can use NULLIF(), or if the specific value should be 0 you can use ISNULL(NULLIF())
Why zero vs null matters - zeros wont affect SUM() aggregates, but they will be counted in the record count when computing AVG()
If your recordset commonly has 'NaN' as a stored value instead of null, in columnA and you want the whole record to be excluded from recordset then an easy option is to add that in the WHERE clause:
WHERE columnA <> 'N/A'
However if you want the rest of the record, but just need this value interpreted as a NULL so that it will be excluded from the aggregate calculations you can use NULLIF():
My personal preference is to alias the column with the same name if an alias is required, the user or next operation doesn't need access to the original value after we have sanitized it like this
SELECT NULLIF(columnA, 'N/A') AS columnA
Again, if you know a specific value that you want to treat as a zero in aggregates, we can use ISNULL(NULLIF):
SELECT ISNULL(NULLIF(columnA, 'N/A'),0) AS columnA
If you have many different values, you can use nested NULLIF() and then COALESCE(), but it quickly gets unmanageable:
SELECT COALESCE(NULLIF(columnA, 'N/A'), NULLIF(columnA, '-'), 0) AS columnA
You can get more control over this type of logic or short-circuit it with a call to ISNUMERIC() in a CASE statement:
-- output as null if non-numeric
SELECT CASE WHEN ISNUMERIC(columnA) = 1 THEN columnA END AS columnA
...
-- ouput as 0 is non-numeric
SELECT CASE WHEN ISNUMERIC(columnA) = 1 THEN columnA ELSE 0 END AS columnA
Once you have identified the offending columns and decided if they should be excluded or treated as zeros, then you could re-write your inline sub-queries into a more set-based operation, this process and the reasoning is explained in this article How to Use GROUP BY with Distinct Aggregates and Derived tables and applies if there are many parts in the source list.
Having a DISTINCT in a result-set that is returning aggregates is a red flag that points to inefficiencies or flaws in the aggregation logic, a distinct is often put in as a quick fix on a query, when aggregates are concerned better scoping or a GROUP BY is often more efficient.
In MS SQL Server, CROSS APPLY is a useful mechanism for evaluating conversions and other functions, it gives us a clean way to separate the steps involved in formatting your data or evaluating repetitive logic once and reusing the result, instead of re-evaluating the same expression many times in a query.
That is the main point to structuring your query using CROSS APPLY by defining the expression once, when you need to alter the logic you wont accidentally leave some expressions in their original (and not potentially invalid) state.
CTEs are used here to separate the concerns, you could write them as inline sub query based joins, but IMO it gets harder to read and interpret the query
;
WITH StockByPart as (
(
SELECT Stock.PartNo
, SUM(numerics.AvailableQty)+SUM(numerics.HardAllocQty))/SUM(numerics.Value) AS AvgPrice
FROM dbo.StockAvailability Stock
CROSS APPLY (SELECT
CASE WHEN ISNUMERIC(stock.AvailableQty) = 1 THEN CAST(stock.AvailableQty as bigint) END AS AvailableQty
, CASE WHEN ISNUMERIC(stock.HardAllocQty) = 1 THEN CAST(stock.HardAllocQty as bigint) END AS HardAllocQty
, CASE WHEN ISNUMERIC(stock.Value) = 1 THEN CAST(stock.Value as bigint) END AS [Value]
) Numerics
GROUP BY Stock.PartNo
)
, POByPart as (
SELECT PO.[Part Number]
, SUM(numerics.AvailableQty)+SUM(numerics.HardAllocQty))/SUM(numerics.Value) AS AvgCost
FROM dbo.tblPO_DETAIL_DATA PO
CROSS APPLY (SELECT
CASE WHEN ISNUMERIC(po.cost) = 1 THEN CAST(po.cost as bigint) ELSE 0 END AS cost
) Numerics
GROUP BY PO.[Part Number]
)
SELECT DISTINCT
part.Part_Num, part.[Part Description], part.[Part Type]
, Stock.AvgPrice
, PO.AvgCost as [Avg Cost]
FROM dbo.UNIQUE_PARTS_LIST part
LEFT OUTER JOIN StockByPart Stock ON stock.PartNo = part.Part_Num
LEFT OUTER JOIN POByPart PO ON PO.[Part Number] = part.Part_Num

Should I use a subquery in this situation?

I'm using Apache Hive and I have a query like this:
SELECT CASE type WHEN 'a' THEN 'A'
WHEN 'b' THEN 'B'
ELSE 'C'
END AS map_type
,COUNT(user_id) AS count
FROM user_types
GROUP BY CASE type WHEN 'a' THEN 'A'
WHEN 'b' THEN 'B'
ELSE 'C'
END
;
As you can see, I need to group the result by the map_type field, which is calculated in a complex way. In my case, will the CASE WHEN parts in SELECT and GROUP BY be calculated twice? And if I used a subquery like below, will it be more efficient or not?
SELECT map_type
,COUNT(user_id) AS count
FROM (
SELECT CASE type WHEN 'a' THEN 'A'
WHEN 'b' THEN 'B'
ELSE 'C'
END AS map_type
,user_id
FROM user_types
) a
GROUP BY map_type;
The second query (involving the sub-query) might be more performant. This is based on interpretation from Hive's explain plan, and running these queries a few times.
The explain plan for query 1 (without the sub-query) has this section:
Group By Operator [GBY_2]
aggregations:["count(user_id)"]
keys:CASE (type) WHEN ('a') THEN ('A') WHEN ('b') THEN ('B') ELSE ('C') END (type: string)
On the other hand, the same section for query 2 (with the sub-query) has this:
Group By Operator [GBY_3]
aggregations:["count(_col1)"]
keys:_col0 (type: string)
Based on the plan, it looks like query 2 is doing slightly less work.
Also ran a test on dummy data, and got these execution times.
Query 1: (1st time) 6.43 s, (2nd time) 5.92 s, (3rd time): 4.30s
Query 2: (1st time) 0.82 s, (2nd time) 1.29 s, (3rd time): 1.03s
Query 2 completed faster in all cases.
The expense of doing an aggregation involves reading lots and lots of data. Then either sorting it or hashing it to bring the keys together. Then the engine needs to process the data and calculate the count.
Whether a case expression is called once or twice is pretty meaningless in the context of all the data movement. Don't worry about it. If there is extra work, it is trivial compared to everything else that needs to be done for the query.
I also think that Hive supports column aliases in the GROUP BY, but I might be mistaken.
The case statement is not harmful in your case but if you are going to use a subquery that it might increase the time
you can continue with
SELECT CASE type WHEN 'a' THEN 'A'
WHEN 'b' THEN 'B'
ELSE 'C'
END AS map_type
,COUNT(user_id) AS count
FROM user_types
GROUP BY CASE type WHEN 'a' THEN 'A'
WHEN 'b' THEN 'B'
ELSE 'C'
END
;

Wrapping case statements within a window function

I have the below sql and would like to add a filter clause within the window function. Is this possible?
select ROUND(SUM(M.CHRG_RATE/M.CONTRACTUAL_RATE) OVER
(PARTITION BY M.PROGRAM),0) AS BILLED_MEMBERS_PER_MONTH2
from tableA
where 1=1
I was thinking I could wrap a case statement?
Something like:
You can use case to leave rows out of the sum. For example, this only sums rows where the flag column equals 'Q':
SUM(CASE WHEN M.FLAG = 'Q' THEN M.CHRG_RATE/M.CONTRACTUAL_RATE END)

Sql Server equivalent of a COUNTIF aggregate function

I'm building a query with a GROUP BY clause that needs the ability to count records based only on a certain condition (e.g. count only records where a certain column value is equal to 1).
SELECT UID,
COUNT(UID) AS TotalRecords,
SUM(ContractDollars) AS ContractDollars,
(COUNTIF(MyColumn, 1) / COUNT(UID) * 100) -- Get the average of all records that are 1
FROM dbo.AD_CurrentView
GROUP BY UID
HAVING SUM(ContractDollars) >= 500000
The COUNTIF() line obviously fails since there is no native SQL function called COUNTIF, but the idea here is to determine the percentage of all rows that have the value '1' for MyColumn.
Any thoughts on how to properly implement this in a MS SQL 2005 environment?
You could use a SUM (not COUNT!) combined with a CASE statement, like this:
SELECT SUM(CASE WHEN myColumn=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
Note: in my own test NULLs were not an issue, though this can be environment dependent. You could handle nulls such as:
SELECT SUM(CASE WHEN ISNULL(myColumn,0)=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
I usually do what Josh recommended, but brainstormed and tested a slightly hokey alternative that I felt like sharing.
You can take advantage of the fact that COUNT(ColumnName) doesn't count NULLs, and use something like this:
SELECT COUNT(NULLIF(0, myColumn))
FROM AD_CurrentView
NULLIF - returns NULL if the two passed in values are the same.
Advantage: Expresses your intent to COUNT rows instead of having the SUM() notation.
Disadvantage: Not as clear how it is working ("magic" is usually bad).
I would use this syntax. It achives the same as Josh and Chris's suggestions, but with the advantage it is ANSI complient and not tied to a particular database vendor.
select count(case when myColumn = 1 then 1 else null end)
from AD_CurrentView
How about
SELECT id, COUNT(IF status=42 THEN 1 ENDIF) AS cnt
FROM table
GROUP BY table
Shorter than CASE :)
Works because COUNT() doesn't count null values, and IF/CASE return null when condition is not met and there is no ELSE.
I think it's better than using SUM().
Adding on to Josh's answer,
SELECT COUNT(CASE WHEN myColumn=1 THEN AD_CurrentView.PrimaryKeyColumn ELSE NULL END)
FROM AD_CurrentView
Worked well for me (in SQL Server 2012) without changing the 'count' to a 'sum' and the same logic is portable to other 'conditional aggregates'. E.g., summing based on a condition:
SELECT SUM(CASE WHEN myColumn=1 THEN AD_CurrentView.NumberColumn ELSE 0 END)
FROM AD_CurrentView
It's 2022 and latest SQL Server still doesn't have COUNTIF (along with regex!). Here's what I use:
-- Count if MyColumn = 42
SELECT SUM(IIF(MyColumn = 42, 1, 0))
FROM MyTable
IIF is a shortcut for CASE WHEN MyColumn = 42 THEN 1 ELSE 0 END.
Not product-specific, but the SQL standard provides
SELECT COUNT() FILTER WHERE <condition-1>,
COUNT() FILTER WHERE <condition-2>, ...
FROM ...
for this purpose. Or something that closely resembles it, I don't know off the top of my hat.
And of course vendors will prefer to stick with their proprietary solutions.
Why not like this?
SELECT count(1)
FROM AD_CurrentView
WHERE myColumn=1
I had to use COUNTIF() in my case as part of my SELECT columns AND to mimic a % of the number of times each item appeared in my results.
So I used this...
SELECT COL1, COL2, ... ETC
(1 / SELECT a.vcount
FROM (SELECT vm2.visit_id, count(*) AS vcount
FROM dbo.visitmanifests AS vm2
WHERE vm2.inactive = 0 AND vm2.visit_id = vm.Visit_ID
GROUP BY vm2.visit_id) AS a)) AS [No of Visits],
COL xyz
FROM etc etc
Of course you will need to format the result according to your display requirements.
SELECT COALESCE(IF(myColumn = 1,COUNT(DISTINCT NumberColumn),NULL),0) column1,
COALESCE(CASE WHEN myColumn = 1 THEN COUNT(DISTINCT NumberColumn) ELSE NULL END,0) AS column2
FROM AD_CurrentView