I have the below sql and would like to add a filter clause within the window function. Is this possible?
select ROUND(SUM(M.CHRG_RATE/M.CONTRACTUAL_RATE) OVER
(PARTITION BY M.PROGRAM),0) AS BILLED_MEMBERS_PER_MONTH2
from tableA
where 1=1
I was thinking I could wrap a case statement?
Something like:
You can use case to leave rows out of the sum. For example, this only sums rows where the flag column equals 'Q':
SUM(CASE WHEN M.FLAG = 'Q' THEN M.CHRG_RATE/M.CONTRACTUAL_RATE END)
Related
I am trying to perform two sum functions from a query. However, I want to only perform one of the sum functions if it meets a certain condition without affecting the other sum function.
What I was thinking is to use something similar to select x where condition = 1 from AC which is however not possible.
Here is the sample query where I want the second [sum(t.match)] selection to only calculate if the result in the subquery: match = 1 while still getting the total sum of all qqty.
select
sum(t.qqty), sum(t.qqty)
from
(select
car, cqty, qqty,
case when cqty = qqty then 1 else 0 end as match,
location, state) t
Use conditional aggregation -- that is case as the argument to the sum():
select sum(t.qqty), sum(case when condition = 1 then t.qqty else 0 end)
from t;
Simplified example:
In hive, I have a table t with two columns:
Name, Value
Bob, 2
Betty, 4
Robb, 3
I want to do a case when that uses the total of the Value column:
Select
Name
, CASE
When value>0.5*sum(value) over () THEN ‘0’
When value>0.9*sum(value) over () THEN ‘1’
ELSE ‘2’
END as var
From table
I don’t like the fact that sum(value) over () is computed twice. Is there a way to compute this only once. Added twist, I want to do this in one query, so without declaring user variables.
I was thinking of scalar queries:
With total as
(Select sum(value) from table)
Select
Name
, CASE
When value>0.5*(select * from total) THEN ‘0’
When value>0.9*(select * from total)THEN ‘1’
ELSE ‘2’
END as var
From table;
But this doesn’t work.
TLDR: Is there a way to simplify the first query without user variables ?
Don't worry about that. Let the optimizer worry about it. But, you can use a subquery or CTE if you don't want to repeat the expression:
select Name,
(case when value > 0.5 * total then '0'
when value > 0.9 * total then '1'
else '2'
end) as var
From (select t.*, sum(value) over () as total
from table t
) t;
Cross join a subquery that fetches the sum to the table:
Select
t.Name
, CASE
When t.value>0.9*tt.value THEN '1'
When t.value>0.5*tt.value THEN '0'
ELSE '2'
END as var
From table t cross join (select sum(value) value from table) tt
and change the order of the WHEN clauses in the CASE expression because as they are, the 2nd case will never succeed.
Since I/O is the major factor the slows down Hive queries, we should strive to reduce the num of stages to get better performance.
So it's better not to use a sub-query or CTE here.
Try this SQL with a global window clause:
select
name,
case
when value > 0.5*sum(value) over w then '0'
when value > 0.9*sum(value) over w then '1'
else '2'
end as var
from my_table
window w as (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case window clause is the recommended way to reduce repetition of code.
Both the windowing and the sum aggregation will be computed only once. You can run explain select..., confirming that only ONE meaningful MR stage will be launched.
Edit:
1. A simple select clause on a subquery is not sth to worry about. It can be pushed down to the last phase of the subquery, so as to avoid additional MR stage.
2. Two identical aggregations residing in the same query block will only be evaluated once. So don’t worry about potential repeated calculation.
I'm trying to write a query that will aggregate data in a table according to a user-supplied table that drives the aggregations. I got it to work fine when I just used a sum statement, but when I put the sum inside of a case statement to allow the user to specify sum, count, mean, etc., I get group by errors.
I replaced:
sum(column)
with:
CASE b.calculationtype
WHEN 'SUM' THEN SUM(column)
WHEN 'MEAN' THEN AVG(column)
WHEN 'COUNT' THEN COUNT(column)
WHEN 'VARIANCE' THEN VARIANCE(column)
WHEN 'STANDARD DEVIATION' THEN STDDEV(column)
END
Does Oracle see beyond the case statement when evaluating the group by function or am I out of luck trying to make the actual aggregation function change based on the value in table b?
I could always brute force it the long way and move the calculationtype logic outside of the actual query, but that seems a little painful in that I'd have 5 identical queries with different aggregate functions that are called depending on the calculation type field.
select b.REPORT,
case b.AGG_VARIABLE_A_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_A
end,
case b.AGG_VARIABLE_B_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_B
end,
--<<< problem starts >>>
case b.CALCULATIONTYPE
when 'SUM' then sum(a.column1) when 'MEAN' then avg(a.column1) when 'COUNT' then count(a.column1) when 'VARIANCE' then variance(a.column1) when 'STANDARD DEVIATION' then stddev(a.column1)
end,
case b.CALCULATIONTYPE
when 'SUM' then sum(a.column2) when 'MEAN' then avg(a.column2) when 'COUNT' then count(a.column2) when 'VARIANCE' then variance(a.column2) when 'STANDARD DEVIATION' then stddev(a.column2)
end
--<<< problem ends >>
from DATA_TABLE a
cross join CONTROL_TABLE b
where a.ID = bind_variable_id
and a.SOURCEARRAY = b.SOURCEARRAY
and b.CALCULATIONTYPE <> 'INTERNAL'
group by b.REPORT,
case b.AGG_VARIABLE_A_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_A
end,
case b.AGG_VARIABLE_B_FLAG
when 'N' then null
when 'Y' then a.AGG_VARIABLE_B
end
add b.calculationtype in GROUP BY clause.
I am having some trouble with aggregate functions inside a case statement. I want to write a query that will set field A to N if it is equal to the minimum date of field A, or M otherwise.
Sample Code:
SELECT *, CASE
WHEN Field_A = MIN(FIELD_A) THEN 'JN'
ELSE 'JP'
END AS JUDI
FROM TABLE_1
GROUP BY *
I am not sure why the command runs but does not execute correctly. It labels all rows as JN in the JUDI Field. How can I fix this?
I am running SQL Server 7. What I want to achieve is that in a list of rows with different dates it labels those with the earliest date with JN and subsequent dates with JP.
Use window functions:
SELECT t.*,
(CASE WHEN t.Field_A = MIN(t.FIELD_A) OVER () THEN 'JN'
ELSE 'JP'
END) AS JUDI
FROM TABLE_1 t;
You don't need aggregation over the entire table for this.
I'm building a query with a GROUP BY clause that needs the ability to count records based only on a certain condition (e.g. count only records where a certain column value is equal to 1).
SELECT UID,
COUNT(UID) AS TotalRecords,
SUM(ContractDollars) AS ContractDollars,
(COUNTIF(MyColumn, 1) / COUNT(UID) * 100) -- Get the average of all records that are 1
FROM dbo.AD_CurrentView
GROUP BY UID
HAVING SUM(ContractDollars) >= 500000
The COUNTIF() line obviously fails since there is no native SQL function called COUNTIF, but the idea here is to determine the percentage of all rows that have the value '1' for MyColumn.
Any thoughts on how to properly implement this in a MS SQL 2005 environment?
You could use a SUM (not COUNT!) combined with a CASE statement, like this:
SELECT SUM(CASE WHEN myColumn=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
Note: in my own test NULLs were not an issue, though this can be environment dependent. You could handle nulls such as:
SELECT SUM(CASE WHEN ISNULL(myColumn,0)=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
I usually do what Josh recommended, but brainstormed and tested a slightly hokey alternative that I felt like sharing.
You can take advantage of the fact that COUNT(ColumnName) doesn't count NULLs, and use something like this:
SELECT COUNT(NULLIF(0, myColumn))
FROM AD_CurrentView
NULLIF - returns NULL if the two passed in values are the same.
Advantage: Expresses your intent to COUNT rows instead of having the SUM() notation.
Disadvantage: Not as clear how it is working ("magic" is usually bad).
I would use this syntax. It achives the same as Josh and Chris's suggestions, but with the advantage it is ANSI complient and not tied to a particular database vendor.
select count(case when myColumn = 1 then 1 else null end)
from AD_CurrentView
How about
SELECT id, COUNT(IF status=42 THEN 1 ENDIF) AS cnt
FROM table
GROUP BY table
Shorter than CASE :)
Works because COUNT() doesn't count null values, and IF/CASE return null when condition is not met and there is no ELSE.
I think it's better than using SUM().
Adding on to Josh's answer,
SELECT COUNT(CASE WHEN myColumn=1 THEN AD_CurrentView.PrimaryKeyColumn ELSE NULL END)
FROM AD_CurrentView
Worked well for me (in SQL Server 2012) without changing the 'count' to a 'sum' and the same logic is portable to other 'conditional aggregates'. E.g., summing based on a condition:
SELECT SUM(CASE WHEN myColumn=1 THEN AD_CurrentView.NumberColumn ELSE 0 END)
FROM AD_CurrentView
It's 2022 and latest SQL Server still doesn't have COUNTIF (along with regex!). Here's what I use:
-- Count if MyColumn = 42
SELECT SUM(IIF(MyColumn = 42, 1, 0))
FROM MyTable
IIF is a shortcut for CASE WHEN MyColumn = 42 THEN 1 ELSE 0 END.
Not product-specific, but the SQL standard provides
SELECT COUNT() FILTER WHERE <condition-1>,
COUNT() FILTER WHERE <condition-2>, ...
FROM ...
for this purpose. Or something that closely resembles it, I don't know off the top of my hat.
And of course vendors will prefer to stick with their proprietary solutions.
Why not like this?
SELECT count(1)
FROM AD_CurrentView
WHERE myColumn=1
I had to use COUNTIF() in my case as part of my SELECT columns AND to mimic a % of the number of times each item appeared in my results.
So I used this...
SELECT COL1, COL2, ... ETC
(1 / SELECT a.vcount
FROM (SELECT vm2.visit_id, count(*) AS vcount
FROM dbo.visitmanifests AS vm2
WHERE vm2.inactive = 0 AND vm2.visit_id = vm.Visit_ID
GROUP BY vm2.visit_id) AS a)) AS [No of Visits],
COL xyz
FROM etc etc
Of course you will need to format the result according to your display requirements.
SELECT COALESCE(IF(myColumn = 1,COUNT(DISTINCT NumberColumn),NULL),0) column1,
COALESCE(CASE WHEN myColumn = 1 THEN COUNT(DISTINCT NumberColumn) ELSE NULL END,0) AS column2
FROM AD_CurrentView