I am having some trouble with aggregate functions inside a case statement. I want to write a query that will set field A to N if it is equal to the minimum date of field A, or M otherwise.
Sample Code:
SELECT *, CASE
WHEN Field_A = MIN(FIELD_A) THEN 'JN'
ELSE 'JP'
END AS JUDI
FROM TABLE_1
GROUP BY *
I am not sure why the command runs but does not execute correctly. It labels all rows as JN in the JUDI Field. How can I fix this?
I am running SQL Server 7. What I want to achieve is that in a list of rows with different dates it labels those with the earliest date with JN and subsequent dates with JP.
Use window functions:
SELECT t.*,
(CASE WHEN t.Field_A = MIN(t.FIELD_A) OVER () THEN 'JN'
ELSE 'JP'
END) AS JUDI
FROM TABLE_1 t;
You don't need aggregation over the entire table for this.
Related
I have the following table. Using sqlite DB
Item
Result
A
Pass
B
Pass
A
Fail
B
Fail
I want to realize the above table as below using some query.
Item
Total
Accept
Reject
A
2
1(50%)
1(50%)
B
2
1(50%)
1(50%)
How should I construct this query?
You can try PIVOT() if your DBMS supports. Then use CONCAT or || operator depending on the DMBS.
Query:
SELECT
item,
total,
SUM(Pass)||'('|| CAST((SUM(Pass)*1.0/total*1.0)*100.0 AS DECIMAL)||'%)' AS Accept,
SUM(Fail)||'('|| CAST((SUM(Fail)*1.0/total*1.0)*100.0 AS DECIMAL)||'%)' AS Reject
FROM
(
SELECT
Item,
result,
COUNT(result) OVER(PARTITION BY item ORDER BY result ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS total,
CASE
WHEN Result = 'Pass' then 1
ELSE 0
END AS Pass,
CASE
WHEN Result = 'Fail' then 1
ELSE 0
END AS Fail
FROM t
) AS j
GROUP BY item, total
Query explanation:
Since SQLITE does not handle PIVOT, we are creating the flags Pass and Fail manually using CASE statement
To calculate total, COUNT is used as analytical function here. It is basically a shortcut to calculate count and place it in all rows
Then in the outer query, we are calculating %s and using || as the concatenate operator to concatenate the result with total sum and % of it
See demo in db<>fiddle
Simplified example:
In hive, I have a table t with two columns:
Name, Value
Bob, 2
Betty, 4
Robb, 3
I want to do a case when that uses the total of the Value column:
Select
Name
, CASE
When value>0.5*sum(value) over () THEN ‘0’
When value>0.9*sum(value) over () THEN ‘1’
ELSE ‘2’
END as var
From table
I don’t like the fact that sum(value) over () is computed twice. Is there a way to compute this only once. Added twist, I want to do this in one query, so without declaring user variables.
I was thinking of scalar queries:
With total as
(Select sum(value) from table)
Select
Name
, CASE
When value>0.5*(select * from total) THEN ‘0’
When value>0.9*(select * from total)THEN ‘1’
ELSE ‘2’
END as var
From table;
But this doesn’t work.
TLDR: Is there a way to simplify the first query without user variables ?
Don't worry about that. Let the optimizer worry about it. But, you can use a subquery or CTE if you don't want to repeat the expression:
select Name,
(case when value > 0.5 * total then '0'
when value > 0.9 * total then '1'
else '2'
end) as var
From (select t.*, sum(value) over () as total
from table t
) t;
Cross join a subquery that fetches the sum to the table:
Select
t.Name
, CASE
When t.value>0.9*tt.value THEN '1'
When t.value>0.5*tt.value THEN '0'
ELSE '2'
END as var
From table t cross join (select sum(value) value from table) tt
and change the order of the WHEN clauses in the CASE expression because as they are, the 2nd case will never succeed.
Since I/O is the major factor the slows down Hive queries, we should strive to reduce the num of stages to get better performance.
So it's better not to use a sub-query or CTE here.
Try this SQL with a global window clause:
select
name,
case
when value > 0.5*sum(value) over w then '0'
when value > 0.9*sum(value) over w then '1'
else '2'
end as var
from my_table
window w as (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case window clause is the recommended way to reduce repetition of code.
Both the windowing and the sum aggregation will be computed only once. You can run explain select..., confirming that only ONE meaningful MR stage will be launched.
Edit:
1. A simple select clause on a subquery is not sth to worry about. It can be pushed down to the last phase of the subquery, so as to avoid additional MR stage.
2. Two identical aggregations residing in the same query block will only be evaluated once. So don’t worry about potential repeated calculation.
I am trying to get the STDEV of MCW_NM column but I want it to be STDEV of all rows not per group by BLADEID. But in Variance_Blade_MCW I need it to be grouped by BLADEID. I have tried over() but I get this error:
Column 'ENG.DBO.MCW_BCL_WEDGE.MCW_NM' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Can anyone help me? Below is my query.
PS: I am having difficulty explaining the problem so please bear with me. Let me know if you have clarifications! thanks a lot!
SELECT
BladeID,
Total_Sigma_MCW = STDEV(MCW_NM) OVER (),
CountD_Blade = COUNT(BLADEID) OVER (),
Variance_Blade_MCW = SQUARE(STDEV(MCW_NM))
FROM
ENG.DBO.MCW_BCL_WEDGE
WHERE
TESTDATE > GETDATE() - 6
GROUP BY
BLADEID
HAVING
COUNT(BladeID) >= 5000
I don't have access to mssql at the moment, but this might work. The inner query returns 1 row per BladeID with what I think are the aggregates you want. Problem is window functions always return 1 row for each row in the source, so the outer query flattens this.
SELECT DISTINCT
BladeID,
Total_Sigma_MCW = STDEV(MCW_NM) OVER (PARTITION BY 1),
Variance_Blade_MCW,
CountD_Blade,
FROM
(
SELECT
BladeID,
MCW_NM,
CountD_Blade = COUNT() OVER (PARTITION BY BladeID),
Variance_Blade_MCW = SQUARE(STDEV(MCW_NM) OVER (PARTITION BY BLADEID))
FROM
ENG.DBO.MCW_BCL_WEDGE
WHERE
TESTDATE > GETDATE() - 6
) q
WHERE CountD_Blade >= 5000
It may be more efficient to create two queries, one to group by BladeID and one over the full dataset and join them.
I have the below sql and would like to add a filter clause within the window function. Is this possible?
select ROUND(SUM(M.CHRG_RATE/M.CONTRACTUAL_RATE) OVER
(PARTITION BY M.PROGRAM),0) AS BILLED_MEMBERS_PER_MONTH2
from tableA
where 1=1
I was thinking I could wrap a case statement?
Something like:
You can use case to leave rows out of the sum. For example, this only sums rows where the flag column equals 'Q':
SUM(CASE WHEN M.FLAG = 'Q' THEN M.CHRG_RATE/M.CONTRACTUAL_RATE END)
I can do this in SQL Server:
SELECT 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
(select sum(empid) from HR.employees) STOCK
but in Access the same query show me the next error:
Query input must contain at least one table or query
So which could be the best form to emulate this? Make a query with any other table looks dirty for me.
EDIT 1:, HR.employees It may no have data, but i want show constants ('HERRAMIENTA ELÉCTRICA',''0') and 0 in the third column, maybe using isnull and this is not the problem here.
Why not to select directly:
select 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
IIF(ISNULL(sum(empid)), 0, sum(empid)) AS STOCK
from HR.employees
This simply doesn't work in Access. You need a FROM clause.
So you need to have a dummy table with one record, even if you don't use a single field from that table.
SELECT 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
(select sum(empid) from HR.employees) STOCK
FROM Dummy_Table
Using this example as empty table:
with employ as
(select 2 as col from dual
minus
select 2 as col from dual)
The query is this one:
select 'HERRAM' as tipo,
0 as deprec,
coalesce(sum(col), 0) as STOCK
from employ;
coalesce(x, value) sets the column to value when X is null
In Access, you can use a system table, and Val and Nz for the zero value:
SELECT TOP 1
'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
Val(Nz((select sum(empid) from HR.employees), 0)) AS STOCK
FROM
MSysObjects