Repurposing IIF statement for Redshift CASE - sql

I'm working with a query I was given from a client, but we have different SQL languages. We use Redshift, which doesn't include iif functions, and frankly, I've never used. I know it's basically a different way of a CASE statement, right? Here is the query
select
*
,iif(datediff(day,
lag(event_date, 1, '1900-01-01') over (partition by client_id, error_id order by event_date),
event_date) <= 1
,'yes', 'no') flag
from table.a
I thought this would work but it keeps firing back an error:
select
*,
CASE WHEN datediff(day, lag(event_date, 1, '1900-01-01')) OVER (PARTITION BY client_id, errord_id ORDER BY event_date) <= 1 THEN 'YES' ELSE 'NO' END flag
from dsa.sas_days
Can someone help me in reconfiguring this?

In redshift Lag, There are only two parameters in the function value_expr and offset.
LAG (value_expr [, offset ])
[ IGNORE NULLS | RESPECT NULLS ]
OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering )
so You can try this.
select
*,CASE WHEN
datediff(day, lag(event_date, 1) OVER (PARTITION BY client_id, errord_id ORDER BY event_date),event_date) <= 1
THEN 'YES'
ELSE 'NO'
END flag
from dsa.sas_days

Related

SQL calculation with previous row + current row

I want to make a calculation based on the excel file. I succeed to obtain 2 of the first records with LAG (as you can check on the 2nd screenshot). Im out of ideas how to proceed from now and need help. I just need the Calculation column take its previous data. I want to automatically calculate it over all the dates. I also tried to make a LAG for the calculation but manually and the result was +1 row more data instead of NULL. This is a headache.
LAG(Data ingested, 1) OVER ( ORDER BY DATE ASC ) AS LAG
You seem to want cumulative sums:
select t.*,
(sum(reconciliation + aves - microa) over (order by date) -
first_value(aves - microa) over (order by date)
) as calculation
from CalcTable t;
Here is a SQL Fiddle.
EDIT:
Based on your comment, you just need to define a group:
select t.*,
(sum(reconciliation + aves - microa) over (partition by grp order by date) -
first_value(aves - microa) over (partition by grp order by date)
) as calculation
from (select t.*,
count(nullif(reconciliation, 0)) over (order by date) as grp
from CalcTable t
) t
order by date;
Imo this could be solved using a "gaps and islands" approach. When Reconciliation>0 then create a gap. SUM(GAP) OVER converts the gaps into island groupings. In the outer query the 'sum_over' column (which corresponds to the 'Calculation') is a cumumlative sum partitioned by the island groupings.
with
gap_cte as (
select *, case when [Reconciliation]>0 then 1 else 0 end gap
from CalcTable),
grp_cte as (
select *, sum(gap) over (order by [Date]) grp
from gap_cte)
select *, sum([Reconciliation]+
(case when gap=1 then 0 else Aves end)-
(case when gap=1 then 0 else Microa end))
over (partition by grp order by [Date]) sum_over
from grp_cte;
[EDIT]
The CASE statement could be CROSS APPLY'ed instead
with
grp_cte as (
select c.*, v.gap, sum(v.gap) over (order by [Date]) grp
from #CalcTable c
cross apply (values (case when [Reconciliation]>0 then 1 else 0 end)) v(gap))
select *, sum([Reconciliation]+
(case when gap=1 then 0 else Aves end)-
(case when gap=1 then 0 else Microa end))
over (partition by grp order by [Date]) sum_over
from grp_cte;
Here is a fiddle

Window function is not supported in partition by clauses

select t1.lease_number ,t2.rec_bal, to_date(t2.date_dim_id,'YYYYMMDD') as issue_date,t2.paid_to as pay_date,
(case when pay_date <= lag(pay_date) over (order by issue_date) then null when pay_date > lag(pay_date) over (order by issue_date) then issue_date end) as payment_date,
dense_rank() over (partition by pay_date order by issue_date) as row_numbers,
(case when row_numbers = max(row_numbers) over (partition by payment_date) then payment_date else null end) as paymentmade_date,
remain_months_upd,remaining_pymt_periods, t2.dealer_dim_id, t2.lease_contract_dim_id
from dm_business_ops_tcci.v_tcci_lease_contract_dim t1
, dm_business_ops_tcci.v_tcci_lease_transaction_fact t2
where t1.lease_contract_dim_id=t2.lease_contract_dim_id
and t2.date_dim_id >=20210301 -- can be changed to latest busienss date
and lease_number in (1633014)
order by issue_date
I am trying to partition by a column I created using a window function, and I can't do it. The error is coming from the line "(case when row_numbers = max(row_numbers) over (partition by payment_date) then payment_date else null end) as paymentmade_date". Payment_date is calculating using a window function in a prior line. Is there a workaround for this?
You will need to materialize the values of your window functions before you perform any sort of filtering, partitioning, or conditional operations on that value.
There are a few ways to go about doing this, and the appropriate one for your use case will vary depending factor outside of this scope.
You may accomplish this using a view, CTE, temp table, or a table variable prior to attempting this partitioning operation. This is not an exhaustive list.

SQL LAG IN CASE STATEMENT

I would appreciate any pointers on what is wrong with my case statement, if the Current CLUSTERn = Previous CLUSTERn Then add the Previous PRODCAT to the current line as PREVCAT...
ORA-30484: missing window specification for this function
30484. 00000 - "missing window specification for this function"
*Cause: All window functions should be followed by window specification,
like <function>(<argument list>) OVER (<window specification>)
*Action:
Error at Line: 11 Column: 30
SELECT CLUSTERn,
MEMBERn,
COUNT(*) OVER ( PARTITION BY CLUSTERn ORDER BY MEMBERn, PRODCAT, STARTd, ENDd ) AS NEWRANK,
CASE WHEN CLUSTERn = LAG(CLUSTERn) THEN LAG(PRODCAT) ELSE 'New' END AS PREVCAT,
STATUS,
PRODCAT,
JOINTYPE,
JOINRANK,
CSP,
PROGID,
PROMNAME,
PROMOID,
COHORT,
FWEEK,
STARTd,
ENDd,
SOURCE
FROM(
I'm not sure what the confusing is. You have:
(CASE WHEN CLUSTERn = LAG(CLUSTERn)
THEN LAG(PRODCAT)
ELSE 'New'
END) AS PREVCAT,
You are missing the OVER clause -- pretty fundamental for all window functions.
Without sample data it is pretty hard to figure out what you really want. Perhaps:
(CASE WHEN CLUSTERn = LAG(CLUSTERn) OVER (ORDER BY MEMBERn, PRODCAT, STARTd, ENDd)
THEN LAG(PRODCAT) OVER (ORDER BY MEMBERn, PRODCAT, STARTd, ENDd)
ELSE 'New'
END) AS PREVCAT,
It is also possible that no CASE is required. LAG() has a three-argument form that allows you to specify a default value:
LAG(PRODCAT, 1, 'NEW') OVER (PARTITION BY ClusterN ORDER BY STARTd, ENDd)

SUM of time spent in a State

Please consider the table below for call center agent states.
What I need is to calculate the sum of time Bryan spent in "Break" for the whole day.
This is what I'm trying to execute but it returns some inaccurate values:
select sum (CASE
WHEN State = 'Not Working' and Reason = 'Break'
THEN Datediff(SECOND, [Time_Stamp], CURRENT_TIMESTAMP)
else '' END) as Break_Overall
from MyTable
where Agent = 'Bryan'
Use lead():
select agent,
sum(datediff(second, timestamp, next_timestamp)
from (select t.*,
lead(timestamp) over (partition by agent order by time_stamp) as next_timestamp
from mytable t
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
If the agent can currently be on break, you might want a default value:
select agent,
sum(datediff(second, timestamp, next_timestamp)
from (select t.*,
lead(timestamp, 1, current_timestamp) over (partition by agent
order by time_stamp) as next_timestamp
from mytable t
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
I'm a little uncomfortable with this logic, because current_timestamp has a date component, but your times don't.
EDIT:
In SQL Server 2008, you can do:
select agent,
sum(datediff(second, timestamp, coalesce(next_timestamp, current_timestamp))
from (select t.*, t2.timestamp as next_timestamp
from mytable t outer apply
(select top 1 t2.*
from mytable t2
where t2.agent = t.agent and t2.time_stamp > t.time_stamp
order by t.time_stamp
) t2
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
As it is, you're getting the difference between the record's Time_Stamp and CURRENT_TIMESTAMP. That's probably not correct - you probably want to get the difference between the record's Time_Stamp and the next Time_Stamp for the same "Agent".
(Note that "Agent" will also present problems if you have multiple Agents with the same name; you probably want to store Agents in a different table and use a unique identifier as a foreign key.)
So, for Bryan, you'd get
the sum of both the "total time" for the 8:30:21 record AND the 11:34:58 record, which is right - except that you're calculating "total time" incorrectly, so instead you'd get the sum of the time since 8:30:21 and 11:34:58.

Oracle SQL LAG Function

I'd appreciate some help with this code, I'm getting a 'missing keyword' error. I've never used the Lag function before, so hopefully I using it correctly. Thanks for your help. Gav
CREATE VIEW GS_Date AS
SELECT
DATE_DATE,
DATE_FLAG,
CASE WHEN LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) = '1' THEN DATE_STEP = ( LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) ) + '1'
WHEN LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) = '0' AND LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) = '-1' THEN DATE_STEP = ( LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) ) + '1'
ELSE DATE_STEP = LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) END AS DATE_STEP
FROM DATE_GROUP
The problem is with the CASE expression; you were using LAG correctly.
Other points: Don't add strings like '1' and '-1' to numbers. Add numbers - you don't need the single quotes.
Also, if in a computation something is common and only the "last part" is different, you can use the CASE expression "at the end". Like below:
Note: On re-reading the original post, the formula needs to be more complicated (I didn't get it exactly right). Not changing the answer, since it still illustrates the same ideas I meant to share. BUT: Looking at the original post, there is a condition "when LAG = 0 and LAG = -1" - that can never be true. What was meant is probably "OR" instead of "AND". In the formula I wrote below, this means one more WHEN...THEN... branch.
LAG(DATE_FLAG) OVER (ORDER BY DATE)
+ CASE LAG(DATE_FLAG) OVER (ORDER BY DATE ) WHEN 1 THEN 1
WHEN 0 THEN -1
ELSE 0 END AS DATE_STEP
Further edit: Looking at it again, it seems when the flag is 1, 0 or -1 then we must add 1, otherwise add 0... then it's easier to use a "simple CASE expression" instead of a "searched CASE expression" as I did. Something like:
LAG(...) ...
+ CASE WHEN LAG(...) ... IN (-1, 0, 1) THEN 1
ELSE 0 END AS DATE_STEP
Try like this
CREATE VIEW GS_Date AS
SELECT DATE_DATE,
DATE_FLAG,
CASE
WHEN LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE) = '1' THEN
(LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE)) + '1'
WHEN LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE) = '0' AND LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE) = '-1' THEN
(LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE)) + '1'
ELSE
LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE)
END AS DATE_STEP
FROM DATE_GROUP
So you don't have to keep writing LAG( ... ) OVER ( ... ) statements, get the LAG value in a sub-query and then use CASE or DECODE in the outer query:
CREATE VIEW GS_Date AS
SELECT DATE_DATE,
DATE_FLAG,
DECODE(
DATE_STEP,
1, 2,
0, 1,
-1, 0,
DATE_STEP
) AS DATE_STEP
FROM (
SELECT DATE_DATE,
DATE_FLAG,
LAG ( DATE_FLAG ) OVER ( ORDER BY DATE_DATE ) AS DATE_STEP
FROM DATE_GROUP
)'
Also, your second WHEN clause will never be true:
WHEN LAG ( DATE_FLAG ) OVER ( ORDER BY DATE_DATE ) = '0'
AND LAG ( DATE_FLAG ) OVER ( ORDER BY DATE_DATE ) = '-1'
THEN ...
Since the value can never be both -1 and 0. I've assumed you meant to use OR rather than AND.