SQL LAG IN CASE STATEMENT - sql

I would appreciate any pointers on what is wrong with my case statement, if the Current CLUSTERn = Previous CLUSTERn Then add the Previous PRODCAT to the current line as PREVCAT...
ORA-30484: missing window specification for this function
30484. 00000 - "missing window specification for this function"
*Cause: All window functions should be followed by window specification,
like <function>(<argument list>) OVER (<window specification>)
*Action:
Error at Line: 11 Column: 30
SELECT CLUSTERn,
MEMBERn,
COUNT(*) OVER ( PARTITION BY CLUSTERn ORDER BY MEMBERn, PRODCAT, STARTd, ENDd ) AS NEWRANK,
CASE WHEN CLUSTERn = LAG(CLUSTERn) THEN LAG(PRODCAT) ELSE 'New' END AS PREVCAT,
STATUS,
PRODCAT,
JOINTYPE,
JOINRANK,
CSP,
PROGID,
PROMNAME,
PROMOID,
COHORT,
FWEEK,
STARTd,
ENDd,
SOURCE
FROM(

I'm not sure what the confusing is. You have:
(CASE WHEN CLUSTERn = LAG(CLUSTERn)
THEN LAG(PRODCAT)
ELSE 'New'
END) AS PREVCAT,
You are missing the OVER clause -- pretty fundamental for all window functions.
Without sample data it is pretty hard to figure out what you really want. Perhaps:
(CASE WHEN CLUSTERn = LAG(CLUSTERn) OVER (ORDER BY MEMBERn, PRODCAT, STARTd, ENDd)
THEN LAG(PRODCAT) OVER (ORDER BY MEMBERn, PRODCAT, STARTd, ENDd)
ELSE 'New'
END) AS PREVCAT,
It is also possible that no CASE is required. LAG() has a three-argument form that allows you to specify a default value:
LAG(PRODCAT, 1, 'NEW') OVER (PARTITION BY ClusterN ORDER BY STARTd, ENDd)

Related

Window function is not supported in partition by clauses

select t1.lease_number ,t2.rec_bal, to_date(t2.date_dim_id,'YYYYMMDD') as issue_date,t2.paid_to as pay_date,
(case when pay_date <= lag(pay_date) over (order by issue_date) then null when pay_date > lag(pay_date) over (order by issue_date) then issue_date end) as payment_date,
dense_rank() over (partition by pay_date order by issue_date) as row_numbers,
(case when row_numbers = max(row_numbers) over (partition by payment_date) then payment_date else null end) as paymentmade_date,
remain_months_upd,remaining_pymt_periods, t2.dealer_dim_id, t2.lease_contract_dim_id
from dm_business_ops_tcci.v_tcci_lease_contract_dim t1
, dm_business_ops_tcci.v_tcci_lease_transaction_fact t2
where t1.lease_contract_dim_id=t2.lease_contract_dim_id
and t2.date_dim_id >=20210301 -- can be changed to latest busienss date
and lease_number in (1633014)
order by issue_date
I am trying to partition by a column I created using a window function, and I can't do it. The error is coming from the line "(case when row_numbers = max(row_numbers) over (partition by payment_date) then payment_date else null end) as paymentmade_date". Payment_date is calculating using a window function in a prior line. Is there a workaround for this?
You will need to materialize the values of your window functions before you perform any sort of filtering, partitioning, or conditional operations on that value.
There are a few ways to go about doing this, and the appropriate one for your use case will vary depending factor outside of this scope.
You may accomplish this using a view, CTE, temp table, or a table variable prior to attempting this partitioning operation. This is not an exhaustive list.

how to select first and last row in 1 query after Filtering and then carry out calculation between the values of two values in one query

I'm using T-SQL 2014
Suppose I have a stock price chart as follow
I want to write efficient code for a stored function to display the Open price at the start, Close price at the end, and the difference between Close and Open. Is it possible to do that in one query? The query seems easy but it turned out extremely difficult. My first problem is to display the first row and last row in one query.
My attempt is like this
create function GetVolatilityRank(#from date, #to date)
returns table as
return(
with Price_Selected_Time as (select * from Price where [date] between #from and #to)
select
(select top 1([Open]) from Price_Selected_Time) as 'Open',
(select top 1([Close]) from Price_Selected_Time order by date desc) as 'Close',
[Close] - [Open] as 'Difference'
);
I feel this code is very clumsy. And it also won't let me pass, because the 'Open'and 'Close' is not defined yet.
Is there anyway to query this in one select?
Thank you
We can handle this via a regular query using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Date) rn_start,
ROW_NUMBER() OVER (ORDER BY Date DESC) rn_end
FROM Price
)
SELECT
MAX(CASE WHEN rn_start = 1 THEN [Open] END) AS OpenStart,
MAX(CASE WHEN rn_end = 1 THEN [Close] END) AS CloseEnd,
MAX(CASE WHEN rn_end = 1 THEN [Close] END) -
MAX(CASE WHEN rn_start = 1 THEN [Open] END) AS diff
FROM cte;

Repurposing IIF statement for Redshift CASE

I'm working with a query I was given from a client, but we have different SQL languages. We use Redshift, which doesn't include iif functions, and frankly, I've never used. I know it's basically a different way of a CASE statement, right? Here is the query
select
*
,iif(datediff(day,
lag(event_date, 1, '1900-01-01') over (partition by client_id, error_id order by event_date),
event_date) <= 1
,'yes', 'no') flag
from table.a
I thought this would work but it keeps firing back an error:
select
*,
CASE WHEN datediff(day, lag(event_date, 1, '1900-01-01')) OVER (PARTITION BY client_id, errord_id ORDER BY event_date) <= 1 THEN 'YES' ELSE 'NO' END flag
from dsa.sas_days
Can someone help me in reconfiguring this?
In redshift Lag, There are only two parameters in the function value_expr and offset.
LAG (value_expr [, offset ])
[ IGNORE NULLS | RESPECT NULLS ]
OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering )
so You can try this.
select
*,CASE WHEN
datediff(day, lag(event_date, 1) OVER (PARTITION BY client_id, errord_id ORDER BY event_date),event_date) <= 1
THEN 'YES'
ELSE 'NO'
END flag
from dsa.sas_days

Row_number() and group by together not working

I wrote below query, I need to know what am I doing wrong. After adding row_number(), the output is always this error:
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action:
Error at Line: 22 Column: 32
The SQL Developer tool tip suggested to append row_number() in group by together with Is_Data_Picked. But as I know row_number() is an analytical function to count the each row, and it can't be use in GROUP BY clause
When I use row_number() inside GROUP BY, then it shows the below error
ORA-30484: missing window specification for this function
30484. 00000 - "missing window specification for this function"
*Cause: All window functions should be followed by window specification, like () OVER ()
*Action:
Error at Line: 26 Column: 26
I want to use both "GROUP BY" and "ROW_NUMBER()" in my query.
Kindly help me to rectify this issue and suggest me the solution.
Query:
SELECT *
FROM
(SELECT
COUNT(DISTINCT Emp_Code) totalEmployees,
SUM(CASE WHEN pay_code = 999 THEN AMOUNT ELSE '0' END) net_salary,
SUM(CASE WHEN pay_code = 997 THEN AMOUNT ELSE '0' END) gross_earning,
SUM(CASE WHEN pay_code = 998 THEN AMOUNT ELSE '0' END) gross_deduction,
Is_Data_Picked,
ROW_NUMBER() OVER (ORDER BY (Emp_Code)) AS ROW_NUM
FROM
Xxmpcd_Salary_Detail_Table
WHERE
Prayas_Erp_Org_Id LIKE '302-%'
AND Yyyymm = '201805'
GROUP BY
Is_Data_Picked, ROW_NUMBER()) mytbl
WHERE
ROW_NUM < 600 AND ROW_NUM > 0
This is the relevant part of your subquery:
SELECT . . .
ROW_NUMBER() OVER (ORDER BY (Emp_Code)) AS ROW_NUM
FROM Xxmpcd_Salary_Detail_Table
WHERE Prayas_Erp_Org_Id LIKE '302-%' AND Yyyymm = '201805'
GROUP BY Is_Data_Picked, ROW_NUMBER()
You have an error in the first ROW_NUMBER() because Emp_Code is not in the GROUP BY. You have an error in the second because ROW_NUMBER() is not a function.
I could speculate that you intend:
SELECT . . .
ROW_NUMBER() OVER (ORDER BY Emp_Code) AS ROW_NUM
FROM Xxmpcd_Salary_Detail_Table
WHERE Prayas_Erp_Org_Id LIKE '302-%' AND Yyyymm = '201805'
GROUP BY Is_Data_Picked, Emp_Code
If you don't want to aggregate by Emp_Code, then you might intend:
SELECT . . .
ROW_NUMBER() OVER (ORDER BY MIN(Emp_Code)) AS ROW_NUM
FROM Xxmpcd_Salary_Detail_Table
WHERE Prayas_Erp_Org_Id LIKE '302-%' AND Yyyymm = '201805'
GROUP BY Is_Data_Picked

SUM of time spent in a State

Please consider the table below for call center agent states.
What I need is to calculate the sum of time Bryan spent in "Break" for the whole day.
This is what I'm trying to execute but it returns some inaccurate values:
select sum (CASE
WHEN State = 'Not Working' and Reason = 'Break'
THEN Datediff(SECOND, [Time_Stamp], CURRENT_TIMESTAMP)
else '' END) as Break_Overall
from MyTable
where Agent = 'Bryan'
Use lead():
select agent,
sum(datediff(second, timestamp, next_timestamp)
from (select t.*,
lead(timestamp) over (partition by agent order by time_stamp) as next_timestamp
from mytable t
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
If the agent can currently be on break, you might want a default value:
select agent,
sum(datediff(second, timestamp, next_timestamp)
from (select t.*,
lead(timestamp, 1, current_timestamp) over (partition by agent
order by time_stamp) as next_timestamp
from mytable t
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
I'm a little uncomfortable with this logic, because current_timestamp has a date component, but your times don't.
EDIT:
In SQL Server 2008, you can do:
select agent,
sum(datediff(second, timestamp, coalesce(next_timestamp, current_timestamp))
from (select t.*, t2.timestamp as next_timestamp
from mytable t outer apply
(select top 1 t2.*
from mytable t2
where t2.agent = t.agent and t2.time_stamp > t.time_stamp
order by t.time_stamp
) t2
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
As it is, you're getting the difference between the record's Time_Stamp and CURRENT_TIMESTAMP. That's probably not correct - you probably want to get the difference between the record's Time_Stamp and the next Time_Stamp for the same "Agent".
(Note that "Agent" will also present problems if you have multiple Agents with the same name; you probably want to store Agents in a different table and use a unique identifier as a foreign key.)
So, for Bryan, you'd get
the sum of both the "total time" for the 8:30:21 record AND the 11:34:58 record, which is right - except that you're calculating "total time" incorrectly, so instead you'd get the sum of the time since 8:30:21 and 11:34:58.