Window function or Recursive query in Redshift - sql

I try to classify customer based on monthly sales. Logic for calculation:
new customer – it has sales and appears for first time or has sales and returned after being lost (after 4 month period, based on sales_date)
active customer - not new and not lost.
lost customer - no sales and period (based on sales_date) more than 4 months
This is the desired output I'm trying to achieve:
The below Window function in Redshift classify however it is not correct.
It classified lost when difference between month > 4 in one row, however it did not classify lost if it was previous lost and revenue 0 until new status appear. How it can be updated?
with customer_status (
select customer_id,customer_name,sales_date,sum(sales_amount) as sales_amount_calc,
nvl(DATEDIFF(month, LAG(reporting_date) OVER (PARTITION BY customer_id ORDER by sales_date ASC), sales_date),0) AS months_between_sales
from customer
GROUP BY customer_id,customer_name,sales_date
)
select *,
case
WHEN months_between_sales = 0 THEN 'New'
WHEN months_between_sales > 0 THEN 'Active'
WHEN months_between_sales > 0 AND months_between_sales <= 4 and sales_amount_calc = 0 THEN 'Active'
WHEN /* months_between_sales > 0 and */ months_between_sales > 4 and sales_amount_calc = 0 THEN 'Lost'
ELSE 'Unknown'
END AS status
from customer_status
One way to solve to get cumulative sales amount partitioned on subset of potentially lost customers ( sales_amount = 0).
Cumulative amount for the customer partitioned
sum(months_between_sales) over (PARTITION BY customer_id ORDER by sales_date ASC rows unbounded preceding) as cumulative_amount,
How can it be changed to get sub-partitioned, for example when sales amount= 0 , in order to get lost correctly?
Does someone have ideas to translate the above logic into an
recursive query in Redshift?
Thank you

Related

Ranking a record based on a pre-calculated field

I have this query and I am trying to figure out how to rank
in the DENSE_RANK area by 'Status'. Status is a row from a case when statement when the current month measure is greater than the 6 mos average.
Basically I want to rank the largest to smallest 'GoalAchieved' status partitioned by territory by Order by sum(revenue). Same with the inprogress status.
So each territory should have a rank of 1,2,3,4 for goalachieved status and the same thing for 'Inprogress' Status.
Any help would be awesome.
SELECT DISTINCT a.ClinLocationID, Terrioty,
ClinLocationName, Round(Sum(revenue),2)
CASE WHEN CM > LastSixmosAvg THEN 'GoalAchieved'
ELSE 'InProgress'
END as Status,
--3 month test
CASE WHEN CM > Last3MosAvg THEN 'Winner'
ELSE 'Loser'
END as ThreeMonthStatus,
CASE WHEN CM > LastSixmosAvg THEN RANK () OVER (partition by Terrioty order by Sum(Revenue) desc)

SQL count records depending on criteria

I have a table which has account information for each individual working day. Each record shows the balance on the account for that particular day. I need to create a field that has a running count on how many days it has been since the balance on that particular account was zero.
For example: I have an account which is number 00000001. It was opened a week ago. The database has created a record for the account for last Tuesday, Wednesday, Thursday, Friday and Monday. The account did not have a balance until Friday and the balance was the same for Monday. I want the field to show '2' as a result. Further to this if the balance drops to '0' today, I need the count to reset on the next record and start again if the account has a balance the following day. There are several accounts in this table so I need this to work for each one individually.
Below is as far as I have got:
SELECT
pos.EffDate,
cus.CNA1,
pos.ACNO,
pos.CCY,
pos.LDBL,
pos.LDBLUSD,
MIN(pos3.effdate) AS 'First Post Date',
pos3.Balcount
FROM[dbo].[Account] AS pos
JOIN [dbo].[Customer] AS cus ON ((pos.CNUM=cus.CUST_NO) AND (cus.effdate=pos.effdate))
LEFT JOIN (SELECT pos2.effdate, pos2.ACNO, SUM(CASE pos2.LDBL WHEN 0 THEN 0 ELSE 1 END) AS 'Balcount' FROM [dbo].[Account] AS pos2 GROUP BY pos2.ACNO, pos2.Effdate HAVING pos2.effdate BETWEEN pos2.effdate AND MIN(pos2.effdate)) pos3 ON pos3.ACNO = pos.ACNO
WHERE pos.effdate >='1 Dec 2015' AND pos.Effdate <'30 Dec 2015'
AND pos.srcsys = 'MP_UK'
AND pos.LDBL <= 0
AND pos.CNUM <> '000020'
AND pos.intbear <> 'N'
AND pos.blockdeb IS null
GROUP BY pos.EffDate,cus.CNA1,pos.ACNO,pos.CCY,pos.LDBL,pos.LDBLUSD, pos3.Balcount
ORDER BY 1,2,3
If you need me to clarify anything please let me know. All help is greatly appreciated.
Thanks,
Ben
Basically, you want to group the data, based on the number of time the account is zero before a given row. Then, within each group, you want to enumerate the records.
In SQL Server 2012+, you can do these things with window functions. I am not sure exactly what your sample query has to do with the question, but here is the basic idea:
select a.*, row_number() over (partition by cust_no, grp order by eff_date) as seqnum
from (select a.*,
sum(case when balance = 0 then 1 else 0 end) over
(partition by cust_no order by eff_date) as grp
from Account a
) a;
In earlier versions of SQL Server, you can do something very similar using apply.

Query for getting previous date in oracle in specific scenario

I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out

Sum Until Value Reached - Teradata

In Teradata, I need a query to first identify all members in the MEM TABLE that currently have a negative balance, let's call that CUR_BAL. Then, for all of those members only, sum all transactions from the TRAN TABLE in order by date until the sum of those transactions is equal to the CUR_BAL.
Editing to add a third ADJ table that contains MEM_NBR, ADJ_DT and ADJ_AMT that need to be included in the running total in order to capture all of the records.
I would like the outcome to include the MEM.MEM_NBR, MEM.CUR_BAL, TRAN.TRAN_DATE OR ADJ.ADJ_DT (date associated with the transaction that resulted in the running total to equal CUR_BAL), MEM.LST_UPD_DT. I don't need to know if the balance is negative as a result of a transaction or adjustment, just the date that it went negative.
Thank you!
select
mem_nbr,
cur_bal,
tran_date,
tran_type
from (
select
a.mem_nbr,
a.cur_bal,
b.tran_date,
b.tran_type,
a.lst_upd_dt,
sum(b.tran_amt) over (partition by b.mem_nbr order by b.tran_date rows between unbounded preceding and current row) as cumulative_bal
from mem a
inner join (
select
mem_nbr,
tran_date,
tran_amt,
'Tran' as tran_type
from tran
union all
select
mem_nbr,
adj_date,
adj_amt,
'Adj' as tran_type
from adj
) b
on a.mem_nbr = b.mem_nbr
where a.cur_bal < 0
qualify cumulative_bal < 0
) z
qualify rank() over (partition by mem_nbr order by tran_date) = 1
The subquery picks up all instances where the cumulative balance is negative, then the outer query picks up the earliest instance of it. If you want the latest, add desc after tran_date in the final qualify line.

SQL query to identify seasonal sales items

I need a SQL query that will identify seasonal sales items.
My table has the following structure -
ProdId WeekEnd Sales
234 23/04/09 543.23
234 30/04/09 12.43
432 23/04/09 0.00
etc
I need a SQL query that will return all ProdId's that have 26 weeks consecutive 0 sales. I am running SQL server 2005. Many thanks!
Update: A colleague has suggested a solution using rank() - I'm looking at it now...
Here's my version:
DECLARE #NumWeeks int
SET #NumWeeks = 26
SELECT s1.ProdID, s1.WeekEnd, COUNT(*) AS ZeroCount
FROM Sales s1
INNER JOIN Sales s2
ON s2.ProdID = s1.ProdID
AND s2.WeekEnd >= s1.WeekEnd
AND s2.WeekEnd <= DATEADD(WEEK, #NumWeeks + 1, s1.WeekEnd)
WHERE s1.Sales > 0
GROUP BY s1.ProdID, s1.WeekEnd
HAVING COUNT(*) >= #NumWeeks
Now, this is making a critical assumption, namely that there are no duplicate entries (only 1 per product per week) and that new data is actually entered every week. With these assumptions taken into account, if we look at the 27 weeks after a non-zero sales week and find that there were 26 total weeks with zero sales, then we can deduce logically that they had to be 26 consecutive weeks.
Note that this will ignore products that had zero sales from the start; there has to be a non-zero week to anchor it. If you want to include products that had no sales since the beginning, then add the following line after `WHERE s1.Sales > 0':
OR s1.WeekEnd = (SELECT MIN(WeekEnd) FROM Sales WHERE ProdID = s1.ProdID)
This will slow the query down a lot but guarantees that the first week of "recorded" sales will always be taken into account.
SELECT DISTINCT
s1.ProdId
FROM (
SELECT
ProdId,
ROW_NUMBER() OVER (PARTITION BY ProdId ORDER BY WeekEnd) AS rownum,
WeekEnd
FROM Sales
WHERE Sales <> 0
) s1
INNER JOIN (
SELECT
ProdId,
ROW_NUMBER() OVER (PARTITION BY ProdId ORDER BY WeekEnd) AS rownum,
WeekEnd
FROM Sales
WHERE Sales <> 0
) s2
ON s1.ProdId = s2.ProdId
AND s1.rownum + 1 = s2.rownum
AND DateAdd(WEEK, 26, s1.WeekEnd) = s2.WeekEnd;