Incremental count - sql

I have a table with a list of Customer Numbers and Order Dates and want to add a count against each Customer number, restarting from 1 each time the customer number changes, I've sorted the Table into Customer then Date order, and need to add an order count column.
CASE WHEN 'Customer Number' on This row = 'Customer Number' on Previous Row then ( Count = Count on Previous Row + 1 )
Else Count = 1
What is the best way to approach this?
Customer and Dates in Customer then Date order:
Customer Date Count
0001 01/05/18 1
0001 02/05/18 2
0001 03/05/18 3
0002 03/05/18 1 <- back to one here as Customer changed
0002 04/05/18 2
0003 05/05/18 1 <- back to one again
I've just tried COUNT(*) OVER (PARTITION BY Customer ) as COUNT but it doesn't seem to be starting from 1 for some reason when the Customer changes

It's hard to tell what you want, but "to add a count against each Customer number, restarting from 1 each time the customer number changes" sounds as if you simply want:
count(*) over (partition by customer_number)
or maybe that should be the count "up-to" the date of the row:
count(*) over (partition by customer_number order by order_date)

It sound like you just want an analytic row_number() call:
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num
from your_table
order by customer_number,
order_date
Using an analytic count also works, as #horse_with_no_name suggested:
count(*) over (partition by customer_number order by order_date) as num
Quick demo showing both, with your sample data in a CTE:
with your_table (customer_number, order_date) as (
select '0001', date '2018-05-01' from dual
union all select '0001', date '2018-05-03' from dual
union all select '0001', date '2018-05-02' from dual
union all select '0002', date '2018-05-03' from dual
union all select '0002', date '2018-05-04' from dual
union all select '0003', date '2018-05-05' from dual
)
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num1,
count(*) over (partition by customer_number order by order_date) as num2
from your_table
order by customer_number,
order_date
/
CUST ORDER_DATE NUM1 NUM2
---- ---------- ---------- ----------
0001 2018-05-01 1 1
0001 2018-05-02 2 2
0001 2018-05-03 3 3
0002 2018-05-03 1 1
0002 2018-05-04 2 2
0003 2018-05-05 1 1

Related

SQL to Calculate Customer Tenure/Service Start Date using a cooldown period logic

The business scenario here is to calculate the customer tenure with the service provider. Customer tenure is calculated based on below aspects:
Oldest account start date to be taken for tenure calculation
One Customer can have more than 1 active account at a given time
Cooldown period is 6 months, i.e., if a customer has to stay as a customer, s/he has 6 months to open a new account with the provider after closing the account or should already have another account open before closing
If the customer opens an account post 6 months then the tenure calculation happens from the new account open date
We can better understand this with an example: (values in bold are Customer since/tenure-start date)
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11111
Account1
2000-01-20
(null)
Customer already had an active account before closing the existing account
11111
Account2
2002-12-10
2021-09-22
11111
Account3
2021-10-22
(null)
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11112
Account1
2000-01-20
2002-08-10
Account closed but customer opened another account within cooling period of 6months
11112
Account2
2002-12-10
2021-09-22
11112
Account3
2021-10-22
(null)
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11113
Account1
2000-01-20
2002-05-10
Account closed but customer didn't open another account within cooling period of 6months
11113
Account2
2002-12-10
2021-09-22
Hence this is the new customer tenure start date
11113
Account3
2021-10-22
(null)
The query I was trying (below) could possibly help me if the events occur sequentially (like in above 3 scenarios)
With dataset as (
SELECT Customer_ID, ACCT_SERIAL_NUM, ACCT_STRT_DT, ACCT_END_DT, COMMENTS,
CASE WHEN NVL(LEAD(ACCT_STRT_DT, 1) OVER(PARTITION BY Customer_ID ORDER BY ACCT_STRT_DT asc ) , SYSDATE-1 ) < ADD_MONTHS(nvl(acct_end_dt, SYSDATE), 6)
THEN 'Y' ELSE 'N' END as ACTV_FLG
FROM calc_customer_tenure ct
order by Customer_ID, ACCT_STRT_DT asc )
SELECT
Customer_ID, MIN(CASE WHEN FLAG = 'Y' THEN ACCT_STRT_DT ELSE NULL END) as CUST_TNUR
FROM (
SELECT ds.*,
CASE WHEN ACCT_END_DT is NULL
THEN 'Y' ELSE MIN(ACTV_FLG) OVER (PARTITION BY Customer_ID ORDER BY ACCT_STRT_DT asc ROWS between current row and unbounded following)
END as FLAG
from dataset ds )
GROUP BY Customer_ID ORDER BY Customer_ID ;
but fails for the below scenario: (which is an ideal real-world scenario)
Unfortunately the above code takes account3 as start date instead of taking account1:
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11114
Account1
2000-01-20
2021-08-22
Customer has closed this account(1) after subsequent account(2) is closed. But then has opened an account(3) within 6 months of closing the account(1) hence this is the tenure start date
11114
Account2
2002-12-10
2003-12-10
11114
Account3
2021-10-22
(null)
Thanks to Akina I was able to re-write the query to fit as required! Also thanks to P3Consulting for contributing! Really appreciate the support!
Re-posting the final SQL here for the Oracle which helped with my use case:
Below is using Recursive CTEs
WITH cte1 as (
SELECT customer_id, ACCT_STRT_DT, ACCT_END_DT,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ACCT_STRT_DT) rn
FROM calc_customer_tenure
), cte2 (customer_id, ACCT_STRT_DT, ACCT_END_DT, rn, tenure_start_date, tenure_end_date) AS (
SELECT customer_id, ACCT_STRT_DT, ACCT_END_DT, rn,
ACCT_STRT_DT tenure_start_date,
ACCT_END_DT tenure_end_date
FROM cte1
WHERE rn = 1
UNION ALL
SELECT cte1.customer_id, cte1.ACCT_STRT_DT, cte1.ACCT_END_DT, cte1.rn,
CASE WHEN cte1.ACCT_STRT_DT > ADD_MONTHS(cte2.tenure_end_date, 6)
THEN cte1.ACCT_STRT_DT
ELSE cte2.tenure_start_date
END,
CASE WHEN cte1.ACCT_STRT_DT > ADD_MONTHS(cte2.tenure_end_date, 6)
THEN cte1.ACCT_END_DT
ELSE GREATEST(cte1.ACCT_END_DT, cte2.tenure_end_date)
END
FROM cte1
JOIN cte2 ON cte1.customer_id = cte2.customer_id AND cte1.rn = cte2.rn + 1
)
SELECT customer_id, CASE WHEN ADD_MONTHS(NVL(tenure_end_date, SYSDATE), 6) < SYSDATE THEN NULL ELSE tenure_start_date END AS CUSTOMER_TENURE_START_DATE FROM (
SELECT
cte2.*, row_number() over (partition by customer_id order by rn desc) as rank_derv
FROM cte2 ) subset
WHERE rank_derv = 1
ORDER BY 1,2 ;
I am also posting one which may work in case of Oracle only (since it uses hierarchical query syntax):
WITH dataset_rnkd as (
SELECT CT.*, row_number() over (partition by customer_id order by ACCT_STRT_DT DESC) as row_rnk
from calc_customer_tenure CT
)
SELECT customer_id, MIN(ACCT_STRT_DT) as CUSTOMER_TENURE FROM (
SELECT * FROM dataset_rnkd
START WITH ADD_MONTHS(NVL(ACCT_END_DT, SYSDATE), 6) >= SYSDATE
CONNECT BY NOCYCLE PRIOR customer_id = customer_id AND PRIOR ACCT_STRT_DT <= ADD_MONTHS(NVL(ACCT_END_DT, SYSDATE), 6)
) DS
GROUP BY customer_id
ORDER BY customer_id ;
Try this, gives the correct results for the dataset you supplied, but you should test more scenarii:
with data(customer_id,acct_serial_num,acct_strt_dt,acct_end_dt,comments) as
(
select '11111', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), cast(NULL AS DATE), 'Customer already had an active account before closing the existing account' from dual union all
select '11111', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2021-09-22','yyyy-mm-dd'), '' from dual union all
select '11111', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual union all
select '11112', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), to_date('2002-08-10','yyyy-mm-dd'), 'Account closed but customer opened another account within cooling period of 6months' from dual union all
select '11112', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2021-09-22','yyyy-mm-dd'), '' from dual union all
select '11112', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual union all
select '11113', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), to_date('2002-05-10','yyyy-mm-dd'), 'Account closed but customer didn''t open another account within cooling period of 6months' from dual union all
select '11113', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2021-09-22','yyyy-mm-dd'), 'Hence this is the new customer tenure start date' from dual union all
select '11113', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual union all
select '11114', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), to_date('2021-08-22','yyyy-mm-dd'), 'Customer has closed this account(1) after subsequent account(2) is closed. But then has opened an account(3) within 6 months of closing the account(1) hence this is the tenure start date' from dual union all
select '11114', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2003-12-10','yyyy-mm-dd'), '' from dual union all
select '11114', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual
),
datawc as (
select d.customer_id,acct_serial_num,acct_strt_dt, nvl(d.acct_end_dt, to_date('2999-12-31','yyyy-mm-dd')) as acct_end_dt,
nvl(add_months(acct_end_dt,6),to_date('2999-12-31','yyyy-mm-dd')) as cooldown_end_dt,
case when
acct_strt_dt < add_months(lag(acct_end_dt,1) over(partition by customer_id order by acct_strt_dt),6)
then 1 else 0 end as prev_within_cooldown,
case when
add_months(nvl(acct_end_dt, to_date('2999-12-31','yyyy-mm-dd')),6) > lead(acct_strt_dt,1) over(partition by customer_id order by acct_strt_dt)
then 1 else 0 end as next_within_cooldown,
d.comments
from data d
),
mergeddata as (
select customer_id, acct_serial_num, acct_strt_dt, acct_end_dt, prev_within_cooldown, next_within_cooldown, cooldown_end_dt, comments
from datawc d
match_recognize(
partition by customer_id
order by acct_strt_dt,acct_end_dt
measures first(acct_serial_num) as acct_serial_num, cooldown_end_dt as cooldown_end_dt,
first(prev_within_cooldown) as prev_within_cooldown, first(next_within_cooldown) as next_within_cooldown,
comments as comments, first(acct_strt_dt) as acct_strt_dt, max(acct_end_dt) as acct_end_dt
pattern( merged* str)
define merged as acct_end_dt >= next(acct_strt_dt)
)
)
select d.customer_id, min(acct_strt_dt) as tenure_dt
from mergeddata d
where next_within_cooldown = 1
group by d.customer_id
;
CUSTO TENURE_DT
----- -----------
11111 20-JAN-2000
11112 20-JAN-2000
11113 10-DEC-2002
11114 20-JAN-2000

Get orders for each customer after a specific date for each customer

Forgive me if I word this poorly.
And sorry if it has already been asked, but I was not able to find an answer here.
I'm using Snowflake to try and do the below.
Basically, I'm trying to do a piece of work to find out how many times a customer as placed an order after a specific date for each customer.
Scenario:
We want to see if customers continue to shop with us after they have been short-shipped (received 1 or more items less than they ordered).
So for example:
customer 1 places an order on 01/01/2020 and this was a short-shipment.
they then go on to place an order 06/06/2020 and 02/02/2021.
so this customer has a total of 2 additional orders since they were short-shipped on 01/01/2020.\
customer 2 places an order on 02/03/2020 and this was short-shipped.
customer 2 has not since placed an order, so they will have 0 additional orders.
Data available:
cust_id
ord_id
order_date
1
0123
01/01/2020
1
0456
06/06/2020
1
0789
02/02/2021
2
1011
01/01/2020
Desired output:
cust_id
number_of_orders
1
2
2
0
So using a boosted version of your data:
with data_cte( cust_id, ord_id, order_date, short_order_flg) as (
select * from values
(1, '1', '2018-06-06'::date, false),
(1, '2', '2019-01-01'::date, true),
(1, '3', '2019-06-06'::date, false),
(1, '4', '2019-12-02'::date, false),
(1, '5', '2020-01-01'::date, true),
(1, '6', '2020-06-06'::date, false),
(1, '7', '2021-02-02'::date, false),
(2, '8', '2020-01-01'::date, true)
)
which shows a "valid" purchase, multiple "short ships" and how to batch them
SELECT
cust_id,
min(order_date) as short_date,
count(*) -1 as follow_count
FROM (
select
cust_id
,order_date
,CONDITIONAL_TRUE_EVENT(short_order_flg) over(partition by cust_id order by order_date ) as edge
from data_cte
)
where edge > 0
group by 1, edge
order by 1,2;
gives:
CUST_ID
SHORT_DATE
FOLLOW_COUNT
1
2019-01-01
2
1
2020-01-01
2
2
2020-01-01
0
The key things to note, CONDITIONAL_TRUE_EVENT increases each time the event happen, which gives cust_id,edge value as batch key, and if the event has not happened those lines are zero, thus the WHERE filter.
The last things is given we have atleast one count for the start of "post short" batch, we need to subtract one from the count.
Try this
with CTE as (
select 1 as cust_id, '0123' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg union all
select 1 as cust_id, '0456' as ord_id, '2020-06-06'::date as order_date, 0 as short_order_flg union all
select 1 as cust_id, '0789' as ord_id, '2021-02-02'::date as order_date, 0 as short_order_flg union all
select 2 as cust_id, '1011' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg
),
following_orders as (
select cust_id, short_order_flg, count(ord_id) over (partition by cust_id order by order_date rows between current row and unbounded following) - 1 as number_of_orders
from cte
order by cust_id, order_date
)
select cust_id, number_of_orders
from following_orders
where short_order_flg = 1
;
I added column short_order_flg to indicate which record represents the short order. Then I used window function count(ord_id) over(...) to calculate the number of orders following each order, subtracting 1 to exclude the current record itself. Finally, I applied a filter to select only the short order records.

MIN value from data set and sum

I have data:
ID DUE AMT
4 2018-03-10 335.75
3 2018-04-10 334.75
1 2018-05-10 333.75
2 2018-06-10 332.75
I need to extract:
least due (03-10)
amt for least due (335.75)
sum of amt column.
Could it be done in single query?
Try keep dense rank:
with tt as (
select 4 id, date '2018-03-10' due, 335.75 amt from dual union all
select 3 id, date '2018-04-10' due, 334.75 amt from dual union all
select 1 id, date '2018-05-10' due, 333.75 amt from dual union all
select 2 id, date '2018-06-10' due, 332.75 amt from dual
)
select min(due) least_due,
min(amt) keep (dense_rank first order by due) amt_for_least_due,
sum(amt) sum_amt
from tt
We can try using analytic functions here:
WITH cte AS (
SELECT ID, DUE, AMOUNT,
SUM(AMOUNT) OVER () AS TOTALAMOUNT,
ROW_NUMBER() OVER (ORDER BY DUE) rn
FROM yourTable
)
SELECT ID, DUE, AMOUNT, TOTALAMOUNT
FROM cte
WHERE rn = 1;

Insert the table data based on grouping of two columns

I have a oracle table with the following format,
For eg:
JLID Dcode SID TDT QTY
8295783 3119255 9842 3/5/2018 14
8269771 3119255 9842 3/6/2018 11
8302211 3119255 1126 3/1/2018 19
Here I have different SID for the same Dcode, now I need to get the SID with the maximum Qty. (i.e) for SID 9842 - (14+11)=25, for SID 1126 it is 19, then the results should be on SID 9842. So, our query should returns the following results
JLID Dcode START_DT END_DT SID
111 3119255 3/1/2018 3/31/2018 12:00 9842
Startdate and enddate should be calculated from TDT (i.e) start date is the first date of the month and the end date is the last date of the month
Can anyone please suggest me some ideas to do it.
It might be as simple as this:
SELECT Dcode, start_date, end_date, SID FROM (
SELECT Dcode, SID, TRUNC(start_date, 'MONTH') AS start_date
, LAST_DAY(end_date) AS end_date
, ROW_NUMBER() OVER ( PARTITION BY Dcode ORDER BY total_qty DESC ) AS rn
FROM (
SELECT Dcode, SID, MIN(TDT) AS start_date, MAX(TDT) AS end_date
, SUM(QTY) AS total_qty
FROM mytable
GROUP BY Dcode, SID
)
) WHERE rn = 1
In the inner most subquery I aggregation to get the range of dates and total quantity for particular values of Dcode and SID. Then I use an anaylitic (window) function to get the row for which total quantity is the greatest. (You would want to use RANK() in place of ROW_NUMBER() in the event you want to return more than one value of SID with the same quantity.)
Here's one option which doesn't contain JLID = 111 in the final result as I have no idea where you took it from.
SQL> with test (jlid, dcode, sid, tdt, qty) as
2 (select 8295783, 3119255, 9842, date '2018-03-05', 14 from dual union
3 select 8269771, 3119255, 9842, date '2018-08-22', 11 from dual union
4 select 8302211, 3119255, 1126, date '2018-03-01', 19 from dual union
5 --
6 select 1234567, 1112223, 1000, date '2018-06-16', 88 from dual
7 )
8 select dcode,
9 min (trunc (tdt, 'mm')) start_dt, --> MIN
10 max (last_day (tdt)) end_dt, --> MAX
11 sid
12 from (select dcode,
13 sid,
14 tdt,
15 sqty,
16 rank () over (partition by dcode order by sqty desc) rnk
17 from (select dcode,
18 sid,
19 tdt,
20 sum (qty) over (partition by dcode, sid) sqty
21 from test))
22 where rnk = 1
23 group by dcode, sid; --> GROUP BY
DCODE START_DT END_DT SID
---------- ---------------- ---------------- ----------
1112223 01.06.2018 00:00 30.06.2018 00:00 1000
3119255 01.03.2018 00:00 31.08.2018 00:00 9842
SQL>

Row number custom coded in sql

I am using bigquery #standardsql to work on a table. The table will note a conversion (1) for user who purchase something in month 9 and month 10. And for user who did not purchase at month 10, will only have 0 in their row
So far , this is the query for custom_coded
(case when row_number()
over (partition by customer_id order by purchase_date asc) =
count(*) over (partition by customer_id)
then 1 else 0 END) AS custom_coded
and this is the result so far
What i expect is that customer_id = 288 only have 0 in custom_coded since he did not purchase in next month, or month 10. And customer_id = 879 expected to have 1 in his latest purchase_date since he have a purchase record at month 10
This is the expected result
I previously asked in this thread (Decode maximum number in rows for sql), however the dataset didn't satisfy the idea for the analysis that i'm going to executed
Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id, item_purchased, purchase_date,
(CASE WHEN
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) =
COUNT(*) OVER (PARTITION BY customer_id)
AND SUM(DISTINCT (CASE FORMAT_DATE('%Y%m', purchase_date)
WHEN '201709' THEN 1 WHEN '201710' THEN 2 ELSE 0 END))
OVER(PARTITION BY customer_id) = 3
THEN 1 ELSE 0
END) AS custom_coded
FROM `project.dataset.table`
You can test / play with above using dummy data from your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 288 customer_id, 'Rice' item_purchased, DATE '2017-09-02' purchase_date UNION ALL
SELECT 288, 'Rice', DATE '2017-09-02' UNION ALL
SELECT 288, 'Rice', DATE '2017-09-06' UNION ALL
SELECT 879, 'Plate', DATE '2017-09-01' UNION ALL
SELECT 879, 'Plate', DATE '2017-09-25' UNION ALL
SELECT 879, 'Plate', DATE '2017-10-25' UNION ALL
SELECT 879, 'Plate', DATE '2017-10-27'
)
SELECT customer_id, item_purchased, purchase_date,
(CASE WHEN
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) =
COUNT(*) OVER (PARTITION BY customer_id)
AND SUM(DISTINCT (CASE FORMAT_DATE('%Y%m', purchase_date)
WHEN '201709' THEN 1 WHEN '201710' THEN 2 ELSE 0 END))
OVER(PARTITION BY customer_id) = 3
THEN 1 ELSE 0
END) AS custom_coded
FROM `project.dataset.table`
ORDER BY customer_id, purchase_date
result is
customer_id item_purchased purchase_date custom_coded
288 Rice 2017-09-02 0
288 Rice 2017-09-02 0
288 Rice 2017-09-06 0
879 Plate 2017-09-01 0
879 Plate 2017-09-25 0
879 Plate 2017-10-25 0
879 Plate 2017-10-27 1