Lag query pulling in incorrect data

Lag query pulling in incorrect data - sql

I have the below table (labelled Original Table) with the following columns:
BU_Code (store code), contact_key (cutomer ID), Bu_key (store number), TXN_Mth (month of trasaction in 2021), Fragrance/Cosmetics/Personal flag (flag for type of product bought).
Original Table
I am trying to create a new table based on this which lists the previous month the customer shopped in (Pre_txn_mth) and using a CASE state to determine if they are a new customer (no previous transction before 2021), returning (shopped within 12 months) or reactivated (last shop more than 12 months ago).
However when I create the table it is listing future transactions as the previous tranaction. Below is an image from the new table in which Contact_key 1196 is pulled correctly but 1443 is not. Error in Table example
This is the code I have tried different variations of but same error:
CREATE TABLE TPS_TABLE_B AS
(
SELECT
B.*
, LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH) PRE_TXN_MTH
--, TXN_MTH - 100
, CASE
WHEN LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH) IS NULL THEN 'NEW'
WHEN TXN_MTH - 100 < (LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH)) THEN 'RETURNING'
WHEN TXN_MTH - 100 >= (LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH)) THEN 'REATIVATED'--REACTIVATED IS NO TRANSACTION IN PAST 12 MONTHS
ELSE 'OTHER'
END AS CUST_TYPE
FROM
(
SELECT
CONTACT_KEY
, BU_CODE
, BU_KEY
, TXN_MTH
, FRAGRANCE_FLAG
, COSMETICS_FLAG
, PERSONALCARE_FLAG
FROM TPS_TABLE_A
) B
)
;

You can create a CTE (or use this code as subquery) to get some derived values that you will use in main query to get everything you need.
WITH
c_months AS
(
Select
BU_CODE, BU_KEY, CONTACT_KEY,
TXN_MTH, FRAGRANCE_FLAG, COSMETICS_FLAG, PERSONALCARE_FLAG,
CASE WHEN SubStr(TXN_MTH - 1, -2) = '00' THEN TXN_MTH - 1 - 88 ELSE TXN_MTH - 1 END "MTH_BEFORE",
LAG(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY, TXN_MTH) "MTH_PREV_TXN",
TXN_MTH - (LAG(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY, TXN_MTH)) "MTHS_SINCE_PREV_TXN",
Min(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) "FIRST_MTH",
Max(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) "LAST_MTH",
Max(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) - Min(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) + 1 "TOTAL_MTHS"
From
tbl
)
/* R e s u l t :
BU_CODE BU_KEY CONTACT_KEY TXN_MTH FRAGRANCE_FLAG COSMETICS_FLAG PERSONALCARE_FLAG MTH_BEFORE MTH_PREV_TXN MTHS_SINCE_PREV_TXN FIRST_MTH LAST_MTH TOTAL_MTHS
------- ---------- ----------- ---------- -------------- -------------- ----------------- ---------- ------------ ------------------- ---------- ---------- ----------
TPS 16 1196 202108 1 0 0 202107 202108 202112 5
TPS 16 1196 202111 1 0 0 202110 202108 3 202108 202112 5
TPS 16 1196 202112 1 0 0 202111 202111 1 202108 202112 5
TPS 16 1259 202109 1 0 0 202108 202109 202109 1
TPS 16 1443 202106 1 0 0 202105 202106 202109 4
TPS 16 1443 202109 1 0 0 202108 202106 3 202106 202109 4
TPS 16 1478 202107 0 0 0 202106 202107 202107 1
TPS 16 1570 202108 1 0 0 202107 202108 202108 1
TPS 16 1637 202105 1 0 0 202104 202105 202109 5
TPS 16 1637 202106 1 0 0 202105 202105 1 202105 202109 5
TPS 16 1637 202107 1 0 0 202106 202106 1 202105 202109 5
TPS 16 1637 202109 1 0 0 202108 202107 2 202105 202109 5
TPS 16 1675 202106 1 0 0 202105 202106 202106 1
*/
Now we have more than everything we need to get you your expected result. Using the CTE's resulting dataset like below hopefuly will answer your question.
Select
BU_CODE, BU_KEY, CONTACT_KEY,
TXN_MTH,
FRAGRANCE_FLAG, COSMETICS_FLAG, PERSONALCARE_FLAG,
MTH_PREV_TXN,
CASE
WHEN MTH_PREV_TXN Is Null THEN 'NEW'
WHEN Nvl(MTHS_SINCE_PREV_TXN, 0) > 12 THEN 'REACTIVATED'
ELSE 'RETURNING'
END "CUST_TYPE"
From
c_months
With your sample data (13 rows from the question) it would reasult as:
BU_CODE
BU_KEY
CONTACT_KEY
TXN_MTH
FRAGRANCE_FLAG
COSMETICS_FLAG
PERSONALCARE_FLAG
MTH_PREV_TXN
CUST_TYPE
TPS
16
1196
202108
1
0
0
NEW
TPS
16
1196
202111
1
0
0
202108
RETURNING
TPS
16
1196
202112
1
0
0
202111
RETURNING
TPS
16
1259
202109
1
0
0
NEW
TPS
16
1443
202106
1
0
0
NEW
TPS
16
1443
202109
1
0
0
202106
RETURNING
TPS
16
1478
202107
0
0
0
NEW
TPS
16
1570
202108
1
0
0
NEW
TPS
16
1637
202105
1
0
0
NEW
TPS
16
1637
202106
1
0
0
202105
RETURNING
TPS
16
1637
202107
1
0
0
202106
RETURNING
TPS
16
1637
202109
1
0
0
202107
RETURNING
TPS
16
1675
202106
1
0
0
NEW

Related

Finding most recent startdate, and endDate from consecutive dates

I have a table like below:
user_id
store_id
stock
date
116
2
0
2021-10-18
116
2
0
2021-10-19
116
2
0
2021-10-20
116
2
0
2021-08-16
116
2
0
2021-08-15
116
2
0
2021-07-04
116
2
0
,2021-07-03
389
2
0
2021-07-02
389
2
0
2021-07-01
389
2
0
2021-10-27
52
6
0
2021-10-28
52
6
0
2021-10-29
52
6
0
2021-10-30
116
38
0
2021-05-02
116
38
0
2021-05-03
116
38
0
2021-05-04
116
38
0
2021-04-06
The table can have multiple consecutive days where a product ran out of stock, so I'd like to create a query with the last startDate and endDate where the product ran out of stock. For the table above, the results have to be:
user_Id
store_id
startDate
endDate
116
2
2021-10-18
2021-10-20
116
38
2021-05-02
2021-05-04
389
2
2021-07-01
2021-07-02
52
6
2021-10-28
2021-10-30
I have tried the solution with row_number(), but it didn't work. Does someone have a tip or idea to solve this problem with SQL (PostgreSQL)?

here is how you can do it :
select user_id, store_id,min(date) startdate,max(date) enddate
from (
select *, rank() over (partition by user_id, store_id order by grp desc) rn from (
select *, date - row_number() over (partition by user_id,store_id order by date) * interval '1 day' grp
from tablename
) t) t where rn = 1
group by user_id, store_id,grp
db<>fiddle here

finding the max number of consecutive days of absence for any person who has more than a specified number of days

I am working on an absence report and am having a hard time figuring out how to obtain the number of consecutive days an employee has off anytime they have 6 or more absences consecutively. I am able to get the max number for the employee but if an employee has more than one instance of this occurring within the given start and end date parameters, this max number of absences will only give me the highest number of absences. The following data set shows what I mean:
ClientID EmplID Date AbsentFlag NumOfDays RowNum
10 2587 2019-07-14 Y 1 4
10 2587 2019-07-15 Y 2 5
10 2587 2019-07-16 Y 3 6
10 2587 2019-07-19 Y 4 7
10 2587 2019-07-20 Y 5 8
10 2587 2019-07-21 Y 6 9
10 2587 2019-07-22 Y 7 10
10 2587 2019-07-23 Y 8 11
10 2587 2019-07-26 Y 9 12
10 2587 2019-07-27 Y 10 13
10 2587 2019-07-28 Y 11 14
10 2587 2019-07-29 Y 12 15
10 2587 2019-07-30 Y 13 16
10 2587 2019-08-03 Y 1 17
10 2587 2019-08-04 Y 2 18
10 2587 2019-08-05 Y 3 19
10 2587 2019-08-06 Y 4 20
10 2587 2019-08-09 Y 5 21
10 2587 2019-08-10 Y 6 22
10 2587 2019-08-11 Y 7 23
10 2587 2019-08-12 Y 8 24
10 2587 2019-08-13 Y 9 25
This employee, for example, has 13 consecutive days of absence(more than 6), as well as 9 consecutive days of absence (more than 6). In my report, I need to include the first 6 dates of absence, as well as the total number of absences for each consecutive streak. So for the results, I would expect this:
ClientID EmplID Days Date1 Date2 Date3 Date4 Date5 Date6
10 2587 13 2019-07-14 2019-07-15 2019-07-16 2019-07-19 2019-07-20 2019-07-21
10 2587 9 2019-08-03 2019-08-04 2019-08-05 2019-08-06 2019-08-09 2019-08-10
Currently, I am getting this:
ClientID EmplID Days Date1 Date2 Date3 Date4 Date5 Date6
10 2587 13 2019-07-14 2019-07-15 2019-07-16 2019-07-19 2019-07-20 2019-07-21
10 2587 13 2019-08-03 2019-08-04 2019-08-05 2019-08-06 2019-08-09 2019-08-10
Let me know if I can provide anything else to help solve this issue. Thanks.

You can identify the first day of an absence using the different of a sequence from numofdays. Then aggregate and filter:
select clientid, empid, max(days),
max(case when numofdays = 1 then date end) as day_1,
max(case when numofdays = 2 then date end) as day_2,
max(case when numofdays = 3 then date end) as day_3,
max(case when numofdays = 4 then date end) as day_4,
max(case when numofdays = 5 then date end) as day_5,
max(case when numofdays = 6 then date end) as day_6
from (select t.*,
row_number() over (partition by clientid, empid order by date) as seqnum
from t
) t
group by clientid, empid, (seqnum - numofdays)
having max(numofdays) >= 6

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.

As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

How to Calculate Current Customer Summary by Month and Display Data in Power Bi

I have a list of customers with a date of joining and a date of leaving,
I have to know each month by year how many joined and how many left and what the summary
id Join left
1 01/01/2017 08/03/2017
2 02/01/2017 25/03/2017
3 03/01/2017 06/03/2017
4 04/01/2017
5 30/01/2017
6 31/01/2017 05/05/2017
7 01/02/2017
8 02/02/2017 22/03/2017
9 04/02/2017 29/04/2017
10 05/02/2017 09/04/2017
11 06/02/2017 08/04/2017
12 07/02/2017 13/03/2017
13 04/03/2017 21/05/2017
14 05/03/2017
15 06/03/2017
16 07/03/2017
17 09/03/2017
18 10/03/2017 03/06/2017
19 11/03/2017 14/04/2017
20 12/03/2017 31/05/2017
21 07/04/2017 06/07/2017
22 08/04/2017 16/06/2017
23 09/04/2017 10/05/2017
24 04/03/2018 26/05/2018
25 24/03/2018 01/06/2018
26 25/03/2018 15/06/2018
27 26/03/2018 05/05/2018
28 27/03/2018 02/07/2018
29 04/04/2018
30 05/04/2018 13/06/2018
And that is how the desired result appears
total left join month year
6 0 6 1 2017
6 0 6 2
3 5 8 3
-1 4 3 4
-4 4 0 5
-2 2 0 6
-1 1 0 7
3 2 5 3 2018
2 0 2 4
0 0 0 5
-3 3 0 6
-1 1 0 7

You can try this if your database is either MySQL or SQL server. For other databases, you can use the logic/idea.
SELECT
SUM(CASE WHEN Type = 'J' THEN C ELSE 0 END) - SUM(CASE WHEN Type = 'L' THEN C ELSE 0 END) AS [Total],
SUM(CASE WHEN Type = 'L' THEN C ELSE 0 END) AS [left],
SUM(CASE WHEN Type = 'J' THEN C ELSE 0 END) AS [join],
M Month,
Y Year
FROM
(
SELECT 'J' AS [Type],
MONTH(CONVERT(DATETIME, [Join], 103)) M,
YEAR(CONVERT(DATETIME, [Join], 103)) Y,
COUNT(ID) C
FROM customers
GROUP BY MONTH(CONVERT(DATETIME, [Join], 103)), YEAR(CONVERT(DATETIME, [Join], 103))
UNION ALL
SELECT 'L',
MONTH(CONVERT(DATETIME, [left], 103)) M,
YEAR(CONVERT(DATETIME, [left], 103)) Y,
COUNT(ID) C
FROM customers
GROUP BY MONTH(CONVERT(DATETIME, [left], 103)), YEAR(CONVERT(DATETIME, [left], 103))
)A
WHERE Y IS NOT NULL
GROUP BY M,Y

find nonbreaking period with condition

There are quotas for hotels per day in a table. How to get number of days when hotel is daily available?
q_id q_hotel q_date q_value
1 1 2013-02-01 1
2 1 2013-02-02 1
3 1 2013-02-03 1
4 1 2013-02-04 0
5 1 2013-02-05 2
6 1 2013-02-06 3
7 1 2013-02-07 3
8 1 2013-02-08 2
9 1 2013-02-09 0
10 1 2013-02-10 0
11 1 2013-02-11 1
12 1 2013-02-12 1
Wanted output:
q_hotel q_date days_available
1 2013-02-01 3
1 2013-02-02 2
1 2013-02-03 1
1 2013-02-04 0
1 2013-02-05 4
1 2013-02-06 3
1 2013-02-07 2
1 2013-02-08 1
1 2013-02-09 0
1 2013-02-10 0
1 2013-02-11 2
1 2013-02-12 1
For now I can get number of days if there is zero quote after needed date exists - I find closest unavailable day and calculate dates difference.
http://sqlfiddle.com/#!12/1a64c/14
select q_hotel
,q_date
,(select extract(day from (min(B.q_date)-A.q_date)) from Table1 B where B.q_date>A.q_date
and B.q_value=0 and A.q_value<>0)
from Table1 A
But there is a problem when I don't have a zero closing date.

Here is a solution:
select
a.q_date
, a.q_hotel
, case
when
a.q_value = 0
then
0
else
(
select
extract
( day from
min ( b.q_date ) - a.q_date + interval '1 day'
)
from table1 b
where b.q_date >= a.q_date
and b.q_hotel = a.q_hotel
and not exists
(
select 1
from table1 c
where c.q_date = b.q_date + interval '1 day'
and b.q_hotel = a.q_hotel
and q_value <> 0
)
)
end as days_available
from table1 a

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lag query pulling in incorrect data - sql

Related

Finding most recent startdate, and endDate from consecutive dates

finding the max number of consecutive days of absence for any person who has more than a specified number of days

Count median days per ID between one zero and the first transaction after the last zero in a running balance

How to Calculate Current Customer Summary by Month and Display Data in Power Bi

find nonbreaking period with condition

Categories

Resources