I have the below table (labelled Original Table) with the following columns:
BU_Code (store code), contact_key (cutomer ID), Bu_key (store number), TXN_Mth (month of trasaction in 2021), Fragrance/Cosmetics/Personal flag (flag for type of product bought).
Original Table
I am trying to create a new table based on this which lists the previous month the customer shopped in (Pre_txn_mth) and using a CASE state to determine if they are a new customer (no previous transction before 2021), returning (shopped within 12 months) or reactivated (last shop more than 12 months ago).
However when I create the table it is listing future transactions as the previous tranaction. Below is an image from the new table in which Contact_key 1196 is pulled correctly but 1443 is not. Error in Table example
This is the code I have tried different variations of but same error:
CREATE TABLE TPS_TABLE_B AS
(
SELECT
B.*
, LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH) PRE_TXN_MTH
--, TXN_MTH - 100
, CASE
WHEN LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH) IS NULL THEN 'NEW'
WHEN TXN_MTH - 100 < (LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH)) THEN 'RETURNING'
WHEN TXN_MTH - 100 >= (LAG(TXN_MTH) OVER (ORDER BY CONTACT_KEY, TXN_MTH)) THEN 'REATIVATED'--REACTIVATED IS NO TRANSACTION IN PAST 12 MONTHS
ELSE 'OTHER'
END AS CUST_TYPE
FROM
(
SELECT
CONTACT_KEY
, BU_CODE
, BU_KEY
, TXN_MTH
, FRAGRANCE_FLAG
, COSMETICS_FLAG
, PERSONALCARE_FLAG
FROM TPS_TABLE_A
) B
)
;
You can create a CTE (or use this code as subquery) to get some derived values that you will use in main query to get everything you need.
WITH
c_months AS
(
Select
BU_CODE, BU_KEY, CONTACT_KEY,
TXN_MTH, FRAGRANCE_FLAG, COSMETICS_FLAG, PERSONALCARE_FLAG,
CASE WHEN SubStr(TXN_MTH - 1, -2) = '00' THEN TXN_MTH - 1 - 88 ELSE TXN_MTH - 1 END "MTH_BEFORE",
LAG(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY, TXN_MTH) "MTH_PREV_TXN",
TXN_MTH - (LAG(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY, TXN_MTH)) "MTHS_SINCE_PREV_TXN",
Min(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) "FIRST_MTH",
Max(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) "LAST_MTH",
Max(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) - Min(TXN_MTH) OVER(Partition By CONTACT_KEY Order By CONTACT_KEY) + 1 "TOTAL_MTHS"
From
tbl
)
/* R e s u l t :
BU_CODE BU_KEY CONTACT_KEY TXN_MTH FRAGRANCE_FLAG COSMETICS_FLAG PERSONALCARE_FLAG MTH_BEFORE MTH_PREV_TXN MTHS_SINCE_PREV_TXN FIRST_MTH LAST_MTH TOTAL_MTHS
------- ---------- ----------- ---------- -------------- -------------- ----------------- ---------- ------------ ------------------- ---------- ---------- ----------
TPS 16 1196 202108 1 0 0 202107 202108 202112 5
TPS 16 1196 202111 1 0 0 202110 202108 3 202108 202112 5
TPS 16 1196 202112 1 0 0 202111 202111 1 202108 202112 5
TPS 16 1259 202109 1 0 0 202108 202109 202109 1
TPS 16 1443 202106 1 0 0 202105 202106 202109 4
TPS 16 1443 202109 1 0 0 202108 202106 3 202106 202109 4
TPS 16 1478 202107 0 0 0 202106 202107 202107 1
TPS 16 1570 202108 1 0 0 202107 202108 202108 1
TPS 16 1637 202105 1 0 0 202104 202105 202109 5
TPS 16 1637 202106 1 0 0 202105 202105 1 202105 202109 5
TPS 16 1637 202107 1 0 0 202106 202106 1 202105 202109 5
TPS 16 1637 202109 1 0 0 202108 202107 2 202105 202109 5
TPS 16 1675 202106 1 0 0 202105 202106 202106 1
*/
Now we have more than everything we need to get you your expected result. Using the CTE's resulting dataset like below hopefuly will answer your question.
Select
BU_CODE, BU_KEY, CONTACT_KEY,
TXN_MTH,
FRAGRANCE_FLAG, COSMETICS_FLAG, PERSONALCARE_FLAG,
MTH_PREV_TXN,
CASE
WHEN MTH_PREV_TXN Is Null THEN 'NEW'
WHEN Nvl(MTHS_SINCE_PREV_TXN, 0) > 12 THEN 'REACTIVATED'
ELSE 'RETURNING'
END "CUST_TYPE"
From
c_months
With your sample data (13 rows from the question) it would reasult as:
BU_CODE
BU_KEY
CONTACT_KEY
TXN_MTH
FRAGRANCE_FLAG
COSMETICS_FLAG
PERSONALCARE_FLAG
MTH_PREV_TXN
CUST_TYPE
TPS
16
1196
202108
1
0
0
NEW
TPS
16
1196
202111
1
0
0
202108
RETURNING
TPS
16
1196
202112
1
0
0
202111
RETURNING
TPS
16
1259
202109
1
0
0
NEW
TPS
16
1443
202106
1
0
0
NEW
TPS
16
1443
202109
1
0
0
202106
RETURNING
TPS
16
1478
202107
0
0
0
NEW
TPS
16
1570
202108
1
0
0
NEW
TPS
16
1637
202105
1
0
0
NEW
TPS
16
1637
202106
1
0
0
202105
RETURNING
TPS
16
1637
202107
1
0
0
202106
RETURNING
TPS
16
1637
202109
1
0
0
202107
RETURNING
TPS
16
1675
202106
1
0
0
NEW
I have a column called kyc_updated_at (type date time, ex: 2022-06-28 18:45:00.000 +0000). I would like to split this column into 15 minutes interval in my final output. Something like:
KYC_Updated_at
Avergae_decision_time
cumilative_average_decison_time
8-815
0.15
0.15
815-830
0.38
0.265
830-845
0.45
0.3266666667
845-9
0.63
0.4025
9-915
0.7
0.462
So truncation is what you are wanting:
Thus with this multi-column sql to show the building up of the answer:
select column1 as original
,hour(column1) || minute(column1) as pre_parts
,hour(column1) || trunc(minute(column1)/15)*15 as post_parts
from values
('2022-06-28 18:46:00'::timestamp),
('2022-06-28 18:45:00'::timestamp),
('2022-06-28 18:44:00'::timestamp)
ORIGINAL
PRE_PARTS
POST_PARTS
2022-06-28 18:46:00.000
1846
1845
2022-06-28 18:45:00.000
1845
1845
2022-06-28 18:44:00.000
1844
1830
Truncating full date:
select column1 as original
,hour(column1) || trunc(minute(column1)/15)*15 as pre_parts
,to_timestamp(trunc(date_part(epoch_second, column1)/(15*60))*(15*60)) as full_trunc
from values
('2022-06-28 18:46:00'::timestamp),
('2022-06-28 18:45:00'::timestamp),
('2022-06-28 18:44:00'::timestamp)
ORIGINAL
PRE_PARTS
FULL_TRUNC
2022-06-28 18:46:00.000
1845
2022-06-28 18:45:00.000
2022-06-28 18:45:00.000
1845
2022-06-28 18:45:00.000
2022-06-28 18:44:00.000
1830
2022-06-28 18:30:00.000
I have a SAS table like below
ID
Grp
Month
A
202201
A
202203
1234
A
202204
B
202201
B
202203
AB1234
B
202204
C
202201
C
202203
3333
C
202204
3333
C
202205
4444
C
202206
T
202204
T
202205
T
202206
D
202201
D
202203
A555
D
202204
D
202205
6666
D
202206
I required the output SAS dataset as :
ID
Grp
Month
1234
A
202201
1234
A
202203
1234
A
202204
AB1234
B
202201
AB1234
B
202203
AB1234
B
202204
3333
C
202201
3333
C
202203
3333
C
202204
3333
C
202205
4444
C
202206
T
202204
T
202205
T
202206
A555
D
202201
A555
D
202203
A555
D
202204
6666
D
202205
6666
D
202206
Can someone please help??
Thanks in advance
If the data was sorted by descending MONTH values it would be lot easier. It is much easier to remember a value you have already seen than to predict what value you might see in the future.
First let's convert your listing into an actual dataset we can use to work with.
data have ;
input ID $ Grp $ Month ;
cards;
. A 202201
. A 202203
1234 A 202204
. B 202201
. B 202203
AB1234 B 202204
. C 202201
. C 202203
3333 C 202204
3333 C 202205
4444 C 202206
. T 202204
. T 202205
. T 202206
. D 202201
. D 202203
A555 D 202204
. D 202205
6666 D 202206
;
Now sort it by GRP and descending MONTH and you can use the UPDATE statement to do a last observation carried forward.
proc sort data=have;
by grp descending month;
run;
data want;
update have(obs=0) have;
by grp;
output;
run;
If you want you can resort to have ascending month values.
proc sort data=want;
by grp month;
run;
Results:
Obs ID Grp Month
1 1234 A 202201
2 1234 A 202203
3 1234 A 202204
4 AB1234 B 202201
5 AB1234 B 202203
6 AB1234 B 202204
7 3333 C 202201
8 3333 C 202203
9 3333 C 202204
10 3333 C 202205
11 4444 C 202206
12 A555 D 202201
13 A555 D 202203
14 A555 D 202204
15 6666 D 202205
16 6666 D 202206
17 T 202204
18 T 202205
19 T 202206
If you really have to deal with the data in the order shown then you could use a double DOW loop. The first loop to find the next non missing ID value. And the second to re-read the observations and update the ID value and write them out.
data want;
if 0 then set have;
do _n_=1 by 1 until(last.grp or not missing(id));
set have ;
by grp notsorted;
end;
_id = id;
do _n_=1 to _n_;
set have;
id = _id;
output;
end;
drop _id;
run;
I need to build a query to calculate the sum/total of a product_id, Product_Name fields based on Released_Date, revision_date, IS_UPDATED and display the output like the following. And I would appreciate if it shows the starting date and ending date instead of Week-1, Week-2.
Table structure and insert script as below.
create table products (
Released_Date varchar(40)
, product_id varchar(40)
, Product_Name varchar(40)
, revision_date varchar(40)
, IS_UPDATED varchar(2)
)
+-----------------+------------+-------------+--------------------------+---------------
insert into products values('2018-04-25 00:00:0','Pega','Pega5.0','2018-04-27 00:00:00','N');
insert into products values('2018-05-11 00:00:00','Oracle','orace11g','2018-05-13 00:00:00','Y');
insert into products values('2018-04-04 00:00:00','Oracle',' OracleBPM','2018-04-06 00:00:00','Y');
insert into products values('2018-06-05 00:00:00','Ibm','Cognos','2018-06-08 00:00:00','Y');
insert into products values('2018-05-03 00:00:00','Microsoft','C++','2018-05-05 00:00:00','Y');
insert into products values('2018-05-21 00:00:00','Microsoft',' C#','2018-05-25 00:00:00','Y');
insert into products values('2018-04-10 00:00:00','Salesforce','CPQ','2018-04-13 00:00:00','Y');
insert into products values('2018-03-12 00:00:00','Java',' Struts','2018-03-15 00:00:00','Y');
insert into products values('2018-04-12 00:00:00','Salesforce','Analytics','2018-04-13 00:00:00','Y');
insert into products values('2018-05-09 00:00:00','Microsoft','Asp','2018-05-11 00:00:00','Y');
insert into products values('2018-05-28 00:00:00','Salesforce','Marketing','2018-05-31 00:00:00','N');
insert into products values('2018-04-11 00:00:00','ETL',' Informatica','2018-04-12 00:00:00',' Y');
insert into products values('2018-03-26 00:00:00','Oracle',' orace11g','2018-03-30 00:00:00','Y');
insert into products values('2018-04-19 00:00:00','Oracle',' obiee','2018-04-20 00:00:00','Y');
insert into products values('2018-04-16 00:00:00','Ibm','Datastage','2018-04-17 00:00:00','N');
insert into products values('2018-06-18 00:00:00','Microsoft','C#','2018-06-21 00:00:00','Y');
insert into products values('2018-06-19 00:00:00','ETL',' Informatica','2018-06-24 00:00:00','Y');
insert into products values('2018-06-22 00:00:00','Microsoft','WCF','2018-06-23 00:00:00','Y');
insert into products values('2018-04-19 00:00:00','Hadoop',' Hive','2018-04-20 00:00:00',' Y');
insert into products values('2018-04-16 00:00:00','Testing','Database','2018-04-20 00:00:00','N');
insert into products values('2018-04-24 00:00:00','Ibm','Cognos',' 2018-04-27 00:00:00','Y');
insert into products values('2018-06-07 00:00:00','Microsoft','C#','2018-06-08 00:00:00','Y');
insert into products values('2018-04-02 00:00:00','Java','Struts','2018-04-05 00:00:00','Y');
insert into products values('2018-05-01 00:00:00','Microsoft','C++','2018-05-04 00:00:00','Y');
insert into products values('2018-04-10 00:00:00','ETL',' Datastage','2018-04-14 00:00:00','N');
insert into products values('2018-04-23 00:00:00','Ibm','AI','2018-04-25 00:00:00','Y');
insert into products values('2018-04-03 00:00:00','JAVA','Struts','2018-04-04 00:00:00','N');
insert into products values('2018-04-23 00:00:00','Pega','Pega5.4','2018-04-25 00:00:00','N');
insert into products values('2018-05-28 00:00:00','Java',' Jasperreports','2018-05-30 00:00:00','Y');
insert into products values('2018-05-28 00:00:00','IBM','Watson','2018-05-29 00:00:00','Y');
insert into products values('2018-05-30 00:00:00','Salesforce','Paradot','2018-05-31 00:00:00','Y');
insert into products values('2018-05-10 00:00:00','Oracle',' orace12c','2018-05-11 00:00:00','Y');
insert into products values('2018-06-11 00:00:00','Ibm','Cognos',' 2018-06-13 00:00:00','Y');
insert into products values('2018-06-13 00:00:00','Ibm','Datastage','2018-06-17 00:00:00','Y');
+-----------------+------------+-------------+--------------------------+---------------
Created_Date product_id Product_Name Released_Date IS_UPDATED
+-----------------+------------+---------------+--------------------------+---------------
2018-04-25 00:00:00 Pega Pega5.0 2018-04-27 00:00:00 N
2018-05-11 00:00:00 Oracle orace11g 2018-05-13 00:00:00 Y
2018-04-04 00:00:00 Oracle OracleBPM 2018-04-06 00:00:00 Y
2018-06-05 00:00:00 Ibm Cognos 2018-06-08 00:00:00 Y
2018-05-03 00:00:00 Microsoft C++ 2018-05-05 00:00:00 Y
2018-05-21 00:00:00 Microsoft C# 2018-05-25 00:00:00 Y
2018-04-10 00:00:00 Salesforce CPQ 2018-04-13 00:00:00 Y
2018-03-12 00:00:00 Java Struts 2018-03-15 00:00:00 Y
2018-04-12 00:00:00 Salesforce Analytics 2018-04-13 00:00:00 Y
2018-05-09 00:00:00 Microsoft Asp 2018-05-11 00:00:00 Y
2018-05-28 00:00:00 Salesforce Marketing 2018-05-31 00:00:00 N
2018-04-11 00:00:00 ETL Informatica 2018-04-12 00:00:00 Y
2018-03-26 00:00:00 Oracle orace11g 2018-03-30 00:00:00 Y
2018-04-19 00:00:00 Oracle obiee 2018-04-20 00:00:00 Y
2018-04-16 00:00:00 Ibm Datastage 2018-04-17 00:00:00 N
2018-06-18 00:00:00 Microsoft C# 2018-06-21 00:00:00 Y
2018-06-19 00:00:00 ETL Informatica 2018-06-24 00:00:00 Y
2018-06-22 00:00:00 Microsoft WCF 2018-06-23 00:00:00 Y
2018-04-19 00:00:00 Hadoop Hive 2018-04-20 00:00:00 Y
2018-04-16 00:00:00 Testing Database 2018-04-20 00:00:00 N
2018-04-24 00:00:00 Ibm Cognos 2018-04-27 00:00:00 Y
2018-06-07 00:00:00 Microsoft C# 2018-06-08 00:00:00 Y
2018-04-02 00:00:00 Java Struts 2018-04-05 00:00:00 Y
2018-05-01 00:00:00 Microsoft C++ 2018-05-04 00:00:00 Y
2018-04-10 00:00:00 ETL Datastage 2018-04-14 00:00:00 N
2018-04-23 00:00:00 Ibm AI 2018-04-25 00:00:00 Y
2018-04-03 00:00:00 JAVA Struts 2018-04-04 00:00:00 N
2018-04-23 00:00:00 Pega Pega5.4 2018-04-25 00:00:00 N
2018-05-28 00:00:00 Java Jasperreports 2018-05-30 00:00:00 Y
2018-05-28 00:00:00 IBM Watson 2018-05-29 00:00:00 Y
2018-05-30 00:00:00 Salesforce Paradot 2018-05-31 00:00:00 Y
2018-05-10 00:00:00 Oracle orace12c 2018-05-11 00:00:00 Y
2018-06-11 00:00:00 Ibm Cognos 2018-06-13 00:00:00 Y
2018-06-13 00:00:00 Ibm Datastage 2018-06-17 00:00:00 Y
Required output:-Based on below conditions.
for Total_productIds, the Created_Date should be greater than 2018-04-01 00:00:00 and Created_Date should be less than 2018-06-30 00:00:00.
i.e Created_Date>2018-04-01 00:00:00 and Created_Date<2018-06-30 00:00:00.
for Total_ProductNames, the Created_Date should be greater than 2018-04-01 00:00:00 and Released_Date should be revision_date<2018-06-30 00:00:00
Created_Date>2018-04-01 00:00:00 and Released_Date< 2018-06-30 00:00:00
for Total_IS_Updated, the Created_Date should be greater than 2018-04-01 00:00:00 and Created_Date should be less than 2018-06-30 00:00:00.
i.e Created_Date>2018-04-01 00:00:00 and Created_Date<2018-06-30 00:00:00. and
IS_UPDATED='Y'
WEEK NO. Total_productIds Total_ProductNames Total_IS_Updated(if 'Y')
Firstweek(2018-04-01) 0 0 0
Secondweek(2018-04-02 to 2018-04-08) 3 2 2
Thirdweek(2018-04-09 to 2018-04-15) 3 5 4
Fourthweek(2018-04-16 to 2018-04-22) 4 4 2
Fifthweek(2018-04-23 to 2018-04-29) 3 4 2
Firstweek(2018-05-01 to 2018-05-06) 1 2 2
Secondweek(2018-05-07 to 2018-05-13) 2 3 3
Thirdweek(2018-05-14 to 2018-05-20) 0 0 0
Fourthweek(2018-05-21 to 2018-05-27) 1 1 0
Fifthweek(2018-05-28 to 2018-05-31) 3 4 3
Firstweek(2018-06-01 to 2018-06-03) 0 0 0
Secondweek(2018-06-04 to 2018-06-10) 2 2 2
Thirdweek(2018-06-11 to 2018-06-17) 1 2 2
Fourthweek(2018-06-18 to 2018-06-24) 2 3 3
Fifthweek(2018-06-25 to 2018-06-30) 0 0 0
As you have mentioned interval conditions so I have hardcoded that. This query will fetch data and group it weekly.
I have replaced column format of week_no from Firstweek(2018-04-01) to week 1 of 04/2018 to make it fast.
SELECT week + DATEPART('wk', Created_Date) - DATEPART('wk', DATEADD(day, 1,
EOMONTH(Created_Date, - 1))) + 'of ' + DATEPART('mm', Created_Date) + '/' +
DATEPART('mm', Created_Date) as WEEK_NO, --- will result week 1 of 04/2018
sum(CASE
WHEN Created_Date > '2018-04-01 00:00:00'
AND Created_Date < '2018-06-30 00:00:00'
THEN 1
ELSE 0
END) AS Total_productIds, sum(CASE
WHEN Created_Date > '2018-04-01 00:00:00'
AND Released_Date < '2018-06-30 00:00:00'
THEN 1
ELSE 0
END) AS Total_ProductNames, sum(CASE
WHEN Created_Date > '2018-04-01 00:00:00'
AND Created_Date < '2018-06-30 00:00:00'
AND IS_UPDATED = 'Y'
THEN 1
ELSE 0
END) AS Total_ProductNames
FROM products
GROUP BY DATEPART('wk', Created_Date)
I'd like to solve this issue:
I have a table or data like this:
id date
1 2014-07-01 00:00:00.000
2 2014-07-01 00:00:00.000
3 2014-07-01 00:00:00.000
4 2014-07-03 00:00:00.000
5 2014-07-03 00:00:00.000
6 2014-07-03 00:00:00.000
7 2014-07-03 00:00:00.000
8 2014-07-05 00:00:00.000
9 2014-07-05 00:00:00.000
10 2014-07-05 00:00:00.000
11 2014-07-05 00:00:00.000
12 2014-07-05 00:00:00.000
13 2014-07-05 00:00:00.000
I'd like to make group by date, and then divide those groups by percentage. Probably the best idea would be to add additional column so that would looks like this:
id date group
1 2014-07-01 00:00:00.000 1
2 2014-07-01 00:00:00.000 1
3 2014-07-01 00:00:00.000 2
4 2014-07-03 00:00:00.000 1
5 2014-07-03 00:00:00.000 1
6 2014-07-03 00:00:00.000 2
7 2014-07-03 00:00:00.000 2
8 2014-07-05 00:00:00.000 1
9 2014-07-05 00:00:00.000 1
10 2014-07-05 00:00:00.000 1
11 2014-07-05 00:00:00.000 2
12 2014-07-05 00:00:00.000 2
13 2014-07-05 00:00:00.000 2
Could anyone would help me with that, thank you.
Assuming you want to half of the rows for each date in a separate group:
DECLARE #SampleData TABLE(id int, date date)
INSERT INTO #SampleData VALUES
(1 , '2014-07-01 00:00:00.000')
,(2 , '2014-07-01 00:00:00.000')
,(3 , '2014-07-01 00:00:00.000')
,(4 , '2014-07-03 00:00:00.000')
,(5 , '2014-07-03 00:00:00.000')
,(6 , '2014-07-03 00:00:00.000')
,(7 , '2014-07-03 00:00:00.000')
,(8 , '2014-07-05 00:00:00.000')
,(9 , '2014-07-05 00:00:00.000')
,(10 , '2014-07-05 00:00:00.000')
,(11 , '2014-07-05 00:00:00.000')
,(12 , '2014-07-05 00:00:00.000')
,(13 , '2014-07-05 00:00:00.000');
SELECT
id
,date
,(ROW_NUMBER() OVER (PARTITION BY date ORDER BY date) % 2) + 1 AS [DateGroup]
FROM #SampleData;