Reverse track forced records relationships based on user-defined tagging - sql

I have this table where the tagging [Tag_To] is updated by an algorithm based on Year and Period of coverage. My current task (in question) is to update the Status given the Year.
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1
11 2019 B 2019-01-01 2019-06-30 2 1
12 2019 B 2019-07-01 2019-12-31 3 1
13 2019 C 2019-01-01 2019-06-30 4 2
14 2020 A 2020-01-01 2020-12-31 1
15 2020 B 2020-01-01 2020-06-30 2 1
16 2020 B 2020-07-01 2020-12-31 3 1
17 2020 C 2020-01-01 2020-12-31 4 2,3
18 2021 A 2021-01-01 2021-12-31 1
19 2021 B 2021-01-01 2021-12-31 2 1
20 2021 C 2021-07-01 2021-12-31 3 2
The SeqNo is applied per Year and the Tag_To is done based on period of coverage.
11 and 12 are tagged to 10 since B follows A and their period falls within 10 period coverage.
13 is tagged to 11 since C follows B and the period...
15 and 16 to 14
Also note that 17 is tagged to 15 and 16 (2,3) because 17's coverage spans across the 2 periods of 15 and 16 combined
and so on...
The objective is to update the Status by Year such that each path is considered Closed if the path already has Methods A, B and C (there are actually more methods, but to simplify). Status should be Open for paths that haven't completed the methods.
From the example above, there are 5 paths:
10(A)-->11(B)-->13(C) = Closed
10(A)-->12(B)-->??? = Open
14(A)-->15(B)-->17(C) = Closed
14(A)-->16(B)-->17(C) = Closed
18(A)-->19(B)-->20(C) = Closed
Therefore the status update should be:
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1 Open
11 2019 B 2019-01-01 2019-06-30 2 1 Closed
12 2019 B 2019-07-01 2019-12-31 3 1 Open
13 2019 C 2019-01-01 2019-06-30 4 2 Closed
14 2020 A 2020-01-01 2020-12-31 1 Closed
15 2020 B 2020-01-01 2020-06-30 2 1 Closed
16 2020 B 2020-07-01 2020-12-31 3 1 Closed
17 2020 C 2020-01-01 2020-12-31 4 2,3 Closed
18 2021 A 2021-01-01 2021-12-31 1 Closed
19 2021 B 2021-01-01 2021-12-31 2 1 Closed
20 2021 C 2021-07-01 2021-12-31 3 2 Closed
I hope I have explained everything clearly. Would really appreciate if anyone could help.

Just to update viewers that I have managed to solve this on my own although the solution is super non-dynamic and quite inefficient, it pretty much did the job for me. Here's what I did.
UPDATE Table SET
Status =
CASE WHEN Method = 'B'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'C'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
WHEN Method = 'A'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'B'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
ELSE 'Closed'
END
FROM Table AValue
WHERE Year = #Year
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY A.Method ORDER BY A.Sequence_No ASC) SN,
A.ID,
A.Method,
A.Sequence_No,
A.Tag_To,
A.Period_From,
A.Period_To,
A.Status
FROM Table A
LEFT JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE Year = #Year
) B ON A.Sequence_No = B.Tag_To
WHERE Year = #Year
),
CTE2 AS
(
SELECT DISTINCT SN FROM CTE
WHERE Status = 'Open'
)
UPDATE Table SET
Status = 'Open'
FROM Table
INNER JOIN CTE ON Table.ID = CTE.ID
INNER JOIN CTE2 ON CTE.SN = CTE2.SN
Yeah, it's ugly but, hey, it did the job! :)

Related

R - get a vector that tells me if a value of another vector is the first appearence or not

I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL

Need help joining incremental data to a fact table in an incremental manor

TableA
ID
Counter
Value
1
1
10
1
2
28
1
3
34
1
4
22
1
5
80
2
1
15
2
2
50
2
3
39
2
4
33
2
5
99
TableB
StartDate
EndDate
2020-01-01
2020-01-11
2020-01-02
2020-01-12
2020-01-03
2020-01-13
2020-01-04
2020-01-14
2020-01-05
2020-01-15
2020-01-06
2020-01-16
TableC (output)
ID
Counter
StartDate
EndDate
Val
1
1
2020-01-01
2020-01-11
10
2
1
2020-01-01
2020-01-11
15
1
2
2020-01-02
2020-01-12
28
2
2
2020-01-02
2020-01-12
50
1
3
2020-01-03
2020-01-13
34
2
3
2020-01-03
2020-01-13
39
1
4
2020-01-04
2020-01-14
22
2
4
2020-01-04
2020-01-14
33
1
5
2020-01-05
2020-01-15
80
2
5
2020-01-05
2020-01-15
99
1
1
2020-01-06
2020-01-16
10
2
1
2020-01-06
2020-01-16
15
I am attempting to come up with some SQL to create TableC. What TableC is, it takes the data from TableB, in chronological order, and for each ID in tableA, it finds the next counter in the sequence, and assigns that to the Start/End date combination for that ID, and when it reaches the end of the counter, it will start back at 1.
Is something like this even possible with SQL?
Yes this is possible. Try to do the following:
Calculate maximal value for Counter in TableA using SELECT MAX(Counter) ... into max_counter.
Add identifier row_number to each row in TableB so it will be able to find matching Counter value using SELECT ROW_NUMBER() OVER() ....
Establish relation between row number in TableB and Counter in TableA like this ... FROM TableB JOIN TableA ON (COALESCE(NULLIF(TableB.row_number % max_counter = 0), max_counter)) = TableA.Counter.
Then gather all these queries using CTE (Common Table Expression) into one query as official documentation shows.
Consider below approach
select id, counter, StartDate, EndDate, value
from tableA
join (
select *, mod(row_number() over(order by StartDate) - 1, 5) + 1 as counter
from tableB
)
using (counter)
if applied to sample data in your question - output is

How to query data and its count in multiple range at same time

I have a table like below,
id
number
date
1
23
2020-01-01
2
12
2020-03-02
3
23
2020-09-02
4
11
2019-03-04
5
12
2019-03-23
6
23
2019-04-12
I want to know is that how many times each number appears per year, such as,
number
2019
2020
23
1
2
12
1
1
11
1
0
I'm kinda stuck.. tried with left join or just a single select, but still, cannot figure out how to make it, please help thank you!
SELECT C.NUMBER,
SUM
(
CASE
WHEN C.DATE BETWEEN '20190101'AND '20191231'
THEN 1 ELSE NULL
END
) AS A_2019,
SUM
(
CASE
WHEN C.DATE BETWEEN '20200101'AND '20201231'
THEN 1 ELSE NULL
END
) AS A_2020
FROM I_have_a_table_like_below AS C
GROUP BY C.NUMBER

Return product if there is no match in other table [duplicate]

This question already has answers here:
Select rows which are not present in other table
(4 answers)
Closed 2 years ago.
I have two tables:
Product_Table
ProductID Name Date
1 ABC 2020-02-14
2 XYZ 2020-03-05
Productbreak_Table
BreakID Product_id Begin End
34 1 2020-01-01 2020-01-30
35 1 2020-02-01 2020-02-20
36 2 2020-01-15 2020-01-31
37 2 2020-02-15 2020-03-01
My goal is to get just the products whose Date are not between the Begin and End dates of the productbreak_table
Result should be:
ProductID Name
2 XYZ
You would use not exists:
select p.*
from products p
where not exists (select 1
from productbreak pb
where pb.productid = p.productid and
p.date between pb.begin and pb.end
);

I need to show the monthly inventory data

I have a table some thing like as follows for Inventory details.
InventoryTable.
InventoryTableID DateCreated quantity ItemName
-------------------------------------------------
1 2010-02-04 12 abc
2 2010-03-10 4 abc
3 2010-03-13 5 xyz
4 2010-03-13 19 def
5 2010-03-17 15 abc
6 2010-03-29 15 abc
7 2010-04-01 22 xyz
8 2010-04-13 5 abc
9 2010-04-15 6 def
from the above table if my admin wants to know the inventory details for month April 2010 (i.e. Apr 1st 2010 - Apr 30th 2010)
I need the output as shown below.
inventory as on Apr 1st 2010
ItemName Datecreated qty
----------------------------
abc 2010-03-29 15
xyz 2010-04-01 22
def 2010-03-13 19
inventory as on Apr 30th 2010
ItemName Datecreated qty
---------------------------
abc 2010-04-13 5
xyz 2010-04-01 22
def 2010-04-15 6
For your first result set, run with #YourDataParam = '2010-04-01'. For the second set, use '2010-04-30'.
;with cteMaxDate as (
select it.ItemName, max(it.DateCreated) as MaxDate
from InventoryTable it
where it.DateCreated <= #YourDataParam
group by it.ItemName
)
select it.ItemName, it.DateCreated, it.qty
from cteMaxDate c
inner join InventoryTable it
on c.ItemName = it.ItemName
and c.MaxDate = it.DateCreated