Reverse track forced records relationships based on user-defined tagging - sql
I have this table where the tagging [Tag_To] is updated by an algorithm based on Year and Period of coverage. My current task (in question) is to update the Status given the Year.
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1
11 2019 B 2019-01-01 2019-06-30 2 1
12 2019 B 2019-07-01 2019-12-31 3 1
13 2019 C 2019-01-01 2019-06-30 4 2
14 2020 A 2020-01-01 2020-12-31 1
15 2020 B 2020-01-01 2020-06-30 2 1
16 2020 B 2020-07-01 2020-12-31 3 1
17 2020 C 2020-01-01 2020-12-31 4 2,3
18 2021 A 2021-01-01 2021-12-31 1
19 2021 B 2021-01-01 2021-12-31 2 1
20 2021 C 2021-07-01 2021-12-31 3 2
The SeqNo is applied per Year and the Tag_To is done based on period of coverage.
11 and 12 are tagged to 10 since B follows A and their period falls within 10 period coverage.
13 is tagged to 11 since C follows B and the period...
15 and 16 to 14
Also note that 17 is tagged to 15 and 16 (2,3) because 17's coverage spans across the 2 periods of 15 and 16 combined
and so on...
The objective is to update the Status by Year such that each path is considered Closed if the path already has Methods A, B and C (there are actually more methods, but to simplify). Status should be Open for paths that haven't completed the methods.
From the example above, there are 5 paths:
10(A)-->11(B)-->13(C) = Closed
10(A)-->12(B)-->??? = Open
14(A)-->15(B)-->17(C) = Closed
14(A)-->16(B)-->17(C) = Closed
18(A)-->19(B)-->20(C) = Closed
Therefore the status update should be:
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1 Open
11 2019 B 2019-01-01 2019-06-30 2 1 Closed
12 2019 B 2019-07-01 2019-12-31 3 1 Open
13 2019 C 2019-01-01 2019-06-30 4 2 Closed
14 2020 A 2020-01-01 2020-12-31 1 Closed
15 2020 B 2020-01-01 2020-06-30 2 1 Closed
16 2020 B 2020-07-01 2020-12-31 3 1 Closed
17 2020 C 2020-01-01 2020-12-31 4 2,3 Closed
18 2021 A 2021-01-01 2021-12-31 1 Closed
19 2021 B 2021-01-01 2021-12-31 2 1 Closed
20 2021 C 2021-07-01 2021-12-31 3 2 Closed
I hope I have explained everything clearly. Would really appreciate if anyone could help.
Just to update viewers that I have managed to solve this on my own although the solution is super non-dynamic and quite inefficient, it pretty much did the job for me. Here's what I did.
UPDATE Table SET
Status =
CASE WHEN Method = 'B'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'C'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
WHEN Method = 'A'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'B'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
ELSE 'Closed'
END
FROM Table AValue
WHERE Year = #Year
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY A.Method ORDER BY A.Sequence_No ASC) SN,
A.ID,
A.Method,
A.Sequence_No,
A.Tag_To,
A.Period_From,
A.Period_To,
A.Status
FROM Table A
LEFT JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE Year = #Year
) B ON A.Sequence_No = B.Tag_To
WHERE Year = #Year
),
CTE2 AS
(
SELECT DISTINCT SN FROM CTE
WHERE Status = 'Open'
)
UPDATE Table SET
Status = 'Open'
FROM Table
INNER JOIN CTE ON Table.ID = CTE.ID
INNER JOIN CTE2 ON CTE.SN = CTE2.SN
Yeah, it's ugly but, hey, it did the job! :)
Related
R - get a vector that tells me if a value of another vector is the first appearence or not
I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year. A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year. So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector. But I'm wondering if there's a specific function or a better way to do that. This is an example of the data I would like to have: date cust month new_customer 1 14975 25 1 TRUE 2 14976 30 1 TRUE 3 14977 22 1 TRUE 4 14978 4 1 TRUE 5 14979 25 1 FALSE 6 14980 11 1 TRUE 7 14981 17 1 TRUE 8 14982 17 1 FALSE 9 14983 18 1 TRUE 10 14984 7 1 TRUE 11 14985 24 1 TRUE 12 14986 22 1 FALSE So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year dat <-data.frame( id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2), month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2), year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021) ) id month year 1 1 1 2019 2 2 6 2019 3 3 7 2019 4 4 8 2019 5 5 2 2019 6 6 3 2020 7 7 4 2020 8 8 8 2020 9 1 11 2020 10 3 1 2020 11 4 10 2021 12 5 9 2021 13 1 1 2021 14 2 12 2021 15 2 2 2021 Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number(). dat %>% group_by(id) %>% arrange(year, month) %>% filter(row_number() == 1) id month year <dbl> <dbl> <dbl> 1 1 1 2019 2 5 2 2019 3 2 6 2019 4 3 7 2019 5 4 8 2019 6 6 3 2020 7 7 4 2020 8 8 8 2020
Sample Code You can change in your code according to this logic:- Create Table:- CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15)); Insert Data Into Table Posting_Date Customer_Id Customer_Name 2018-01-01 C_01 Jack 2018-02-01 C_01 Jack 2018-03-01 C_01 Jack 2018-04-01 C_02 James 2019-04-01 C_01 Jack 2019-05-01 C_01 Jack 2019-05-01 C_03 Gill 2020-01-01 C_02 James 2020-01-01 C_04 Jones Code WITH Date_CTE (PostingDate,CustomerID,FirstYear) AS ( SELECT MIN(Posting_Date) as [Date], Customer_Id, YEAR(MIN(Posting_Date)) as [F_Purchase_Year] FROM PURCHASE GROUP BY Customer_Id ) SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer] FROM ( SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear], T2.Customer_Id, (CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status] FROM Date_CTE AS T1 left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id ) AS T GROUP BY T.[ActualYear],T.[Customer Status] Final Result ActualYear New Customer 2018 2 2019 1 2020 1 2019 NULL 2020 NULL
Need help joining incremental data to a fact table in an incremental manor
TableA ID Counter Value 1 1 10 1 2 28 1 3 34 1 4 22 1 5 80 2 1 15 2 2 50 2 3 39 2 4 33 2 5 99 TableB StartDate EndDate 2020-01-01 2020-01-11 2020-01-02 2020-01-12 2020-01-03 2020-01-13 2020-01-04 2020-01-14 2020-01-05 2020-01-15 2020-01-06 2020-01-16 TableC (output) ID Counter StartDate EndDate Val 1 1 2020-01-01 2020-01-11 10 2 1 2020-01-01 2020-01-11 15 1 2 2020-01-02 2020-01-12 28 2 2 2020-01-02 2020-01-12 50 1 3 2020-01-03 2020-01-13 34 2 3 2020-01-03 2020-01-13 39 1 4 2020-01-04 2020-01-14 22 2 4 2020-01-04 2020-01-14 33 1 5 2020-01-05 2020-01-15 80 2 5 2020-01-05 2020-01-15 99 1 1 2020-01-06 2020-01-16 10 2 1 2020-01-06 2020-01-16 15 I am attempting to come up with some SQL to create TableC. What TableC is, it takes the data from TableB, in chronological order, and for each ID in tableA, it finds the next counter in the sequence, and assigns that to the Start/End date combination for that ID, and when it reaches the end of the counter, it will start back at 1. Is something like this even possible with SQL?
Yes this is possible. Try to do the following: Calculate maximal value for Counter in TableA using SELECT MAX(Counter) ... into max_counter. Add identifier row_number to each row in TableB so it will be able to find matching Counter value using SELECT ROW_NUMBER() OVER() .... Establish relation between row number in TableB and Counter in TableA like this ... FROM TableB JOIN TableA ON (COALESCE(NULLIF(TableB.row_number % max_counter = 0), max_counter)) = TableA.Counter. Then gather all these queries using CTE (Common Table Expression) into one query as official documentation shows.
Consider below approach select id, counter, StartDate, EndDate, value from tableA join ( select *, mod(row_number() over(order by StartDate) - 1, 5) + 1 as counter from tableB ) using (counter) if applied to sample data in your question - output is
How to query data and its count in multiple range at same time
I have a table like below, id number date 1 23 2020-01-01 2 12 2020-03-02 3 23 2020-09-02 4 11 2019-03-04 5 12 2019-03-23 6 23 2019-04-12 I want to know is that how many times each number appears per year, such as, number 2019 2020 23 1 2 12 1 1 11 1 0 I'm kinda stuck.. tried with left join or just a single select, but still, cannot figure out how to make it, please help thank you!
SELECT C.NUMBER, SUM ( CASE WHEN C.DATE BETWEEN '20190101'AND '20191231' THEN 1 ELSE NULL END ) AS A_2019, SUM ( CASE WHEN C.DATE BETWEEN '20200101'AND '20201231' THEN 1 ELSE NULL END ) AS A_2020 FROM I_have_a_table_like_below AS C GROUP BY C.NUMBER
Return product if there is no match in other table [duplicate]
This question already has answers here: Select rows which are not present in other table (4 answers) Closed 2 years ago. I have two tables: Product_Table ProductID Name Date 1 ABC 2020-02-14 2 XYZ 2020-03-05 Productbreak_Table BreakID Product_id Begin End 34 1 2020-01-01 2020-01-30 35 1 2020-02-01 2020-02-20 36 2 2020-01-15 2020-01-31 37 2 2020-02-15 2020-03-01 My goal is to get just the products whose Date are not between the Begin and End dates of the productbreak_table Result should be: ProductID Name 2 XYZ
You would use not exists: select p.* from products p where not exists (select 1 from productbreak pb where pb.productid = p.productid and p.date between pb.begin and pb.end );
I need to show the monthly inventory data
I have a table some thing like as follows for Inventory details. InventoryTable. InventoryTableID DateCreated quantity ItemName ------------------------------------------------- 1 2010-02-04 12 abc 2 2010-03-10 4 abc 3 2010-03-13 5 xyz 4 2010-03-13 19 def 5 2010-03-17 15 abc 6 2010-03-29 15 abc 7 2010-04-01 22 xyz 8 2010-04-13 5 abc 9 2010-04-15 6 def from the above table if my admin wants to know the inventory details for month April 2010 (i.e. Apr 1st 2010 - Apr 30th 2010) I need the output as shown below. inventory as on Apr 1st 2010 ItemName Datecreated qty ---------------------------- abc 2010-03-29 15 xyz 2010-04-01 22 def 2010-03-13 19 inventory as on Apr 30th 2010 ItemName Datecreated qty --------------------------- abc 2010-04-13 5 xyz 2010-04-01 22 def 2010-04-15 6
For your first result set, run with #YourDataParam = '2010-04-01'. For the second set, use '2010-04-30'. ;with cteMaxDate as ( select it.ItemName, max(it.DateCreated) as MaxDate from InventoryTable it where it.DateCreated <= #YourDataParam group by it.ItemName ) select it.ItemName, it.DateCreated, it.qty from cteMaxDate c inner join InventoryTable it on c.ItemName = it.ItemName and c.MaxDate = it.DateCreated