Return product if there is no match in other table [duplicate] - sql

This question already has answers here:
Select rows which are not present in other table
(4 answers)
Closed 2 years ago.
I have two tables:
Product_Table
ProductID Name Date
1 ABC 2020-02-14
2 XYZ 2020-03-05
Productbreak_Table
BreakID Product_id Begin End
34 1 2020-01-01 2020-01-30
35 1 2020-02-01 2020-02-20
36 2 2020-01-15 2020-01-31
37 2 2020-02-15 2020-03-01
My goal is to get just the products whose Date are not between the Begin and End dates of the productbreak_table
Result should be:
ProductID Name
2 XYZ

You would use not exists:
select p.*
from products p
where not exists (select 1
from productbreak pb
where pb.productid = p.productid and
p.date between pb.begin and pb.end
);

Related

How to select max date from table for distinct values [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 11 months ago.
I have a table that looks like this:
date
account
asset
amount
01-01-2022
1
A
12
01-01-2022
1
B
100
02-01-2022
1
A
14
02-01-2022
1
B
98
01-01-2022
2
A
15
01-01-2022
2
C
230
02-01-2022
2
A
13
02-01-2022
2
B
223
03-01-2022
2
A
17
03-01-2022
2
B
237
I want to be able to get the last values (i.e. max date) for each account. So the result should look like this:
date
account
asset
amount
02-01-2022
1
A
14
02-01-2022
1
B
98
03-01-2022
2
A
17
03-01-2022
2
B
237
How can this be done in SQL?
EDIT: Notice that the max dates for the different accounts are not the same.
You can do it by first selecting the max dates for each account and then forcing the match between accounts given the date constraints, like in the following query:
SELECT
*
FROM
(
SELECT
MAX(date) AS date,
account
FROM
tab
GROUP BY
account
) max_date_per_account
INNER JOIN
tab
ON
tab.date = max_date_per_account.date
AND
tab.account = max_date_per_account.account

Need help joining incremental data to a fact table in an incremental manor

TableA
ID
Counter
Value
1
1
10
1
2
28
1
3
34
1
4
22
1
5
80
2
1
15
2
2
50
2
3
39
2
4
33
2
5
99
TableB
StartDate
EndDate
2020-01-01
2020-01-11
2020-01-02
2020-01-12
2020-01-03
2020-01-13
2020-01-04
2020-01-14
2020-01-05
2020-01-15
2020-01-06
2020-01-16
TableC (output)
ID
Counter
StartDate
EndDate
Val
1
1
2020-01-01
2020-01-11
10
2
1
2020-01-01
2020-01-11
15
1
2
2020-01-02
2020-01-12
28
2
2
2020-01-02
2020-01-12
50
1
3
2020-01-03
2020-01-13
34
2
3
2020-01-03
2020-01-13
39
1
4
2020-01-04
2020-01-14
22
2
4
2020-01-04
2020-01-14
33
1
5
2020-01-05
2020-01-15
80
2
5
2020-01-05
2020-01-15
99
1
1
2020-01-06
2020-01-16
10
2
1
2020-01-06
2020-01-16
15
I am attempting to come up with some SQL to create TableC. What TableC is, it takes the data from TableB, in chronological order, and for each ID in tableA, it finds the next counter in the sequence, and assigns that to the Start/End date combination for that ID, and when it reaches the end of the counter, it will start back at 1.
Is something like this even possible with SQL?
Yes this is possible. Try to do the following:
Calculate maximal value for Counter in TableA using SELECT MAX(Counter) ... into max_counter.
Add identifier row_number to each row in TableB so it will be able to find matching Counter value using SELECT ROW_NUMBER() OVER() ....
Establish relation between row number in TableB and Counter in TableA like this ... FROM TableB JOIN TableA ON (COALESCE(NULLIF(TableB.row_number % max_counter = 0), max_counter)) = TableA.Counter.
Then gather all these queries using CTE (Common Table Expression) into one query as official documentation shows.
Consider below approach
select id, counter, StartDate, EndDate, value
from tableA
join (
select *, mod(row_number() over(order by StartDate) - 1, 5) + 1 as counter
from tableB
)
using (counter)
if applied to sample data in your question - output is

Reverse track forced records relationships based on user-defined tagging

I have this table where the tagging [Tag_To] is updated by an algorithm based on Year and Period of coverage. My current task (in question) is to update the Status given the Year.
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1
11 2019 B 2019-01-01 2019-06-30 2 1
12 2019 B 2019-07-01 2019-12-31 3 1
13 2019 C 2019-01-01 2019-06-30 4 2
14 2020 A 2020-01-01 2020-12-31 1
15 2020 B 2020-01-01 2020-06-30 2 1
16 2020 B 2020-07-01 2020-12-31 3 1
17 2020 C 2020-01-01 2020-12-31 4 2,3
18 2021 A 2021-01-01 2021-12-31 1
19 2021 B 2021-01-01 2021-12-31 2 1
20 2021 C 2021-07-01 2021-12-31 3 2
The SeqNo is applied per Year and the Tag_To is done based on period of coverage.
11 and 12 are tagged to 10 since B follows A and their period falls within 10 period coverage.
13 is tagged to 11 since C follows B and the period...
15 and 16 to 14
Also note that 17 is tagged to 15 and 16 (2,3) because 17's coverage spans across the 2 periods of 15 and 16 combined
and so on...
The objective is to update the Status by Year such that each path is considered Closed if the path already has Methods A, B and C (there are actually more methods, but to simplify). Status should be Open for paths that haven't completed the methods.
From the example above, there are 5 paths:
10(A)-->11(B)-->13(C) = Closed
10(A)-->12(B)-->??? = Open
14(A)-->15(B)-->17(C) = Closed
14(A)-->16(B)-->17(C) = Closed
18(A)-->19(B)-->20(C) = Closed
Therefore the status update should be:
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1 Open
11 2019 B 2019-01-01 2019-06-30 2 1 Closed
12 2019 B 2019-07-01 2019-12-31 3 1 Open
13 2019 C 2019-01-01 2019-06-30 4 2 Closed
14 2020 A 2020-01-01 2020-12-31 1 Closed
15 2020 B 2020-01-01 2020-06-30 2 1 Closed
16 2020 B 2020-07-01 2020-12-31 3 1 Closed
17 2020 C 2020-01-01 2020-12-31 4 2,3 Closed
18 2021 A 2021-01-01 2021-12-31 1 Closed
19 2021 B 2021-01-01 2021-12-31 2 1 Closed
20 2021 C 2021-07-01 2021-12-31 3 2 Closed
I hope I have explained everything clearly. Would really appreciate if anyone could help.
Just to update viewers that I have managed to solve this on my own although the solution is super non-dynamic and quite inefficient, it pretty much did the job for me. Here's what I did.
UPDATE Table SET
Status =
CASE WHEN Method = 'B'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'C'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
WHEN Method = 'A'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'B'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
ELSE 'Closed'
END
FROM Table AValue
WHERE Year = #Year
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY A.Method ORDER BY A.Sequence_No ASC) SN,
A.ID,
A.Method,
A.Sequence_No,
A.Tag_To,
A.Period_From,
A.Period_To,
A.Status
FROM Table A
LEFT JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE Year = #Year
) B ON A.Sequence_No = B.Tag_To
WHERE Year = #Year
),
CTE2 AS
(
SELECT DISTINCT SN FROM CTE
WHERE Status = 'Open'
)
UPDATE Table SET
Status = 'Open'
FROM Table
INNER JOIN CTE ON Table.ID = CTE.ID
INNER JOIN CTE2 ON CTE.SN = CTE2.SN
Yeah, it's ugly but, hey, it did the job! :)

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;
You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

How to group set of records in log table based on status changes

I have a large activity table that contains all actions taken on a case. some of these actions change the status of the case. Some actions do not change the status of the case, and essentially should be the status of the previous non-null record.
sample:
caseID | datetime | action | status
1 1/1/2020 a OPEN
1 1/2/2020 B NULL
1 1/3/2020 G CLOSED
1 1/5/2020 T REOPEN
1 1/6/2020 H NULL
1 1/7/2020 H NULL
1 1/9/2020 G CLOSED
1 1/10/2020 J CLOSED
1 1/15/2020 P CLOSED
the output i am trying to achieve is to group and attach a "session" number to the set of dates that contain the date range from open OR reopen TO close. The idea here is that if the dateto is NULL, then that is the current status:
CaseID | status | datefrom | dateto | session
1 OPEN 1/1/2020 1/3/2020 1
1 CLOSED 1/3/2020 1/5/2020 1
1 REOPEN 1/5/2020 1/9/2020 2
1 CLOSED 1/9/2020 NULL 2
i am using SQL 2014 enterprise edition and have been wracking my brain on this for days...any help would be much appreciated. I have found some hints on stackoverflow, but nothing that fully satisfies the needed output.
EDIT: here is a better example of the data:
caseID | datetime | action | status
1 1/1/2020 a OPEN
1 1/2/2020 B REOPEN
1 1/3/2020 G CLOSED
1 1/5/2020 T REOPEN
1 1/6/2020 H NULL
1 1/7/2020 H NULL
1 1/9/2020 G CLOSED
1 1/10/2020 J CLOSED
1 1/15/2020 P CLOSED
1 1/16/2020 P WORKABLE
1 1/17/2020 P NULL
1 1/18/2020 P WORKABLE
1 1/19/2020 P WORKABLE
1 1/20/2020 P CLOSED
1 2/1/2020 o NULL
EXPECTED OUTPUT:
CaseID | status | datefrom | dateto | session
1 OPEN 1/1/2020 1/3/2020 1
1 CLOSED 1/3/2020 1/5/2020 1
1 REOPEN 1/5/2020 1/9/2020 2
1 CLOSED 1/9/2020 1/16/2020 2
1 WORKABLE 1/16/2020 1/20/2020 3
1 CLOSED 1/20/2020 NULL 3
This answers the original version of the question.
I'm not sure if this meets all your requirements, but it produces the results you specify:
Factor out the NULL values.
Assign session based on the number of "open"s or "reopens" up to a row
Aggregate:
So:
select caseid, session, status, min(datetime),
lead(min(datetime)) over (partition by caseid order by min(datetime))
from (select t.*,
sum(case when status in ('OPEN', 'REOPEN') then 1 else 0 end) over (partition by caseid order by datetime) as session
from t
where status is not null
) t
group by caseid, session, status
order by caseid, min(datetime);
Here is a db<>fiddle illustrating that this interpretation works for the data you have provided.