Selecting records from slowly changing table with a set of dates

Selecting records from slowly changing table with a set of dates - sql

I have a slowly changing table,a new row is created each time any of the source fields are changed. Some metadata is added to show when that version was valid. This is a simplified example(dates are dd/mm/yyyy format) that doesn't show the fields which have changed.
Startdate
Enddate
Currentrecord
unique id
serial_number
15/12/2020
31/12/2020
0
1
2345
15/12/2020
8/3/2021
0
2
1234
19/9/2020
15/2/2021
0
3
2345
15/12/2020
8/3/2021
0
4
3456
9/3/2021
10/3/2021
0
5
3456
16/2/2021
10/3/2021
0
6
2345
9/3/2021
26/3/2021
0
7
1234
27/3/2021
2/5/2021
0
8
1234
11/3/2021
17/5/2021
0
9
3456
3/3/2021
27/4/2021
0
10
4567
20/1/2021
7/4/2021
0
11
5678
3/5/2021
30/6/2021
1
12
1234
25/5/2021
31/5/2021
0
13
2345
8/4/2021
22/5/2021
0
14
5678
1/6/2021
26/6/2021
0
15
2345
18/5/2021
3/6/2021
0
16
3456
27/6/2021
2/8/2021
0
17
2345
28/4/2021
28/6/2021
0
18
4567
23/5/2021
6/9/2021
0
19
5678
4/6/2021
28/6/2021
0
20
3456
29/6/2021
25/7/2021
0
21
3456
3/8/2021
31/12/9999
1
22
2345
26/7/2021
31/12/9999
1
23
3456
15/10/2021
31/12/9999
1
24
4567
7/9/2021
1/11/2021
0
25
5678
22/9/2021
10/11/2021
0
26
6789
2/11/2021
16/11/2021
0
27
5678
17/11/2021
21/11/2021
0
28
5678
15/7/2021
31/12/9999
1
29
7891
22/11/2021
31/12/9999
1
30
5678
26/11/2021
31/12/9999
1
31
6789
15/6/2021
31/12/9999
1
32
8912
There is only one record for each serial_number for any given point in time (i.e. the dates ranges will not overlap for identical serial_numbers) but there might be gaps between episodes for a some serial_numbers (representing something leaving and returning after a gap in service).
I want to supply an arbitrary list of datetimes, say midnight on 01/01/2021, 15/03/2021, 27/05/2021. 23/10/2021. I want to return a set of records, containing every record which was in effect on each of the dates, with each row labelled with the date it was selected by. So the above example should return this.
date
unique id
serial_number
1/1/2021
2
1234
1/1/2021
3
2345
1/1/2021
4
3456
15/3/2021
7
1234
15/3/2021
9
3456
15/3/2021
10
4567
15/3/2021
11
5678
27/5/2021
12
1234
27/5/2021
13
2345
27/5/2021
16
3456
27/5/2021
18
4567
27/5/2021
19
5678
23/10/2021
22
2345
23/10/2021
23
3456
23/10/2021
24
4567
23/10/2021
25
5678
23/10/2021
26
6789
23/10/2021
29
7891
23/10/2021
32
8912
I can see how to do this with a cursor, stepping through each date putting them into a variable and using something like
select #date, [unique id], serial_number
from example
where #date between start_date and end_date
to get the rows.
I can’t work out a pattern that would do it in a set based approach. My preferred SQL version is TSQL. Sorry as this is almost certainly a repeat, but I can't find a form of words that hits a worked example.

You can use a temporary table to accomplish this.
CREATE TABLE #RequestedDates([Date] DATE)
You insert your dates you want into a temporary table.
INSERT INTO #RequestedDates([Date])
VALUES ('2021-01-01'), ('2021-03-15') /*Other dates*/
And then you join with the temporary table and use the between clause to get the valid results.
SELECT rd.[Date]
, t.UniqueId
, t.SerialNumber
FROM MyTable t
INNER JOIN #RequestedDates rd on rd.[Date] BETWEEN t.StartDate AND t.EndDate
ORDER BY rd.[Date]
, t.UniqueId
, t.SerialNumber

You can join to VALUES with the dates you need.
Then join the datetimes on the range.
SELECT
datetimes.dt as [date]
, t.[unique id]
, t.serial_number
FROM example t
JOIN (VALUES
(cast('2021-01-01 00:00:00' as datetime)),
('2021-03-15 00:00:00'),
('2021-05-27 00:00:00'),
('2021-10-23 00:00:00')
) datetimes(dt)
ON datetimes.dt >= t.start_date
AND datetimes.dt <= t.end_date
ORDER BY datetimes.dt, t.[unique id], t.serial_number

Related

Access SQL Query to Count Unique Occurrences of One Field Matching Multiple Parameters/Rows, Some Identical

Struggling with ms-access's flavor of SQL queries still, though I've made some progress (thanks to y'all). I have an event log table like this:
Logs Table
logID (auto#)
modID (str)
relID (str)
DateTime (date)
TxType (short)
1
1234
22.3
10/1/22 0800
6
2
1234
22.3
10/1/22 0900
7
3
1234
22.3
10/1/22 1000
13
4
1234
22.3
10/1/22 1100
15
5
4321
22.3
10/1/22 0830
1
6
4321
22.3
10/1/22 0930
13
7
4321
22.3
10/1/22 1030
15
8
4321
22.3
10/1/22 1130
13
9
1234
23.1
11/1/22 0800
1
10
1234
23.1
11/1/22 0900
15
11
1234
23.1
11/1/22 1000
13
12
1234
23.1
11/1/22 1100
15
13
4321
23.1
11/1/22 0830
13
14
4321
23.1
11/1/22 0930
7
15
4321
23.1
11/1/22 1030
13
16
4321
23.1
11/1/22 1130
15
What I need to do is:
filter the table by relID, then
count the number of modID's that have a 15 txType as the last/most recent chronological event in their rows.
So ideally I'd filter e.g. by relID=23.1 and get these results (but not logID # 10 for example) and then count them:
logID (auto#)
modID (str)
relID (str)
DateTime (date)
TxType (short)
12
1234
23.1
11/1/22 1100
15
16
4321
23.1
11/1/22 1130
15
As part of another function I have been able to count any modID's having a single txType successfully using
SELECT COUNT(*)
FROM (
SELECT DISTINCT Logs.modID, Logs.relID
FROM Logs
WHERE ((Logs.relID='23.1') AND ((Logs.TxType=13)))
);
Another stackoverflow user (exception - thanks!) showed me how to get the last event type for a given modID, relID combination using
SELECT TOP 1 TxType
FROM Logs
WHERE (((Logs.modID=[EnterModID])) AND ((Logs.relID=[EnterRelID])))
ORDER BY DateTime DESC;
But I'm having trouble combining these two. I know I can combine COUNT and GROUP BY but Access treats GROUP BY very particularly, and I'm not sure how to use SELECT TOP to get the latest events for each modID rather than just the latest events in the table, period.

This should give you the logID from the row with the latest DateTime for each combination of modIDand your target relID:
PARAMETERS which_relID Text(255);
SELECT DISTINCT
(
SELECT TOP 1 logID
FROM Logs
WHERE modID=l.modID AND relID=l.relID
ORDER BY [DateTime] DESC
) AS latest_modID
FROM Logs AS l
WHERE l.relID=[which_relID]
Use it as a subquery which you INNER JOIN to your Logs table. Note the subquery evaluates rows regardless of TxType. So have the parent query select only rows whose TxType = 15
PARAMETERS which_relID Text(255);
SELECT l2.*
FROM
Logs AS l2
INNER JOIN
(
SELECT DISTINCT
(
SELECT TOP 1 logID
FROM Logs
WHERE modID=l.modID AND relID=l.relID
ORDER BY [DateTime] DESC
) AS latest_modID
FROM Logs AS l
WHERE l.relID=[which_relID]
) AS sub
ON l2.logID=sub.latest_modID
WHERE l2.TxType=15;
Note I moved the PARAMETERS clause into the parent query. But you can eliminate it altogether if you believe it's causing trouble.
DateTime is a reserved word. I enclosed it in square brackets to ensure Access understands we mean the name of an object.
Using your sample data, I get these 2 rows when I supply 23.1 for the query parameter:
logID
modID
relID
DateTime
TxType
12
1234
23.1
11/1/2022 11:00:00 AM
15
16
4321
23.1
11/1/2022 11:30:00 AM
15
I get a single row with 22.3 for the parameter:
logID
modID
relID
DateTime
TxType
4
1234
22.3
10/1/2022 11:00:00 AM
15

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;

You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

How to write the query to make report by month in sql

I have the receiving and sending data for whole year. so i want to built the monthly report base on that data with the rule is Fisrt in first out. It means is the first receiving will be sent out first ...
DECLARE #ReceivingTbl AS TABLE(Id INT,ProId int, RecQty INT,ReceivingDate DateTime)
INSERT INTO #ReceivingTbl
VALUES (1,1001,210,'2019-03-12'),
(2,1001,315,'2019-06-15'),
(3,2001,500,'2019-04-01'),
(4,2001,10,'2019-06-15'),
(5,1001,105,'2019-07-10')
DECLARE #SendTbl AS TABLE(Id INT,ProId int, SentQty INT,SendMonth int)
INSERT INTO #SendTbl
VALUES (1,1001,50,3),
(2,1001,100,4),
(3,1001,80,5),
(4,1001,80,6),
(5,2001,200,6)
SELECT * FROM #ReceivingTbl ORDER BY ProId,ReceivingDate
SELECT * FROM #SendTbl ORDER BY ProId,SendMonth
Id ProId RecQty ReceivingDate
1 1001 210 2019-03-12
2 1001 315 2019-06-15
5 1001 105 2019-07-10
3 2001 500 2019-04-01
4 2001 10 2019-06-15
Id ProId SentQty SendMonth
1 1001 50 3
2 1001 100 4
3 1001 80 5
4 1001 80 6
5 2001 200 6
--- And the below is what i want:
Id ProId RecQty ReceivingDate ... Mar Apr May Jun
1 1001 210 2019-03-12 ... 50 100 60 0
2 1001 315 2019-06-15 ... 0 0 20 80
5 1001 105 2019-07-10 ... 0 0 0 0
3 2001 500 2019-04-01 ... 0 0 0 200
4 2001 10 2019-06-15 ... 0 0 0 0
Thanks!

Your question is not clear to me.
If you want to purely use the FIFO approach, therefore ignore any data the table contains, you necessarely need to order by ID, which in your example you are providing, and looks like it is in order of insert.
The first line inserted should be also the first line appearing in the select (FIFO), in order to do so you have to use:
ORDER BY Id ASC
Which will place the lower value of the ID first (1, 2, 3, ...)
To me though, this doesn't make much sense, so pay attention to the meaning o the data you actually have and leverage dates like ReceivingDate, and order by that, maybe even filtering by month of the date, below an example for January data:
WHERE MONTH(ReceivingDate) = 1

SQL query to allow NULL for a highest_bid column when no bid has been placed yet

For school I need to make a function on an auction website. For this I need to join a couple of tables in a VIEW. This worked just fine, until I needed to add a filter for price range. Seems easy enough but the query result needs to allow a NULL when no bid has been placed.
The Statement for the View:
SELECT I.itemID, I.title, I.startPrice, B.highestBid, Cfi.category, I.endDate
FROM dbo.Items AS I INNER JOIN dbo.category_for_item AS Cfi ON V.itemID = Vir.itemID
LEFT OUTER JOIN dbo.Bid AS B ON V.itemID = B.itemID
This would get the following Table:
itemID title startPrice highestBid category endDate
1 1234 Alfa 25 26 PC 2018-09-22
2 1234 Alfa 25 NULL PC 2018-09-22
3 5678 Bravo 9 20 Console 2018-07-03
4 5678 Bravo 9 15 Console 2018-07-03
5 5678 Bravo 9 NULL Console 2018-07-03
6 9876 Charlie 84 100 Stamps 2018-06-14
7 9876 Charlie 84 90 Stamps 2018-06-14
8 9876 Charlie 84 85 Stamps 2018-06-14
9 9876 Charlie 84 NULL Stamps 2018-06-14
10 1470 Delta 98 100 Fashion 2018-06-15
11 1470 Delta 98 99 Fashion 2018-06-15
12 1470 Delta 98 NULL Fashion 2018-06-15
13 9631 Echo 56 65 Cars 2018-06-25
14 9631 Echo 56 NULL Cars 2018-06-25
15 7856 Foxtrot 98 NULL Dolls 2018-12-26
After looking around for answers I got a query for joining the VIEW on itself with only showing the highest bid instead of all bids:
SELECT VW.itemID, VW.title, VW.startPrice, VW.highestBid, VW.category, VW.endDate
FROM VW_SEARCH AS VW
INNER JOIN (SELECT itemID, MAX(highestBid) AS MaxBid
FROM VW_SEARCH
GROUP BY itemID) VJ
ON VW.itemID = VJ.itemID AND VW.highestBid = VJ.MaxBid
This gave the next results:
itemID title startPrice highestBid category endDate
1 1234 Alfa 25 26 PC 2018-09-22
2 5678 Bravo 9 20 Console 2018-07-03
3 9876 Charlie 84 85 Stamps 2018-06-14
4 1470 Delta 98 100 Fashion 2018-06-15
5 9631 Echo 56 65 Cars 2018-06-25
As I expected the result only showed the items with at least one bid on them. I tried added one extra condition on the subQuery and Joining RIGHT OUTER to make sure I would not get doubles of an itemID.
SELECT VW.itemID, VW.title, VW.startPrice, VW.highestBid, VW.category, VW.endDate
FROM VW_SEARCH AS VW
RIGHT OUTER JOIN (SELECT itemID, MAX(highestBid) AS MaxBid
FROM VW_SEARCH
WHERE highestBid > 0 OR highestBid IS NULL
GROUP BY itemID) VJ
ON VW.itemID = VJ.itemID AND VW.highestBid = VJ.MaxBid
This gave the following results (did not add result 5 - 1199 because it is all the same as result 4, this would happen in the actual table not the example table from above):
itemID title startPrice highestBid category endDate
1 1234 Alfa 25 26 PC 2018-09-22
2 5678 Bravo 9 20 Console 2018-07-03
3 9876 Charlie 84 85 Stamps 2018-06-14
4 NULL NULL NULL NULL NULL NULL
1200 1470 Delta 98 100 Fashion 2018-06-15
1201 9631 Echo 56 65 Cars 2018-06-25
While this is technicly allowing a NULL in the colums I need to get a result in the likes of :
itemID title startPrice highestBid catgory endDate
1 1234 Alfa 25 26 PC 2018-09-22
2 5678 Bravo 9 20 Console 2018-07-03
3 9876 Charlie 84 85 Stamps 2018-06-14
4 1470 Delta 98 100 Fashion 2018-06-15
5 9631 Echo 56 65 Cars 2018-06-25
6 7856 Foxtrot 98 NULL Dolls 2018-12-26
How do I get the desired result, or is it just impossible?
Also if the query could be written better, please say so.
Thanks in advance.

Solve the problem using a left join:
SELECT VW.itemID, VW.title, VW.startPrice, VW.highestBid, VW.category, VW.endDate
FROM VW_SEARCH VW LEFT JOIN
(SELECT itemID, MAX(highestBid) AS MaxBid
FROM VW_SEARCH
GROUP BY itemID
) VJ
ON VW.itemID = VJ.itemID AND VW.highestBid = VJ.MaxBid;
Or, use the ANSI-standard ROW_NUMBER() function:
select vw.*
from (select vw.*,
row_number() over (partition by itemID
order by highestBid nulls last
) as seqnum
from vw_search vw
) vw
where seqnum = 1;
This guarantees one row per item.
Note: Not all databases support NULLS LAST. This may not even be necessary, but you can also implement it using a case expression.

Can you give the definition of the view at least? Maybe the table definition too.

I would go only with subquery as because identity column :
select vw.*
from vw_search vw
where id = (select vm1.id
from vw_search vm1
where vm.itemID = vw1.itemID and vm1.highestBid is not null
order by vm1.highestBid desc
limit 1
);
However, some DBMS has not support LIMIT clause such as SQL Server if so, then you can use TOP clause instead.

Incremental count in SQL Server 2005

I am working with a Raiser's Edge database using SQL Server 2005. I have written SQL that will produce a temporary table containing details of direct debit instalments. Below is a small table containing the key variables for the question I'm going to ask, with some fictional data:
Donor_ID Instalment_ID Instalment_Date Amount
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00
1234 1116 01/06/2011 £0.00
2345 2111 01/01/2011 £0.00
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00
2345 2115 01/05/2011 £0.00
2345 2116 01/06/2011 £0.00
As you will see, some of the values in the Amount column are £0.00. This can occur when a donor has insufficient funds in their account, for example.
What I'd like to do is write a SQL query that will create a field containing an incremental count of consecutive £0.00 payments that resets after a non-£0.00 payment or after a change in Donor_ID. I have reproduced the above data below, with the field I'd like to see.
Donor_ID Instalment_ID Instalment_Date Amount New_Field
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00 1
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00 1
1234 1116 01/06/2011 £0.00 2
2345 2111 01/01/2011 £0.00 1
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00 1
2345 2115 01/05/2011 £0.00 2
2345 2116 01/06/2011 £0.00 3
To help clarify what I'm looking for, I think what I'm looking to do would be similar to a winning streak field on a list of a football team's results. For example:
Opponent Score Winning_Streak
Arsenal 1-0 1
Liverpool 0-0
Swansea 3-1 1
Chelsea 2-1 2
Fulham 4-0 3
Stoke 0-0
Man Utd 1-3
Reading 2-1 1
I've considered various options, but have made no progress. Unless I've missed something obvious, I think that a solution more advanced than my current SQL programming level might be required.

If I am thinking about this problem correctly, I believe that you want a row number when the Amount is 0.00 pounds.
Select 0 as As InsufficientCount
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount > 0.00
Union
Select Row_Number() Over (Partition By Donor_ID Order By Installment_ID)
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount = 0.00
This union select should only give you 'ranks' where the Amount equals 0.

Am calling your new field streakAmount
ALTER TABLE instalments ADD streakAmount int NULL;
Then, to update the value:
UPDATE instalments
SET streakAmount =
(SELECT
COUNT(*)
FROM
instalments streak
WHERE
streak.donor_id = instalments.donor_id
AND
streak.instalment_date <= instalments.instalment_date
AND
(streak.instalment_date >
-- find previous instalment date, if any exists
COALESCE(
(
SELECT
MAX(instalment_date)
FROM
instalments prev
WHERE
prev.donor_id = instalments.donor_id
AND
prev.amount > 0
AND
prev.instalment_date < instalments.instalment_date
)
-- otherwise min date
, cast('1753-1-1' AS date))
)
)
WHERE
amount = 0;
http://sqlfiddle.com/#!6/a571f/18

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting records from slowly changing table with a set of dates - sql

Related

Access SQL Query to Count Unique Occurrences of One Field Matching Multiple Parameters/Rows, Some Identical

Query to find active days per year to find revenue per user per year

How to write the query to make report by month in sql

SQL query to allow NULL for a highest_bid column when no bid has been placed yet

Incremental count in SQL Server 2005

Categories

Resources