Selecting top n matches without matching the same rows twice - sql

I am given two tables. Table 1 contains a list of appointment entries and Table 2 contains a list of date ranges, where each date range has an acceptable number of appointments it can be matched with.
I need to match an appointment from table 1 (starting with an appointment with the lowest date) to a date range in table 2. Once we've matched N appointments (where N = Allowed Appointments), we can no longer consider that date range.
Moreover, once we've matched an appointment from table 1 we can no longer consider that appointment for other matches.
Based on the matches I return table 3, with a bit column telling me if there was a match.
I am able to successfully perform this using a cursor, however this solution is not scaling well with larger datasets. I tried to match top n groups using row_count() however, this allows the same appointment to be matched multiple times which is not what I'm looking for.
Would anyone have suggestions in how to perform this matching using a set based approach?
Table 1
ApptID
ApptDate
1
01-01-2022
2
01-04-2022
3
01-05-2022
4
01-20-2022
5
01-21-2022
Table 2
DateRangeId
Date From
Date To
Allowed Num Appointments
1
01-01-2020
01-05-2020
2
2
01-06-2020
01-11-2020
1
3
01-12-2020
01-18-2020
2
4
01-20-2020
01-25-2020
1
5
01-20-2020
01-26-2020
1
Table 3 (Expected Output):
ApptID
ApptDate
Matched
DateRangeId
1
01-01-2022
1
1
2
01-04-2022
1
1
3
01-05-2022
0
NULL
4
01-20-2022
1
4
5
01-21-2022
1
5

Here's a set-based, iterative solution. Depending on the size of your data it might benefit from indexing on the temp table. It works by filling in appointment slots in order of appointment id and range id. You should be able to adjust that if something more optimal is important.
declare #r int = 0;
create table #T3 (ApptID int, ApptDate date, DateRangeId int, UsedSlot int);
insert into #T3 (ApptID, ApptDate, DateRangeId, UsedSlot)
select ApptID, ApptDate, null, 0
from T1;
set #r = ##rowcount;
while #r > 0
begin
with ranges as (
select r.DateRangeId, r.DateFrom, r.DateTo, s.ApptID, r.Allowed,
coalesce(max(s.UsedSlot) over (partition by r.DateRangeId), 0) as UsedSlots
from T2 r left outer join #T3 s on s.DateRangeId = r.DateRangeId
), appts as (
select ApptID, ApptDate from #T3 where DateRangeId is null
), candidates as (
select
a.ApptID, r.DateRangeId, r.Allowed,
UsedSlots + row_number() over (partition by r.DateRangeId
order by a.ApptID) as CandidateSlot
from appts a inner join ranges r
on a.ApptDate between r.DateFrom and r.DateTo
where r.UsedSlots < r.Allowed
), culled as (
select ApptID, DateRangeId, CandidateSlot,
row_number() over (partition by ApptID order by DateRangeId)
as CandidateSequence
from candidates
where CandidateSlot <= Allowed
)
update #T3
set DateRangeId = culled.DateRangeId,
UsedSlot = culled.CandidateSlot
from #T3 inner join culled on culled.ApptID = #T3.ApptID
where culled.CandidateSequence = 1;
set #r = ##rowcount;
end
select ApptID, ApptDate,
case when DateRangeId is null then 0 else 1 end as Matched, DateRangeId
from #T3 order by ApptID;
https://dbfiddle.uk/-5nUzx6Q
It also has occurred to me that you don't really need to store the UsedSlot column. Since it's looking for the maximum in the ranges CTE you might as well just use count(*) over . But it might still have some benefit in making sense of what's going on.

Related

SQL from per day table to date range table transformation

I need to transform the following input table to the output table where output table will have ranges instead of per day data.
Input:
Asin day is_instock
--------------------
A1 1 0
A1 2 0
A1 3 1
A1 4 1
A1 5 0
A2 3 0
A2 4 0
Output:
asin start_day end_day is_instock
---------------------------------
A1 1 2 0
A1 3 4 1
A1 5 5 0
A2 3 4 0
This is what is referred to as the "gaps and islands" problem. There's a fair amount of articles and references you can find if you use that search term.
Solution below:
/*Data setup*/
DROP TABLE IF EXISTS #Stock
CREATE TABLE #Stock ([Asin] Char(2),[day] int,is_instock bit)
INSERT INTO #Stock
VALUES
('A1',1,0)
,('A1',2,0)
,('A1',3,1)
,('A1',4,1)
,('A1',5,0)
,('A2',3,0)
,('A2',4,0);
/*Solution*/
WITH cte_Prev AS (
SELECT *
/*Compare previous day's stock status with current row's status. Every time it changes, return 1*/
,StockStatusChange = CASE WHEN is_instock = LAG(is_instock) OVER (PARTITION BY [Asin] ORDER BY [day]) THEN 0 ELSE 1 END
FROM #Stock
)
,cte_Groups AS (
/*Cumulative sum so everytime stock status changes, add 1 from StockStatusChange to begin the next group*/
SELECT GroupID = SUM(StockStatusChange) OVER (PARTITION BY [Asin] ORDER BY [day])
,*
FROM cte_Prev
)
SELECT [Asin]
,start_day = MIN([day])
,end_day = MAX([day])
,is_instock
FROM cte_Groups
GROUP BY [Asin],GroupID,is_instock
You are looking for an operator described in the temporal data literature, and "best known" as PACK.
This operator was not made part of the SQL standard (SQL:2011) that introduced the temporal features of the literature into the language, so there's extremely little chance you're going to find anything to support you in any SQL product/dialect.
Boils down to : you'll have to write out the algorithm to do the PACKing yourself.

Update based on order

Is it possible to update data based on priority defined in column.
I have input data like this
id Start_date active_flag
1 21-03-2013 N
1 23-03-2013 N
1 22-02-2013 N
1 20-02-2013 N
we have to maintain SC2 in our data and have to keep the data for latest date ( i,e 23-02-2013 here) as active in our database.
we will be getting files daily but in some case, we can get files with combined data for 2 days. now I have to make sure all the history is maintained and data with the latest date as active.
My target data will look like
id Start_date active_flag
1 21-03-2013 N
1 23-03-2013 Y
1 22-02-2013 N
1 20-02-2013 N
but how to write an update which can update data for the column id , based on the order of Start_date.
Thanks in advance
CREATE TABLE #tst(id int,start_data datetime,active_flage varchar(2))
insert into #tst
SELECT 1,'2013-03-21','' UNION
SELECT 1,'2013-03-23','' UNION
SELECT 1,'2013-03-20','' UNION
SELECT 1,'2013-03-19','' UNION
SELECT 1,'2013-03-18',''
UPDATE #tst set active_flage=CASE when r_Id=1 then 'Y' ELSE 'N' END
FROM #tst a JOIN
(SELECT ROW_NUMBER() over(PARTITION by id order by start_data desc) as r_Id,* from #tst)b
ON a.Id=b.Id AND a.start_data=b.start_data
select * from #tst
Considering there will not be duplicate date for same id

Show data from table even if there is no data!! Oracle

I have a query which shows count of messages received based on dates.
For Eg:
1 | 1-May-2012
3 | 3-May-2012
4 | 6-May-2012
7 | 7-May-2012
9 | 9-May-2012
5 | 10-May-2012
1 | 12-May-2012
As you can see on some dates there are no messages received. What I want is it should show all the dates and if there are no messages received it should show 0 like this
1 | 1-May-2012
0 | 2-May-2012
3 | 3-May-2012
0 | 4-May-2012
0 | 5-May-2012
4 | 6-May-2012
7 | 7-May-2012
0 | 8-May-2012
9 | 9-May-2012
5 | 10-May-2012
0 | 11-May-2012
1 | 12-May-2012
How can I achieve this when there are no rows in the table?
First, it sounds like your application would benefit from a calendar table. A calendar table is a list of dates and information about the dates.
Second, you can do this without using temporary tables. Here is the approach:
with constants as (select min(thedate>) as firstdate from <table>)
dates as (select( <firstdate> + rownum - 1) as thedate
from (select rownum
from <table> cross join constants
where rownum < sysdate - <firstdate> + 1
) seq
)
select dates.thedate, count(t.date)
from dates left outer join
<table> t
on t.date = dates.thedate
group by dates.thedate
Here is the idea. The alias constants records the earliest date in your table. The alias dates then creates a sequence of dates. The inner subquery calculates a sequence of integers, using rownum, and then adds these to the first date. Note this assumes that you have on average at least one transaction per date. If not, you can use a bigger table.
The final part is the join that is used to bring back information about the dates. Note the use of count(t.date) instead of count(*). This counts the number of records in your table, which should be 0 for dates with no data.
You don't need a separate table for this, you can create what you need in the query. This works for May:
WITH month_may AS (
select to_date('2012-05-01', 'yyyy-mm-dd') + level - 1 AS the_date
from dual
connect by level < 31
)
SELECT *
FROM month_may mm
LEFT JOIN mytable t ON t.some_date = mm.the_date
The date range will depend on how exactly you want to do this and what your range is.
You could achieve this with a left outer join IF you had another table to join to that contains all possible dates.
One option might be to generate the dates in a temp table and join that to your query.
Something like this might do the trick.
CREATE TABLE #TempA (Col1 DateTime)
DECLARE #start DATETIME = convert(datetime, convert(nvarchar(10), getdate(), 121))
SELECT #start
DECLARE #counter INT = 0
WHILE #counter < 50
BEGIN
INSERT INTO #TempA (Col1) VALUES (#start)
SET #start = DATEADD(DAY, 1, #start)
SET #counter = #counter+1
END
That will create a TempTable to hold the dates... I've just generated 50 of them starting from today.
SELECT
a.Col1,
COUNT(b.MessageID)
FROM
TempA a
LEFT OUTER JOIN YOUR_MESSAGE_TABLE b
ON a.Col1 = b.DateColumn
GROUP BY
a.Col1
Then you can left join your message counts to that.

Best way to interpolate values in SQL

I have a table with rate at certain date :
Rates
Id | Date | Rate
----+---------------+-------
1 | 01/01/2011 | 4.5
2 | 01/04/2011 | 3.2
3 | 04/06/2011 | 2.4
4 | 30/06/2011 | 5
I want to get the output rate base on a simple linear interpolation.
So if I enter 17/06/2011:
Date Rate
---------- -----
01/01/2011 4.5
01/04/2011 3.2
04/06/2011 2.4
17/06/2011
30/06/2011 5.0
the linear interpolation is (5 + 2,4) / 2 = 3,7
Is there a way to do a simple query (SQL Server 2005), or this kind of stuff need to be done in a programmatic way (C#...) ?
Something like this (corrected):
SELECT CASE WHEN next.Date IS NULL THEN prev.Rate
WHEN prev.Date IS NULL THEN next.Rate
WHEN next.Date = prev.Date THEN prev.Rate
ELSE ( DATEDIFF(d, prev.Date, #InputDate) * next.Rate
+ DATEDIFF(d, #InputDate, next.Date) * prev.Rate
) / DATEDIFF(d, prev.Date, next.Date)
END AS interpolationRate
FROM
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= #InputDate
ORDER BY Date DESC
) AS prev
CROSS JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= #InputDate
ORDER BY Date ASC
) AS next
As #Mark already pointed out, the CROSS JOIN has its limitations. As soon as the target value falls outside the range of defined values no records will be returned.
Also the above solution is limited to one result only. For my project I needed an interpolation for a whole list of x values and came up with the following solution. Maybe it is of interested to other readers too?
-- generate some grid data values in table #ddd:
CREATE TABLE #ddd (id int,x float,y float, PRIMARY KEY(id,x));
INSERT INTO #ddd VALUES (1,3,4),(1,4,5),(1,6,3),(1,10,2),
(2,1,4),(2,5,6),(2,6,5),(2,8,2);
SELECT * FROM #ddd;
-- target x-values in table #vals (results are to go into column yy):
CREATE TABLE #vals (xx float PRIMARY KEY,yy float null, itype int);
INSERT INTO #vals (xx) VALUES (1),(3),(4.3),(9),(12);
-- do the actual interpolation
WITH valstyp AS (
SELECT id ii,xx,
CASE WHEN min(x)<xx THEN CASE WHEN max(x)>xx THEN 1 ELSE 2 END ELSE 0 END flag,
min(x) xmi,max(x) xma
FROM #vals INNER JOIN #ddd ON id=1 GROUP BY xx,id
), ipol AS (
SELECT v.*,(b.x-xx)/(b.x-a.x) f,a.y ya,b.y yb
FROM valstyp v
INNER JOIN #ddd a ON a.id=ii AND a.x=(SELECT max(x) FROM #ddd WHERE id=ii
AND (flag=0 AND x=xmi OR flag=1 AND x<xx OR flag=2 AND x<xma))
INNER JOIN #ddd b ON b.id=ii AND b.x=(SELECT min(x) FROM #ddd WHERE id=ii
AND (flag=0 AND x>xmi OR flag=1 AND x>xx OR flag=2 AND x=xma))
)
UPDATE v SET yy=ROUND(f*ya+(1-f)*yb,8),itype=flag FROM #vals v INNER JOIN ipol i ON i.xx=v.xx;
-- list the interpolated results table:
SELECT * FROM #vals
When running the above script you will get the following data grid points in table #ddd
id x y
-- -- -
1 3 4
1 4 5
1 6 3
1 10 2
2 1 4
2 5 6
2 6 5
2 8 2
[[ The table contains grid points for two identities (id=1 and id=2). In my example I referenced only the 1-group by using where id=1 in the valstyp CTE. This can be changed to suit your requirements. ]]
and the results table #vals with the interpolated data in column yy:
xx yy itype
--- ---- -----
1 2 0
3 4 0
4.3 4.7 1
9 2.25 1
12 1.5 2
The last column itype indicates the type of interpolation/extrapolation that was used to calculate the value:
0: extrapolation to lower end
1: interpolation within given data range
2: extrapolation to higher end
This working example can be found here.
The trick with CROSS JOIN here is it wont return any records if either of the table does not have rows (1 * 0 = 0) and the query may break. Better way to do is use FULL OUTER JOIN with inequality condition (to avoid getting more than one row)
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= #InputDate
ORDER BY Date DESC
) AS prev
FULL OUTER JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= #InputDate
ORDER BY Date ASC
) AS next
ON (prev.Date <> next.Date) [or Rate depending on what is unique]

Retrieve/update rows with a minimal deviation in a certain column value

I have a database table with one column being dates. However, some of the rows should share the same date but due to lag on insertion there's a one second difference between them. The insert part has been fixed already but the current data in the table needs to be fixed as well.
As an example the following data is present:
2008-10-08 12:23:01 1 1 x
2008-10-08 12:23:01 1 2 y
2008-10-08 12:23:02 1 3 z
Now I want to update the last row in this example and set the date to '2008-10-08 12:23:01'.
The best way I can think of is writing an external script to do that. It's tricky to determine which columns are correct and which should be updated without having more control over the grouping. Pseudo-code:
all_rows = SELECT * FROM table ORDER BY date
last_date = NULL
rows_to_update = []
for row in all_rows:
if last_date is NULL or row.date - last_date > X seconds:
set date to last_date for all rows from rows_to_update
last_date = row.date
rows_to_update = []
else if row.date != last_date:
rows_to_update += row
Alternatively, something like this could work, but you might need more than one run if want to handle cases where all three dates are different and you want to normalize two of them to the first one.
UPDATE
tbl t,
(SELECT
t.date,
(SELECT min(date)
FROM tbl
WHERE timestampdiff(SECOND,date,t.date) BETWEEN 1 AND 3) AS new_date
FROM tbl t) t2
SET t.date=t2.new_date
WHERE t.date=t2.date AND t2.new_date IS NOT NULL
For all rows::.
update yourtable set date_added=date_added-'01';
for a specific row add a where clause
due to lag in insertion
Why don't you get the date for insert before inserting/updating the first row and use that for all the other rows?
Assuming you have this structure:
create table tbl(id int identity, dt datetime)
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:02')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:06')
This query will only show the last item of each set that's 1 second late:
select distinct A.* from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
Using that in conjunction with an UPDATE statement, you get this:
update tbl set dt = (select top 1 dt from tbl where tbl.id < A.id order by tbl.id desc)
from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
And that updates the last record of each set to the date above it, giving the results:
1 2009-10-08 12:23:01.000
2 2009-10-08 12:23:01.000
3 2009-10-08 12:23:01.000
4 2009-10-08 12:23:05.000
5 2009-10-08 12:23:05.000
6 2009-10-08 12:23:05.000
Its quick and dirty and unoptimized, but for a once-off data-scrub it should work.
Remember to back up!