Best way to interpolate values in SQL - sql

I have a table with rate at certain date :
Rates
Id | Date | Rate
----+---------------+-------
1 | 01/01/2011 | 4.5
2 | 01/04/2011 | 3.2
3 | 04/06/2011 | 2.4
4 | 30/06/2011 | 5
I want to get the output rate base on a simple linear interpolation.
So if I enter 17/06/2011:
Date Rate
---------- -----
01/01/2011 4.5
01/04/2011 3.2
04/06/2011 2.4
17/06/2011
30/06/2011 5.0
the linear interpolation is (5 + 2,4) / 2 = 3,7
Is there a way to do a simple query (SQL Server 2005), or this kind of stuff need to be done in a programmatic way (C#...) ?

Something like this (corrected):
SELECT CASE WHEN next.Date IS NULL THEN prev.Rate
WHEN prev.Date IS NULL THEN next.Rate
WHEN next.Date = prev.Date THEN prev.Rate
ELSE ( DATEDIFF(d, prev.Date, #InputDate) * next.Rate
+ DATEDIFF(d, #InputDate, next.Date) * prev.Rate
) / DATEDIFF(d, prev.Date, next.Date)
END AS interpolationRate
FROM
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= #InputDate
ORDER BY Date DESC
) AS prev
CROSS JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= #InputDate
ORDER BY Date ASC
) AS next

As #Mark already pointed out, the CROSS JOIN has its limitations. As soon as the target value falls outside the range of defined values no records will be returned.
Also the above solution is limited to one result only. For my project I needed an interpolation for a whole list of x values and came up with the following solution. Maybe it is of interested to other readers too?
-- generate some grid data values in table #ddd:
CREATE TABLE #ddd (id int,x float,y float, PRIMARY KEY(id,x));
INSERT INTO #ddd VALUES (1,3,4),(1,4,5),(1,6,3),(1,10,2),
(2,1,4),(2,5,6),(2,6,5),(2,8,2);
SELECT * FROM #ddd;
-- target x-values in table #vals (results are to go into column yy):
CREATE TABLE #vals (xx float PRIMARY KEY,yy float null, itype int);
INSERT INTO #vals (xx) VALUES (1),(3),(4.3),(9),(12);
-- do the actual interpolation
WITH valstyp AS (
SELECT id ii,xx,
CASE WHEN min(x)<xx THEN CASE WHEN max(x)>xx THEN 1 ELSE 2 END ELSE 0 END flag,
min(x) xmi,max(x) xma
FROM #vals INNER JOIN #ddd ON id=1 GROUP BY xx,id
), ipol AS (
SELECT v.*,(b.x-xx)/(b.x-a.x) f,a.y ya,b.y yb
FROM valstyp v
INNER JOIN #ddd a ON a.id=ii AND a.x=(SELECT max(x) FROM #ddd WHERE id=ii
AND (flag=0 AND x=xmi OR flag=1 AND x<xx OR flag=2 AND x<xma))
INNER JOIN #ddd b ON b.id=ii AND b.x=(SELECT min(x) FROM #ddd WHERE id=ii
AND (flag=0 AND x>xmi OR flag=1 AND x>xx OR flag=2 AND x=xma))
)
UPDATE v SET yy=ROUND(f*ya+(1-f)*yb,8),itype=flag FROM #vals v INNER JOIN ipol i ON i.xx=v.xx;
-- list the interpolated results table:
SELECT * FROM #vals
When running the above script you will get the following data grid points in table #ddd
id x y
-- -- -
1 3 4
1 4 5
1 6 3
1 10 2
2 1 4
2 5 6
2 6 5
2 8 2
[[ The table contains grid points for two identities (id=1 and id=2). In my example I referenced only the 1-group by using where id=1 in the valstyp CTE. This can be changed to suit your requirements. ]]
and the results table #vals with the interpolated data in column yy:
xx yy itype
--- ---- -----
1 2 0
3 4 0
4.3 4.7 1
9 2.25 1
12 1.5 2
The last column itype indicates the type of interpolation/extrapolation that was used to calculate the value:
0: extrapolation to lower end
1: interpolation within given data range
2: extrapolation to higher end
This working example can be found here.

The trick with CROSS JOIN here is it wont return any records if either of the table does not have rows (1 * 0 = 0) and the query may break. Better way to do is use FULL OUTER JOIN with inequality condition (to avoid getting more than one row)
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= #InputDate
ORDER BY Date DESC
) AS prev
FULL OUTER JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= #InputDate
ORDER BY Date ASC
) AS next
ON (prev.Date <> next.Date) [or Rate depending on what is unique]

Related

Selecting top n matches without matching the same rows twice

I am given two tables. Table 1 contains a list of appointment entries and Table 2 contains a list of date ranges, where each date range has an acceptable number of appointments it can be matched with.
I need to match an appointment from table 1 (starting with an appointment with the lowest date) to a date range in table 2. Once we've matched N appointments (where N = Allowed Appointments), we can no longer consider that date range.
Moreover, once we've matched an appointment from table 1 we can no longer consider that appointment for other matches.
Based on the matches I return table 3, with a bit column telling me if there was a match.
I am able to successfully perform this using a cursor, however this solution is not scaling well with larger datasets. I tried to match top n groups using row_count() however, this allows the same appointment to be matched multiple times which is not what I'm looking for.
Would anyone have suggestions in how to perform this matching using a set based approach?
Table 1
ApptID
ApptDate
1
01-01-2022
2
01-04-2022
3
01-05-2022
4
01-20-2022
5
01-21-2022
Table 2
DateRangeId
Date From
Date To
Allowed Num Appointments
1
01-01-2020
01-05-2020
2
2
01-06-2020
01-11-2020
1
3
01-12-2020
01-18-2020
2
4
01-20-2020
01-25-2020
1
5
01-20-2020
01-26-2020
1
Table 3 (Expected Output):
ApptID
ApptDate
Matched
DateRangeId
1
01-01-2022
1
1
2
01-04-2022
1
1
3
01-05-2022
0
NULL
4
01-20-2022
1
4
5
01-21-2022
1
5
Here's a set-based, iterative solution. Depending on the size of your data it might benefit from indexing on the temp table. It works by filling in appointment slots in order of appointment id and range id. You should be able to adjust that if something more optimal is important.
declare #r int = 0;
create table #T3 (ApptID int, ApptDate date, DateRangeId int, UsedSlot int);
insert into #T3 (ApptID, ApptDate, DateRangeId, UsedSlot)
select ApptID, ApptDate, null, 0
from T1;
set #r = ##rowcount;
while #r > 0
begin
with ranges as (
select r.DateRangeId, r.DateFrom, r.DateTo, s.ApptID, r.Allowed,
coalesce(max(s.UsedSlot) over (partition by r.DateRangeId), 0) as UsedSlots
from T2 r left outer join #T3 s on s.DateRangeId = r.DateRangeId
), appts as (
select ApptID, ApptDate from #T3 where DateRangeId is null
), candidates as (
select
a.ApptID, r.DateRangeId, r.Allowed,
UsedSlots + row_number() over (partition by r.DateRangeId
order by a.ApptID) as CandidateSlot
from appts a inner join ranges r
on a.ApptDate between r.DateFrom and r.DateTo
where r.UsedSlots < r.Allowed
), culled as (
select ApptID, DateRangeId, CandidateSlot,
row_number() over (partition by ApptID order by DateRangeId)
as CandidateSequence
from candidates
where CandidateSlot <= Allowed
)
update #T3
set DateRangeId = culled.DateRangeId,
UsedSlot = culled.CandidateSlot
from #T3 inner join culled on culled.ApptID = #T3.ApptID
where culled.CandidateSequence = 1;
set #r = ##rowcount;
end
select ApptID, ApptDate,
case when DateRangeId is null then 0 else 1 end as Matched, DateRangeId
from #T3 order by ApptID;
https://dbfiddle.uk/-5nUzx6Q
It also has occurred to me that you don't really need to store the UsedSlot column. Since it's looking for the maximum in the ranges CTE you might as well just use count(*) over . But it might still have some benefit in making sense of what's going on.

SQL from per day table to date range table transformation

I need to transform the following input table to the output table where output table will have ranges instead of per day data.
Input:
Asin day is_instock
--------------------
A1 1 0
A1 2 0
A1 3 1
A1 4 1
A1 5 0
A2 3 0
A2 4 0
Output:
asin start_day end_day is_instock
---------------------------------
A1 1 2 0
A1 3 4 1
A1 5 5 0
A2 3 4 0
This is what is referred to as the "gaps and islands" problem. There's a fair amount of articles and references you can find if you use that search term.
Solution below:
/*Data setup*/
DROP TABLE IF EXISTS #Stock
CREATE TABLE #Stock ([Asin] Char(2),[day] int,is_instock bit)
INSERT INTO #Stock
VALUES
('A1',1,0)
,('A1',2,0)
,('A1',3,1)
,('A1',4,1)
,('A1',5,0)
,('A2',3,0)
,('A2',4,0);
/*Solution*/
WITH cte_Prev AS (
SELECT *
/*Compare previous day's stock status with current row's status. Every time it changes, return 1*/
,StockStatusChange = CASE WHEN is_instock = LAG(is_instock) OVER (PARTITION BY [Asin] ORDER BY [day]) THEN 0 ELSE 1 END
FROM #Stock
)
,cte_Groups AS (
/*Cumulative sum so everytime stock status changes, add 1 from StockStatusChange to begin the next group*/
SELECT GroupID = SUM(StockStatusChange) OVER (PARTITION BY [Asin] ORDER BY [day])
,*
FROM cte_Prev
)
SELECT [Asin]
,start_day = MIN([day])
,end_day = MAX([day])
,is_instock
FROM cte_Groups
GROUP BY [Asin],GroupID,is_instock
You are looking for an operator described in the temporal data literature, and "best known" as PACK.
This operator was not made part of the SQL standard (SQL:2011) that introduced the temporal features of the literature into the language, so there's extremely little chance you're going to find anything to support you in any SQL product/dialect.
Boils down to : you'll have to write out the algorithm to do the PACKing yourself.

Aggregated product generation on runtime for SQL Server 2008 R2

I have a large amount of data. I need to implement a product aggregation on each value. Let me explain with example to make it clear.
This is a sample data-
/*SampleTable*/
|ID|Date |Value |
| 1|201401|25 |
| 1|201402|-30 |
| 1|201403|-15 |
| 1|201404|50 |
| 1|201405|70 |
| 2|201010|1.15 |
| 2|201011|1.79 |
| 2|201012|0.82 |
| 2|201101|1.8 |
| 2|201102|1.67 |
Have to make this table-
/*ResultTable*/
|ID|Date |Aggregated Value |
| 1|201312|100 |
| 1|201401|125 |
| 1|201402|87.5 |
| 1|201403|74.375 |
| 1|201404|111.563 |
| 1|201405|189.657 |
| 2|201009|100 |
| 2|201010|101.15 |
| 2|201011|102.960 |
| 2|201012|103.804 |
| 2|201101|105.673 |
| 2|201102|107.438 |
-- Note: The 100 values are separately inserted for each ID at the month before first date
-- of previous table
Here for each ID, I have a Value (Column 2) given with corresponding Date (YYYYMM format). I have to implement the following formula to calculate the Aggregated Value column Grouped By each ID -
current_Aggregated_Value = previous_aggregated_value * ((current_value/100) + 1))
There was no easy solution for this. I have to take aggregated value of previous row, which is also a generated value by the same query (except 100, it has been manually added), to calculate aggregated value for current row. As it is not possible to take a generated value in runtime for SQL, I had to implement a product aggregate function described here.
so 2nd aggregated_value (125) was derived by (100 * ((25 / 100) + 1)) = 125
3rd aggregated_value (87.5) was derived by (125 * ((-30 / 100) + 1)) = 87.5
But as we cannot take the generated '125' value in runtime, I had to take the product aggregate of the all previous value, 100 * ((25 / 100) + 1) * ((-30 / 100) + 1) = 87.5
similarly 4th value (74.375) comes from, 100 * ((25 / 100) + 1) * ((-30 / 100) + 1) * ((-15 / 100) + 1) = 74.375
Giving a sample query below -
INSERT INTO ResultTable (ID, [Date], [Aggregate Value])
SELECT temps.ID, temps.[Date],
CASE
WHEN temps.min_val = 0 THEN 0
WHEN temps.is_negative % 2 = 1 THEN -1 * EXP(temps.abs_multiplier) * 100
ELSE EXP(temps.abs_multiplier) * 100
END AS value
FROM
(
SELECT st1.ID, st1.[Date],
-- Multiplication by taking all +ve values
SUM(LOG(ABS(NULLIF(((st2.Value / 100) + 1), 0)))) AS abs_multiplier,
-- Count of -ve values, final result is -ve if count is odd
SUM(SIGN(CASE WHEN ((st2.Value / 100) + 1) < 0 THEN 1 ELSE 0 END)) AS is_negative,
-- If any value in the multipliers is 0 the whole multiplication result will be 0
MIN(ABS((st2.Value / 100) + 1)) AS min_val
FROM SampleTable AS st1
INNER JOIN SampleTable AS st2 ON (st2.ID = st1.ID AND st2.[Date] <= st1.[Date])
GROUP BY st1.id, st1.[Date]
) AS temps;
Basically, it is taking the product aggregate for all aggreagted values of previous dates for each value to calculate the desired value.
Well, it is as messy as it sounds and looks and "h-word" slow! But I couldn't find any better solution for this kind of problem in SQL Server 2008 R2 (unless u can give me one).
So, I wanna know 2 things-
1. Is it possible to do it without joining the same table like I did there?
2. Is there any better way to do product aggregation on SQL Server 2008 R2? (I know there is one way in Server 2012, but that is not an option for me)
Sorry for the L-O-N-G question! But Thanks in advance!
I've run several reports that make heavy use of recursion and the results have usually been very acceptable and not slow at all. Give this solution a shot:
-- http://stackoverflow.com/questions/30437219/aggregated-product-generation-on-runtime-for-sql-server-2008-r2
-- Create temp table to hold sample data
Create table #sampleTable
( ID int
, YrMnth date not null
, CurrentValue numeric(13,3)
);
-- Insert sample data into the temp table
-- Date values have an added '01' at the end to make them compatible with the "date" datatype
insert into #sampleTable
values (1,'20131201',100)
, (1,'20140101',25)
, (1,'20140201',-30)
, (1,'20140301',-15)
, (1,'20140401',50)
, (1,'20140501',70)
, (2,'20100901',100)
, (2,'20101001',1.15)
, (2,'20101101',1.79)
, (2,'20101201',0.82)
, (2,'20110101',1.8)
, (2,'20110201',1.67);
-- Declare recursive CTE which loads the first values for each ID as the anchor
With CTE
as
(
Select 0 as lvl
, minID.ID
, minID.YrMnth
, s.CurrentValue
From #sampleTable s
inner join (select ID
, min(YrMnth) as 'YrMnth'
from #sampleTable
group by ID) as minID
on s.ID = minID.ID
and s.YrMnth = minID.YrMnth
union all
-- Add the recursive part which unions on the same ID and +1 month for the date
-- Note that the cast in the calculation is required to prevent datatype errors between anchor and recursive member
select cte.lvl + 1 as lvl
, CTE.ID
, S2.YrMnth
, cast(CTE.CurrentValue * ((s2.CurrentValue / 100) + 1) as numeric(13,3))
--, s2.CurrentValue
from #sampleTable s2
inner join CTE
on s2.ID = CTE.ID
and S2.YrMnth = dateadd(month,1,cte.YrMnth)
)
-- Select final result set
Select *
from CTE
order by ID
, YrMnth
, lvl;
-- Clean up temp table
drop table #sampleTable;
I had to add a day part to your date values so I could treat them as a date datatype. This allows you to join the recursive member on "month + 1". The "lvl" column was added by me just to check the recursion results but I left it in as it's useful to see how many recursions a particular record has gone through.
It will depend on your total data size how fast this will run, but I'm pretty sure it will be faster than your original solution. Note that this solution assumes your dates are sequential for a given ID with no missing months.

What's the most efficient way to match values between 2 tables based on most recent prior date?

I've got two tables in MS SQL Server:
dailyt - which contains daily data:
date val
---------------------
2014-05-22 10
2014-05-21 9.5
2014-05-20 9
2014-05-19 8
2014-05-18 7.5
etc...
And periodt - which contains data coming in at irregular periods:
date val
---------------------
2014-05-21 2
2014-05-18 1
Given a row in dailyt, I want to adjust its value by adding the corresponding value in periodt with the closest date prior or equal to the date of the dailyt row. So, the output would look like:
addt
date val
---------------------
2014-05-22 12 <- add 2 from 2014-05-21
2014-05-21 11.5 <- add 2 from 2014-05-21
2014-05-20 10 <- add 1 from 2014-05-18
2014-05-19 9 <- add 1 from 2014-05-18
2014-05-18 8.5 <- add 1 from 2014-05-18
I know that one way to do this is to join the dailyt and periodt tables on periodt.date <= dailyt.date and then imposing a ROW_NUMBER() (PARTITION BY dailyt.date ORDER BY periodt.date DESC) condition, and then having a WHERE condition on the row number to = 1.
Is there another way to do this that would be more efficient? Or is this pretty much optimal?
I think using APPLY would be the most efficient way:
SELECT d.Val,
p.Val,
NewVal = d.Val + ISNULL(p.Val, 0)
FROM Dailyt AS d
OUTER APPLY
( SELECT TOP 1 Val
FROM Periodt p
WHERE p.Date <= d.Date
ORDER BY p.Date DESC
) AS p;
Example on SQL Fiddle
If there relatively very few periodt rows, then there is an option that may prove quite efficient.
Convert periodt into a From/To ranges table using subqueries or CTEs. (Obviously performance depends on how efficiently this initial step can be done, which is why a small number of periodt rows is preferable.) Then the join to dailyt will be extremely efficient. E.g.
;WITH PIds AS (
SELECT ROW_NUMBER() OVER(ORDER BY PDate) RN, *
FROM #periodt
),
PRange AS (
SELECT f.PDate AS FromDate, t.PDate as ToDate, f.PVal
FROM PIds f
LEFT OUTER JOIN PIds t ON
t.RN = f.RN + 1
)
SELECT d.*, p.PVal
FROM #dailyt d
LEFT OUTER JOIN PRange p ON
d.DDate >= p.FromDate
AND (d.DDate < p.ToDate OR p.ToDate IS NULL)
ORDER BY 1 DESC
If you want to try the query, the following produces the sample data using table variables. Note I added an extra row to dailyt to demonstrate no periodt entries with a smaller date.
DECLARE #dailyt table (
DDate date NOT NULL,
DVal float NOT NULL
)
INSERT INTO #dailyt(DDate, DVal)
SELECT '20140522', 10
UNION ALL SELECT '20140521', 9.5
UNION ALL SELECT '20140520', 9
UNION ALL SELECT '20140519', 8
UNION ALL SELECT '20140518', 7.5
UNION ALL SELECT '20140517', 6.5
DECLARE #periodt table (
PDate date NOT NULL,
PVal int NOT NULL
)
INSERT INTO #periodt
SELECT '20140521', 2
UNION ALL SELECT '20140518', 1

Show data from table even if there is no data!! Oracle

I have a query which shows count of messages received based on dates.
For Eg:
1 | 1-May-2012
3 | 3-May-2012
4 | 6-May-2012
7 | 7-May-2012
9 | 9-May-2012
5 | 10-May-2012
1 | 12-May-2012
As you can see on some dates there are no messages received. What I want is it should show all the dates and if there are no messages received it should show 0 like this
1 | 1-May-2012
0 | 2-May-2012
3 | 3-May-2012
0 | 4-May-2012
0 | 5-May-2012
4 | 6-May-2012
7 | 7-May-2012
0 | 8-May-2012
9 | 9-May-2012
5 | 10-May-2012
0 | 11-May-2012
1 | 12-May-2012
How can I achieve this when there are no rows in the table?
First, it sounds like your application would benefit from a calendar table. A calendar table is a list of dates and information about the dates.
Second, you can do this without using temporary tables. Here is the approach:
with constants as (select min(thedate>) as firstdate from <table>)
dates as (select( <firstdate> + rownum - 1) as thedate
from (select rownum
from <table> cross join constants
where rownum < sysdate - <firstdate> + 1
) seq
)
select dates.thedate, count(t.date)
from dates left outer join
<table> t
on t.date = dates.thedate
group by dates.thedate
Here is the idea. The alias constants records the earliest date in your table. The alias dates then creates a sequence of dates. The inner subquery calculates a sequence of integers, using rownum, and then adds these to the first date. Note this assumes that you have on average at least one transaction per date. If not, you can use a bigger table.
The final part is the join that is used to bring back information about the dates. Note the use of count(t.date) instead of count(*). This counts the number of records in your table, which should be 0 for dates with no data.
You don't need a separate table for this, you can create what you need in the query. This works for May:
WITH month_may AS (
select to_date('2012-05-01', 'yyyy-mm-dd') + level - 1 AS the_date
from dual
connect by level < 31
)
SELECT *
FROM month_may mm
LEFT JOIN mytable t ON t.some_date = mm.the_date
The date range will depend on how exactly you want to do this and what your range is.
You could achieve this with a left outer join IF you had another table to join to that contains all possible dates.
One option might be to generate the dates in a temp table and join that to your query.
Something like this might do the trick.
CREATE TABLE #TempA (Col1 DateTime)
DECLARE #start DATETIME = convert(datetime, convert(nvarchar(10), getdate(), 121))
SELECT #start
DECLARE #counter INT = 0
WHILE #counter < 50
BEGIN
INSERT INTO #TempA (Col1) VALUES (#start)
SET #start = DATEADD(DAY, 1, #start)
SET #counter = #counter+1
END
That will create a TempTable to hold the dates... I've just generated 50 of them starting from today.
SELECT
a.Col1,
COUNT(b.MessageID)
FROM
TempA a
LEFT OUTER JOIN YOUR_MESSAGE_TABLE b
ON a.Col1 = b.DateColumn
GROUP BY
a.Col1
Then you can left join your message counts to that.