Left join with complex join clause - sql

I have two tables and want to left join them.
I want all entries from the account table, but only rows matching a criteria from the right table. If no criteria is matching, I only want the account.
The following does not work as expected:
SELECT * FROM Account a
LEFT JOIN
Entries ef ON ef.account_id = a.account_id AND
(ef.entry_period_end_date BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.forecast_period_end BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.entry_period_end_date IS NULL
OR
ef.forecast_period_end IS NULL
)
cause it also gives me the rows from the entries table, which are outside the requested period.
Example Data:
Account Table
AccountID | AccountName
1 Test
2 Foobar
3 Test1
4 Foobar2
Entries Table
id | AccountID | entry_period_end_date | forecast_period_end | amount
1 1 12/31/2009 12/31/2009 100
2 1 NULL 10/31/2009 150
3 2 NULL NULL 200
4 3 10/31/2009 NULL 250
5 4 10/31/2009 10/31/2009 300
So the query should return (when i set startDate = 12/01/2009, endDate 12/31/2009)
AccountID | id
1 1
2 NULL
3 NULL
4 NULL
Thx,
Martin

If either entry_period_end_date or forecast_period_end is NULL, the row will be returned, even if your other, non-NULL column is not within the period.
Probably you meant this:
SELECT *
FROM Account a
LEFT JOIN
Entries ef
ON ef.account_id = a.account_id
AND
(
entry_period_end_date BETWEEN …
OR forecast_period_end BETWEEN …
)
, which will return you all rows with either entry_period_end or forecast_period_end within the given period.
Update:
A test script:
CREATE TABLE account (AccountID INT NOT NULL, AccountName VARCHAR(100) NOT NULL);
INSERT
INTO account
VALUES
(1, 'Test'),
(2, 'Foobar'),
(3, 'Test1'),
(4, 'Foobar1');
CREATE TABLE Entries (id INT NOT NULL, AccountID INT NOT NULL, entry_period_end_date DATETIME, forecast_period_end DATETIME, amount FLOAT NOT NULL);
INSERT
INTO Entries
VALUES
(1, 1, '2009-12-31', '2009-12-31', 100),
(2, 1, NULL, '2009-10-31', 100),
(3, 2, NULL, NULL, 100),
(4, 3, '2009-10-31', NULL, 100),
(5, 4, '2009-10-31', '2009-10-31', 100);
SELECT a.*, ef.id
FROM Account a
LEFT JOIN
Entries ef
ON ef.accountID = a.accountID
AND
(
entry_period_end_date BETWEEN '2009-12-01' AND '2009-12-31'
OR forecast_period_end BETWEEN '2009-12-01' AND '2009-12-31'
);
returns following:
1, 'Test', 1
2, 'Foobar', NULL
3, 'Test1', NULL
4, 'Foobar1' NULL

Edited to fix logic so end date logic is grouped together, then forecast period logic...
Now it should check for a "good" end date (null or within range), then check for a "good" forecast date (null or within range)
Since all the logic is on the Entries table, narrow it down first, then join
SELECT a.*,temp.id FROM Account a
LEFT JOIN
(
SELECT id, account_id
FROM Entries ef
WHERE
((ef.entry_period_end_date BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.entry_period_end_date IS NULL
)
AND
(ef.forecast_period_end BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.forecast_period_end IS NULL
)
) temp
ON a.account_id = temp.account_id

Related

join equal or max

I want to join two tables by 'EmployeeId' and 'CalendarMonthId', but problem is that I want to connect record with the bigest CalednarMonthId if not exist equal. I use Azure Synapse SQL pool.
Example data and code
create table emp_contr (
EmployeeId int,
CalendarMonthId int,
IsDeleted int
--login varchar(20)
)
create table contr (
ContractId int,
EmployeeId int,
CalendarMonthId int,
value int
)
insert into emp_contr values (1, 202201, 0)
insert into emp_contr values (1, 202202, 0)
insert into emp_contr values (1, 202205, 0)
insert into emp_contr values (1, 202206, 0)
insert into emp_contr values (2, 202202, 0)
insert into emp_contr values (2, 202203, 0)
insert into contr values (1, 1, 202201, 5)
insert into contr values (2, 1, 202202, 2)
insert into contr values (40, 2, 202202, 2)
insert into contr values (50, 2, 202203, 0)
Base on this data I have problem with connect table: emp_contr, row: EmployeeId:1, CalendarMonthId:202205 with row from table contr from the same user and maximum CalendarMonthId:202202. I have query like this, but it doesn't work
select *
from emp_contr ec
join contr c
on ec.EmployeeId = c.EmployeeId
and (ec.CalendarMonthId = c.CalendarMonthId or ec.CalendarMonthId > max(c.CalendarMonthId))
order by ec.EmployeeId
I expect result like this:
EmployeeId
Emp_Contr_CalendarMonthId
Contr_CalendarMonthId
Value
ContractId
1
202206
202202
2
2
1
202205
202202
2
2
1
202202
202202
2
2
1
202201
202201
5
1
2
202203
202203
0
50
2
202202
202202
2
40
I don't know what I should do. Maybe should I add row in temporary table based on contr table with the same values like bigest one but different CalendarMonthId
TL;DR:
select ec.EmployeeId, ec.CalendarMonthId Emp_Contr_CalendarMonthId,
coalesce(c.CalendarMonthId,c2.CalendarMonthId) Contr_CalendarMonthId,
coalesce(c.value,c2.value) value,
coalesce(c.ContractId,c2.ContractId) ContractId,
-- additional columns for debugging
c.CalendarMonthId, c.value, c.ContractId,
c2.CalendarMonthId, c2.value, c2.ContractId
from emp_contr ec
left join contr c
on ec.EmployeeId = c.EmployeeId
and ec.CalendarMonthId = c.CalendarMonthId
join (
select *
from (
select EmployeeId,
CalendarMonthId,
value,
ContractId,
row_number() over (partition by EmployeeId order by CalendarMonthId desc) rn
from contr
) x
where rn = 1
) c2 on ec.EmployeeId = c2.EmployeeId
order by ec.EmployeeId, ec.CalendarMonthId desc
SQL Fiddle
The coalesce assumes all relevant columns are NOT NULL. If this is not the case you'll have to replace them with CASE-expressions in which you check if c.CalendarMonthId is null.
Explanation
The statement joins contr twice.
First using an outer join to obtain the matching results when present, without loosing rows without a match.
The second join is an inner join.
This assumes there is at least one contr row per emp_contr
The join does join an inline view which
uses the window function row_number() to filter only the last row per employee when ordered by CalendarMonthId
The last step is to combine the results, and pick the one from the first join if present and the one from the second join if not.

Use select in group by statement in Firebird

I'm using a Firebird database which has the following tables:
ARTICULOS
ProductId
longSKU
1
A22121000125
2
A22121000138
3
A22123001508
4
A22124002001
TALLESPORARTICULOS
ProductId
position
Sizes
1
1
Small
1
2
Medium
1
3
Large
1
4
Xtra Large
1
5
XXtra Large
2
1
Small
2
2
Medium
2
3
Large
2
4
Xtra Large
2
5
XXtra Large
3
1
02
3
2
04
3
3
06
3
4
08
and
RANGOSTALLE
ProductId
FromPosition
ToPosition
Price
1
1
3
500
1
4
5
600
2
1
3
500
2
4
5
600
3
1
4
200
I want to be able to group by a substring (shortSKU) of the longSKU and be able to get for each shortSKU the corresponding ranges and prices.
like this example:
ShortSKU
SizeFrom
SizeTo
Price
A221210001
small
large
500
A221210001
xtra large
xxtra large
600
A221230015
02
08
200
I'm using the following cobe but I get the error:
Dynamic SQL Error.
SQL error code = -104.
Invalid expression in the select list (not contained in either an aggregate function or the >GROUP BY clause).
CREATE OR ALTER VIEW RANGOSPARACOSTOSYPRECIOS(
SHORTSKU,
SIZEFROM,
SIZETO,
PRICE ) AS select substring(ar.codigoparticular from 1 for 10) AS SHORTSKU,
( Select TAL.SIZE
From tallesporarticulos TAL
Where TAL.productid=Ar.productid
and TAL.position= RT.FromPosition) as SIZEFROM,
( Select TAL.SIZE
From tallesporarticulos TAL
Where TAL.productid=Ar.productid
and TAL.position= RT.ToPosition) as SIZETO,
max(RT.PRICE)
from Articulos Ar
Inner Join tallesporarticulos TA On Ar.productId = TA.productId
Inner Join rangostalle RT On AR.productId = RT.productId
GROUP BY SHORTSKU, SIZEFROM, SIZETO ;
The following code works, but I need to replace the "fromposition" and "ToPosition" values with the size value like the code above, and that's when I get the error message.
CREATE OR ALTER VIEW RANGOSPARACOSTOSYPRECIOS(
SHORTSKU,
SIZEFROM,
SIZETO,
PRICE ) AS select substring(ar.codigoparticular from 1 for 10) AS SHORTSKU,
RT.FromPosition as SIZEFROM,
RT.ToPosition as SIZETO,
max(RT.PRICE)
from Articulos Ar
Inner Join tallesporarticulos TA On Ar.productId = TA.productId
Inner Join rangostalle RT On AR.productId = RT.productId
GROUP BY SHORTSKU, SIZEFROM, SIZETO ;
For anyone interested in helping, here you have the insert data from the tables above.
CREATE TABLE articulos (
ProductId INTEGER PRIMARY KEY,
LongSKU varchar(12) NOT NULL
);
INSERT INTO articulos VALUES (1, 'A22121000125');
INSERT INTO articulos VALUES (2, 'A22121000138');
INSERT INTO articulos VALUES (3, 'A22123001508');
INSERT INTO articulos VALUES (4, 'A22124002001');
CREATE TABLE TALLESPORARTICULOS (
ProductId INTEGER NOT NULL,
Position INTEGER NOT NULL,
Sizes varchar(12) NOT NULL
);
INSERT INTO TALLESPORARTICULOS (ProductId, position, Sizes) VALUES
(1, 1, 'SMALL'),
(1, 2, 'MEDIUM'),
(1, 3, 'LARGE'),
(1, 4, 'XTRALARGE'),
(1, 1, 'XXTRALARGE'),
(2, 2, 'SMALL'),
(2, 3, 'MEDIUM'),
(2, 4, 'LARGE'),
(2, 5, 'XTRALARGE'),
(2, 5, 'XXTRALARGE'),
(3, 1, '02'),
(3, 2, '03'),
(3, 3, '04'),
(3, 4, '05');
CREATE TABLE RANGOSTALLE (
ProductId INTEGER NOT NULL,
FromPosition INTEGER NOT NULL,
ToPosition INTEGER NOT NULL,
Price double not null
);
INSERT INTO RANGOSTALLE (ProductId,FromPosition,ToPosition,Price) VALUES
(1, 1,3,500),
(1, 4,5,600),
(2, 1,3,500),
(2, 4,5,600),
(3, 1,4,200);
Your script contains quite a few errors. After fixing them the query is rather trivial:
select substring(LongSKU from 1 for 10), low.sizes, high.sizes, avg(price)
from articulos join RANGOSTALLE on articulos.ProductId = RANGOSTALLE.ProductId
join TALLESPORARTICULOS low on RANGOSTALLE.ProductId = low.ProductId and RANGOSTALLE.FromPosition = low.Prodposition
join TALLESPORARTICULOS high on RANGOSTALLE.ProductId = high.ProductId and RANGOSTALLE.ToPosition = high.Prodposition
group by 1,2,3
https://dbfiddle.uk/?rdbms=firebird_3.0&fiddle=ae54a7d897da4604396775e3ddc4b764
This query can be optimized by moving grouping into a derived table but such optimization highly depends on the real table structure and query requirements.

How to I get a correct average number of appointments per day?

I want to see what the average number of appointments is by each appointment type is. Basically I have the following tables and columns:
Table 1 - Dates
-----------
Date date (primary key)
Table 2 - Appointments
-----------
AppointmentStart Datetime
ApptId Numeric
FacilityId Numeric
ApptKind Numeric
Appointmentid Numeric
Table 3 AppointmentType
-----------
ApptTypeId Numeric
Name Varchar
Sample Data
============
Table 1 Date
---------------
date
1/1/2017
1/2/2017
...
Table 2 Appointment
----------------
ApptStart | ApptTypeId | FacilityId | ApptKind | ApptId
2017-1-1 9:00:00 1 2 1 2385525
2017-1-1 9:15:00 3 2 1 2385526
2017-1-1 9:30:00 2 2 1 2385527
...
Table 3 ApptType
-----------------
ApptTypeId | Name
1 Walk-in
2 MAT
3 Acute
...
There are about 30 different appointment types and not all of them occur every day. So far I have created a table that lists every date in the time range that I want then I do a left join with the count of appointments (nulls equal 0). I also remove Saturdays and Sundays. This works really well for one appointment type but when I do this with multiple appointment types zeroes only show up for the days where there are no appointments.
My solution:
Somehow insert each appointment type next to each day then do the left join with the NULL = 0 part although I don't know how to get the list to repeat for each day in the table.
Example:
At the end I want
EndResult
----------
Average(Count(appts)) | ApptType.Name
OR
EndResult
---------
Count(apptid) | ApptType.Name | Date
5 Acute 1/1/2017
0 MAT 1/1/2017
4 Walk-in 1/1/2017
0 Other 1/1/2017
Then repeat for the next day with the same appointment type names
This is how I would write a query that gets you to
End Result #2:
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptTypeID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
This assumes that ApptTypeID is indeed part of Table2. You can wrap this result up further to get your End Result #1:
SELECT Avg(D.ApptCount), D.ApptTypeName
FROM (
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
) AS D
GROUP BY D.ApptTypeName
First we declare and populate table variables for example data.
DECLARE #Dates TABLE (
Date DATE
)
INSERT #Dates
VALUES
('2017-01-01')
,('2017-01-02')
DECLARE #Appointments TABLE (
AppointmentStart DATETIME
,ApptId INT
,FacilityId INT
,ApptKind INT
,Appointmentid INT
)
INSERT #Appointments
VALUES
('2017-01-01 09:00:00.000', 1, 2, 1, 2385525)
,('2017-01-01 09:15:00.000', 3, 2, 1, 2385526)
,('2017-01-01 09:30:00.000', 2, 2, 1, 2385527)
DECLARE #ApptType TABLE (
ApptTypeId INT
,Name VARCHAR(32)
)
INSERT #ApptType
VALUES
(1, 'Walk-in')
,(2, 'MAT')
,(3, 'Acute')
This shows us the cartesian product of a full outer join of Dates and ApptType.
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1
We can use the cartesian product as our left data set, and count the number of items in our right data set (#Appointments). By doing this with a left join, we ensure that every date/appointment type combination is included, even if there were no appointments of that type on that date.
SELECT
A.[Date]
,A.[Name]
,COUNT(B.Appointmentid)
FROM (
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1) AS A
LEFT JOIN #Appointments AS B
ON A.[ApptTypeId] = B.[ApptId]
AND A.[Date] = CAST(B.[AppointmentStart] AS DATE)
GROUP BY
A.[Date]
,A.[Name]
ORDER BY
A.[Date]
,A.[Name]

Select TOP columns from table1, join table2 with their names

I have a TABLE1 with these two columns, storing departure and arrival identifiers from flights:
dep_id arr_id
1 2
6 2
6 2
6 2
6 2
3 2
3 2
3 2
3 4
3 4
3 6
3 6
and a TABLE2 with the respective IDs containing their ICAO codes:
id icao
1 LPPT
2 LPFR
3 LPMA
4 LPPR
5 LLGB
6 LEPA
7 LEMD
How can i select the top count of TABLE1 (most used departure id and most used arrival id) and group it with the respective ICAO code from TABLE2, so i can get from the provided example data:
most_arrivals most_departures
LPFR LPMA
It's simple to get ONE of them, but mixing two or more columns doesn't seem to work for me no matter what i try.
You can do it like this.
Create and populate tables.
CREATE TABLE dbo.Icao
(
id int NOT NULL PRIMARY KEY,
icao nchar(4) NOT NULL
);
CREATE TABLE dbo.Flight
(
dep_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id),
arr_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id)
);
INSERT INTO dbo.Icao (id, icao)
VALUES
(1, N'LPPT'),
(2, N'LPFR'),
(3, N'LPMA'),
(4, N'LPPR'),
(5, N'LLGB'),
(6, N'LEPA'),
(7, N'LEMD');
INSERT INTO dbo.Flight (dep_id, arr_id)
VALUES
(1, 2),
(6, 2),
(6, 2),
(6, 2),
(6, 2),
(3, 2),
(3, 2),
(3, 2),
(3, 4),
(3, 4),
(3, 6),
(3, 6);
Then do a SELECT using two subqueries.
SELECT
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.arr_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_arrivals',
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.dep_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_departures';
Click this button on the toolbar to include the actual execution plan, when you execute the query.
And this is the graphical execution plan for the query. Each icon represents an operation that will be performed by the SQL Server engine. The arrows represent data flows. The direction of flow is from right to left, so the result is the leftmost icon.
try this one:
select
(select name
from table2 where id = (
select top 1 arr_id
from table1
group by arr_id
order by count(*) desc)
) as most_arrivals,
(select name
from table2 where id = (
select top 1 dep_id
from table1
group by dep_id
order by count(*) desc)
) as most_departures

Calculating a fields value according to the values of the previous and next fields

For clarity assume that I have a table with a carID, a mileage and a date. The dates are always months (eg 01/02/2015, 01/03/2015, ...). Each carID has a row for each month, but not each row has values for the mileage field, some are NULL.
Example table:
carID mileage date
-----------------------------------------
1 400 01/01/2015
2 NULL 01/02/2015
3 NULL 01/03/2015
4 1050 01/04/2015
If such a field is NULL I need to calculate what value it should have by looking at the previous and next values (these aren't necessarily the next or previous month, they can be months apart).
I want to do this by taking the difference of the previous and next values, then calculate the time between them and make the value accordingly to the time. I have no idea however as how to do this.
I have already used a bit of code to look at the next value before, it looks like this:
, carKMcombiDiffList as (
select ml.*,
(ml.KM - mlprev.KM) as diff
from carKMcombilist ml outer apply
(select top 1 ml2.*
from carKMcombilist ml2
where ml2.FK_CarID = ml.FK_CarID and
ml2.beginmonth < ml.beginmonth
order by ml2.beginmonth desc
) mlprev
)
What this does is check if the current value is larger then the previous value. I assume I can use this as well to check the previous one in my current problem, I just don't know how I can add the next one in it AND all the logic that I need to make the calculations.
Assumption: CarID and date are always a unique combination
This is what i came up with:
select with_dates.*,
prev_mileage.mileage as prev_mileage,
next_mileage.mileage as next_mileage,
next_mileage.mileage - prev_mileage.mileage as mileage_delta,
datediff(month,prev_d,next_d) as month_delta,
(next_mileage.mileage - prev_mileage.mileage)/datediff(month,prev_d,next_d)*datediff(month,prev_d,with_dates.d) + prev_mileage.mileage as estimated_mileage
from (select *,
(select top 1 d
from mileage as prev
where carid = c.carid
and prev.d < c.d
and prev.mileage is not null
order by d desc ) as prev_d,
(select top 1 d
from mileage as next_rec
where carid = c.carid
and next_rec.d > c.d
and next_rec.mileage is not null
order by d asc) as next_d
from mileage as c
where mileage is null) as with_dates
join mileage as prev_mileage
on prev_mileage.carid = with_dates.carid
and prev_mileage.d = with_dates.prev_d
join mileage as next_mileage
on next_mileage.carid = with_dates.carid
and next_mileage.d = with_dates.next_d
Logic:
First, for every mileage is nullrecord i select the previous and next date where mileage is not null. After this i just join the rows based on carid and date and do some simple math to approximate.
Hope this helps, it was quite fun.
The following query obtains the previous and next available mileages for a record.
with data as --test data
(
select * from (VALUES
(0, null, getdate()),
(1, 400, '20150101'),
(1, null, '20150201'),
(1, null, '20150301'),
(1, 1050, '20150401'),
(2, 300, '20150101'),
(2, null, '20150201'),
(2, null, '20150301'),
(2, 1235, '20150401'),
(2, null, '20150501'),
(2, 1450, '20150601'),
(3, 200, '20150101'),
(3, null, '20150201')
) as v(carId, mileage, [date])
where v.carId != 0
)
-- replace 'data' with your table name
select d.*,
(select top 1 mileage from data dprev where dprev.mileage is not null and dprev.carId = d.carId and dprev.[date] <= d.date order by dprev.[date] desc) as 'Prev available mileage',
(select top 1 mileage from data dnext where dnext.mileage is not null and dnext.carId = d.carId and dnext.[date] >= d.date order by dnext.[date] asc) as 'Next available mileage'
from data d
Note that these columns can still be null if there is no data available before/after a specific date.
From here it's up to you on how you use these values. Probably you want to interpolate values for records where mileage is missing.
Edit
In order to interpolate the values for missing mileages I had to compute three auxiliary columns:
ri - index of record in a continuous group where mileage is missing
gi - index of a continuous group where mileage is missing per car
gc - count of records per continuous group where mileage is missing
The limit columns from the query above where renamed to
pa (Previous Available) and
na (Next Available).
The query is not compact and I am sure it can be improved but the good part of the cascading CTEs is that you can easily check intermediary results and understand each step.
SQL Fiddle: SO 29363187
with data as --test data
(
select * from (VALUES
(0, null, getdate()),
(1, 400, '20150101'),
(1, null, '20150201'),
(1, null, '20150301'),
(1, 1050, '20150401'),
(2, 300, '20150101'),
(2, null, '20150201'),
(2, null, '20150301'),
(2, 1235, '20150401'),
(2, null, '20150501'),
(2, 1450, '20150601'),
(3, 200, '20150101'),
(3, null, '20150201')
) as v(carId, mileage, [date])
where v.carId != 0
),
-- replace 'data' with your table name
limits AS
(
select d.*,
(select top 1 mileage from data dprev where dprev.mileage is not null and dprev.carId = d.carId and dprev.[date] <= d.date order by dprev.[date] desc) as pa,
(select top 1 mileage from data dnext where dnext.mileage is not null and dnext.carId = d.carId and dnext.[date] >= d.date order by dnext.[date] asc) as na
from data d
),
t1 as
(
SELECT l.*,
case when mileage is not null
then null
else row_number() over (partition by l.carId, l.pa, l.na order by l.carId, l.[date])
end as ri, -- index of record in a continuous group where mileage is missing
case when mileage is not null
then null
else dense_rank() over (partition by carId order by l.carId, l.pa, l.na)
end as gi -- index of a continuous group where mileage is missing per car
from limits l
),
t2 as
(
select *,
(select count(*) from t1 tm where tm.carId = t.carId and tm.gi = t.gi) gc --count of records per continuous group where mileage is missing
FROM t1 t
)
select *,
case when mileage is NULL
then pa + (na - pa) / (gc + 1.0) * ri -- also converts from integer to decimal
else NULL
end as 'Interpolated value'
from t2
order by carId, [date]