Find closest match to value in another table - sql

I have a table_a with many rows and columns for each timestamp in PostgreSQL 13. I'm trying to find the row where the value in column X is closest to a benchmark value obtained from another table.
This second table has only a single benchmark value for each timestamp. For each timestamp, I need to return most of the columns of table_a. The query below works fine when supplying the value for the benchmark directly.
How can I get the benchmark value from table_b to use in this query?
Simply substituting table_b.benchmark with (SELECT benchmark FROM table_b WHERE table_a.timestamp = table_b.timestamp) results in 'relation "t1" does not exist' error.
Could not figure out a working join either.
table_a:
+-----------------+-----+---------------+
| timestamp | x | other_columns |
+-----------------+-----+---------------+
| 2020-01-01 8:00 | 100 | |
| 2020-01-01 8:00 | 200 | |
| 2020-01-01 8:00 | 300 | |
| 2020-01-01 8:00 | 400 | |
| 2020-01-01 8:00 | 500 | |
| ... | | |
| 2020-01-01 9:00 | 100 | |
| 2020-01-01 9:00 | 200 | |
| 2020-01-01 9:00 | 300 | |
| 2020-01-01 9:00 | 400 | |
| 2020-01-01 9:00 | 500 | |
| ... | | |
+-----------------+-----+---------------+
table_b:
+-----------------+-----------+
| timestamp | benchmark |
+-----------------+-----------+
| 2020-01-01 8:00 | 340 |
| 2020-01-01 9:00 | 380 |
| ... | |
+-----------------+-----------+
Expected result:
+-----------------+-----+
| timestamp | x |
+-----------------+-----+
| 2020-01-01 8:00 | 300 |
| 2020-01-01 9:00 | 400 |
| ... | |
+-----------------+-----+
SQL query:
WITH date_filter AS (
SELECT *
FROM table_a
WHERE timestamp >= {start_date} and timestamp < {end_date}
)
SELECT DISTINCT t1.timestamp, t1.x, t1.etc
FROM date_filter AS t1
INNER JOIN (
SELECT timestamp, MIN(ABS(x - (table_b.benchmark))) AS target_value
FROM t1
GROUP BY timestamp
) AS t2
ON t2.timestamp = t1.timestamp AND t2.target_value = ABS(x - (table_b.benchmark))
ORDER BY timestamp ASC;```

One option uses a lateral join:
select b.timestamp, a.x
from table_b b
cross join lateral (
select a.*
from table_a a
where a.timestamp = b.timestamp
order by abs(a.x - b.benchmark)
limit 1
) a
You can also use distinct on:
select distinct on (b.timestamp) b.timestamp, a.x
from table_b b
inner join table_a a on a.timestamp = b.timestamp
order by b.timestamp, abs(a.x - b.benchmark)

I would suggest a lateral join:
select b.*, a.x
from table_b b left join lateral
(select a.*
from table_a a
where a.timestamp = b.timestamp
order by abs(a.x - b.benchmark)
limit 1
) b
on 1=1;

Related

Replace zero values with last available value in join

SQL-FIDDLE Link
I'm using SQL Server.
Edit: Used a wrong SQL Fiddle before - updated to correct one
The join statement:
select t1.A_NR, t1.V_DATE, t1.AMOUNT T1_AMOUNT, t2.AMOUNT T2_AMOUNT
from Table_1 t1
left join Table_2 t2 on t1.A_NR = t2.A_NR and t1.V_DATE = t2.V_DATE
brings me this table with null values in the T2_Amount row.
+------+----------------------+-----------+-----------+
| A_NR | V_DATE | T1_AMOUNT | T2_AMOUNT |
+------+----------------------+-----------+-----------+
| 1 | 2020-01-01T00:00:00Z | 100 | 100 |
| 1 | 2020-01-02T00:00:00Z | 101 | (null) |
| 1 | 2020-01-03T00:00:00Z | 102 | (null) |
| 2 | 2020-01-01T00:00:00Z | 200 | 200 |
| 2 | 2020-01-02T00:00:00Z | 201 | (null) |
| 2 | 2020-01-03T00:00:00Z | 202 | (null) |
+------+----------------------+-----------+-----------+
I want to replace these values with the last available values from Table_2 like this:
+------+----------------------+-----------+-----------+
| A_NR | V_DATE | T1_AMOUNT | T2_AMOUNT |
+------+----------------------+-----------+-----------+
| 1 | 2020-01-01T00:00:00Z | 100 | 100 |
| 1 | 2020-01-02T00:00:00Z | 101 | 100 | --> value from 01.01.2020
| 1 | 2020-01-03T00:00:00Z | 102 | 100 | --> value from 01.01.2020
| 2 | 2020-01-01T00:00:00Z | 200 | 200 |
| 2 | 2020-01-02T00:00:00Z | 201 | 200 | --> value from 01.01.2020
| 2 | 2020-01-03T00:00:00Z | 202 | 200 | --> value from 01.01.2020
+------+----------------------+-----------+-----------+
One option uses a correlated subquery, or a lateral join:
select t1.a_nr, t1.v_date, t1.amount as t1_amount, t2.*
from table_1 t1
outer apply (
select top (1) t2.amount as t2_amount
from table_2 t2
where t2.a_nr = t1.a_nr and t2.v_date <= t1.v_date
order by t2.v_date desc
) t2
An alternative is to use some gaps-and-island technique: we can put unmatched records in groups along with the latest matched record with a window count, then use a window max to recover the value we want:
select a_nr, v_date, amount as t1_amount,
max(t2_amount) over(partition by a_nr, grp)
from (
select t1.*, t2.amount as t2_amount,
count(t2.amount) over(partition by t1.a_nr order by t1.v_date) as grp
from table_1 t1
left join table_2 t2 on t2.a_nr = t1.a_nr and t2.v_date = t1.v_date
) t

SQL: How to return just 1 previous date for a record, not all previous dates

I have a very simple table of ID's and Sign-in dates and I want to use SQL to make a column that shows the previous sign-in date:
Table: SIGNIN
| ID | Sign-in Date |
| A | 01/01/19 |
| B | 01/01/19 |
| C | 02/01/19 |
| A | 02/01/19 |
| A | 03/01/19 |
| B | 03/01/19 |
| A | 04/01/19 |
| C | 04/01/19 |
| B | 05/01/19 |
I've tried doing a join to itself but it's showing all previous sign-in dates rather than just the most recent.
SELECT [SIGNIN].ID
[SIGNIN].SignInDate
FROM [SIGNIN]
INNER JOIN [SIGNIN] as [Prev] on [SIGNIN].ID = [Prev].ID
and [SIGNIN].SignInDate < [Prev].SignInDate
ORDER BY [SIGNIN].ID, [SIGNIN].SignInDate
The result I want:
Table: SIGNIN
| ID | Sign-in Date | Previous |
| A | 01/01/19 | NULL |
| B | 01/01/19 | NULL |
| C | 02/01/19 | NULL |
| A | 02/01/19 | 01/01/19 |
| A | 03/01/19 | 02/01/19 |
| B | 03/01/19 | 01/01/19 |
| A | 04/01/19 | 03/01/19 |
| C | 04/01/19 | 02/01/19 |
| B | 05/01/19 | 03/01/19 |
What I'm getting:
| ID | Sign-in Date | Previous |
| A | 01/01/19 | NULL |
| B | 01/01/19 | NULL |
| C | 02/01/19 | NULL |
| A | 02/01/19 | 01/01/19 |
| A | 03/01/19 | 01/01/19 |
| A | 03/01/19 | 02/01/19 |
| B | 03/01/19 | 01/01/19 |
| A | 04/01/19 | 01/01/19 |
| A | 04/01/19 | 02/01/19 |
| A | 04/01/19 | 03/01/19 |
| C | 04/01/19 | 02/01/19 |
| B | 05/01/19 | 01/01/19 |
| B | 05/01/19 | 03/01/19 |
I'm certain this has been answered elsewhere before, but the biggest problem I'm having is not knowing how to word my problem!
EDIT: Really helpful responses so far, but is there a solution where I can change the date "cut-off" eg:
Cut off: 03/01/19
Table: The same
Desired result:
| ID | Sign-in Date | Previous |
| A | 03/01/19 | 02/01/19 |
| B | 03/01/19 | 01/01/19 |
| A | 04/01/19 | 03/01/19 |
| C | 04/01/19 | 02/01/19 |
| B | 05/01/19 | 03/01/19 |
I think that if you need to do that it's better to make an ordering column like:
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SignInDate) AS O FROM [SIGNIN]
So the end result would be like:
SELECT t.ID, t.SignInDate [Sign-In Date], t2.SignInDate as Previous
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SignInDate) AS O FROM [SIGNIN]) t
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SignInDate) AS O FROM [SIGNIN]) t2 ON t.ID = t2.ID AND t.O = t2.O+1
Which should give Something akin to:
A 2019-01-01 NULL
A 2019-01-04 2019-01-01
A 2019-02-01 2019-01-04
B 2019-01-01 NULL
B 2019-01-05 2019-01-01
C 2019-01-01 NULL
Hope this helps.
Try using LAG assuming you're on a modern version of SQL Server.
SELECT [SIGNIN].ID,
[SIGNIN].SignInDate,
LAG([SIGNIN].SignInDate) OVER (PARTITION BY [SIGNIN].ID ORDER BY [SIGNIN].SignInDate DESC) AS Previous
FROM [SIGNIN]
using this:
SELECT [SIGNIN].ID,
[SIGNIN].SignInDate,
MAX([Prev].SignInDate) as Previous
FROM [SIGNIN]
LEFT JOIN [SIGNIN] as [Prev] on [SIGNIN].ID = [Prev].ID
and [SIGNIN].SignInDate > [Prev].SignInDate
GROUP BY [SIGNIN].ID, [SIGNIN].SignInDate
ORDER BY [SIGNIN].ID, [SIGNIN].SignInDate
Try something like this:
SELECT
ID, SignInDate,
LAG(SignInDate, 1,SignInDate) OVER(order by ID partition by ID)
FROM SIGNIN
The following will give you almost what you are looking for, just without the nulls.
You should probably do a left outer or right outer join in the inner query, and some extra maneuver to add the null rows as well. I am a lit
select id, max(prev) as prev, signindate from
(
SELECT SIGNIN.ID,
SIGNIN.SignInDate as prev,
prev.signindate
FROM SIGNIN
JOIN SIGNIN as Prev on SIGNIN.ID = Prev.ID
and SIGNIN.SignInDate < Prev.SignInDate
ORDER BY SIGNIN.ID, SIGNIN.SignInDate
) a
group by 1,3
I like the APPLY solution because you can add any amount of columns from the matching row(s):
DECLARE #CutOffDate DATE = '2019-01-03'
SELECT
S.ID,
S.SignInDate,
PreviousSignInDate = R.SignInDate
FROM
[SIGNIN] AS S
OUTER APPLY (
SELECT TOP 1
P.* -- Can incorporate many columns (will also have to add them on the outmost SELECT list)
FROM
SIGNIN AS P
WHERE
S.ID = P.ID AND
P.SignInDate < S.SignInDate
ORDER BY
P.SignInDate DESC
) AS R
WHERE
S.SignInDate >= #CutOffDate
ORDER BY
S.SignInDate,
S.ID
For this case, you can use TOP 1 + ORDER BY to fetch the previous one, as long as you have the link S.ID = P.ID and making sure that P.SignInDate < S.SignInDate.
Also get used to writing dates on the YYYY-MM-DD format, since 03/01/19 might lead to confusions.
A correlated subquery is a very simple solution :
SELECT ID, SignInDate,
(SELECT top 1 SigInDate
FROM SIGNIN as S2
WHERE S2.ID = S1.ID and S2.SignInDate < S1.SignInDate
ORDER BY S2.SignInDate desc) as Previous
FROM SIGNIN as S1
ORDER BY S1.ID, S1.SignInDate

Get value from previous row data for next datas by dates

I have a table 2 transactions let's say table A and B, for some cases i need to transfer row datas from table B to A as new table with several conditions :
The price for the data transferred will follow from the previous data
The same date will not processed into results
When there is no previous data, it will not processed into results
For Example :
-----------Table A------------- ----------Table B----------
product | Date | Price | | Product | Date |
A | 2019-01-01 | 10 | | A | 2018-11-05 |
A | 2019-01-15 | 15 | | A | 2019-01-10 |
A | 2019-01-25 | 20 | | A | 2019-01-12 |
A | 2019-05-01 | 25 | | A | 2019-01-27 |
A | 2019-07-02 | 30 | | B | 2019-02-10 |
B | 2019-02-05 | 40 | | B | 2019-04-22 |
B | 2019-04-22 | 50 | | B | 2019-05-13 |
B | 2019-05-12 | 40 |
Result :
-----------Table C-------------
product | Date | Price |
A | 2019-01-01 | 10 |
A | 2019-01-10 | 10 | *The prices follow the data in the previous date (2019-01-01)
A | 2019-01-12 | 10 | *The prices follow the data in the previous date (2019-01-01)
A | 2019-01-15 | 15 |
A | 2019-01-25 | 20 |
A | 2019-01-27 | 20 | *The prices follow the data in the previous date (2019-01-25)
A | 2019-05-01 | 25 |
A | 2019-07-02 | 30 |
B | 2019-02-05 | 40 |
B | 2019-02-10 | 40 | *The prices follow the data in the previous date (2019-02-05)
B | 2019-04-22 | 50 |
B | 2019-05-12 | 40 |
B | 2019-05-13 | 40 | *The prices follow the data in the previous date (2019-05-12)
NOTE:
For product A in Table B on 2018-11-05 not processed into results because there's no data before that date in the table A for that product.
For product B in Table B on 2019-04-22 not processed into results because the date and product in table A and B are the same (The data is already in table A)
I try not to use looping mechanism because my data reaches millions, but i was too dizzy to think about it.
One way is using group by in a cte and then union:
WITH cte AS(
SELECT b.product,
b.[Date],
MAX(a.[Date]) AS [DateValue]
FROM TableA AS a
INNER JOIN TableB AS b ON a.product = b.product
WHERE a.[Date] <= b.[Date]
GROUP BY b.product, b.[Date]
)
SELECT *
FROM dbo.TableA AS a
UNION
SELECT b.product,
b.[Date],
a.Price
FROM cte AS c
INNER JOIN dbo.TableB AS b ON b.product = c.product AND b.[Date] = c.[Date]
INNER JOIN dbo.TableA AS a ON a.product = c.product AND a.[Date] = c.[DateValue]
ORDER BY product, [Date]
One method uses union all and cross apply:
select ab.product, ab.date, p.price
from ((select a.product, a.date
from a
) union -- intentional to remove duplicates
(select b.product b.date
from b
)
) ab cross apply
(select top (1) a.price
from a
where a.product = ab.product and a.date <= ab.date
order by ab.date desc
) p;
Note that cross apply will eliminate the rows from b that have no price.
If SQL support the ignore nulls option on either last_value() or lag(), this would be more appropriate with a full join:
select coalesce(a.product, b.product) as product,
coalesce(a.date, b.date) as date,
coalesce(a.price,
lag(ignore nulls a.price) over (partition by coalesce(a.product, b.product) order by coalesce(a.date, b.date)) as price
from a full join
b
on a.product = b.product and a.date = b.date;
Alas, SQL Server does not (currently) support that. You can make that work with a bit of effort and additional subqueries.
SQL MERGE is a very powerful tool to perform "CRUD" operation based on some condition...
Please follow the link for more details of this feature.
http://www.sqlservertutorial.net/sql-server-basics/sql-server-merge/
https://www.essentialsql.com/introduction-merge-statement/
Please feel free to ask if you have any doubt.

SQL JOIN with multiple date condition

We have the first valuetable table and the query should check if there is
a next younger datetime in the correctiontable table and should add the corrvalue with the corrdatetime.
My problem query:
SELECT * FROM valuetable vt
LEFT JOIN correctiontable corr ON corr.value_id = vt.id WHERE vt.datetime <= corr.corrdatetime
is just delivering the last corrdatetime...
To clarify te results:
Row1 id1 should be NULL as the valuetable datetime is younger than the correction datetime
Row2 id2 should be 01/08/2017 00:00:00 as the datetime in valuetable is older but younger than the 01/12/2017 10:00:00 corrdatetime
Row3 id2 got its correction on 01/12/2017 10:00:00
Row4 id3 is NULL, there is no corrdatetime in correctiontable for it
Thank you all ++
+----------------------------------+
| valuetable |
+----------------------------------+
| id | datetime | value |
+----+---------------------+-------+
| 1 | 22/07/2017 13:00:00 | 123 |
+----+---------------------+-------+
| 2 | 10/08/2017 09:00:00 | 456 |
+----+---------------------+-------+
| 2 | 05/12/2017 20:00:00 | 789 |
+----+---------------------+-------+
| 3 | 11/11/2017 11:11:11 | 012 |
+----+---------------------+-------+
+-------------------------------------------------+
| correctiontable |
+-------------------------------------------------+
| id | value_id | corrdatetime | corrvalue |
+----+----------+---------------------+-----------+
| 1 | 2 | 01/08/2017 00:00:00 | 888 |
+----+----------+---------------------+-----------+
| 2 | 2 | 01/12/2017 10:00:00 | 999 |
+----+----------+---------------------+-----------+
| 3 | 1 | 01/08/2017 20:00:00 | 111 |
+----+----------+---------------------+-----------+
+--------------------------------------------------------------------+
| Result (as it should be) |
+--------------------------------------------------------------------+
| id | datetime | corrdatetime | value | corrvalue |
+----+---------------------+---------------------+-------+-----------+
| 1 | 22/07/2017 13:00:00 | NULL | 123 | NULL |
+----+---------------------+---------------------+-------+-----------+
| 2 | 10/08/2017 09:00:00 | 01/08/2017 00:00:00 | 456 | 888 |
+----+---------------------+---------------------+-------+-----------+
| 2 | 05/12/2017 20:00:00 | 01/12/2017 10:00:00 | 789 | 999 |
+----+---------------------+---------------------+-------+-----------+
| 3 | 11/11/2017 11:11:11 | NULL | 012 | NULL |
+----+---------------------+---------------------+-------+-----------+
Assuming "younger" means "logically less than", this should work for you.
select *
from valuetable a
outer apply (
select top 1 *
from correctiontable y
where y.value_id = a.id
and y.datetime < a.datetime
order by y.datetime desc
) b
Many Thanks to #KindaTechy for delivering me the right path!
I've created two querys, one for MySQL and one for >= Oracle 12.1
For MySQL:
SELECT *
FROM valuetable vt
LEFT JOIN correctiontable ON correctiontable.id
=
(SELECT corr.id
FROM correctiontable corr
WHERE vt.id = corr.value_id
AND vt.datetime <= corr.corrdatetime
ORDER BY datetime DESC
LIMIT 1)
For Oracle:
select *
from valuetable vt
outer apply (
select *
from correctiontable corr
where corr.value_id = vt.id
and corr.corrdatetime < vt.datetime
order by corr.corrdatetime desc
FETCH FIRST 1 ROWS ONLY
) b;
I found a working query, but your id column of valuetable should be unique, because otherwise you get a cross product.
SELECT vt.id, vt.datetime, corr.corrdatetime, vt.value, corr.corrvalue
FROM valuetable vt
LEFT JOIN correctiontable corr
ON corr.value_id = vt.id
AND vt.datetime >= corr.corrdatetime
By changing the date constraint from WHERE-CLAUSE to ON-CLAUSE it will impact only the join and not the result.
A made a sample for you http://sqlfiddle.com/#!9/301a6/4/0
When you need that non-unique id, the query must be improved. And also the test data set.

SQL Query to Join Two Tables Based On Closest Timestamp

I need to retrieve the records from dbo.transaction (transaction of all users-more than one transaction for each user) that having timestamp which is closest to the time in dbo.bal (current balance details of each user-only one record for each user)
ie, the resultant records should equal to the no of records in the dbo.bal
Here i tried the below query, am getting only the records less than the time in dbo.bal. But there are some record having timestamp greater than and closest to dbo.bal.time
SELECT dbo.bal.uid,
dbo.bal.userId,
dbo.bal.balance,
dbo.bal.time,
(SELECT TOP 1 transactionBal
FROM dbo.transaction
WHERE TIMESTAMP <= dbo.bal.time
ORDER BY TIMESTAMP DESC) AS newBal
FROM dbo.bal
WHERE dbo.bal.time IS NOT NULL
ORDER BY dbo.bal.time DESC
here is my table structure,
dbo.transaction
---------------
| uid| userId | description| timestamp | credit | transactionBal
-------------------------------------------------------------------------
| 1 | 101 | buy credit1| 2012-01-25 03:23:31.624 | 100 | 500
| 2 | 102 | buy credit5| 2012-01-18 03:13:12.657 | 500 | 700
| 3 | 103 | buy credit3| 2012-01-15 02:16:34.667 | 300 | 300
| 4 | 101 | buy credit2| 2012-01-13 05:34:45.637 | 200 | 300
| 5 | 101 | buy credit1| 2012-01-12 07:45:21.457 | 100 | 100
| 6 | 102 | buy credit2| 2012-01-01 08:18:34.677 | 200 | 200
dbo.bal
-------
| uid| userId | balance | time |
-----------------------------------------------------
| 1 | 101 | 500 | 2012-01-13 05:34:45.645 |
| 2 | 102 | 700 | 2012-01-01 08:18:34.685 |
| 3 | 103 | 300 | 2012-01-15 02:16:34.672 |
And the result should be like,
| Id | userId | balance | time | credit | transactionBal
-----------------------------------------------------------------------------
| 1 | 101 | 500 | 2012-01-13 05:34:45.645 | 200 | 300
| 2 | 102 | 700 | 2012-01-01 08:18:34.685 | 200 | 200
| 3 | 103 | 300 | 2012-01-15 02:16:34.672 | 300 | 300
Please help me.. Any help is must appreciated...Thankyou
It would be helpful if you posted your table structures, but ...
I think your inner query needs a join condition. (That is not actually in your question)
Your ORDER BY clause in the inner query could be ABS(TIMESTAMP - DB0.BAL.TIME). That should give you the smallest difference between the 2.
Does that help ?
Based on the follwing Sql Fiddle http://sqlfiddle.com/#!3/7a900/15 I came up with ...
SELECT
bal.uid,
bal.userId,
bal.balance,
bal.time,
trn.timestamp,
trn.description,
datediff(ms, bal.time, trn.timestamp)
FROM
money_balances bal
JOIN money_transaction trn on
trn.userid = bal.userid and
trn.uid =
(
select top 1 uid
from money_transaction trn2
where trn2.userid = trn.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
)
WHERE
bal.time IS NOT NULL
ORDER BY
bal.time DESC
I cannot vouch for its performance because I know nothing of your data, but I believe it works.
I have simplified my answer - I believe what you need is
SELECT
bal.uid as baluid,
(
select top 1 uid
from money_transaction trn2
where trn2.userid = bal.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
) as tranuid
FROM
money_balances bal
and from that you can derive all the datasets you need.
for example :
with matched_credits as
(
SELECT
bal.uid as baluid,
(
select top 1 uid
from money_transaction trn2
where trn2.userid = bal.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
) as tranuid
FROM
money_balances bal
)
select
*
from
matched_credits mc
join money_balances mb on
mb.uid = mc.baluid
join money_transaction trn on
trn.uid = mc.tranuid
Try:
SELECT dbo.bal.uid,
dbo.bal.userId,
dbo.bal.balance,
dbo.bal.time,
(SELECT TOP 1 transactionBal
FROM dbo.transaction
ORDER BY abs(datediff(ms, dbo.bal.time, TIMESTAMP))) AS newBal
FROM dbo.bal
WHERE dbo.bal.time IS NOT NULL
ORDER BY dbo.bal.time DESC