How can I join two tables with a many-to-one relationship ordered by date? - sql

I need to join two tables and get the most recent record only. Here is the basic form:
table1.id | table1.region | table1.important_col1
1 | NORTH AMERICA | abc
2 | CHINA | def
2 | NORTH AMERICA | hij
table2.id | table2.region | table2.transaction_date | table2.important_col2
1 | NORTH AMERICA | 2/13/2019 | xyz
1 | NORTH AMERICA | 1/13/2019 | zzz
1 | NORTH AMERICA | 12/13/2018 | xxx
desired result:
1 | NORTH AMERICA | 2/13/2019 | abc | xyz
I wanted to use this answer but it seems like I can't use it if I need to group by and then order by descending date. I will need information in multiple columns on the right hand side, but do not want duplicate rows on the left hand side.
The right hand side may have up to 100s of records per id, but I just need something that works for now. Thanks in advance.
edit: I also need to filter the right hand side on other criteria so a simple MAX(table2.transaction_date) won't work.

You can filter your table using internal window function, I used LAG for this example, but you can use ROW_NUMBER and filter several records. Using sliding windows does not change the number of records or counted as SQL aggregation, i.e. you filter using where rather than with having.
SELECT
t1.id
,t2.transaction_date
,t1.region
,t1.col1
,t2.important_col2
FROM table1 AS t1
OUTER APPLY (
SELECT
id
,transaction_date
,LAG(transaction_date,1) over (partition by id order by transaction_date desc) as prev_td
,important_col2
FROM table2
-- WHERE filter_by_col=1 -- additonal "right side" filtering
) as t2
where t1.id = t2.id
and t2.prev_td is null
Output:
1 2019-02-13 00:00:00.000 NORTH AMERICA abc xyz
I used this to test the above query:
create table table1
(id int,
region varchar(30),
col1 varchar(100));
insert into table1
values (1 ,'NORTH AMERICA' ,'abc'),
(2,'CHINA','def'),
(2,'NORTH AMERICA','hij');
create table table2
(id int,
region varchar(30),
transaction_date datetime,
important_col2 varchar(100))
insert into table2
values
(1 ,'NORTH AMERICA',convert(datetime, '02/13/19', 1),'xyz'),
(1 ,'NORTH AMERICA',convert(datetime, '01/13/19',1),'zzz'),
(1 ,'NORTH AMERICA',convert(datetime, '12/13/18',1),'xxx')

Try in this way:
select table11.id, table1.region, max(table2.transaction_date) transaction_date
from table1
inner join table2
on table1.id = table2.id
group by table1.id, table1.region

If there are more columns in table2 (other than transaction date) that you want to display as well, then aggregation alone cannot solve your question.
In MySQL 8.0 you can use window function ROW_NUMBER() to identify the most recent transaction record, as follows :
SELECT x.*
FROM (
SELECT
t1.*,
t2.*,
ROW_NUMBER() OVER(PARTITION BY t2.region ORDER BY t2.transaction_date DESC) rn
FROM table1 t1
INNER JOIN table2 t2 ON t1.region = t2.region
) x
WHERE x.rn = 1
In earlier versions of MySQL, one solution is to add a NOT EXISTS with a correlated subquery that ensures that we are joining with the most recent transaction for the current region :
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN table2 t2
ON t1.region = t2.region
AND NOT EXISTS (
SELECT 1
FROM table2
WHERE region = t2.region AND transaction_date > t2.transaction_date
)

Related

Compare columns from 2 different tables with only last inserted values in table_2 in SQL Server

If I have two different tables in a SQL Server 2019 database as follows:
Table1
|id | name |
+-----+--------+
| 1 | rose |
| 2 | peter |
| 3 | ann |
| 4 | rose |
| 5 | ann |
Table2
| name2 |
+--------+
|rose |
|ann |
I would like to retrieve only the last tow ids from table1 (which in this case 4 and 5) that match name2 in table2. In other words, match happens only once on the last added names in table1, furthermore, the ids (4, 5) to be inserted in table2.
How to do that using SQL?
Thank you
You can use row_number()
select name,id from
(
select *, row_number() over(partition by t.name order by id desc) as rn
from table1 t join table2 t1 on t.name=t1.name2
)A where rn=1
Your question is vague, so there could be many answers here. My first thought is that you simply want an inner join. This will fetch ONLY the data that both tables share.
SELECT Table1.*
FROM Table1
INNER JOIN Table2 on Table1.name = Table2.name2
You seem to be describing:
select . . . -- whatever columns you want
from (select top (2) t1.*
from table1 t1
order by t1.id desc
) t1 join
table2 t2
on t2.name2 = t1.name;
This doesn't seem particularly useful for the data you have provided, but it does what you describe.
EDIT:
If you want only the most recent rows that match, use row_number():
select . . . -- whatever columns you want
from (select t1.*,
row_number() over (partition by name order by id desc) as seqnum
from table1 t1
) t1 join
table2 t2
on t2.name2 = t1.name and t1.seqnum = 1;

Find the nearest future or equal to date from a table of dates with sql stmt

I have two tables
Table 1
ID | T1_Date
---+-------------
1 | 09/08/2020
2 | 09/30/2020
Table 2
T2_Date | Label
-----------+-----
08/31/2020 | Aug-20
09/20/2020 | Sep-20
10/25/2020 | Oct-20
I'm trying to have the result link the nearest future date label from table 2 with each record in table 1. So my output would look like:
ID | T1_Date | Label
---+------------+--------
1 | 09/08/2020 | Sep-20
2 | 09/30/2020 | Oct-20
So far I can only return all the records that are greater than the T1_Date value, so it repeats all the labels.
Is there a way to just grab the nearest future date label or the equal to label?
One method is a correlated subquery:
select t1.*,
(select max(t2.label) keep (dense_rank first t2.date asc)
from t2
where t2.date > t1.date
)
from t1;
there are many ways, you can solve this problem
One way which is pretty simple but not optimal:
select
t1.*
,(select Label from table2 where t2_date=(select MIN(T2_Date)from table2
where T2_Date>=T1_Date))Label
from table1 t1
-----------------------------------------------------------
The second Way:
with TempTable as
(
select T1_Date,min(T2_Date)T2_Date
from table2
left join table1 on T2_Date>=T1_Date
group by T1_Date
)
select t1.T1_Date,t2.Label from TempTable temp
left join table1 t1 on t1.T1_Date=temp.T1_Date
left join table2 t2 on t2.T2_Date=temp.T2_Date

Oracle Efficiently joining tables with subquery in FROM

Table 1:
| account_no | **other columns**...
+------------+-----------------------
| 1 |
| 2 |
| 3 |
| 4 |
Table 2:
| account_no | TX_No | Balance | History |
+------------+-------+---------+------------+
| 1 | 123 | 123 | 12.01.2011 |
| 1 | 234 | 2312 | 01.03.2011 |
| 3 | 232 | 212 | 19.02.2011 |
| 4 | 117 | 234 | 24.01.2011 |
I have multiple join query, one of the tables(Table 2) inside a query is problematic as it is a view which computes many other things, that is why each query to that table is costly. From Table 2, for each account_no in Table 1 I need the whole row with the greatest TX_NO, this is how I do it:
SELECT * FROM TABLE1 A LEFT JOIN
( SELECT
X.ACCOUNT_NO,
HISTORY,
X.BALANCE
FROM TABLE2 X INNER JOIN
(SELECT
ACCOUNT_NO,
MAX(TX_NO) AS TX_NO
FROM TABLE2
GROUP BY ACCOUNT_NO) Y ON X.ACCOUNT_NO = Y.ACCOUNT_NO) B
ON B.ACCOUNT_NO = A.ACCOUNT_NO
As I understand at first it will make the inner join for all the rows in Table2 and after that left join needed account_no's with Table1 which is what I would like to avoid.
My question: Is there a way to find the max(TX_NO) for only those accounts that are in Table1 instead of going through all? I think it will help to increase the speed of the query.
I think you are on the right track, but I don't think that you need to, and would not myself, nest the subqueries the way you have done. Instead, if you want to get each record from table 1 and the matching max record from table 2, you can try the following:
SELECT * FROM TABLE1 t1
LEFT JOIN
(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY account_no ORDER BY TX_No DESC) rn
FROM TABLE2 t
) t2
ON t1.account_no = t2.account_no AND
t2.rn = 1
If you want to continue with your original approach, this is how I would do it:
SELECT *
FROM TABLE1 t1
LEFT JOIN TABLE2 t2
ON t1.account_no = t2.account_no
INNER JOIN
(
SELECT account_no, MAX(TX_No) AS max_tx_no
FROM TABLE2
GROUP BY account_no
) t3
ON t2.account_no = t3.account_no AND
t2.TX_No = t3.max_tx_no
Instead of using a window function to find the greatest record per account in TABLE2, we use a second join to a subquery instead. I would expect the window function approach to perform better than this double join approach, and once you get used to it can even easier to read.
If table1 is comparatiely less expensive then you could think of doing a left outer join first which would considerable decrease the resultset and from that pick the latest transaction id records alone
select <required columns> from
(
select f.<required_columns),row_number() over (partition by account_no order by tx_id desc ) as rn
from
(
a.*,b.tx_id,b.balance,b.History
from table1 a left outer join table2 b
on a.account_no=b.account_no
)f
)g where g.rn=1

Where clause to check against two columns in another table

I am struggling to get this answer for some reason.
I have two tables, table1 and table2 which look like this:
Table1:
ID Location Warehouse
1 London Narnia
2 Cyprus Metro
3 Norway Neck
4 Paris Triumph
Table2:
ID Area Code
1 London Narnia
2 Cyprus Metro
3 Norway Triumph
4 Paris Neck
I need to first select everything from table1 where table1.Location is in table2.Area AND table1.Warehouse is in table2.Code GIVEN THAT table1.Location is in table2.Area. I.e. I want:
ID Location Warehouse
1 London Narnia
2 Cyprus Metro
I have got to:
select
1.location
, 1.warehouse
from table1 1
where 1.location in (select area from table2)
and 1.warehouse in (select code from table2)
But this won't work because I need the second where clause to be executed based on the first where clause holding true.
I have also tried similar queries with joins to no avail.
Is there a simple way to do this?
Use exists:
select t.location, t.warehouse
from table1 t
where exists (select 1
from table2 t2
where t.location = t2.area and t.warehouse = t2.code
);
I should point out that some databases support row constructors with in. That allows you to do:
select t.location, t.warehouse
from table1 t
where(t1.location, t1.warehouse) in (select t2.area, t2.code from table2 t2);
Maybe I'm missing something, but a simple join on the two conditions would give you the result in your example:
select t1.*
from table1 t1
join table2 t2 on t1.Location = t2.Area
and t1.Warehouse = t2.Code;
Result:
| ID | Location | Warehouse |
|----|----------|-----------|
| 1 | London | Narnia |
| 2 | Cyprus | Metro |
Sample SQL Fiddle
You need to use JOIN.
I'll design the query in a while :)
EDIT:
SELECT
1.location
, 1.warehouse
FROM table1 1
JOIN table2 2 ON 1.location = 2.area AND 1.warehouse = 2.code

Joining two sql tables with a one to many relationship, but want the max of the second table

I am trying to join two tables one is a unique feature the seconds is readings taken on several dates that relate to the unique features. I want all of the records in the first table plus the most recent reading. I was able to get the results I was looking for before adding the shape field. By using the code
SELECT
Table1.Name, Table1.ID, Table1.Shape,
Max(Table2.DATE) as Date
FROM
Table1
LEFT OUTER JOIN
Table2 ON Table1.ID = table2.ID
GROUP BY
Table1.Name, Table1.ID, Table1.Shape
The shape field is a geometry type and I get the error
'The type "Geometry" is not comparable. It can not be use in the Group By Clause'
So I need to go about it a different way, but not sure how.
Below is a sample of the two tables and the desired results.
Table1
Name| ID |Shape
AA1 | 1 | X
BA2 | 2 | Y
CA1 | 3 | Z
CA2 | 4 | Q
Table2
ID | Date
1 | 5/27/2013
1 | 6/27/2014
2 | 5/27/2013
2 | 6/27/2014
3 | 5/27/2013
3 | 6/27/2014
My Desired Result is
Name| ID |Shape |Date
AA1 | 1 | X | 6/27/2014
BA2 | 2 | Y | 6/27/2014
CA1 | 3 | Z | 6/27/2014
CA2 | 4 | Q | Null
You can do the aggregation on Table2 in a CTE, finding the MAX(DATE) for each ID, and then join that result to Table1:
WITH AggregatedTable2(ID, MaxDate) AS
(
SELECT
ID, MAX(DATE)
FROM
Table2
GROUP BY
ID
)
SELECT
t1.ID, t1.Name, t1.Shape, t2.MaxDate
FROM
Table1 t1
LEFT JOIN
AggregatedTable2 t2 ON t1.ID = t2.ID
Try casting geometry as a varchar.
Select Table1.Name, Table1.ID, cast(Table1.Shape as varchar(1)) AS Shape, Max(Table2.DATE) as Date
FROM Table1 LEFT OUTER JOIN
Table2 ON Table1.ID = table2.ID
Group By Table1.Name, Table1.ID, cast(Table1.Shape as varchar(1))
Try this:
SELECT t1.Name
, t1.ID
, t1.Shape
, MAX(t2.Date) As Date
FROM Table1 AS t1
LEFT JOIN Table2 AS t2
ON t2.ID = t1.ID
GROUP
BY t1.Name
, t1.ID
, t1.Shape