SQL - Join two tables without unique fields

SQL - Join two tables without unique fields - sql

This is my first post in this forum so please be understanding.
I have following issue.
I want to two join two tables:
Table1:
Product | Start Date | End Date
-------------------------------------
Product1 | 01/01/2014 | 01/05/2015
Product2 | 01/03/2014 | 01/01/2015
Table2:
Product | Start Date | End Date | Value
--------------------------------------------
Product1 | 01/01/2014 | 01/02/2015 | 10
Product1 | 02/02/2014 | 01/04/2015 | 15
Product1 | 02/04/2014 | 01/05/2015 | 15
Product2 | 01/03/2014 | 04/05/2014 | 5
Product2 | 05/05/2014 | 01/01/2015 | 5
To have a table with latest value like:
Product | Start Date | End Date | Value
------------------------------------------------
Product1 | 02/04/2014 | 01/05/2015 | 15
Product2 | 05/05/2014 | 01/01/2015 | 5
I need to join them and not use just the second table because both of them have more unique columns that I need to use.
I was thinking about firstly using some kind of IF function on second table to make one row per product (the one with latest start date) and than join it simply then with first table. But I have no idea how to do the first part.
I am really looking forward for your help.
Regards,
Matt

Just use WHERE NOT EXISTS to filter out everything but the latest date from TABLE2 (I am assuming that you are asking for the latest STARTDATE from TABLE2; also I add 'SomeOtherField' to Table1, because otherwise you could just query Table2):
SELECT t1.Product, t1.SomeOtherField, t2.StartDate, t2.EndDate, t2.Value
FROM Table1 t1
JOIN (SELECT a.Product, a.StartDate, a.EndDate, a.Value FROM Table2 a
WHERE NOT EXISTS (SELECT * FROM Table2 b
WHERE b.Product = a.Product AND b.StartDate > a.StartDate)) t2
ON (t2.Product = t1.Product)

This is possible, the query will involve three steps:
Find all the max start date for each product in table 2. Hint: use group by.
Join table 2 with the result from #1 to get the Value.
Join table 1 with the result from #2 to filter out products that are not in table 1.

Not sure you need Table1 at all in your example, you just need to aggregate Table2 to find the MAX([Start Date] for each Product:
SELECT a.*
FROM Table2 a
JOIN (SELECT Product,MAX([Start Date]) AS Mx_Start_Dt
FROM Table2
GROUP BY Product
) b
ON a.Product = b.Product
AND a.[Start Date] = b.Mx_Start_Dt
If you do need to bring in fields from Table you can just add another JOIN:
SELECT a.*,b.*
FROM Table1 a
JOIN (SELECT a.*
FROM Table2 a
JOIN (SELECT Product,MAX([Start Date]) AS Mx_Start_Dt
FROM Table2
GROUP BY Product
) b
ON a.Product = b.Product
AND a.[Start Date] = b.Mx_Start_Dt
) c
ON a.Product = b.Product
If using a database that supports analytic functions, you can make it cleaner via the ROW_NUMBER() function:
;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY Product ORDER BY [Start Date] DESC) AS RN
FROM Table2
)
SELECT *
FROM Table1 a
JOIN cte b
ON a.Product = b.Product
AND b.RN = 1

Here is a solution using ROW_NUMBER in SQLServer:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Product ORDER BY StartDate DESC) RN
FROM #T
) Results
WHERE RN = 1

Related

How to JOIN 2 tables and keep only the most recent records

I have 2 tables I want to join. One table is a historical record of inventory that has a "last updated" date associated with each "piece" of inventory. The other table has the prices for each of those pieces. I want to join the tables so that I get the historical records with each of their prices. eg.
TABLE 1
Date Item Location QTY
06/01/2020 ABC 123 10
06/01/2020 DEF 234 12
06/02/2020 ABC 345 13
06/06/2020 ABC 123 10
TABLE 2
ITEM Price
ABC 34.5
DEF 52.12
-----------------> result table ------------------>
Date Item Location QTY Price
06/01/2020 DEF 234 12 34.5
06/02/2020 ABC 345 13 52.12
06/06/2020 ABC 123 10 34.5
Where the result table filters so that it only keeps the most recent records. Eg. TABLE1 updates every minute to show new inventory levels. The item + location combination is "unique" in the sense that table1 is at the item/location level of granularity. However, there can be many of the same item/location combinations as the table updates and creates new entries (it is a historical table, so older entries with the same item + location combination remain in the table). Sometimes the date is different, sometimes the date is the same day.
The query I wrote to try to do this is:
SELECT DISTINCT
TB1.DATE
,TB1.ITEM
,TB1.LOCATION
,TB1.QTY
,TB2.ITEM_COST
FROM
(
SCHEMA_1.TABLE1 AS TB1
JOIN SCHEMA_1.TABLE2 AS TB2
ON TB1.ITEM = TB2.ITEM
JOIN (
SELECT ITEM AS ITM,
LOCATION AS LOC,
MAX(DATE) AS MAXDATE
FROM SCHEMA_1.TABLE1
GROUP BY ITEM, LOCATION
)TB3
ON TB1.ITEM = TB3.ITM AND TB1.LOCATION= TB3.LOC AND TB1.DATE= TB3.MAXDATE
)
This query does execute but it gives me duplicates and definitely does not filter for the most recent records only. Not sure what I'm doing wrong here.

Good old subselect should work, too.
Assuming unqiqe Date per item, Location pair.
SELECT T1.* , T2.price
FROM SCHEMA_1.TABLE1 AS TB1
JOIN SCHEMA_1.TABLE2 AS TB2 ON TB1.Item = TB2.Item
WHERE Date = (SELECT MAX(Date) FROM SCHEMA_1.TABLE1 AS TB3
WHERE TB1.Item = TB3.Item
AND TB1.Location = TB3.Location)

I would suggest:
SELECT t1.*, t2.ITEM_PRICE
FROM SCHEMA_1.TABLE1 t1 JOIN
(SELECT t2.ITEM, t2.LOCATION,
MAX(t2.ITEM_PRICE) KEEP (DENSE_RANK FIRST ORDER BY t2.DATE DESC) as ITEM_PRICE
FROM SCHEMA_1.TABLE2 t2
GROUP BY t2.ITEM, t2.LOCATION
) t2
USING (ITEM, LOCATION);
Oracle has the convenient functionality to get the "first" or "last" value within a group. KEEP isn't the simplest syntax for this endeavor, but it does exactly what you want.

Columns names(dte= Date, LOC = Location) are changed but you can try this simple query to get the results:
Select dte dates, item, loc Locations, price, qty from
(Select a.dte, a.item, a.loc, b.price, a.qty,
max(a.dte) OVER (PARTITION BY a.item, a.loc) latest_dt
from table1 a LEFT JOIN table2 b ON a.item = b.item) where dte = latest_dt
order by 1;
Output:
+-----------+------+-----------+-------+-----+
| DATES | ITEM | LOCATIONS | PRICE | QTY |
+-----------+------+-----------+-------+-----+
| 01-JUN-20 | DEF | 234 | 52.12 | 12 |
+-----------+------+-----------+-------+-----+
| 02-JUN-20 | ABC | 345 | 34.5 | 13 |
+-----------+------+-----------+-------+-----+
| 06-JUN-20 | ABC | 123 | 34.5 | 10 |
+-----------+------+-----------+-------+-----+
You can also get Latest date as : max(a.dte) KEEP (DENSE_RANK FIRST order by dte desc) OVER (PARTITION BY a.item, a.loc )

MAX plus other data with JOIN

This should be a simple one I reckon (at least I thought it would be when I started doing it a couple hours ago).
I am trying to select the MAX value from one table and join it with another to get the pertinent data.
I have two tables: ACCOUNTS & ACCOUNT_BALANCES
Here they are:
ACCOUNTS
ACC_ID | NAME | IMG_LOCATION
------------------------------------
0 | Cash | images/cash.png
500 | MyBank | images/mybank.png
and
ACCOUNT_BALANCES
ACC_ID | BALANCE | UPDATE_DATE
-------------------------------
500 | 100 | 2017-11-10
500 | 250 | 2018-01-11
0 | 100 | 2018-01-05
I would like the end result to look like:
ACC_ID | NAME | IMG_LOCATION | BALANCE | UPDATE_DATE
----------------------------------------------------------------
0 | Cash | images/cash.png | 100 | 2018-01-05
500 | MyBank | images/mybank.png | 250 | 2018-01-11
I thought I could select the MAX(UPDATE_DATE) from the ACCOUNT_BALANCES table, and join with the ACCOUNTS table to get the account name (as displayed above), but having to group by means my end result includes all records from the ACCOUNT_BALANCES table.
I can use this query to select only the records from ACCOUNT_BALANCES with the max UPDATE_DATE, but I can't include the balance.
SELECT
a.ACC_ID,
a.IMG_LOCATION,
a.NAME,
x.UDATE
FROM
ACCOUNTS a
RIGHT JOIN
(
SELECT
b.ACC_ID,
MAX(b.UPDATE_DATE) as UDATE
FROM
ACCOUNT_BALANCES b
GROUP BY
b.ACC_ID
) x
ON
a.ACC_ID = x.ACC_ID
If I include ACCOUNT_BALANCES.BALANCE in the above query (like so):
SELECT
a.ACC_ID,
a.IMG_LOCATION,
a.NAME,
x.UDATE
FROM
ACCOUNTS a
RIGHT JOIN
(
SELECT
b.ACC_ID,
b.BALANCE,
MAX(b.UPDATE_DATE) as UDATE
FROM
ACCOUNT_BALANCES b
GROUP BY
b.ACC_ID, b.BALANCE
) x
ON
a.ACC_ID = x.ACC_ID
The results returned looks like this:
ACC_ID | NAME | IMG_LOCATION | BALANCE | UPDATE_DATE
----------------------------------------------------------------
0 | Cash | images/cash.png | 100 | 2018-01-05
500 | MyBank | images/mybank.png | 100 | 2018-01-11
500 | MyBank | images/mybank.png | 250 | 2018-01-11
Which is obviously the case, since I'm grouping by BALANCE in the subquery.
I'm super hesitant to post this, as this seems exactly the kind of questions that's been answered n times, but I've searched around a lot, but couldn't find anything that really helped me.
This one has a really good answer, but isn't exactly what I'm looking for
I'm obviously missing something really simple and even pointers in the right direction will help. Thank you.

Try This
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY AC.ACC_ID ORDER BY AB.UPDATE_DATE DESC),
AC.ACC_ID,
NAME,
IMG_LOCATION,
BALANCE,
UPDATE_DATE
FROM ACCOUNTS AC
INNER JOIN ACCOUNT_BALANCES AB
ON AC.ACC_ID = AB.ACC_ID
)
SELECT
*
FROM CTE
WHERE RN = 1

I think the simplest method is outer apply:
select a.*, ab.*
from accounts a outer apply
(select top 1 ab.*
from account_balances ab
where ab.acc_id = a.acc_id
order by ab.update_date desc
) ab;
apply implements what is technically known as a "lateral join". This is a very powerful type of join -- something like a generalization of correlated subqueries.

The accepted answer will work. However, for large data sets try to avoid using the row_number() function. Instead, you can try something like this (assuming update_date is part of a unique constraint):
SELECT
a.Acc_Id,
a.Name,
a.Img_Location,
bDetail.Balance,
bDetail.Update_Date
FROM
#accounts AS a LEFT JOIN
(
SELECT Acc_Id, MAX(Update_Date) AS Update_Date
FROM #account_balances AS b
GROUP BY Acc_Id
) AS maxDate ON a.Acc_Id = maxDate.Acc_Id
LEFT JOIN #account_balances AS bDetail ON maxDate.Acc_Id = bDetail.Acc_Id AND
maxDate.Update_Date = bDetail.Update_Date

Oracle Efficiently joining tables with subquery in FROM

Table 1:
| account_no | **other columns**...
+------------+-----------------------
| 1 |
| 2 |
| 3 |
| 4 |
Table 2:
| account_no | TX_No | Balance | History |
+------------+-------+---------+------------+
| 1 | 123 | 123 | 12.01.2011 |
| 1 | 234 | 2312 | 01.03.2011 |
| 3 | 232 | 212 | 19.02.2011 |
| 4 | 117 | 234 | 24.01.2011 |
I have multiple join query, one of the tables(Table 2) inside a query is problematic as it is a view which computes many other things, that is why each query to that table is costly. From Table 2, for each account_no in Table 1 I need the whole row with the greatest TX_NO, this is how I do it:
SELECT * FROM TABLE1 A LEFT JOIN
( SELECT
X.ACCOUNT_NO,
HISTORY,
X.BALANCE
FROM TABLE2 X INNER JOIN
(SELECT
ACCOUNT_NO,
MAX(TX_NO) AS TX_NO
FROM TABLE2
GROUP BY ACCOUNT_NO) Y ON X.ACCOUNT_NO = Y.ACCOUNT_NO) B
ON B.ACCOUNT_NO = A.ACCOUNT_NO
As I understand at first it will make the inner join for all the rows in Table2 and after that left join needed account_no's with Table1 which is what I would like to avoid.
My question: Is there a way to find the max(TX_NO) for only those accounts that are in Table1 instead of going through all? I think it will help to increase the speed of the query.

I think you are on the right track, but I don't think that you need to, and would not myself, nest the subqueries the way you have done. Instead, if you want to get each record from table 1 and the matching max record from table 2, you can try the following:
SELECT * FROM TABLE1 t1
LEFT JOIN
(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY account_no ORDER BY TX_No DESC) rn
FROM TABLE2 t
) t2
ON t1.account_no = t2.account_no AND
t2.rn = 1
If you want to continue with your original approach, this is how I would do it:
SELECT *
FROM TABLE1 t1
LEFT JOIN TABLE2 t2
ON t1.account_no = t2.account_no
INNER JOIN
(
SELECT account_no, MAX(TX_No) AS max_tx_no
FROM TABLE2
GROUP BY account_no
) t3
ON t2.account_no = t3.account_no AND
t2.TX_No = t3.max_tx_no
Instead of using a window function to find the greatest record per account in TABLE2, we use a second join to a subquery instead. I would expect the window function approach to perform better than this double join approach, and once you get used to it can even easier to read.

If table1 is comparatiely less expensive then you could think of doing a left outer join first which would considerable decrease the resultset and from that pick the latest transaction id records alone
select <required columns> from
(
select f.<required_columns),row_number() over (partition by account_no order by tx_id desc ) as rn
from
(
a.*,b.tx_id,b.balance,b.History
from table1 a left outer join table2 b
on a.account_no=b.account_no
)f
)g where g.rn=1

Joining Table A and B to get elements of both

I have two tables:
Table 'bookings':
id | date | hours
--------------------------
1 | 06/01/2016 | 2
1 | 06/02/2016 | 1
2 | 06/03/2016 | 2
3 | 06/03/2016 | 4
Table 'lookupCalendar':
date
-----
06/01/2016
06/02/2016
06/03/2016
I want to join them together so that I have a date for each booking so that the results look like this:
Table 'results':
id | date | hours
--------------------------
1 | 06/01/2016 | 2
1 | 06/02/2016 | 1
1 | 06/03/2016 | 0 <-- Added by query
2 | 06/01/2016 | 0 <-- Added by query
2 | 06/02/2016 | 0 <-- Added by query
2 | 06/03/2016 | 2
3 | 06/01/2016 | 0 <-- Added by query
3 | 06/02/2016 | 0 <-- Added by query
3 | 06/03/2016 | 4
I have tried doing a cross-apply, but that doesn't get me there, neither does a full join. The FULL JOIN just gives me nulls in the id column and the cross-apply gives me too much data.
Is there a query that can give me the results table above?
More Information
It might be beneficial to note that I am doing this so that I can calculate an average hours booked over a period of time, not just the number of records in the table.
Ideally, I'd be able to do
SELECT AVG(hours) AS my_average, id
FROM bookings
GROUP BY id
But since that would just give me a count of the records instead of the count of the days I want to cross apply it with the dates. Then I think I can just do the query above with the results table.

select i.id, c.date, coalesce(b.hours, 0) as hours
from lookupCalendar c
cross join (select distinct id from bookings) i
left join bookings b
on b.id = i.id
and b.date = c.date
order by i.id, c.date

Try this:
select c.date, b.id, isnull(b.hours, 0)
from lookupCalendar c
left join bookings b on b.date = c.date
LookupCalendar is your main table because you want the bookings against each date, irrespective of whether there was a booking on that date or not, so a left join is required.
I am not sure if you need to include b.id to solve your actual problem though. Wouldn't you just want to get the total number of hours booked against each date like this, to then calculate the average?:
select c.date, sum(isnull(b.hours, 0))
from lookupCalendar c
left join bookings b on b.date = c.date
group by c.date

You can try joining all the combinations of IDs and dates and left joining the data;
WITH Booking AS (SELECT *
FROM (VALUES
( 1 , '06/01/2016', 2 )
, ( 1 , '06/02/2016', 1 )
, ( 2 , '06/03/2016', 2 )
, ( 3 , '06/03/2016', 4 )
) x (id, date, hours)
)
, lookupid AS (
SELECT DISTINCT id FROM Booking
)
, lookupCalender AS (
SELECT DISTINCT date FROM Booking
)
SELECT ID.id, Cal.Date, ISNULL(B.Hours,0) AS hours
FROM lookupid id
INNER JOIN lookupCalender Cal
ON 1 = 1
LEFT JOIN Booking B
ON id.id = B.id
AND Cal.date = B.Date
ORDER BY ID.id, Cal.Date

SQL Query - Get count of two columns from two tables

Table 1:
TicketNumber | Rules
---------------------------
PR123 | rule_123
PR123 | rule_234
PR123 | rule_456
PR999 | rule_abc
PR999 | rule_xyz
Table2:
TicketNumber | Rules
---------------------------
PR123 | rule_123
PR123 | rule_234
PR999 | rule_abc
NOTE: Both tables have the same structure: same column names but different count.
NOTE: Both tables have same set of TicketNumber values
CASE 1:
If I need ticket and rules count of each ticket from table1, the query is:
Select [TicketNo], COUNT([TicketNo]) AS Rules_Count from [Table1] group by TicketNo
This will give me output in format :
ticketNumber | Rules_Count
---------------------------
PR123 | 3
PR999 | 9
CASE 2: (NEED HELP WITH THIS)
Now, the previous query gets the ticket and the count of the ticket of only 1 table. I need the count of the same ticket (since both have same set of tkt nos) in table2 also.
I need result in this way:
ticketNumber | Count(ticketNumber) of table1 | Count(ticketNumber) of table2
---------------------------------------------------------------------------------
PR123 | 3 | 2
PR999 | 2 | 1
Both Table1 and table2 have the same set of ticket nos but different counts
How do i get the result as shown above?

A simpler solution from a "statement point of view" (without COALESCE that maybe it's not so easy to understand).
Pay attention to the performances:
Select T1.TicketNumber,T1.Rules_Count_1,T2.Rules_Count_2
FROM
(
Select [TicketNumber], COUNT([TicketNumber]) AS Rules_Count_1
from [Table1] T1
group by TicketNumber) T1
INNER JOIN
(
Select [TicketNumber], COUNT([TicketNumber]) AS Rules_Count_2
from [Table2] T2
group by TicketNumber
) T2
on T1.TicketNumber = T2.TicketNumber
SQL Fiddle Demo

You can do this with a full outer join after aggregation (or an inner join if you really know that both tables have the same tickets:
select coalesce(t1.TicketNo, t2.TicketNo) as TicketNo,
coalesce(t1.Rules_Count, 0) as t1_Rules_Count,
coalesce(t2.Rules_Count, 0) as t2_Rules_Count
from (Select [TicketNo], COUNT([TicketNo]) AS Rules_Count
from [Table1]
group by TicketNo
) t1 full outer join
(Select [TicketNo], COUNT([TicketNo]) AS Rules_Count
from [Table2]
group by TicketNo
) t2
on t1.TicketNo = t2.TicketNo;

SELECT A.center,
A.total_1st,
B.total_2nd
FROM (SELECT a.center,
Count (a.dose1) AS Total_1st
FROM table_1 a
GROUP BY a.center) A
INNER JOIN (SELECT b.center,
Count (b.dose2) AS Total_2nd
FROM table_2 b
GROUP BY b.center) B
ON a.center = b.center
ORDER BY A.center

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Join two tables without unique fields - sql

This is possible, the query will involve three steps: Find all the max start date for each product in table 2. Hint: use group by. Join table 2 with the result from #1 to get the Value. Join table 1 with the result from #2 to filter out products that are not in table 1.

Here is a solution using ROW_NUMBER in SQLServer: SELECT * FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY Product ORDER BY StartDate DESC) RN FROM #T ) Results WHERE RN = 1

Related

How to JOIN 2 tables and keep only the most recent records

MAX plus other data with JOIN

Oracle Efficiently joining tables with subquery in FROM

Joining Table A and B to get elements of both

SQL Query - Get count of two columns from two tables

Categories

Resources