Linking tables on multiple criteria - sql

I've got myself in a bit of a mess on something I'm doing where I'm trying to get two tables linked together based on multiple bits of info.
I want to link one table to another based on the basic rules of(in this hierarchy)
where main linking is where orderid matches between the two tables
records from table 2 where valid=Y,
from those i want the valid records which has the highest seqn1 number and then from those the one that has the highest seqn2 value
table1
orderid | date | otherinfo
223344 | 22/10/2020 | okokkokokooeodijjf
table2
orderid | seqn1 | seqn2 | valid | additonaldata
223344 | 1 | 3 | y | sdfsfsf
223344 | 2 | 1 | y | sffferfr
223344 | 2 | 2 | y | sfrfrefr -- This row
223344 | 2 | 3 | n | rfrg66rr
223344 | 2 | 4 | n | adwere
223344 | 3 | 4 | n | adwere
so would want the final record to be
orderid | date | otherinfo | seqn1 | seqn2 | valid | additonaldata
223344 | 22/10/2020 | okokkokokooeodijjf | 2 | 2 | y | sfrfrefr
I started off with the code below but I'm not sure I'm doing it right and I can't seem to get it to pay attention to the valid flag when i try to add it in.
SELECT * FROM table1
left JOIN table2
ON table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid)
AND table2.seqn2 = (SELECT MAX(table2.seqn2) FROM table2 WHERE table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid))
Could someone help me amend the code please.

Use row_number analytic function with partition by orderid and order by SEQNRs in the order you need. No need for multiple subselects. To add more selections for the single row, use CASE to map your values to numbers and order by them also.
Fiddle here.
with l as (
select *,
rank() over(partition by orderid order by seqn1 desc, seqn2 desc) as rn
from line
where valid = 'y'
)
select *
from header as h
join l
on h.orderid = l.orderid
and l.rn = 1

How about something like this:
;
with cte_table2 as
(
SELECT ordered
,MAX(seqn1) as seqn1
,MAX(seqn2) as seqn2
FROM table2
where valid = 'y'
group by ordered --check if you need to add 'valid' to the group by but I don't think so.
)
SELECT
t1.*
,t3.otherinfo
--,t3.[OtherFields]
from table1 t1
inner join cte_table2 t2 on t1.orderid = t2.orderid -- first match on id
left join table2 t3 on t3.orderid = t2.orderid and t3.seqn1 = t2.seqn1 and t3.seqn2 = t2.seqn2

Related

Tie-breaking mutliple matches on MAX() in SQL

I have a table that looks like this:
| client_id | program_id | provider_id | date_of_service | data_entry_date | data_entry_time |
| --------- | ---------- | ----------- | --------------- | --------------- | --------------- |
| 2 | 5 | 6 | 02/02/2022 | 02/02/2022 | 0945 |
| 2 | 5 | 6 | 02/02/2022 | 02/07/2022 | 0900 |
| 2 | 5 | 6 | 02/04/2022 | 02/04/2022 | 1000 |
| 2 | 5 | 6 | 02/04/2022 | 02/04/2022 | 1700 |
| 2 | 5 | 6 | 02/04/2022 | 02/05/2022 | 0800 |
| 2 | 5 | 6 | 02/04/2022 | 02/05/2022 | 0900 |
I need to get the most recent date_of_service entered. From the table above, the desired result/row is:
date_of_service = 02/04/2022, data_entry_date = 02/05/2022, data_entry_time = 0900
This resulting date_of_service will be left joined to the master table.
This query mostly works:
SELECT t1.client_id, t1.program_id, t1.provider_id, t2.date_of_service
FROM table1 as t1
WHERE provider_id = '6'
LEFT JOIN
(SELECT client_id, program_id, provider_id, date_of_service
FROM table2) as t2
ON t2.client_id = t1.client_id
AND t2.program_id = t1.program_id
AND t2.provider_id = t1.provider_id
AND t2.date_of_service =
(SELECT MAX(date_of_service)
FROM t2 as t3
WHERE t3.client_id = t1.client_id
AND t3.program_id = t1.program_id
AND t3.provider_id = t1.provider_id
)
)
But it also returns multiple rows whenever there is more than one match on the max(date_of_service).
To solve this, I need to use the max data_entry_date to break any ties whenever there is more than one row that matches the max(date_of_service). Likewise, I also need to use the max data_entry_time to break any ties whenever there is more than one row that also matches the max data_entry_date.
I tried the following:
SELECT t1.client_id, t1.program_id, t1.provider_id, t2.date_of_service
FROM table1 as t1
WHERE provider_id = '6'
LEFT JOIN
(SELECT TOP(1) client_id, program_id, provider_id, date_of_service, data_entry_date, data_entry_time
FROM table2
ORDER BY date_of_service DESC, data_entry_date DESC, data_entry_time DESC
) as t2
ON t2.client_id = t1.client_id
AND t2.program_id = t1.program_id
AND t2.provider_id = t1.provider_id
But I can only get it to return null values for the date_of_service.
Likewise, this:
SELECT t1.client_id, t1.program_id, t1.provider_id, t2.date_of_service
FROM table1 as t1
WHERE provider_id = '6'
LEFT JOIN
(
SELECT TOP(1) client_id AS client_id2, program_id AS program_id2, provider_id AS provider_id2, date_of_service, data_entry_date, data_entry_time
FROM table2 AS t3
JOIN
(SELECT
MAX(date_of_service) AS max_date_of_service
,MAX(data_entry_date) AS max_data_entry_date
FROM table1
WHERE date_of_service = (SELECT MAX(date_of_service) FROM table2)
) AS t4
ON t3.date_of_service = t4.max_date_of_service
AND t3.data_entry_date = t4.max_data_entry_date
ORDER BY data_entry_time
) AS t2
ON t2.client_id2 = t1.client_id
AND t2.program_id2 = t1.program_id
AND t2.provider_id2 = t1.provider_id
... works (meaning it doesn't throw any errors), but it only seems to return null values for me.
I've tried various combinations of MAX, ORDER BY, and multiple variations of JOIN's, but haven't found one that works yet.
I don't know what version my SQL database is, but it doesn't appear to handle window functions like OVER and PARTITION or other things like COALESCE. I've been using DBeaver 22.2.0 to test the SQL scripts.
Based on your what you've provided, looks like you can simply query table2:
SELECT client_id, program_id, provider_id, MAX(date_of_service), MAX(data_entry_date), MAX(data_entry_time)
FROM table2
GROUP BY client_id, program_id, provider_id
If you need to join this result set to table1, just JOIN to the statement above on client_id, program_id, provider_id
Try using below query. This is using just joins and sub query.
SELECT TOP 1 * FROM table1 t1
JOIN (
SELECT
MAX(date_of_Service) AS Max_date_of_Service
,MAX(data_entry_date) AS Max_data_entry_date
FROM table1
WHERE date_of_Service = (SELECT MAX(date_of_Service) FROM table1)
)t2
ON t1.date_of_Service = t2.Max_date_of_Service
AND t1.data_entry_date = t2.Max_data_entry_date
ORDER BY data_entry_time

How to Count total rows on a shipment

I'm trying to count the total number of lines on each shipment:
SELECT Shipments.ShipmentId,
SalesOrders.SalesOrderId as OrderNumber,
Count(SalesOrderItems.SalesOrderItem) as NumberOfLines
FROM SalesOrders
INNER JOIN SalesOrderItems on SalesOrders.SalesOrder = SalesOrderItems.SalesOrder
INNER JOIN Shipments on SalesOrderItems.SalesOrder = Shipments.SalesOrder
GROUP BY SalesOrderItems.SalesOrderItem, SalesOrders.SalesOrderId, Shipments.ShipmentId
ORDER BY Shipments.ShipmentID ASC
Currently I'm getting:
ShipmentID | OrderNumber | NumberOfLines
SH00000001 | SO-0000001 | 1
SH00000001 | SO-0000001 | 1
SH00000002 | SO-0000007 | 1
SH00000003 | SO-0000006 | 1
SH00000003 | SO-0000006 | 1
And I should be getting:
ShipmentID | OrderNumber | NumberOfLines
SH00000001 | SO-0000001 | 1
SH00000001 | SO-0000001 | 2
SH00000002 | SO-0000007 | 1
SH00000003 | SO-0000006 | 1
SH00000003 | SO-0000006 | 2
Remove SalesOrderItems.SalesOrderItem from your group by clause, you don't want it (as can be deduced from it not existing on your sample result dataset).
Your GROUP BY clause should match the unaggregated columns in the SELECT:
SELECT s.ShipmentId,
so.SalesOrderId as OrderNumber,
Count(soi.SalesOrderItem) as NumberOfLines
FROM SalesOrders so INNER JOIN
SalesOrderItems soi
ON so.SalesOrder = soi.SalesOrder INNER JOIN
Shipments s
ON soi.SalesOrder = s.SalesOrder
GROUP BY soi.SalesOrderId, s.ShipmentId
ORDER BY s.ShipmentID ASC;
Note that I've added table aliases. These make the queries easier to write and to read.

postgresql - How to get one row the min value

I have table (t_image) with this column
datacd | imagecode | indexdate
----------------------------------
A | 1 | 20170213
A | 2 | 20170213
A | 3 | 20170214
B | 4 | 20170201
B | 5 | 20170202
desired result is this
datacd | imagecode | indexdate
----------------------------------
A | 1 | 20170213
B | 4 | 20170201
In the above table, I want to retrieve 1 row for each datacd who has the minimum index date
Here is my query, but the result returns 2 rows for datacd A
select *
from (
select datacd, min(indexdate) as indexdate
from t_image
group by datacd
) as t1 inner join t_image as t2 on t2.datacd = t1.datacd and t2.indexdate = t1.indexdate;
The Postgres proprietary distinct on () operator is typically the fastest solution for greatest-n-per-group queries:
select distinct on (datacd) *
from t_image
order by datacd, indexdate;
One option uses ROW_NUMBER():
SELECT t.datacd,
t.imagecode,
t.indexdate
FROM
(
SELECT datacd, imagecode, indexdate,
ROW_NUMBER() OVER (PARTITION BY datacd ORDER BY indexdate) rn
FROM t_image
) t
WHERE t.rn = 1

How do I join to another table and return only the most recent matching row?

I have a table that stores the lines on a contract. Each contract line his it's own unique ID, it also has the ID of its parent contract. Example:
+-------------+---------+
| contract_id | line_id |
+-------------+---------+
| 1111 | 100 |
| 1111 | 101 |
| 1111 | 102 |
+-------------+---------+
I have another table that stores the historical changes to contract lines. For example, every time the number of units on a contract line is changed a new row is added to the table. Example:
+-------------+---------+--------------+-------+
| contract_id | line_id | date_changed | units |
+-------------+---------+--------------+-------+
| 1111 | 100 | 2016-01-01 | 1 |
| 1111 | 100 | 2016-02-01 | 2 |
| 1111 | 100 | 2016-03-01 | 3 |
+-------------+---------+--------------+-------+
As you can see the contract line with ID 100 belonging to the contract with ID 1111 has been edited 3 times over 3 months. The current value is 3 units.
I'm running a query against the contract lines table to select all data. I want to join to the historical data table and select the most recent row for each contract line and show the units in my results. How do I do this?
Expected results (there would single results for 101 and 102 as well):
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 3 |
+-------------+---------+-------+
I've tried the query below with a left join but it returns 3 rows instead of 1.
Query:
SELECT *, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id, units) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Actual results:
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 1 |
| 1111 | 100 | 2 |
| 1111 | 100 | 3 |
+-------------+---------+-------+
An extra join to contract_history along with maxdate will work
SELECT contract_lines.*,T2.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id) AS T1
JOIN contract_history T2 ON
T1.contract_id=T2.contract_id and
T1.line_id= T2.line_id and
T1.maxdate=T2.date_changed
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Output
This is my preferred style because it doesn't require self joining and cleanly expresses your intent. Also, it competes very well with the ROW_NUMBER() method in terms of performance.
select a.*
, b.units
from contract_lines as a
join (
select a.contract_id
, a.line_id
, a.units
, Max(a.date_changed) over(partition by a.contract_id, a.line_id) as max_date_changed
from contract_history as a
) as b
on a.contract_id = b.contract_id
and a.line_id = b.line_id
and b.date_changed = b.max_date_changed;
Another possible solution to this. This uses RANK to sort/filter this. Similar to what you did, just a different tact.
SELECT contract_lines.*, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units,
RANK() OVER (PARTITION BY contract_id, line_id ORDER BY date_changed DESC) AS [rank]
FROM contract_history) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
AND T1.rank = 1
WHERE T1.units IS NOT NULL
You could change this to a INNER JOIN and remove the IS NOT NULL in the WHERE clause if you expect data to be present all the time.
Glad you figured it out!
Try this simple query:
SELECT TOP 1 T1.*
FROM contract_lines T0
INNER JOIN contract_history T1
ON T0.contract_id = T1.contract_id and
T0.line_id = T1.line_id
ORDER BY date_changed DESC
As always seems to be the way after spending an hour looking at it and shouting at StackOverflow for having a rare period of maintenance I solve my own problem not long after posting a question.
In an effort to help anyone else who's stuck I'll show what I found. It might not be an efficient way to achieve this so if someone has a better suggestion I'm all ears.
I adapted the answer from here: T-SQL Subquery Max(Date) and Joins
SELECT *,
Units = (SELECT TOP 1 units
FROM contract_history
WHERE contract_lines.contract_id = contract_history.contract_id
AND contract_lines.line_id = contract_history.line_id
ORDER BY date_changed DESC
)
FROM ....

SQL return max value from child for each parent row

I have 2 tables - 1 with parent records, 1 with child records. For each parent record, I'm trying to return a single child record with the MAX(SalesPriceEach).
Additionally I'd like to only return a value when there is more than 1 child record.
parent - SalesTransactions table:
+-------------------+---------+
|SalesTransaction_ID| text |
+-------------------+---------+
| 1 | Blah |
| 2 | Blah2 |
| 3 | Blah3 |
+-------------------+---------+
child - SalesTransactionLines table
+--+-------------------+---------+--------------+
|id|SalesTransaction_ID|StockCode|SalesPriceEach|
+--+-------------------+---------+--------------+
| 1| 1 | 123 | 99 |
| 2| 1 | 35 | 50 |
| 3| 2 | 15 | 75 |
+--+-------------------+---------+--------------+
desired results
+-------------------+---------+--------------+
|SalesTransaction_ID|StockCode|SalesPriceEach|
+-------------------+---------+--------------+
| 1 | 123 | 99 |
| 2 | 15 | 75 |
+-------------------+---------+--------------+
I found a very similar question here, and based my query on the answer but am not seeing the results I expect.
WITH max_feature AS (
SELECT c.StockCode,
c.SalesTransaction_ID,
MAX(c.SalesPriceEach) as feature
FROM SalesTransactionLines c
GROUP BY c.StockCode, c.SalesTransaction_ID)
SELECT p.SalesTransaction_ID,
mf.StockCode,
mf.feature
FROM SalesTransactions p
LEFT JOIN max_feature mf ON mf.SalesTransaction_ID = p.SalesTransaction_ID
The results from this query are returning multiple rows for each parent, and not even the highest value first!
select stl.SalesTransaction_ID, stl.StockCode, ss.MaxSalesPriceEach
from SalesTransactionLines stl
inner join
(
select stl2.SalesTransaction_ID, max(stl2.SalesPriceEach) MaxSalesPriceEach
from SalesTransactionLines stl2
group by stl2.SalesTransaction_ID
having count(*) > 1
) ss on (ss.SalesTransaction_ID = stl.SalesTransaction_ID and
ss.MaxSalesPriceEach = stl.SalesPriceEach)
OR, alternatively:
SELECT stl1.*
FROM SalesTransactionLines AS stl1
LEFT OUTER JOIN SalesTransactionLines AS stl2
ON (stl1.SalesTransaction_ID = stl2.SalesTransaction_ID
AND stl1.SalesPriceEach < stl2.SalesPriceEach)
WHERE stl2.SalesPriceEach IS NULL;
I know I'm a year late to this party but I always prefer using Row_Number in these situations. It solves the problem when there are two rows that meet your Max criteria and makes sure that only one row is returned:
with z as (
select
st.SalesTransaction_ID
,row=ROW_NUMBER() OVER(PARTITION BY st.SalesTransaction_ID ORDER BY stl.SalesPriceEach DESC)
,stl.StockCode
,stl.SalesPriceEach
from
SalesTransactions st
inner join SalesTransactionLines stl on stl.SalesTransaction_ID = st.SalesTransaction_ID
)
select * from z where row = 1
SELECT SalesTransactions.SalesTransaction_ID,
SalesTransactionLines.StockCode,
MAX(SalesTransactionLines.SalesPriceEach)
FROM SalesTransactions RIGHT JOIN SalesTransactionLines
ON SalesTransactions.SalesTransaction_ID = SalesTransactionLines.SalesTransaction_ID
GROUP BY SalesTransactions.SalesTransaction_ID, alesTransactionLines.StockCode;
select a.SalesTransaction_ID, a.StockCode, a.SalesPriceEach
from SalesTransacions as a
inner join (select SalesTransaction_ID, MAX(SalesPriceEach) as SalesPriceEach
from SalesTransactionLines group by SalesTransaction_ID) as b
on a.SalesTransaction_ID = b.SalesTransaction_ID
and a.SalesPriceEach = b.SalesPriceEach
subquery returns table with trans ids and their maximums so just join it with transactions table itself by those 2 values