Fetch most recent records as part of Joins - sql

I am joining 2 tables customer & profile. Both the tables are joined by a specific column cust_id. In profile table, I have more than 1 entry. I want to select the most recent entry by start_ts (column) when joining both the tables. As a result I would like 1 row - row from customer and most recent row from profile in the resultset. Is there a way to do this ORACLE SQL?

I would use window functions:
select . . .
from customer c join
(select p.*,
row_number() over (partition by cust_id order by start_ts desc) as seqnum
from profile
) p
on c.cust_id = p.cust_id and p.seqnum = 1;
You can use a left join if you like to get customers that don't have profiles as well.

One way (which works for all DB engines) is to join the tables you want to select data from and then join against the specific max-record of profile to filter out the data
select c.*, p.*
from customer c
join profile p on c.cust_id = p.cust_id
join
(
select cust_id, max(start_ts) as maxts
from profile
group by cust_id
) p2 on p.cust_id = p2.cust_id and p.start_ts = p2.maxts

Here is another way (if there exists no newer entry then it's the newest):
select
c.*,
p.*
from
customer c inner join
profile p on p.cust_id = c.cust_id and not exists(
select *
from profile
where cust_id = c.cust_id and start_ts > p.start_ts
)

Related

JOIN only one row from second table and if no rows exist return null

In this query I need to show all records from the left table and only the records from the right table where the result is the highest date.
Current query:
SELECT a.*, c.*
FROM users a
INNER JOIN payments c
ON a.id = c.user_ID
INNER JOIN
(
SELECT user_ID, MAX(date) maxDate
FROM payments
GROUP BY user_ID
) b ON c.user_ID = b.user_ID AND
c.date = b.maxDate
WHERE a.package = 1
This returns all records where the join is valid, but I need to show all users and if they didn't make a payment yet the fields from the payments table should be null.
I could use a union to show the other rows:
SELECT a.*, c.*
FROM users a
INNER JOIN payments c
ON a.id = c.user_ID
INNER JOIN
(
SELECT user_ID, MAX(date) maxDate
FROM payments
GROUP BY user_ID
) b ON c.user_ID = b.user_ID AND
c.date = b.maxDate
WHERE a.package = 1
union
SELECT a.*, c.*
FROM users a
--here I would need to join with payments table to get the columns from the payments table,
but where the user doesn't have a payment yet
WHERE a.package = 1
The option to use the union doesn't seem like a good solution, but that's what I tried.
So, in other words, you want a list of users and the last payment for each.
You can use OUTER APPLY instead of INNER JOIN to get the last payment for each user. The performance might be better and it will work the way you want regarding users with no payments.
SELECT a.*, b.*
FROM users a
OUTER APPLY ( SELECT * FROM payments c
WHERE c.user_id = a.user_id
ORDER BY c.date DESC
FETCH FIRST ROW ONLY ) b
WHERE a.package = 1;
Here is a generic version of the same concept that does not require your tables (for other readers). It gives a list of database users and the most recently modified object for each user. You can see it properly includes users that have no objects.
SELECT a.*, b.*
FROM all_users a
OUTER APPLY ( SELECT * FROM all_objects b
WHERE b.owner = a.username
ORDER BY b.last_ddl_time desc
FETCH FIRST ROW ONLY ) b
I like the answer from #Matthew McPeak but OUTER APPLY is 12c or higher and isn't very idiomatic Oracle, historically anyway. Here's a straight LEFT OUTER JOIN version:
SELECT *
FROM users a
LEFT OUTER JOIN
(
-- retrieve the list of payments for just those payments that are the maxdate per user
SELECT payments.*
FROM payments
JOIN (SELECT user_id, MAX(date) maxdate
FROM payments
GROUP BY user_id
) maxpayment_byuser
ON maxpayment_byuser.maxdate = payments.date
AND maxpayment_byuser.user_id = payments.user_id
) b ON a.ID = b.user_ID
If performance is an issue, you may find the following more performant but for simplicity you'll end up with an extra "maxdate" column.
SELECT *
FROM users a
LEFT OUTER JOIN
(
-- retrieve the list of payments for just those payments that are the maxdate per user
SELECT *
FROM (
SELECT payments.*,
MAX(date) OVER (PARTITION BY user_id) maxdate
FROM payments
) max_payments
WHERE date = maxdate
) b ON a.ID = b.user_ID
A generic approach using row_number() is very useful for "highest date" or "most recent" or similar conditions:
SELECT
*
FROM users a
LEFT OUTER JOIN (
-- determine the row corresponding to "most recent"
SELECT
payments.*
, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) is_recent
FROM payments
) b ON a.ID = b.user_ID
AND b.is_recent = 1
(reversing the ORDER BY within the over clause also enables "oldest")

SQL strategy to fetch maximum

Suppose I have these three tables:
I want to get, for all products, it's product_id and the client that bougth it most times (the biggest client of the product).
I solved it like this:
SELECT
product_id AS product,
(SELECT TOP 1 client_id FROM Bill_Item, Bill
WHERE Bill_Item.product_id = p.product_id
and Bill_Item.bill_id = Bill.bill_id
GROUP BY
client_id
ORDER BY
COUNT(*) DESC
) AS client
FROM Product p
Do you know a better way?
the inner query will give you the ranking. The outer query will give you the client that puchase the most for a product
SELECT *
(
SELECT i.product_id, b.client_id,
r = row_number() over (partition by i.product_id
order by count(*) desc)
FROM Bill b
INNER JOIN Bill_Item i ON b.bill_id = i.bill_id
GROUP BY i.product_id, b.client_id
) d
WHERE r = 1
I was going to submit pretty much the same thing as #Squirrell only with a Common Table Expression [CTE] rather than a derived table. So I wont duplicate that but there are some learning points concerning your query. First is IMPLICIT JOINS such as FROM Bill_Item, Bill are really easy to have uintended consequences (one of many questions: Queries that implicit SQL joins can't do?) Next for the Calculated column you can actually do this in a OUTER APPLY or CROSS APPLY which is a very useful technique.
So you could re-write your method as follows:
SELECT *
FROM
Product p
OUTER APPLY (SELECT TOP 1 b.client_id
FROM
Bill_Item bi
INNER JOIN Bill b
ON bi.bill_id = b.bill_id
WHERE
bi.product_id = p.product_id
GROUP BY
b.client_id
ORDER BY
COUNT(*) DESC) c
And to show you how squirell's answer can still include products that have never been sold all you need to do is join Products and LEFT JOIN to other tables:
;WITH cte AS (
SELECT
p.product_id
,b.client_id
,ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY COUNT(*) DESC) as RowNumber
FROM
Product p
LEFT JOIN Bill_Item bi
ON p.product_id = bi.product_id
LEFT JOIN Bill b
ON bi.bill_id = b.bill_id
GROUP BY
p.product_id
,b.client_id
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
Techniques used in some of these that are useful.
CTE
APPLY (Outer & Cross)
Window Functions
Squirrel's answer doesn't return products that have never been sold. If you want to include those, then your approach is ok, although I would write the query as:
SELECT product_id as product,
(SELECT TOP 1 b.client_id
FROM Bill_Item bi JOIN
Bill b
ON bi.bill_id = b.bill_id
WHERE Bill_Item.product_id = p.product_id
GROUP BY client_id
ORDER BY COUNT(*) DESC
) as client
FROM Product p;
You can also express this using APPLY, but a correlated subquery is also fine.
Note the correct use of the explicit JOIN syntax.

Join SQL Statements optimization

I have 2 Tables:
Customer
ID
Customer_ID
Name
Sir_Name
Phone
Email
and
Table Invoice
Manager_Name
Manaer_First_Name
Customer_ID1
Customer_ID2
Customer_ID3
There is only one Customer.Customer_ID for each Customer or a Customer has no Customer_ID
In Invoice.Customer_ID1 i have the same Customer_ID.Customer_ID several times.
I Like to get all Records in Customer Table Join Invoice Table - check if the Customer_ID = Customer_ID1 if not check in Customer_ID = Customer_ID2 Or Customer_ID = Customer_ID2
If customer_ID is found in one of rows stop the search.
Probably the best way to write the query is:
select . . .
from customer c join
invoice i
on c.customer_id = coalesce(i.customer_id1, i.customer_id2, i.customer_id3);
This should be able to take advantage of an index on customer(customer_id). If this is not efficient, then another alternative is left join:
select . . ., coalesce(c1.col1, c2.col1, c3.col1) as col1, . . .
from invoice i left join
customer c1
on c1.customer_id = i.customer_id1 left join
customer c2
on c2.customer_id = i.customer_id2 left join
customer c3
on c3.customer_id = i.customer_id3;
The left join can take advantage of an index on customer(customer_id). You need to use coalesce() in the select to choose the field from the right table.
select
*
from [Table Invoice] A
JOIN [Customer] B
ON B.Customer_ID = A.Customer_ID1 OR (B.Customer_ID <> A.Customer_ID1 AND B.Customer_ID = A.Customer_ID2) OR (B.Customer_ID = A.Customer_ID3 AND B.Customer_ID <> A.Customer_ID2 AND B.Customer_ID <> A.Customer_ID1)
this would return you all the Invoices for all of the Customers. In case you need Invoices just for one customer - add
WHERE B.Customer_ID = #YourCustomerID
statement. If you need only one, first invoice, add 'TOP 1' to select statement:
SELECT TOP 1
Could a inner join on or clause
select Customer.*, Invocie.*
from Customer
inner join Invoice on ( Customer.Customer_ID = Invoce.Customer_ID1
OR Customer.Customer_ID = Invoce.Customer_ID2
OR Customer.Customer_ID = Invoce.Customer_ID3)
This is how I understand your request: You want all customers that have at least one entry in the invoice table. But per customer you want the "best" invoice record only; with ID1 match better than ID2 match and ID2 match better than ID3 match.
So join the tables to get all matches and then rank your matches with row_number giving the best matching record #1. Then only keep those rows ranked #1.
select *
from
(
select
c.*,
i.*,
row_number() over
(
partition by c.customer_id order by
case c.customer_id
when i.customer_id1 then 1
when i.customer_id2 then 2
when i.customer_id3 then 3
end
) as rn
from customer c
join invoice i on c.customer_id in (i.customer_id1, i.customer_id2, i.customer_id3)
)
where rn = 1;

Join two tables but only get most recent associated record

I am having a hard time constructing an sql query that gets all the associated data with respect to another (associated) table and loops over into that set of data on which are considered as latest (or most recent).
The image below describes my two tables (Inventory and Sales), the Inventory table contains all the item and the Sales table contains all the transaction records. The Inventory.Id is related to Sales.Inventory_Id. And the Wanted result is the output that I am trying to work on to.
My objective is to associate all the sales record with respect to inventory but only get the most recent transaction for each item.
Using a plain join (left, right or inner) doesn't produce the result that I am looking into for I don't know how to add another category in which you can filter the most recent data to join to. Is this doable or should I change my table schema?
Thanks.
You can use APPLY
Select Item,Sales.Price
From Inventory I
Cross Apply(Select top 1 Price
From Sales S
Where I.id = S.Inventory_Id
Order By Date Desc) as Sales
WITH Sales_Latest AS (
SELECT *,
MAX(Date) OVER(PARTITION BY Inventory_Id) Latest_Date
FROM Sales
)
SELECT i.Item, s.Price
FROM Inventory i
INNER JOIN Sales_Latest s ON (i.Id = s.Inventory_Id)
WHERE s.Date = s.Latest_Date
Think carefully about what results you expect if there are two prices in Sales for the same date.
I would just use a correlated subquery:
select Item, Price
from Inventory i
inner join Sales s
on i.id = s.Inventory_Id
and s.Date = (select max(Date) from Sales where Inventory_Id = i.id)
select * from
(
select i.name,
row_number() over (partition by i.id order by s.date desc) as rownum,
s.price,
s.date
from inventory i
left join sales s on i.id = s.inventory_id
) tmp
where rownum = 1
SQLFiddle demo

sql min date returning more than 1 record

I am trying to return 1 record per customer, along with the first record from another table (product). The tables are joined with an intersection table, and the date that I am using the min (date) on is a user input date.
My sql would work fine except i have noticed there a few customer which have more than one record in the product table with the same date, so they are being returned more than once. I want to just be able to return 1 product per customer. Database is oracle so I have tried using rownum but then only returning 1 record for the whole query so i'm obviously not using it correctly. This is my sql
SELECT cust.ROW_ID prod.NAME, prod.DATE
FROM cust INNER JOIN ProdCust on cust.ROW_ID = ProdCust.CUST_ID
INNER JOIN prod on ProdCust .PROD_ID = Prod.ROW_ID
INNER JOIN
(SELECT ProdCust.CUST_ID, MIN (Prod.DATE) minDate
FROM ProdCust, Prod
WHERE ProdCust.PROD_ID = Prod.ROW_ID
GROUP BY CUST_ID
) ProdCustMin on ProdCust.CUST_ID = ProdCustMin.CUST_ID AND prod.DATE = ProdCustMin.minDate
In Oracle, you can use row_number() to resolve ties:
SELECT c.ROW_ID
, p.NAME
, p.DATE
FROM Cust c
JOIN (
SELECT row_number() over (partition by pc.CUST_ID order by p.DATE) rn
, pc.CUST_ID
, p.NAME
, p.DATE
FROM Prod p
JOIN ProdCust pc
ON pc.PROD_ID = p.ROW_ID
) p
ON c.ROW_ID = p.CUST_ID
AND p.rn = 1 -- First row only