How do I select Just one row for each row in a Left Join - sql

So I have two Tables: Customers and Calls.
There is a one to many relationship between these tables. i.e. One Customer can have Many Calls
I am trying to create a left join so that I have an output where the Customers are listed only once with the most recent CallDatefrom the Calls table.
Using this diagram:
I have constructed the following SQL statement:
Select Customers.*, Calls.CallDate
From Customers
Left Join Calls
on Customers.Id=Calls.CustomerId
But this gives me a separate Customer row for each Call
How do I get just one row for each Customer based on the most recent CallDate?

A simple way is to use Outer Apply:
Select c.*, ca.*
From Customers c outer apply
(select top 1 ca.*
from Calls ca
where c.id = ca.CustomerId
order by CallDate desc
) ca;
However, if you just want the most recent call date, then aggregation is the typical approach. One method:
select c.*, max_callDate
from customers c left join
(select CustomerId, max(CallDate) as max_callDate
from calls
group by CustomerId
) ca
on c.id = ca.CustomerId;

You can use ROW_NUMBER window function:
Select Customers.*, c.CallDate
From Customers
Left Join (
SELECT CustomerId, CallDate,
ROW_NUMBER() OVER (PARTITION BY CustomerId
ORDER BY CallDate DESC) AS rn
FROM Calls
) AS c on Customers.Id = c.CustomerId AND c.rn = 1
ROW_NUMBER with a PARTITION BY clause enumerates records within CustomerId partitions. Number 1 is assigned to the record having the maximum CallDate value, due to ORDER BY CallDate DESC clause.

You can use outer apply
Select Customers.*, Calls.CallDate
From Customers
outer apply (select top 1 * from Calls c where Customers.Id=c.CustomerId order by c.CallDate desc ) as Calls

As you'll ever only want one result, you can code with CROSS APPLY:
Select Customers.*, c.CallDate
From Customers
CROSS APPLY (SELECT TOP 1 * Calls
WHERE Customers.Id=Calls.CustomerId ORDER BY CallDate DESC) c
If you expect some customers to not have calls (OUTER JOIN) you can do OUTER APPLY instead of CROSS APPLY.

Related

All joined subquery results return null

I am trying to get all customers with their latest payment transaction, including customers without any transaction:
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON c.customer_id = p.customer_id
The above query returns NULL for p.transaction_no, p.amount, p.transaciton_datetime in every row. But I can make sure that there are transactions made by customers in tbl_payment_transactions.
You want the subquery to be run once per each different row of the driving table tbl_customers. This is called a lateral subquery and takes the form:
SELECT
c.customer_id, c.phone_number, c.email,
p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN LATERAL (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions t
WHERE c.customer_id = t.customer_id
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON true
The Impaler provided the correct form with a LATERAL subquery.
Alternatively, you can use DISTINCT ON in a subquery and a plain LEFT JOIN.
Performance of the latter can be better while retrieving all (or most) customers, and if there are only few transactions per customer and/or you don't have a multicolumn index on (customer_id, payment_transaction_id) or (customer_id, payment_transaction_id DESC):
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY customer_id, payment_transaction_id DESC
) p USING (customer_id);
About performance aspects:
Optimize GROUP BY query to retrieve latest row per user
Select first row in each GROUP BY group?

Better way in SQL to fetch most recent customer record?

Imagine we have two tables: customers and purchases.
Purchases has a customerID, purchaseDateTime, etc.
What is the best way to select the most recent purchase for all customers in hive or impala SQL?
I've seen this query:
With recent as (
select customerID, max(purchaseDateTime) as dt
from purchases group by customerID
)
Select *
from customer c
join recent r
on c.customerID = r.customerID
join purchases p
on r.customerId = p.customerid and
p.purchaseDateTime = dt
Seems like that's not as efficient as it could be...
I would use row_number():
Select c.*, p.*
from customer c join
(select p.*,
row_number() over (partition by p.customerid order by p.purchaseDateTime desc) as seqnum
from purchases p
) p
on c.customerId = p.customerid and p.purchaseDateTime = dt
where seqnum = 1;
row_number() is ANSI standard functionality, so it is standard SQL. In general, it should be faster than doing an explicit group by and join.
One difference is that -- in the event of ties -- this returns one row. Your query will return multiple rows. If you want that behavior, change the row_number() to rank().

Fetch most recent records as part of Joins

I am joining 2 tables customer & profile. Both the tables are joined by a specific column cust_id. In profile table, I have more than 1 entry. I want to select the most recent entry by start_ts (column) when joining both the tables. As a result I would like 1 row - row from customer and most recent row from profile in the resultset. Is there a way to do this ORACLE SQL?
I would use window functions:
select . . .
from customer c join
(select p.*,
row_number() over (partition by cust_id order by start_ts desc) as seqnum
from profile
) p
on c.cust_id = p.cust_id and p.seqnum = 1;
You can use a left join if you like to get customers that don't have profiles as well.
One way (which works for all DB engines) is to join the tables you want to select data from and then join against the specific max-record of profile to filter out the data
select c.*, p.*
from customer c
join profile p on c.cust_id = p.cust_id
join
(
select cust_id, max(start_ts) as maxts
from profile
group by cust_id
) p2 on p.cust_id = p2.cust_id and p.start_ts = p2.maxts
Here is another way (if there exists no newer entry then it's the newest):
select
c.*,
p.*
from
customer c inner join
profile p on p.cust_id = c.cust_id and not exists(
select *
from profile
where cust_id = c.cust_id and start_ts > p.start_ts
)

SQL strategy to fetch maximum

Suppose I have these three tables:
I want to get, for all products, it's product_id and the client that bougth it most times (the biggest client of the product).
I solved it like this:
SELECT
product_id AS product,
(SELECT TOP 1 client_id FROM Bill_Item, Bill
WHERE Bill_Item.product_id = p.product_id
and Bill_Item.bill_id = Bill.bill_id
GROUP BY
client_id
ORDER BY
COUNT(*) DESC
) AS client
FROM Product p
Do you know a better way?
the inner query will give you the ranking. The outer query will give you the client that puchase the most for a product
SELECT *
(
SELECT i.product_id, b.client_id,
r = row_number() over (partition by i.product_id
order by count(*) desc)
FROM Bill b
INNER JOIN Bill_Item i ON b.bill_id = i.bill_id
GROUP BY i.product_id, b.client_id
) d
WHERE r = 1
I was going to submit pretty much the same thing as #Squirrell only with a Common Table Expression [CTE] rather than a derived table. So I wont duplicate that but there are some learning points concerning your query. First is IMPLICIT JOINS such as FROM Bill_Item, Bill are really easy to have uintended consequences (one of many questions: Queries that implicit SQL joins can't do?) Next for the Calculated column you can actually do this in a OUTER APPLY or CROSS APPLY which is a very useful technique.
So you could re-write your method as follows:
SELECT *
FROM
Product p
OUTER APPLY (SELECT TOP 1 b.client_id
FROM
Bill_Item bi
INNER JOIN Bill b
ON bi.bill_id = b.bill_id
WHERE
bi.product_id = p.product_id
GROUP BY
b.client_id
ORDER BY
COUNT(*) DESC) c
And to show you how squirell's answer can still include products that have never been sold all you need to do is join Products and LEFT JOIN to other tables:
;WITH cte AS (
SELECT
p.product_id
,b.client_id
,ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY COUNT(*) DESC) as RowNumber
FROM
Product p
LEFT JOIN Bill_Item bi
ON p.product_id = bi.product_id
LEFT JOIN Bill b
ON bi.bill_id = b.bill_id
GROUP BY
p.product_id
,b.client_id
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
Techniques used in some of these that are useful.
CTE
APPLY (Outer & Cross)
Window Functions
Squirrel's answer doesn't return products that have never been sold. If you want to include those, then your approach is ok, although I would write the query as:
SELECT product_id as product,
(SELECT TOP 1 b.client_id
FROM Bill_Item bi JOIN
Bill b
ON bi.bill_id = b.bill_id
WHERE Bill_Item.product_id = p.product_id
GROUP BY client_id
ORDER BY COUNT(*) DESC
) as client
FROM Product p;
You can also express this using APPLY, but a correlated subquery is also fine.
Note the correct use of the explicit JOIN syntax.

SQL Statement Help - Select latest Order for each Customer

Say I have 2 tables: Customers and Orders. A Customer can have many Orders.
Now, I need to show any Customers with his latest Order. This means if a Customer has more than one Orders, show only the Order with the latest Entry Time.
This is how far I managed on my own:
SELECT a.*, b.Id
FROM Customer a INNER JOIN Order b ON b.CustomerID = a.Id
ORDER BY b.EntryTime DESC
This of course returns all Customers with one or more Orders, showing the latest Order first for each Customer, which is not what I wanted. My mind was stuck in a rut at this point, so I hope someone can point me in the right direction.
For some reason, I think I need to use the MAX syntax somewhere, but it just escapes me right now.
UPDATE: After going through a few answers here (there's a lot!), I realized I made a mistake: I meant any Customer with his latest record. That means if he does not have an Order, then I do not need to list him.
UPDATE2: Fixed my own SQL statement, which probably caused no end of confusion to others.
I don't think you do want to use MAX() as you don't want to group the OrderID. What you need is an ordered sub query with a SELECT TOP 1.
select *
from Customers
inner join Orders
on Customers.CustomerID = Orders.CustomerID
and OrderID = (
SELECT TOP 1 subOrders.OrderID
FROM Orders subOrders
WHERE subOrders.CustomerID = Orders.CustomerID
ORDER BY subOrders.OrderDate DESC
)
Something like this should do it:
SELECT X.*, Y.LatestOrderId
FROM Customer X
LEFT JOIN (
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
) Y ON X.Customer = Y.Customer
This assumes that two orders for the same customer may have the same EntryTime, which is why MAX(OrderID) is used in subquery Y to ensure that it only occurs once per customer. The LEFT JOIN is used because you stated you wanted to show all customers - if they haven't got any orders, then the LatestOrderId will be NULL.
Hope this helps!
--
UPDATE :-) This shows only customers with orders:
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
While I see that you've already accepted an answer, I think this one is a bit more intuitive:
select a.*
,b.Id
from customer a
inner join Order b
on b.CustomerID = a.Id
where b.EntryTime = ( select max(EntryTime)
from Order
where a.Id = b.CustomerId
);
a.Id = b.CustomerId because you want the max EntryTime of all orders (in b) for the customer (a.Id).
I would have to run something like this through an execution plan to see the difference in execution, but where the TOP function is done after-the-fact and that using order by can be expensive, I believe that using max(EntryTime) would be the best way to run this.
You can use a window function.
SELECT *
FROM (SELECT a.*, b.*,
ROW_NUMBER () OVER (PARTITION BY a.ID ORDER BY b.orderdate DESC,
b.ID DESC) rn
FROM customer a, ORDER b
WHERE a.ID = b.custid)
WHERE rn = 1
For each customer (a.id) it sorts all orders and discards everything but the latest.
ORDER BY clause includes both order date and entry id, in case there are multiple orders on the same date.
Generally, window functions are much faster than any look-ups using MAX() on large number of records.
This query is much faster than the accepted answer :
SELECT c.id as customer_id,
(SELECT co.id FROM customer_order co WHERE
co.customer_id=c.id
ORDER BY some_date_column DESC limit 1) as last_order_id
FROM customer c
SELECT Cust.*, Ord.*
FROM Customers cust INNER JOIN Orders ord ON cust.ID = ord.CustID
WHERE ord.OrderID =
(SELECT MAX(OrderID) FROM Orders WHERE Orders.CustID = cust.ID)
Something like:
SELECT
a.*
FROM
Customer a
INNER JOIN Order b
ON a.OrderID = b.Id
INNER JOIN (SELECT Id, max(EntryTime) as EntryTime FROM Order b GROUP BY Id) met
ON
b.EntryTime = met.EntryTime and b.Id = met.Id
One approach that I haven't seen above yet:
SELECT
C.*,
O1.ID
FROM
dbo.Customers C
INNER JOIN dbo.Orders O1 ON
O1.CustomerID = C.ID
LEFT OUTER JOIN dbo.Orders O2 ON
O2.CustomerID = C.ID AND
O2.EntryTime > O1.EntryTime
WHERE
O2.ID IS NULL
This (as well as the other solutions I believe) assumes that no two orders for the same customer can have the exact same entry time. If that's a concern then you would have to make a choice as to what determines which one is the "latest". If that's a concern post a comment and I can expand the query if needed to account for that.
The general approach of the query is to find the order for a customer where there is not another order for the same customer with a later date. It is then the latest order by definition. This approach often gives better performance then the use of derived tables or subqueries.
A simple max and "group by" is sufficient.
select c.customer_id, max(o.order_date)
from customers c
inner join orders o on o.customer_id = c.customer_id
group by c.customer_id;
No subselect needed, which slows things down.