All joined subquery results return null - sql

I am trying to get all customers with their latest payment transaction, including customers without any transaction:
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON c.customer_id = p.customer_id
The above query returns NULL for p.transaction_no, p.amount, p.transaciton_datetime in every row. But I can make sure that there are transactions made by customers in tbl_payment_transactions.

You want the subquery to be run once per each different row of the driving table tbl_customers. This is called a lateral subquery and takes the form:
SELECT
c.customer_id, c.phone_number, c.email,
p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN LATERAL (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions t
WHERE c.customer_id = t.customer_id
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON true

The Impaler provided the correct form with a LATERAL subquery.
Alternatively, you can use DISTINCT ON in a subquery and a plain LEFT JOIN.
Performance of the latter can be better while retrieving all (or most) customers, and if there are only few transactions per customer and/or you don't have a multicolumn index on (customer_id, payment_transaction_id) or (customer_id, payment_transaction_id DESC):
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY customer_id, payment_transaction_id DESC
) p USING (customer_id);
About performance aspects:
Optimize GROUP BY query to retrieve latest row per user
Select first row in each GROUP BY group?

Related

Calculating the average of order value without using a WITH statement

I am trying to add a new column to my table which will be the average value calculated as the division of two existing columns. Therefore Average value = Total Sales / Number of Orders.
My data looks like this:click to view picture
I don't understand why Example Code A does not work but Example Code B does. Please can someone explain?
Example Code A
%%sql
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales,
order_num / total_sales avg_order_value
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC;
Example Code B
%%sql
WITH
customer_sales AS
(
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC
)
SELECT
country,
customer_num,
order_num,
total_sales,
total_sales / order_num avg_order_value
FROM customer_sales;
Thank you!
Depending on the DBMS some allow you to reference the alias in the calculation (in the same select) and others require you to either bring it outside in an outer query or state your previous aggregation/functions, such as counts or sums.
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales,
count(i.invoice_id) / ROUND(SUM(i.total),2) avg_order_value
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC;

How do I select Just one row for each row in a Left Join

So I have two Tables: Customers and Calls.
There is a one to many relationship between these tables. i.e. One Customer can have Many Calls
I am trying to create a left join so that I have an output where the Customers are listed only once with the most recent CallDatefrom the Calls table.
Using this diagram:
I have constructed the following SQL statement:
Select Customers.*, Calls.CallDate
From Customers
Left Join Calls
on Customers.Id=Calls.CustomerId
But this gives me a separate Customer row for each Call
How do I get just one row for each Customer based on the most recent CallDate?
A simple way is to use Outer Apply:
Select c.*, ca.*
From Customers c outer apply
(select top 1 ca.*
from Calls ca
where c.id = ca.CustomerId
order by CallDate desc
) ca;
However, if you just want the most recent call date, then aggregation is the typical approach. One method:
select c.*, max_callDate
from customers c left join
(select CustomerId, max(CallDate) as max_callDate
from calls
group by CustomerId
) ca
on c.id = ca.CustomerId;
You can use ROW_NUMBER window function:
Select Customers.*, c.CallDate
From Customers
Left Join (
SELECT CustomerId, CallDate,
ROW_NUMBER() OVER (PARTITION BY CustomerId
ORDER BY CallDate DESC) AS rn
FROM Calls
) AS c on Customers.Id = c.CustomerId AND c.rn = 1
ROW_NUMBER with a PARTITION BY clause enumerates records within CustomerId partitions. Number 1 is assigned to the record having the maximum CallDate value, due to ORDER BY CallDate DESC clause.
You can use outer apply
Select Customers.*, Calls.CallDate
From Customers
outer apply (select top 1 * from Calls c where Customers.Id=c.CustomerId order by c.CallDate desc ) as Calls
As you'll ever only want one result, you can code with CROSS APPLY:
Select Customers.*, c.CallDate
From Customers
CROSS APPLY (SELECT TOP 1 * Calls
WHERE Customers.Id=Calls.CustomerId ORDER BY CallDate DESC) c
If you expect some customers to not have calls (OUTER JOIN) you can do OUTER APPLY instead of CROSS APPLY.

Postgres - Get all clients and latest order

I have a feeling I'll feel stupid when this is answered. I have a table of clients and a table of orders. I want a query that gives me a list of all clients, and their last order info if there is one, sorted by client name.
SELECT c.id, c.name, o.order_time, o.item_name
FROM clients AS c LEFT JOIN(
SELECT client_id, max(order_time) AS order_time
FROM orders GROUP BY client_id
) AS o
ON(c.id = o.client_id)
ORDER BY UPPER(c.name)"
My issue is I get the rows I want if I remove o.item_name but the query as written isn't valid because there's no way to get o.item_name without putting it in the GROUP BY. That, of course, causes it to return multiple rows per client. Hopefully my intent is clear.
You can do this using a window function:
SELECT c.id, c.name, o.order_time, o.item_name
FROM clients AS c
LEFT JOIN (
SELECT client_id,
item_name,
order_time,
row_number() over (partition by client_id order by order_time desc) as rn
FROM orders
) AS o ON c.id = o.client_id and o.rn = 1
ORDER BY UPPER(c.name);
another option is to use Postgres' distinct on() operator which is usually faster than a solution using window functions:
SELECT c.id, c.name, o.order_time, o.item_name
FROM clients AS c
LEFT JOIN (
SELECT distinct on (client_id) client_id,
item_name,
order_time
FROM orders
order by client_id, order_time desc
) AS o ON c.id = o.client_id
ORDER BY UPPER(c.name);
In Postgres, you can use distinct on:
SELECT DISTINCT ON (c.name) c.id, c.name, o.order_time, o.item_name
FROM clients c LEFT JOIN
orders o
ON c.id = o.client_id
ORDER BY UPPER(c.name), o.order_time DESC;

User COUNT() in SQL Server query to be independent on the where clause

I am trying to use COUNT function in SQL server to count number of orders per customer independent on the condition at the end.
select u.FullName, u.Id, o.FullAddress,
Price, Payment, o.Created, StartDelivery as Delivering,
[Status], o.Id,
(select COUNT(Orders.Id)
from Orders
full outer join Users
on Users.Id = Orders.CustomerId) as CountOfOrders
from orders o
full outer join users u
on o.CustomerId = u.Id
where [Status] = 0 and Payment = 1;
With this query I am getting what it appears to be total number of orders and its the number is in every row. i would like to group it by customer but not based on the condition at the end. I need total number of orders grouped by customer.
And with this query:
select u.FullName, u.Id, o.FullAddress,
Price, Payment, o.Created, StartDelivery as Delivering,
[Status], o.Id, COUNT(*) OVER (PARTITION BY o.CustomerId) AS CountOfOrders
from orders o
full outer join users u
on o.CustomerId = u.Id
where [Status] = 2;
The count of orders is grouped, but according to the condition at the end. I need total number of orders grouped by customer regardless of that the condition at the end is. Apologies for repeating myself I just want to make sure I am clear :)
Thanks a lot for any input on this and for your time!
You can use a subquery:
select ou.*
from (select u.FullName, u.Id, o.FullAddress,
Price, Payment, o.Created, StartDelivery as Delivering,
[Status], o.Id,
COUNT(*) OVER (PARTITION BY o.CustomerId) AS CountOfOrders
from orders o full outer join
users u
on o.CustomerId = u.Id
) ou
where [Status] = 2
By the way, you definitely do not need a full outer join. If your tables are set up with proper foreign key relationships, then an inner join should suffice. Do you really have orders where the CustomerId field is not a valid value in users?

SQL Statement Help - Select latest Order for each Customer

Say I have 2 tables: Customers and Orders. A Customer can have many Orders.
Now, I need to show any Customers with his latest Order. This means if a Customer has more than one Orders, show only the Order with the latest Entry Time.
This is how far I managed on my own:
SELECT a.*, b.Id
FROM Customer a INNER JOIN Order b ON b.CustomerID = a.Id
ORDER BY b.EntryTime DESC
This of course returns all Customers with one or more Orders, showing the latest Order first for each Customer, which is not what I wanted. My mind was stuck in a rut at this point, so I hope someone can point me in the right direction.
For some reason, I think I need to use the MAX syntax somewhere, but it just escapes me right now.
UPDATE: After going through a few answers here (there's a lot!), I realized I made a mistake: I meant any Customer with his latest record. That means if he does not have an Order, then I do not need to list him.
UPDATE2: Fixed my own SQL statement, which probably caused no end of confusion to others.
I don't think you do want to use MAX() as you don't want to group the OrderID. What you need is an ordered sub query with a SELECT TOP 1.
select *
from Customers
inner join Orders
on Customers.CustomerID = Orders.CustomerID
and OrderID = (
SELECT TOP 1 subOrders.OrderID
FROM Orders subOrders
WHERE subOrders.CustomerID = Orders.CustomerID
ORDER BY subOrders.OrderDate DESC
)
Something like this should do it:
SELECT X.*, Y.LatestOrderId
FROM Customer X
LEFT JOIN (
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
) Y ON X.Customer = Y.Customer
This assumes that two orders for the same customer may have the same EntryTime, which is why MAX(OrderID) is used in subquery Y to ensure that it only occurs once per customer. The LEFT JOIN is used because you stated you wanted to show all customers - if they haven't got any orders, then the LatestOrderId will be NULL.
Hope this helps!
--
UPDATE :-) This shows only customers with orders:
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
While I see that you've already accepted an answer, I think this one is a bit more intuitive:
select a.*
,b.Id
from customer a
inner join Order b
on b.CustomerID = a.Id
where b.EntryTime = ( select max(EntryTime)
from Order
where a.Id = b.CustomerId
);
a.Id = b.CustomerId because you want the max EntryTime of all orders (in b) for the customer (a.Id).
I would have to run something like this through an execution plan to see the difference in execution, but where the TOP function is done after-the-fact and that using order by can be expensive, I believe that using max(EntryTime) would be the best way to run this.
You can use a window function.
SELECT *
FROM (SELECT a.*, b.*,
ROW_NUMBER () OVER (PARTITION BY a.ID ORDER BY b.orderdate DESC,
b.ID DESC) rn
FROM customer a, ORDER b
WHERE a.ID = b.custid)
WHERE rn = 1
For each customer (a.id) it sorts all orders and discards everything but the latest.
ORDER BY clause includes both order date and entry id, in case there are multiple orders on the same date.
Generally, window functions are much faster than any look-ups using MAX() on large number of records.
This query is much faster than the accepted answer :
SELECT c.id as customer_id,
(SELECT co.id FROM customer_order co WHERE
co.customer_id=c.id
ORDER BY some_date_column DESC limit 1) as last_order_id
FROM customer c
SELECT Cust.*, Ord.*
FROM Customers cust INNER JOIN Orders ord ON cust.ID = ord.CustID
WHERE ord.OrderID =
(SELECT MAX(OrderID) FROM Orders WHERE Orders.CustID = cust.ID)
Something like:
SELECT
a.*
FROM
Customer a
INNER JOIN Order b
ON a.OrderID = b.Id
INNER JOIN (SELECT Id, max(EntryTime) as EntryTime FROM Order b GROUP BY Id) met
ON
b.EntryTime = met.EntryTime and b.Id = met.Id
One approach that I haven't seen above yet:
SELECT
C.*,
O1.ID
FROM
dbo.Customers C
INNER JOIN dbo.Orders O1 ON
O1.CustomerID = C.ID
LEFT OUTER JOIN dbo.Orders O2 ON
O2.CustomerID = C.ID AND
O2.EntryTime > O1.EntryTime
WHERE
O2.ID IS NULL
This (as well as the other solutions I believe) assumes that no two orders for the same customer can have the exact same entry time. If that's a concern then you would have to make a choice as to what determines which one is the "latest". If that's a concern post a comment and I can expand the query if needed to account for that.
The general approach of the query is to find the order for a customer where there is not another order for the same customer with a later date. It is then the latest order by definition. This approach often gives better performance then the use of derived tables or subqueries.
A simple max and "group by" is sufficient.
select c.customer_id, max(o.order_date)
from customers c
inner join orders o on o.customer_id = c.customer_id
group by c.customer_id;
No subselect needed, which slows things down.