I have a feeling I'll feel stupid when this is answered. I have a table of clients and a table of orders. I want a query that gives me a list of all clients, and their last order info if there is one, sorted by client name.
SELECT c.id, c.name, o.order_time, o.item_name
FROM clients AS c LEFT JOIN(
SELECT client_id, max(order_time) AS order_time
FROM orders GROUP BY client_id
) AS o
ON(c.id = o.client_id)
ORDER BY UPPER(c.name)"
My issue is I get the rows I want if I remove o.item_name but the query as written isn't valid because there's no way to get o.item_name without putting it in the GROUP BY. That, of course, causes it to return multiple rows per client. Hopefully my intent is clear.
You can do this using a window function:
SELECT c.id, c.name, o.order_time, o.item_name
FROM clients AS c
LEFT JOIN (
SELECT client_id,
item_name,
order_time,
row_number() over (partition by client_id order by order_time desc) as rn
FROM orders
) AS o ON c.id = o.client_id and o.rn = 1
ORDER BY UPPER(c.name);
another option is to use Postgres' distinct on() operator which is usually faster than a solution using window functions:
SELECT c.id, c.name, o.order_time, o.item_name
FROM clients AS c
LEFT JOIN (
SELECT distinct on (client_id) client_id,
item_name,
order_time
FROM orders
order by client_id, order_time desc
) AS o ON c.id = o.client_id
ORDER BY UPPER(c.name);
In Postgres, you can use distinct on:
SELECT DISTINCT ON (c.name) c.id, c.name, o.order_time, o.item_name
FROM clients c LEFT JOIN
orders o
ON c.id = o.client_id
ORDER BY UPPER(c.name), o.order_time DESC;
Related
I am trying to get all customers with their latest payment transaction, including customers without any transaction:
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON c.customer_id = p.customer_id
The above query returns NULL for p.transaction_no, p.amount, p.transaciton_datetime in every row. But I can make sure that there are transactions made by customers in tbl_payment_transactions.
You want the subquery to be run once per each different row of the driving table tbl_customers. This is called a lateral subquery and takes the form:
SELECT
c.customer_id, c.phone_number, c.email,
p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN LATERAL (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions t
WHERE c.customer_id = t.customer_id
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON true
The Impaler provided the correct form with a LATERAL subquery.
Alternatively, you can use DISTINCT ON in a subquery and a plain LEFT JOIN.
Performance of the latter can be better while retrieving all (or most) customers, and if there are only few transactions per customer and/or you don't have a multicolumn index on (customer_id, payment_transaction_id) or (customer_id, payment_transaction_id DESC):
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY customer_id, payment_transaction_id DESC
) p USING (customer_id);
About performance aspects:
Optimize GROUP BY query to retrieve latest row per user
Select first row in each GROUP BY group?
For example, let's say that I have a table of purchases, and I want to return a list of all purchases that were made on the same day as the 1st purchase, grouping by individual customers. I don't believe that I can use min(purchase_date), and group by customer, since that will return just one row. How would I go about doing this?
Here's an example. I believe this would return only 1 row, whereas I want to return all orders that fall on the initial purchase date.
select c.name, min(o.purchase_date)
from customers c
join orders o on c.id = o.customer_id
group by c.name
Qualify can come in handy
select c.name, o.purchase_date
from customers c
join orders o on c.id = o.customer_id
qualify o.purchase_date = min(o.purchase_date) over (partition by c.name)
select c.name, min(o.purchase_date)
from(
select o.customer_id,o.purchase_date,
row_number()
over(partition by customer_id, date order by id) as firstOrder
from orders o
)t
join customers c on c.id = o.customer_id and o.firstOrder = 1
Using QUALIFY and RANK:
SELECT *
FROM customers c
JOIN orders o
ON c.id = o.customer_id
QUALIFY RANK() OVER(PARTITION BY c.name ORDER BY o.purchase_date) = 1
Imagine we have two tables: customers and purchases.
Purchases has a customerID, purchaseDateTime, etc.
What is the best way to select the most recent purchase for all customers in hive or impala SQL?
I've seen this query:
With recent as (
select customerID, max(purchaseDateTime) as dt
from purchases group by customerID
)
Select *
from customer c
join recent r
on c.customerID = r.customerID
join purchases p
on r.customerId = p.customerid and
p.purchaseDateTime = dt
Seems like that's not as efficient as it could be...
I would use row_number():
Select c.*, p.*
from customer c join
(select p.*,
row_number() over (partition by p.customerid order by p.purchaseDateTime desc) as seqnum
from purchases p
) p
on c.customerId = p.customerid and p.purchaseDateTime = dt
where seqnum = 1;
row_number() is ANSI standard functionality, so it is standard SQL. In general, it should be faster than doing an explicit group by and join.
One difference is that -- in the event of ties -- this returns one row. Your query will return multiple rows. If you want that behavior, change the row_number() to rank().
There are two tables:
Clients (id, name)
Order (id, id_client, name), where id_client - foreign key.
Write a query that selects the identifier and name of the first table and the number of records in the second table, associated with them. The result should be sorted by surname in descending order.
I've tried
SELECT
Clients.id, Clients.name, count(id)
FROM clients
INNER JOIN Order ON Clients.id = Order.id_client
GROUP BY
Clients.id, Clients.name
ORDER BY
Clients.name DESC
But it doesn't work. What is wrong?
SELECT
c.ID,
c.Name,
COUNT(o.ID)
FROM
Clients c
LEFT JOIN [Order] o
ON
o.id_client = c.id
GROUP BY
c.ID,
c.Name
ORDER BY
c.Name DESC
SELECT Clients.id, Clients.name, count(client.id) FROM clients INNER JOIN Order on Clients.id=Order.id_client GROUP BY Clients.id, Clients.name ORDER BY Clients.name DESC
Change count(id) to
count(Clients.id) or count(Order.id)
I don't know which table you need count(id) from. I hope you understand where the issue is.
SELECT
c.ID,
c.Name,
COUNT(o.ID)
FROM
Clients c,
Order o
WHERE o.id_client = c.id
GROUP BY
c.ID
c.Name
I'm trying to do something like:
SELECT c.id, c.name, COUNT(orders.id)
FROM customers c
JOIN orders o ON o.customerId = c.id
However, SQL will not allow the COUNT function. The error given at execution is that c.Id is not valid in the select list because it isn't in the group by clause or isn't aggregated.
I think I know the problem, COUNT just counts all the rows in the orders table. How can I make a count for each customer?
EDIT
Full query, but it's in dutch... This is what I tried:
select k.ID,
Naam,
Voornaam,
Adres,
Postcode,
Gemeente,
Land,
Emailadres,
Telefoonnummer,
count(*) over (partition by k.id) as 'Aantal bestellingen',
Kredietbedrag,
Gebruikersnaam,
k.LeverAdres,
k.LeverPostnummer,
k.LeverGemeente,
k.LeverLand
from klanten k
join bestellingen on bestellingen.klantId = k.id
No errors but no results either..
When using an aggregate function like that, you need to group by any columns that aren't aggregates:
SELECT c.id, c.name, COUNT(orders.id)
FROM customers c
JOIN orders o ON o.customerId = c.id
GROUP BY c.id, c.name
If you really want to be able to select all of the columns in Customers without specifying the names (please read this blog post in full for reasons to avoid this, and easy workarounds), then you can do this lazy shorthand instead:
;WITH o AS
(
SELECT CustomerID, CustomerCount = COUNT(*)
FROM dbo.Orders GROUP BY CustomerID
)
SELECT c.*, o.OrderCount
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON c.id = o.CustomerID;
EDIT for your real query
SELECT
k.ID,
k.Naam,
k.Voornaam,
k.Adres,
k.Postcode,
k.Gemeente,
k.Land,
k.Emailadres,
k.Telefoonnummer,
[Aantal bestellingen] = o.klantCount,
k.Kredietbedrag,
k.Gebruikersnaam,
k.LeverAdres,
k.LeverPostnummer,
k.LeverGemeente,
k.LeverLand
FROM klanten AS k
INNER JOIN
(
SELECT klantId, klanCount = COUNT(*)
FROM dbo.bestellingen
GROUP BY klantId
) AS o
ON k.id = o.klantId;
I think this solution is much cleaner than grouping by all of the columns. Grouping on the orders table first and then joining once to each customer row is likely to be much more efficient than joining first and then grouping.
The following will count the orders per customer without the need to group the overall query by customer.id. But this also means that for customers with more than one order, that count will repeated for each order.
SELECT c.id, c.name, COUNT(orders.id) over (partition by c.id)
FROM customers c
JOIN orders ON o.customerId = c.id