i have the following database structure:
And i have the task: select the buyer's name, Id of Order and book's name for orders that contains less than 3 books. I solved this task so:
SELECT b.Name, O.OrderId, bk.Name
FROM Orders O
JOIN Buyers b ON b.Id = O.BuyerId
JOIN BooksInOrder bo ON bo.OrderId = O.OrderId
JOIN Books bk ON bk.Id = bo.BookId
WHERE O.OrderId IN
(
SELECT OrderId
FROM BooksInOrder
GROUP BY ORDERID
HAVING COUNT(*) < 3
)
Is my SQL the most optimal way to perform what I am trying to achieve?
In general, the only way to understand performance is to run queries on your system with your data. Okay, there are some things that hurt performance (such as an unnecessary select distinct or union instead of union all). Your query doesn't have those, however.
One obvious simplification is to move the where clause to the from clause:
SELECT b.Name, O.OrderId, bk.Name
FROM (SELECT OrderId
FROM BooksInOrder
GROUP BY ORDERID
HAVING COUNT(*) < 3
) o JOIN
Buyers b
ON b.Id = O.BuyerId JOIN
BooksInOrder bo
ON bo.OrderId = O.OrderId JOIN
Books bk
ON bk.Id = bo.BookId;
Do note that although this might look simpler, the performance may not be better.
Similarly, you could use window functions:
SELECT Name, OrderId, BookName
FROM (SELECT b.Name, O.OrderId, bk.Name as BookName, count(*) over (partition by orderid) as cnt
FROM Orders O JOIN
Buyers b
ON b.Id = O.BuyerId JOIN
BooksInOrder bo
ON bo.OrderId = O.OrderId JOIN
Books bk
ON bk.Id = bo.BookId
) bb
WHERE cnt <= 3;
But your original formulation may still work best.
Related
Suppose I have these three tables:
I want to get, for all products, it's product_id and the client that bougth it most times (the biggest client of the product).
I solved it like this:
SELECT
product_id AS product,
(SELECT TOP 1 client_id FROM Bill_Item, Bill
WHERE Bill_Item.product_id = p.product_id
and Bill_Item.bill_id = Bill.bill_id
GROUP BY
client_id
ORDER BY
COUNT(*) DESC
) AS client
FROM Product p
Do you know a better way?
the inner query will give you the ranking. The outer query will give you the client that puchase the most for a product
SELECT *
(
SELECT i.product_id, b.client_id,
r = row_number() over (partition by i.product_id
order by count(*) desc)
FROM Bill b
INNER JOIN Bill_Item i ON b.bill_id = i.bill_id
GROUP BY i.product_id, b.client_id
) d
WHERE r = 1
I was going to submit pretty much the same thing as #Squirrell only with a Common Table Expression [CTE] rather than a derived table. So I wont duplicate that but there are some learning points concerning your query. First is IMPLICIT JOINS such as FROM Bill_Item, Bill are really easy to have uintended consequences (one of many questions: Queries that implicit SQL joins can't do?) Next for the Calculated column you can actually do this in a OUTER APPLY or CROSS APPLY which is a very useful technique.
So you could re-write your method as follows:
SELECT *
FROM
Product p
OUTER APPLY (SELECT TOP 1 b.client_id
FROM
Bill_Item bi
INNER JOIN Bill b
ON bi.bill_id = b.bill_id
WHERE
bi.product_id = p.product_id
GROUP BY
b.client_id
ORDER BY
COUNT(*) DESC) c
And to show you how squirell's answer can still include products that have never been sold all you need to do is join Products and LEFT JOIN to other tables:
;WITH cte AS (
SELECT
p.product_id
,b.client_id
,ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY COUNT(*) DESC) as RowNumber
FROM
Product p
LEFT JOIN Bill_Item bi
ON p.product_id = bi.product_id
LEFT JOIN Bill b
ON bi.bill_id = b.bill_id
GROUP BY
p.product_id
,b.client_id
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
Techniques used in some of these that are useful.
CTE
APPLY (Outer & Cross)
Window Functions
Squirrel's answer doesn't return products that have never been sold. If you want to include those, then your approach is ok, although I would write the query as:
SELECT product_id as product,
(SELECT TOP 1 b.client_id
FROM Bill_Item bi JOIN
Bill b
ON bi.bill_id = b.bill_id
WHERE Bill_Item.product_id = p.product_id
GROUP BY client_id
ORDER BY COUNT(*) DESC
) as client
FROM Product p;
You can also express this using APPLY, but a correlated subquery is also fine.
Note the correct use of the explicit JOIN syntax.
I have a table for "branches", "orders" and "products. Each order and product are connected to a branch with branch_id. I need an sql statement to get a list of all branches with a field for how many orders and a field for how many products.
This works:
SELECT b.*, COUNT(o.id) AS orderCount FROM branches b
LEFT JOIN orders o ON (o.branch_id = b.id) GROUP BY b.id
but it only gets the amount of orders, not products.
If I change it to add amount of products, the amounts are wrong because it's getting amount of orders * amount of products.
How can I get the amount of both the orders and the products in the same SQL statement?
Something like this should work (on sql server at least - you didn't specify your engine).
SELECT
b.id
,COUNT(distinct o.id) AS orderCount
,COUNT(distinct p.id) AS productCount
FROM branches b
LEFT JOIN orders o
ON (o.branch_id = b.id)
left join products p
on p.product_id=b.id)
GROUP BY
b.id
Please try:
select
*,
(select COUNT(*) from Orders o where o.branch_id=b.id) OrderCount,
(select COUNT(*) from Products p where o.branch_id=p.id) ProductCount
From
branches b
I'm trying to do something like:
SELECT c.id, c.name, COUNT(orders.id)
FROM customers c
JOIN orders o ON o.customerId = c.id
However, SQL will not allow the COUNT function. The error given at execution is that c.Id is not valid in the select list because it isn't in the group by clause or isn't aggregated.
I think I know the problem, COUNT just counts all the rows in the orders table. How can I make a count for each customer?
EDIT
Full query, but it's in dutch... This is what I tried:
select k.ID,
Naam,
Voornaam,
Adres,
Postcode,
Gemeente,
Land,
Emailadres,
Telefoonnummer,
count(*) over (partition by k.id) as 'Aantal bestellingen',
Kredietbedrag,
Gebruikersnaam,
k.LeverAdres,
k.LeverPostnummer,
k.LeverGemeente,
k.LeverLand
from klanten k
join bestellingen on bestellingen.klantId = k.id
No errors but no results either..
When using an aggregate function like that, you need to group by any columns that aren't aggregates:
SELECT c.id, c.name, COUNT(orders.id)
FROM customers c
JOIN orders o ON o.customerId = c.id
GROUP BY c.id, c.name
If you really want to be able to select all of the columns in Customers without specifying the names (please read this blog post in full for reasons to avoid this, and easy workarounds), then you can do this lazy shorthand instead:
;WITH o AS
(
SELECT CustomerID, CustomerCount = COUNT(*)
FROM dbo.Orders GROUP BY CustomerID
)
SELECT c.*, o.OrderCount
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON c.id = o.CustomerID;
EDIT for your real query
SELECT
k.ID,
k.Naam,
k.Voornaam,
k.Adres,
k.Postcode,
k.Gemeente,
k.Land,
k.Emailadres,
k.Telefoonnummer,
[Aantal bestellingen] = o.klantCount,
k.Kredietbedrag,
k.Gebruikersnaam,
k.LeverAdres,
k.LeverPostnummer,
k.LeverGemeente,
k.LeverLand
FROM klanten AS k
INNER JOIN
(
SELECT klantId, klanCount = COUNT(*)
FROM dbo.bestellingen
GROUP BY klantId
) AS o
ON k.id = o.klantId;
I think this solution is much cleaner than grouping by all of the columns. Grouping on the orders table first and then joining once to each customer row is likely to be much more efficient than joining first and then grouping.
The following will count the orders per customer without the need to group the overall query by customer.id. But this also means that for customers with more than one order, that count will repeated for each order.
SELECT c.id, c.name, COUNT(orders.id) over (partition by c.id)
FROM customers c
JOIN orders ON o.customerId = c.id
I have some problems with an SQL statement. I need to find the next DeliveryDate for each Customer in the following setup.
Tables
Customer (id)
DeliveryOrder (id, deliveryDate)
DeliveryOrderCustomer (customerId, deliveryOrderId)
Each Customer may have several DeliveryOrders on the same deliveryDate. I just can't figure out how to only get one deliveryDate for each customer. The date should be the next upcoming DeliveryDate after today. I feel like I would need some sort of "for each" here but I don't know how to solve it in SQL.
Another simpler version
select c.id, min(o.date)
from customer c
inner join deliveryordercustomer co o on co.customerId = c.id
inner join deliveryorder o on co.deliveryOrderId = o.id and o.date>getdate()
group by c.id
This would give the expected results using a subselect. Take into account that current_date may be rdbms specific, it works for Oracle.
select c.id, o.date
from customer c
inner join deliveryordercustomer co o on co.customerId = c.id
inner join deliveryorder o on co.deliveryOrderId = o.id
where o.date =
(select min(o2.date)
from deliveryorder o2
where o2.id = co.deliveryOrderId and o2.date > current_date)
You need to use a group by. There's a lot of ways to do this, here's my solution that takes into account multiple orders on same day for customer, and allows you to query different delivery slots, first, second etc. This assumes Sql Server 2005 and above.
;with CustomerDeliveries as
(
Select c.id, do.deliveryDate, Rank()
over (Partition BY c.id order by do.deliveryDate) as DeliverySlot
From Customer c
inner join DeliveryOrderCustomer doc on c.id = doc.customerId
inner join DeliveryOrder do on do.id = doc.deliveryOrderId
Where do.deliveryDate>GETDATE()
Group By c.id, do.deliveryDate
)
Select id, deliveryDate
From CustomerDeliveries
Where DeliverySlot = 1
Say I have 2 tables: Customers and Orders. A Customer can have many Orders.
Now, I need to show any Customers with his latest Order. This means if a Customer has more than one Orders, show only the Order with the latest Entry Time.
This is how far I managed on my own:
SELECT a.*, b.Id
FROM Customer a INNER JOIN Order b ON b.CustomerID = a.Id
ORDER BY b.EntryTime DESC
This of course returns all Customers with one or more Orders, showing the latest Order first for each Customer, which is not what I wanted. My mind was stuck in a rut at this point, so I hope someone can point me in the right direction.
For some reason, I think I need to use the MAX syntax somewhere, but it just escapes me right now.
UPDATE: After going through a few answers here (there's a lot!), I realized I made a mistake: I meant any Customer with his latest record. That means if he does not have an Order, then I do not need to list him.
UPDATE2: Fixed my own SQL statement, which probably caused no end of confusion to others.
I don't think you do want to use MAX() as you don't want to group the OrderID. What you need is an ordered sub query with a SELECT TOP 1.
select *
from Customers
inner join Orders
on Customers.CustomerID = Orders.CustomerID
and OrderID = (
SELECT TOP 1 subOrders.OrderID
FROM Orders subOrders
WHERE subOrders.CustomerID = Orders.CustomerID
ORDER BY subOrders.OrderDate DESC
)
Something like this should do it:
SELECT X.*, Y.LatestOrderId
FROM Customer X
LEFT JOIN (
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
) Y ON X.Customer = Y.Customer
This assumes that two orders for the same customer may have the same EntryTime, which is why MAX(OrderID) is used in subquery Y to ensure that it only occurs once per customer. The LEFT JOIN is used because you stated you wanted to show all customers - if they haven't got any orders, then the LatestOrderId will be NULL.
Hope this helps!
--
UPDATE :-) This shows only customers with orders:
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
While I see that you've already accepted an answer, I think this one is a bit more intuitive:
select a.*
,b.Id
from customer a
inner join Order b
on b.CustomerID = a.Id
where b.EntryTime = ( select max(EntryTime)
from Order
where a.Id = b.CustomerId
);
a.Id = b.CustomerId because you want the max EntryTime of all orders (in b) for the customer (a.Id).
I would have to run something like this through an execution plan to see the difference in execution, but where the TOP function is done after-the-fact and that using order by can be expensive, I believe that using max(EntryTime) would be the best way to run this.
You can use a window function.
SELECT *
FROM (SELECT a.*, b.*,
ROW_NUMBER () OVER (PARTITION BY a.ID ORDER BY b.orderdate DESC,
b.ID DESC) rn
FROM customer a, ORDER b
WHERE a.ID = b.custid)
WHERE rn = 1
For each customer (a.id) it sorts all orders and discards everything but the latest.
ORDER BY clause includes both order date and entry id, in case there are multiple orders on the same date.
Generally, window functions are much faster than any look-ups using MAX() on large number of records.
This query is much faster than the accepted answer :
SELECT c.id as customer_id,
(SELECT co.id FROM customer_order co WHERE
co.customer_id=c.id
ORDER BY some_date_column DESC limit 1) as last_order_id
FROM customer c
SELECT Cust.*, Ord.*
FROM Customers cust INNER JOIN Orders ord ON cust.ID = ord.CustID
WHERE ord.OrderID =
(SELECT MAX(OrderID) FROM Orders WHERE Orders.CustID = cust.ID)
Something like:
SELECT
a.*
FROM
Customer a
INNER JOIN Order b
ON a.OrderID = b.Id
INNER JOIN (SELECT Id, max(EntryTime) as EntryTime FROM Order b GROUP BY Id) met
ON
b.EntryTime = met.EntryTime and b.Id = met.Id
One approach that I haven't seen above yet:
SELECT
C.*,
O1.ID
FROM
dbo.Customers C
INNER JOIN dbo.Orders O1 ON
O1.CustomerID = C.ID
LEFT OUTER JOIN dbo.Orders O2 ON
O2.CustomerID = C.ID AND
O2.EntryTime > O1.EntryTime
WHERE
O2.ID IS NULL
This (as well as the other solutions I believe) assumes that no two orders for the same customer can have the exact same entry time. If that's a concern then you would have to make a choice as to what determines which one is the "latest". If that's a concern post a comment and I can expand the query if needed to account for that.
The general approach of the query is to find the order for a customer where there is not another order for the same customer with a later date. It is then the latest order by definition. This approach often gives better performance then the use of derived tables or subqueries.
A simple max and "group by" is sufficient.
select c.customer_id, max(o.order_date)
from customers c
inner join orders o on o.customer_id = c.customer_id
group by c.customer_id;
No subselect needed, which slows things down.