Let's say I have 2 tables:
tOrder tOrderLine
------ ----------
OrderID OrderID
Other Fields ProductID
Other Fields
I want to get back all orders that have a certain 2 products on the same order. Currently, I'm doing:
select o.OrderID
from tOrder o
inner join tOrderLine ol on o.OrderID = ol.OrderID
inner join tOrderLine ol2 on o.OrderID = ol2.OrderID
where ol.ProductID = 67
and ol2.ProductID = 68
This works and gives me the results I need, but it seems a little hokey. Also, if I wanted to get all orders that have a set of 10 items, the query is going to get monstrous.
Is there a better way to do what I'm looking for?
There are 2 simple ways I can think to do it:
Inner join to tOrderLine twice. Exactly how you did it. I actually would have never thought of that. It was an interesting solution when I saw it.
EXISTS clauses. For each tOrder row, check if there exists a row in tOrderLine corresponding to it and one of the products.
For example:
select tOrder.OrderID
from tOrder o
where exists (select 1 from tOrderLine ol where ol.OrderID = o.OrderID and ol.ProductID = 67)
and exists (select 1 from tOrderLine ol where ol.OrderID = o.OrderID and ol.ProductID = 68)
This is how I would write it, simply because it's more obvious what the query is doing. On the other hand, yours seems easier for the parser to optimize.
I'd be curious to know if you look at the execution plan of either one if they are the same. The correlated sub-queries could be rewritten internally to be inner joins.
You could try to solve it using a distinct count on a having:
select t.OrderId
, count(*)
from tOrder t
join tOrderLine o
on t.OrderID = o.OrderID
where o.ProductId in (67, 68)
group
by t.OrderId
having count(distinct o.ProductId) = 2
Try this
Select Distinct a.*
From tOrder a
inner join tOrderLine b on a.orderId = b.orderId
Where a.orderId in
(
Select orderId
From tOrderLine
Where
productId in (68,69)
--replace with set of products you want
Group by orderId
Having Count(Distinct productId) = 2
---replace with 10 if you want to return orders with 10 items
)
SQL DEMO
Related
I'm practicing questions for the book "SQL Practice Problems: 57 beginning, intermediate, and advanced challenges for you to solve using a “learn-by-doing” approach ". Question 31 is -
Customers with no orders for EmployeeID 4
One employee (Margaret Peacock, EmployeeID 4) has placed the most orders. However, there are some customers who've never placed an order with her. Show only those customers who have never placed an order with her.
The solution I did creates a temporary table "cte" and joints it with an existing table. Not pretty or readable actually -
with cte as
(Select Customers.CustomerID
from customers
where CustomerID not in
(select Orders.CustomerID from orders where orders.EmployeeID = '4'))
select *
from cte left join
(select CustomerID from Orders where Orders.EmployeeID = '4') O
on cte.CustomerID = O.CustomerID
I found the following solution online -
SELECT c.CustomerID, o.CustomerID
FROM Customers AS c
LEFT JOIN Orders AS o ON o.CustomerID = c.CustomerID AND o.EmployeeID = 4
WHERE o.CustomerID IS NULL;
Which is nicer.
My question - when can I use OR, AND clauses in a JOIN? What are the advantages? Is it the fact that a JOIN is executed before the where clause?
Thanks,
Asaf
A JOIN condition can contain any boolean comparison, even subqueries using EXISTS and correlated subqueries. There is no limitation on what can be expressed.
Just a note, however. = and AND are good for performance. Inequalities tend to be performance killers.
As for your particular problem, I think the following is a more direct interpretation of the question:
SELECT c.CustomerID
FROM Customers c
WHERE NOT EXISTS (SELECT 1
FROM Orders o
WHERE o.CustomerID = c.CustomerID AND
o.EmployeeID = 4
);
That is, get all customers for whom no order exists with employee 4.
Generally I'd recommend that you always choose the most readable version of the query unless you can actually measure a performance difference with realistic data. The cost based optimiser should pick a good way of executing the query to return the results you want in this case.
For me the JOIN is a lot more readable than the CTE.
Here's is another Solution
SELECT * FROM(
(SELECT Customers.CustomerID AS Customers_ID
FROM Customers) AS P
LEFT JOIN
(Select Orders.CustomerID from Orders
where Orders.EmployeeID=4) as R
on R.CustomerID = P.Customers_ID
)
WHERE R.CustomerID IS NULL
ORDER BY R.CustomerID DESC
I know similar questions like this have been asked before, but I have not seen one for more than 2 tables. And there seems to be a difference.
I have three tables from which I need fields, customers where I need customerID and orderID from, orders from which I get customerID and orderID and lineitems from which I get orderID and quantity (= quantity ordered).
I want to find out how many customers bought more than 2 of the same item, so basically quantity > 2 with:
SELECT COUNT(DISTINCT custID)
FROM customers
WHERE EXISTS(
SELECT *
FROM customers C, orders O, lineitems L
WHERE C.custID = O.custID AND O.orderID = L.orderID AND L.quantity > 2
);
I do not understand why it is returning me the count of all rows. I am correlating the subqueries before checking the >2 condition, am I not?
I am a beginner at SQL, so I'd be thankful if you could explain it to me fundamentally, if necessary. Thanks.
You don't have to repeat customers table in the EXISTS subquery. This is the idea of correlation: use the table of the outer query in order to correlate.
SELECT COUNT(DISTINCT custID)
FROM customers c
WHERE EXISTS(
SELECT *
FROM orders O
JOIN lineitems L ON O.orderID = L.orderID
WHERE C.custID = O.custID AND L.quantity > 2
);
I would approach this as two aggregations:
select count(distinct customerid)
from (select o.customerid, l.itemid, count(*) as cnt
from lineitems li join
orders o
on o.orderID = l.orderId
group by o.customerid, l.itemid
) ol
where cnt >= 2;
The inner query counts the number of items that each customer has purchased. The outer counts the number of customers.
EDIT:
I may have misunderstood the question for the above answer. If you just want where quantity >= 2, then that is much easier:
select count(distinct o.customerid)
from lineitems li join
orders o
on o.orderID = l.orderId
where l.quantity >= 2;
This is probably the simplest way to express the query.
I suggest you to use "joins" ,
Try this
select
count(*)
From
orders o
inner join
lineitems l
on
l.orderID = o.orderID
where
l.quantity > 2
I'm trying to minimize the cost of this query as much as possible without creating any indexes.
This is the original query with a cost of 599:
SELECT DISTINCT OL.PRODUCT_ID
FROM ORDERS O JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
JOIN CUSTOMERS C ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
WHERE C.CUSTOMER_ID = 474871
OR UPPER(C.FIRST_NAME) = 'EDGAR';
This is what I've done so far. The cost is now 344:
SELECT OL.PRODUCT_ID
FROM ORDER_LINES OL
WHERE EXISTS
(SELECT ORDER_ID
FROM ORDERS
WHERE CUSTOMER_ID = 474871
AND ORDER_ID = OL.ORDER_ID)
OR EXISTS
(SELECT ORDER_ID
FROM ORDERS
WHERE CUSTOMER_ID IN
(SELECT CUSTOMER_ID
FROM CUSTOMERS
WHERE UPPER(FIRST_NAME) = 'EDGAR')
AND ORDER_ID = OL.ORDER_ID);
Is there anything that stands out that I may try to drive down the cost more?
Here is a screen shot of the explain plan:
Screenshot of ERD:
Looking at the cost is misleading and may lead you to make changes that aren't actually beneficial. To quote Tom Kyte, "You cannot compare the cost of 2 queries with each other. ... they might as well be random numbers."
The best way to check query performance is to actually time the query, ideally with realistic data. You should also be wary of premature optimization. Your first query is pretty straight-forward; I would stick with it unless a performance issue manifests.
SELECT OL.PRODUCT_ID
FROM ORDER_LINES OL
WHERE OL.ORDER_ID IN
(SELECT ORDER_ID FROM ORDERS
WHERE CUSTOMER_ID IN (SELECT CUSTOMER_ID FROM CUSTOMERS
WHERE CUSTOMER_ID = 474871 OR UPPER(FIRST_NAME) = 'EDGAR')
);
I suppose there is an index on OL.ORDER_ID (now you have FULL SCAN of ORDER_LINES)
I'm curious if the engine is smart enough to apply the where clause before the joins... If it's doing it after the join, then the results it has to scan are larger than they need to be... What happens if you move the limiting criteria to the join so it HAS to be evaluated before the join occurs. (fully expect this to be 599 or less. just don't know if it will be less...
SELECT OL.PRODUCT_ID
FROM CUSTOMERS C
INNER JOIN ORDERS O
ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
AND (C.customer_ID = 47871 OR upper(C.First_name) = 'EDGAR')
INNER JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
INNER JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
GROUP BY OL.Product_ID
I wonder if the OR is causing the problem....
if you run it w/o the or how much cost is reduced
and then what happens if you union the two sets instead of using an or.
SELECT OL.PRODUCT_ID
FROM CUSTOMERS C
INNER JOIN ORDERS O
ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
INNER JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
INNER JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
WHERE C.customer_ID = 47871
UNION
SELECT OL.PRODUCT_ID
FROM CUSTOMERS C
INNER JOIN ORDERS O
ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
INNER JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
INNER JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
WHERE upper(C.First_name) = 'EDGAR')
i have the following database structure:
And i have the task: select the buyer's name, Id of Order and book's name for orders that contains less than 3 books. I solved this task so:
SELECT b.Name, O.OrderId, bk.Name
FROM Orders O
JOIN Buyers b ON b.Id = O.BuyerId
JOIN BooksInOrder bo ON bo.OrderId = O.OrderId
JOIN Books bk ON bk.Id = bo.BookId
WHERE O.OrderId IN
(
SELECT OrderId
FROM BooksInOrder
GROUP BY ORDERID
HAVING COUNT(*) < 3
)
Is my SQL the most optimal way to perform what I am trying to achieve?
In general, the only way to understand performance is to run queries on your system with your data. Okay, there are some things that hurt performance (such as an unnecessary select distinct or union instead of union all). Your query doesn't have those, however.
One obvious simplification is to move the where clause to the from clause:
SELECT b.Name, O.OrderId, bk.Name
FROM (SELECT OrderId
FROM BooksInOrder
GROUP BY ORDERID
HAVING COUNT(*) < 3
) o JOIN
Buyers b
ON b.Id = O.BuyerId JOIN
BooksInOrder bo
ON bo.OrderId = O.OrderId JOIN
Books bk
ON bk.Id = bo.BookId;
Do note that although this might look simpler, the performance may not be better.
Similarly, you could use window functions:
SELECT Name, OrderId, BookName
FROM (SELECT b.Name, O.OrderId, bk.Name as BookName, count(*) over (partition by orderid) as cnt
FROM Orders O JOIN
Buyers b
ON b.Id = O.BuyerId JOIN
BooksInOrder bo
ON bo.OrderId = O.OrderId JOIN
Books bk
ON bk.Id = bo.BookId
) bb
WHERE cnt <= 3;
But your original formulation may still work best.
I have the following tables:
Orders, Notes, Permits
The following columns are in each table:
Orders = ID
Notes = ID, RelatedID, Note, Timestamp
Permits = ID, OrderId
I have the following query
SELECT o.id
, op.id
, n.timestamp
FROM [tblOrders] o
INNER JOIN [tblNotes] n ON n.[RelatedID] = o.[ID]
INNER JOIN [tblPermits] op ON o.[id] = op.[OrderID]
WHERE n.[Text] LIKE 'Line item is created%'
An order has 1 to many permits and a order has 1 to many notes
The problem here is that the notes relate to the order and not the individual permit so when you join o.id with n.relatedID if there is more that 1 permit in an order it will actually show 4 records instead of 2 since it joins twice for each permit since the orderID is the same. How can I get this to only return 2 records?
The issue is using JOINs risks duplication in the resultset because there'll be a record for each supporting record in the tblnotes. My first recommendation is to re-write so you aren't using a JOIN:
Using EXISTS:
SELECT o.id,
p.id
FROM tblorders o
JOIN tblpermits p ON p.orderid = o.id
WHERE EXISTS(SELECT NULL
FROM tblnotes n
WHERE n.[Text] LIKE 'Line item is created%'
AND n.relatedid = o.id)
Using IN:
SELECT o.id,
p.id
FROM tblorders o
JOIN tblpermits p ON p.orderid = o.id
WHERE o.id IN (SELECT n.relatedid
FROM tblnotes n
WHERE n.[Text] LIKE 'Line item is created%')
One way would be
SELECT DISTINCT
o.id
,op.id
.....
....
SELECT o.id ,op.id
FROM [tblOrders] o
JOIN [tblPermits] op
ON op.[OrderID] = o.[id]
WHERE o.id IN
(
SELECT n.[RelatedID]
FROM tblNotes n
WHERE n.[Text] LIKE 'Line item is created%'
)