SQL - INNER JOIN with AND vs using sub-query - sql

I'm practicing questions for the book "SQL Practice Problems: 57 beginning, intermediate, and advanced challenges for you to solve using a “learn-by-doing” approach ". Question 31 is -
Customers with no orders for EmployeeID 4
One employee (Margaret Peacock, EmployeeID 4) has placed the most orders. However, there are some customers who've never placed an order with her. Show only those customers who have never placed an order with her.
The solution I did creates a temporary table "cte" and joints it with an existing table. Not pretty or readable actually -
with cte as
(Select Customers.CustomerID
from customers
where CustomerID not in
(select Orders.CustomerID from orders where orders.EmployeeID = '4'))
select *
from cte left join
(select CustomerID from Orders where Orders.EmployeeID = '4') O
on cte.CustomerID = O.CustomerID
I found the following solution online -
SELECT c.CustomerID, o.CustomerID
FROM Customers AS c
LEFT JOIN Orders AS o ON o.CustomerID = c.CustomerID AND o.EmployeeID = 4
WHERE o.CustomerID IS NULL;
Which is nicer.
My question - when can I use OR, AND clauses in a JOIN? What are the advantages? Is it the fact that a JOIN is executed before the where clause?
Thanks,
Asaf

A JOIN condition can contain any boolean comparison, even subqueries using EXISTS and correlated subqueries. There is no limitation on what can be expressed.
Just a note, however. = and AND are good for performance. Inequalities tend to be performance killers.
As for your particular problem, I think the following is a more direct interpretation of the question:
SELECT c.CustomerID
FROM Customers c
WHERE NOT EXISTS (SELECT 1
FROM Orders o
WHERE o.CustomerID = c.CustomerID AND
o.EmployeeID = 4
);
That is, get all customers for whom no order exists with employee 4.

Generally I'd recommend that you always choose the most readable version of the query unless you can actually measure a performance difference with realistic data. The cost based optimiser should pick a good way of executing the query to return the results you want in this case.
For me the JOIN is a lot more readable than the CTE.

Here's is another Solution
SELECT * FROM(
(SELECT Customers.CustomerID AS Customers_ID
FROM Customers) AS P
LEFT JOIN
(Select Orders.CustomerID from Orders
where Orders.EmployeeID=4) as R
on R.CustomerID = P.Customers_ID
)
WHERE R.CustomerID IS NULL
ORDER BY R.CustomerID DESC

Related

Find customer who bought least on W3schools SQL

I'm new to SQL Server and I'm trying to do some exercises. I want to find customers who bought least on W3schools database. My solution for this case is:
Join Customers with OrderDetails via CustomerID
Select CustomerNames that have least OrderID appeared after using JOIN.
Here is my query:
SELECT COUNT(OrderID), CustomerID
FROM Orders
GROUP BY CustomerID
ORDER BY COUNT(CustomerID) ASC
HAVING COUNT(OrderID) = '1'
When I ran this query, message says "Syntax error near "Having". What happened with my query?
Please help me to figure out.
My solution for this case is:
Join Customers with OrderDetails via CustomerID
Select CustomerNames that have least OrderID appeared after using JOIN.
As #thorsten-kettner lamented:
You say in your explanation that you join and then show the customer
name. Your query does neither of the two things...
Furthermore, your question has severe grammatical errors making it hard to decipher.
I want to find customers who bought least on W3schools database.
Nonetheless,
The Try-SQL Editor at w3schools.com
To get the list of customers who have at least 1 order:
SELECT C.CustomerName FROM [Customers] AS C
JOIN [Orders] AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
ORDER BY C.CustomerName
To get the list of customers who have exactly 1 order:
SELECT C.CustomerName FROM [Customers] AS C
JOIN [Orders] AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
HAVING COUNT(O.OrderID) = 1
ORDER BY C.CustomerName
To get the customer who made the least number of orders:
Including the ones who made no order. Use JOIN instead of LEFT JOIN if you only want to consider the ones who made at least one order.
You can remove LIMIT 1 to get the whole list sorted by the number of orders placed.
SELECT C.CustomerName, COUNT(O.OrderID) FROM [Customers] AS C
LEFT JOIN [Orders] AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
ORDER BY COUNT(O.OrderID), C.CustomerName
LIMIT 1;
Addendum
As commented by #sticky-bit ,
The ORDER BY clause has to come after the HAVING clause.
You want a TOP 1 WITH TIES query, something like this:
SELECT TOP 1 WITH TIES CustomerID
FROM Orders
GROUP BY CustomerID
ORDER BY COUNT(OrderID);
In case you are using MySQL, try the following version:
SELECT CustomerID
FROM Orders
GROUP BY CustomerID
HAVING COUNT(OrderID) = (
SELECT COUNT(OrderID)
FROM ORDERS
GROUP BY CustomerID
ORDER BY COUNT(OrderID)
LIMIT 1
);

How to find the date of the last item sold sql server

Hi Guys I am having a bit of trouble writing the most efficient and optimized query for this question:
Find the order ID and date of the last discontinued item sold.
I have my code below as well as the metadata for the tables. I am not sure if my code will produce the correct output because I have no way of testing it and I am not sure if my code will be the best way to complete this query. Any advice would help.
Select
orders.orderid,
Max(orders.orderdate)
from orders
inner join order_details on orders.orderid = order_details.orderid
inner join products on order_details.productid = products.productid
where discontinued = 1
group by orders.orderid ```
Using row_number is the easiest way:
select orderID,OrderDate from (
select o.orderID,o.OrderDate,rn = row_number() over (order by orderdate desc)
from products p
join orderDetails od on od.productID=p.productID
join orders o on o.orderID=od.orderID
where p.discontinued = 1) sub
where sub.rn = 1
your query is pretty much already what you want, the fastest way is to simply order by the required column and select top 1
select top (1) o.orderid, o.orderdate
from orders o
join order_details od on od.orderid = o.orderid
join products p on p.productid = od.productid
where p.discontinued = 1
order by o.orderdate desc
This will be more performant than using a window function to number all the rows before selecting row one.
I have a very similar arrangement of tables with the ubiquitous orders/orderitems/products arrangement including a similar deleted flag for products so it's easy to test both side by side, this query is a bit more performant than using the row_number equivalent, using a table of 13.5m orders and 6m products. Execution times for both were sub-second but this query was slightly faster.

Not in aggregate function or group by clause: org.hsqldb.Expression#59bcb2b6 in statement

I'm trying to group SUM(OrderDetails.Quantity) but keep getting the error Not in aggregate function or group by clause: org.hsqldb.Expression#59bcb2b6 in statement but since I already have an GROUP BY part I don't know what I'm missing
SQL Statement:
SELECT OrderDetails.CustomerID, Customers.CompanyName, Customers.ContactName, SUM(OrderDetails.Quantity)
FROM OrderDetails INNER JOIN Customers ON OrderDetails.CustomerID = Customers.CustomerID
WHERE OrderDetails.CustomerID = Customers.CustomerID
GROUP BY OrderDetails.CustomerID
ORDER BY OrderDetails.CustomerID ASC
I'm trying to create a table that shows customers and the amount of products they ordered, while also showing their CompanyName and ContactName.
Write this:
GROUP BY OrderDetails.CustomerID, Customers.CompanyName, Customers.ContactName
Unlike in MySQL, PostgreSQL, and standard SQL, in most other SQL dialects, it is not sufficient to group only by the primary key if you also want to project functionally dependent columns in the SELECT clause, or elsewhere. You have to explicitly GROUP BY all of the columns that you want to project.
Don't take the customer id from the orders table. Take it from the customers table. If you do so, this might work in your database:
SELECT c.CustomerID, c.CompanyName, c.ContactName, SUM(od.Quantity)
FROM OrderDetails od INNER JOIN
Customers c
ON od.CustomerID = c.CustomerID
GROUP BY c.CustomerID
ORDER BY c.CustomerID ASC;
Note that the WHERE clause does not need to repeat the conditions in the ON clause.
Your version won't work in standard SQL because od.CustomerId is not unique in OrderDetails. Many databases don't support this, so in these you need the additional columns:
SELECT c.CustomerID, c.CompanyName, c.ContactName, SUM(od.Quantity)
FROM OrderDetails od INNER JOIN
Customers c
ON od.CustomerID = c.CustomerID
GROUP BY c.CustomerID, c.CompanyName, c.ContactName
ORDER BY c.CustomerID ASC;
Even so, it is much, much better to take all columns from the same table. That would allow the SQL optimizer to use indexes on Customers.

SQL - How can you use WHERE instead of LEFT/RIGHT JOIN?

since I am a bit rusty, I was practicing SQL on this link and was trying to replace the LEFT JOIN completly with WHERE. How can i do this so it does the same thing as the premade function in the website?
What I tried so far is:
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID OR Customers.CustomerID != Orders.CustomerID
Order by Customers.CustomerName;
Thanks in advance for your help.
You are trying to replace
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
with
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers, Orders
WHERE ???
this is doomed to failure. Consider Customers has two rows and Orders has zero. The outer join will return two rows.
The cross join (FROM Customers, Orders) will return zero rows.
In standard SQL a WHERE clause can only reduce the rows from that - not increase them so there is nothing you can put for ??? that will give your desired results.
Before ANSI-92 joins were introduced some systems used to have proprietary operators for this, such as *= in SQL Server but this was removed from the product.
This may work for you.
SELECT
c.CustomerName,
o.OrderID
FROM Customers c
LEFT JOIN Orders o
on c.CustomerID = o.CustomerID
Order by c.CustomerName;
If you are trying to replace this:
SELECT c.CustomerName, o.OrderID
FROM Customers c LEFT JOIN
Orders o
ON c.CustomerID = o.CustomerID
ORDER BY c.CustomerName;
Then you can use UNION ALL:
SELECT c.CustomerName, o.OrderID
FROM Customers c JOIN
Orders o
ON c.CustomerID = o.CustomerID
UNION ALL
SELECT c.CustomerName, o.OrderID
FROM Customers c
WHERE NOT EXIST (SELECT 1 FROM Orders o WHERE c.CustomerID = o.CustomerID)
ORDER BY CustomerName
However, the LEFT JOIN is really a much better way to go.

SQL query with subquery and without subquery comparison

I'd like someone who can explain me the logic difference between these two queries. Maybe you can explain performance difference also. (DB is Microsoft Northwind).
-- Join
select distinct c.CustomerID, c.CompanyName, c.ContactName from orders as o inner join customers as c
on o.CustomerID = c.CustomerID
-- SubQuery
select customerid, companyname, contactname, country from customers
where customerid in (select distinct customerid from orders)
Thanks in advance.
The first generates an intermediate result set with all orders for all customers. It then reduces them using select distinct.
The second just selects the customers without having to reduce them later. It should be much more efficient. However, the select distinct is not needed in the subquery (it is done automatically with in).
I would write the logic as:
select c.customerid, c.companyname, c.contactname, c.country
from customers c
where exists (select 1
from orders o
where o.customerid = c.customerid
);
This can readily make use of an index on orders(customerid).