Query returns cartesian product when not expected - sql

Task: Select all orders having products belonging to ‘Sea Food’ category.
Result: OrderNo, OrderDate, Product Name
I write this query but it returns Cartesian products.
select o.orderid, o.orderdate as "Order Date", p.productname , ct.categoryname from orders o,
order_details od , products p , customers c ,categories ct
where
od.orderid = o.orderid and p.productid = od.productid and ct.categoryid = p.categoryid
and ct.categoryname = 'Seafood';
Question: What is wrong with my query ?

You're doing a CROSS JOIN on customers table since you forgot to specify the connection. This is why you should use explicit JOIN syntax rather than old syntax using commas in WHERE clause.
After translating your query into explicit syntax, you will see that there is no WHERE condition involving customers table:
select
o.orderid,
o.orderdate as "Order Date",
p.productname,
ct.categoryname
from
orders o,
inner join order_details od on od.orderid = o.orderid
inner join products p on p.productid = od.productid
inner join categories ct on ct.categoryid = p.categoryid
cross join customers c -- either you don't need this table, or you need to specify conditions
where
ct.categoryname = 'Seafood'
Basically the reason you got it was that your where clause omitted join condition involving customers table, so you were left with:
from (...), customers -- cross join when joining condition not applied in where clause

Related

Find Top 5 Customers for Beverages based on their total purchase value SQL

Here is the link to the Data Set.
https://www.w3schools.com/sql/trysql.asp?filename=trysql_asc
I have been trying to solve this but couldn't find a way to get the total purchase value while grouping with the customer table
I would recommend using a Common Table Expression (CTE) as, in my experience, it helps with scalability/maintenance down the road and easily enables you to see what the data is under the hood if you wanted to simply run the CTE itself.
I join the Customer to the Order to get the OrderID
I join the Order to OrderDetails to get the ProductID and Order Quantity
I join the OrderDetails to Products to get the Price
I join the Categories to filter for just Beverages
All this is wrapped as a CTE (similar to a subquery), on top of which I can now aggregate at the Customer level and sequence by Order Value in a descending fashion.
with beverage_orders_cte as(
SELECT c.CustomerName, o.OrderID
, od.OrderDetailID, od.ProductID, od.Quantity
, p.ProductName, p.Price
, od.Quantity * p.Price as OrderVal
,cat.CategoryName FROM Customers c
inner join Orders o
on c.CustomerID = o.CustomerID
inner join OrderDetails od
on o.OrderID = od.OrderID
inner join Products p
on od.ProductID = p.ProductID
inner join Categories cat
on p.CategoryID = cat.CategoryID and cat.CategoryID = 1
)
select CustomerName, SUM(OrderVal) as Revenue
From beverage_orders_cte
Group by CustomerName
Order by Revenue desc
Limit 5
Hope this helps, good luck.
Something like that?
SELECT c.customerid,
Sum(p.price)
FROM customers AS c
INNER JOIN orders AS o
ON o.customerid = c.customerid
INNER JOIN orderdetails AS od
ON od.orderid = o.orderid
INNER JOIN products AS p
ON p.productid = od.productid
GROUP BY c.customerid
ORDER BY Sum(p.price) DESC
LIMIT 5
Just following on from your quantity comment...
SELECT c.customerid,
Sum(p.price),
Sum(p.price * od.quantity)
FROM customers AS c
INNER JOIN orders AS o
ON o.customerid = c.customerid
INNER JOIN orderdetails AS od
ON od.orderid = o.orderid
INNER JOIN products AS p
ON p.productid = od.productid
GROUP BY c.customerid
ORDER BY Sum(p.price) DESC
LIMIT 5
I think this is the best optimized code.
Please try with this.
SELECT CustomerID, Count(Quantity * Price) AS Total
FROM Orders, OrderDetails, Products
Where Orders.OrderID = OrderDetails.OrderID AND Products.ProductID = OrderDetails.ProductID
Group by CustomerID
ORDER BY Total DESC
LIMIT 5

How to Organize Multiple Joins SQL

In SQL, how should I be joining tables together when I do multiple joins in one query. Should I join on only one table - in this case the Customers table or is it okay to do what I have done (joining on different tables as new keys are needed)?
SELECT O.OrderID, O.OrderDate, C.City, C.Country, C.PostalCode, C.ContactName, O.CustomerID, O.ShipperID, D.ProductID, COUNT(D.ProductID) ProductCount, S.SupplierID
FROM Customers C
INNER JOIN Orders O
ON O.CustomerID = C.CustomerID
INNER JOIN OrderDetails D
ON O.OrderID = D.OrderID
INNER JOIN Products P
ON D.ProductID = P.ProductID
INNER JOIN Suppliers S
ON S.SupplierID = P.SupplierID
WHERE 1 = 1
GROUP BY O.OrderID
ORDER BY OrderDate DESC
I am using W3Schools SQL TryIt editor to test this, not sure what DB engine it is!
Thanks!
Of course you can join on multiple tables in a query. That is a big part of the power of SQL.
In your particular case, you don't need the join to the Suppliers table, because the column is already in Products.
Also, you need to be careful about your SELECT and GROUP BY clauses. In general, you should put all non-aggregated columns in the GROUP BY:
SELECT O.OrderID, O.OrderDate, C.City, C.Country, C.PostalCode, C.ContactName,
O.CustomerID, O.ShipperID, D.ProductID,
COUNT(D.ProductID) as ProductCount,
P.SupplierID
FROM Customers C INNER JOIN
Orders O
ON O.CustomerID = C.CustomerID INNER JOIN
OrderDetails D
ON O.OrderID = D.OrderID INNER JOIN
Products P
ON D.ProductID = P.ProductID
GROUP BY O.OrderID, O.OrderDate, C.City, C.Country, C.PostalCode, C.ContactName,
O.CustomerID, O.ShipperID, D.ProductID, P.SupplierId
ORDER BY OrderDate DESC;
The WHERE 1=1 is also unnecessary.
I wonder if this query really does what you want. However, you don't state what you actually want the query to do, so I'm merely speculating.
The way you have done it is find, don't forget that for each inner join, your record set may reduce by the number of non matching keys in each additional join.
you could also just use the JOIN syntax.

Left Join in Oracle SQL

I was going through an example of LEFT JOIN on w3schools.com.
http://www.w3schools.com/sql/sql_join_left.asp
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
The above query will return me all customers with No Orders as NULL Order ID+ All customers having Orders with their Order Ids
How should I modify this query so that it returns All Customers with No Orders + All Customers having Orders with Order date as '1996-09-18'
Thanks in advance.
If you want customers with no orders and those with a specific order date, then you want a WHERE clause:
SELECT c.CustomerName, o.OrderID
FROM Customers c LEFT JOIN
Orders o
ON c.CustomerID = o.CustomerID
WHERE (o.CustomerID is NULL) OR (o.OrderDate = DATE '1996-09-18)
ORDER BY c.CustomerName;
If you wanted all customers with their order on that date (if they have one), then you would move the condition to the ON clause:
SELECT c.CustomerName, o.OrderID
FROM Customers c LEFT JOIN
Orders o
ON c.CustomerID = o.CustomerID AND o.OrderDate = DATE '1996-09-18
ORDER BY c.CustomerName;
Note the difference: the first filters the customers. The second only affects what order gets shown (and NULL will often be shown).

SQL query with w3schools db

I should have asked multiple questions in my other post. Thanks to all who have helped, I am now stuck on another one..
Using the w3schools db, List SupplierID, SupplierName and ItemSupplied (count of number of items supplied by a supplier), sort the list first by number of items supplied (descending) and then by supplier name (ascending)
SELECT supplierid,
suppliername,
p.productname,
Count(s.supplierid) AS itemssupplied
FROM [Suppliers] AS s
INNER JOIN [Products] AS p
ON p.supplierid = s.supplierid
GROUP BY p.productid,
p.productname
ORDER BY Count (p.productid, p.productname) DESC
order BY s.suppliername
It's giving me an error, then again I am ordering by multiple ones. I think there's something I am not quite understanding here.
My other question is
List customers for each category and the total of order placed by that customer in a given category. In the query show three columnm: CategoryName, CustomerName, and TotalOrders (which is price * quantity for orders for a given customer in a given category). Sort this data in descending order by TotalOrders.
SELECT cg.CategoryName,
c.CustomerName,
Sum(p.Price * od.Quantity) AS TotalOrders
FROM [products] AS p
INNER JOIN [orderdetails] AS od
ON od.ProductID = p.ProductID
INNER JOIN [orders] AS o
ON o.OrderID = od.OrderID
INNER JOIN [customers] AS c
ON c.customerID = o.CustomerID
INNER JOIN [categories] AS cg
ON cg.CategoryID = p.CategoryID
GROUP BY c.CustomerName
ORDER BY TotalOrders DESC
Can someone please check if my query is correct? Thank you once again!
Question 1
You are really close but you only need to state ORDER BY once (also make sure to include all shown fields in your GROUP BY unless you are aggregating them):
SELECT SupplierID, SupplierName, p.ProductName, count(s.SupplierID) AS ItemsSupplied
FROM [Suppliers] AS s
INNER JOIN [Products] AS p ON p.SupplierID = s.SupplierID
GROUP BY p.ProductID, p.ProductName, SupplierID, SupplierName -- Added SupplierID, SupplierName
ORDER BY COUNT (p.productID, p.ProductName) DESC, s.SupplierName
Notice that you just place multiple sorts on the same line with a comma separating them.
Question 2
You're almost there but you need to group by any field that is not being aggregated. So in order not to get a parsing error, I added the cg.CategoryName to the GROUP BY line.
SELECT cg.CategoryName, c.CustomerName, Sum(p.Price*od.Quantity) AS TotalOrders
FROM [Products] AS p
INNER JOIN [OrderDetails] AS od ON od.ProductID = p.ProductID
INNER JOIN [Orders] AS o ON o.OrderID = od.OrderID
INNER JOIN [Customers] AS c ON c.customerID = o.CustomerID
INNER JOIN [Categories] AS cg ON cg.CategoryID = p.CategoryID
GROUP BY c.CustomerName, cg.CategoryName --Added CategoryName
ORDER BY TotalOrders DESC
You have several problems with the first query:
You're grouping by ProductID and ProductName even though you want the number of items supplied by a supplier, which means that you want to group by SupplierID and SupplierName.
You're supplying too many arguments to the COUNT function, which takes a single column name or *.
You've included a ProductName column in your results, which is not called for.
You need to ORDER BY both the number of products supplied and the SupplierName.
With those points in mind:
SELECT
s.SupplierID,
s.SupplierName,
COUNT(p.ProductID) AS ItemsSupplied
FROM
[Suppliers] AS s
INNER JOIN [Products] AS p ON p.SupplierID = s.SupplierID
GROUP BY
s.SupplierID, s.SupplierName
ORDER BY
ItemsSupplied DESC,
s.SupplierName ASC
Your second query is quite close, you're just missing one point, which is that you're looking for total of order placed by that customer in a given category. This means that in addition to grouping by c.CustomerName, you need to group by cg.CategoryID:
SELECT
cg.CategoryName,
c.CustomerName,
SUM(p.Price*od.Quantity) AS TotalOrders
FROM
[Products] AS p
INNER JOIN [OrderDetails] AS od ON od.ProductID = p.ProductID
INNER JOIN [Orders] AS o ON o.OrderID = od.OrderID
INNER JOIN [Customers] AS c ON c.customerID = o.CustomerID
INNER JOIN [Categories] AS cg ON cg.CategoryID = p.CategoryID
GROUP BY
c.CustomerName, cg.CategoryID
ORDER BY
TotalOrders DESC
The first one has two order by clauses
ORDER BY COUNT (p.productID, p.ProductName) DESC
and
ORDER BY s.SupplierName
also some databases will complain when order by columns for queries using group by are not included in the selected columns

Conditionally joining tables

I want to create conditional join to a table in my T-SQL query. Tables used in this example are from Northwind database (with only one additional table ProductCategories)
Table Products and table Categories have many-to-many relationship, hence table ProductCategories comes into picture.
I need the sum of Quantity column on table OrderDetails for each of the products falling under certain category. So I have a query like one below
Select p.ProductName, Sum(od.Quantity) As Qty
From Products p
Join OrderDetails od On od.ProductID = p.ProductID
Join ProductCategories pc On pc.ProductID = p.ProductID
And pc.CategoryID = #CategoryID
Group By p.ProductName
#CategoryID is an optional parameter. So in case it's not supplied, the join to table ProductCategories will not be required
and query should look like one below
Select p.ProductName, Sum(od.Quantity) As Qty
From Products p
Join OrderDetails od On od.ProductID = p.ProductID
Group By p.ProductName
I want to achieve this without repeating the whole query with If conditions (as below)
If #CategoryID Is Null
Select p.ProductName, Sum(od.Quantity) As Qty
From Products p
Join OrderDetails od On od.ProductID = p.ProductID
Group By p.ProductName
Else
Select p.ProductName, Sum(od.Quantity) As Qty
From Products p
Join OrderDetails od On od.ProductID = p.ProductID
Join ProductCategories pc On pc.ProductID = p.ProductID
And pc.CategoryID = #CategoryID
Group By p.ProductName
This is simplified version of the query which has many other tables and conditions like ProductCategories. And will require multiple multiple If conditions and repetition of the query. I have also tried dynamically generating the query. It works but query is not readable at all.
Any solutions?
Thank you.
Try this, if you'll use properly parametrized query - there will be no performance impact but may be gain:
Select p.ProductName, Sum(od.Quantity) As Qty
From Products p
Join OrderDetails od On od.ProductID = p.ProductID
WHERE #CategoryID IS NULL OR EXISTS (SELECT * FROM ProductCategories WHERE CategoryID = #CategoryID AND ProductID = p.ProductID)
Group By p.ProductName
Actually
in your query if there can be multiple rows in ProductCategories for one row in OrderDetails - then you get duplicates of od.Quantity in your SUM - is it an intended behavior?
I believe you can left join the tables vs the implicit inner join you are doing.
In an inner join it matches the key on the source table with each instance of the key on the destination table.
Each instance of a match on the destination table generates a set of rows displaying the
match types.
With an outer join it will display the source row EVEN IF there is no matching row in the other table. If there is it will do essentially the same as an inner join and you'll get a row back for each instance of match. So you're getting the data you need when it's available and not getting what data is unavailable.
Take this as an example
select * from Products
Left join ProductAndCategory on ProductAndCategory.ProductID = Products.ProductID
left join Categories on Categories.CategoryID = ProductAndCategory.CategoryID
Where I have a simple Product table with a ProductID and a ProductName a ProductAndCategory table with a ProductID and a CategoryID and a Categories table with a CategoryID and a CategoryName
It will show the rows that have categories with the joined categories and the rows that don't have categories will just show the one row with null for the values that don't exist.
I'm not very familiar with T-SQL so I don't know if this will cut it but in mysql you could do something like
Select p.ProductName, Sum(od.Quantity) As Qty
From Products p
Join OrderDetails od On od.ProductID = p.ProductID
Join ProductCategories pc On pc.ProductID = p.ProductID
And pc.CategoryID = if(#CategoryID is null , pc.CategoryID, #CategoryId)
Group By p.ProductName
Yes and if() still remains but just "one query", if that's what you're looking for.
At the same time, you say you were able to dynamically generate a single query. If so is the readability that much of an issue? If your generated query was more performant over what I suggested above I'd go with that. It's generated; you won' be manually tweaking/reading the result.