Cartesian products and selects in the from clause - sql

I need to use a select in the from clause but I keep getting an Cartesian product.
select
customer.customer_name
,orders.order_date
,order_line.num_ordered
,order_line.quoted_price
,part.descript
,amt_billed
from (select order_line.num_ordered*part.price as amt_billed
from order_line
join part
on order_line.part_num = part.part_num
) billed
,customer
join orders
on customer.customer_num = orders.customer_num
join order_line
on orders.order_num = order_line.order_num
join part
on order_line.part_num = part.part_num;
Don't bother looking at the rest too hard. I already know that if I remove both the subselect in the from clause and amt_billed in the select clause I don't get the Cartesian product. What am I doing wrong that's causing the Cartesian product?

The reason for Cartesian product is, you didn't join the sub-select with orders or Part table.
First of all you don't need that sub-select
SELECT customer.customer_name,
orders.order_date,
order_line.num_ordered,
order_line.quoted_price,
part.descript,
order_line.num_ordered * part.price AS amt_billed
FROM customer
JOIN orders
ON customer.customer_num = orders.customer_num
JOIN order_line
ON orders.order_num = order_line.order_num
JOIN part
ON order_line.part_num = part.part_num;

Related

Row is listed twice in simple query

One row is Repeated twice and I can't seem to figure out why. I tried using Group by but couldn't figure that out either lol. Using Left outer Join, to list suppliers who have a discounted product, in the Northwind Database
Select *
From Suppliers s
Left Outer Join products p
On s.SupplierID = p.SupplierID
Where p.Discontinued = 1
You have two discontinued products for a supplier, and a row is created for each row in products that matches the join condition and the Discontinued = 1 predicate. You want something like that:
SELECT * FROM Suppliers s
WHERE EXISTS (SELECT 1
FROM Products p
WHERE p.SupplierID = s.SupplierID
AND p.Discontinued = 1)
You have two rows in one of those tables. You can determine which by querying both tables by themselves on that supplierId.
Now, to your query, by putting p.discontinued in the where, that join effectively becomes an inner join, so you should either flip it to an inner join or move that condition to the join.
To get suppliers with discontinued products, you can do this:
Select * from supplier where supplierId in (
select supplierId from products
where discontinued =1)
There is clearly a supplier that has multiple discontinued products.
If you want suppliers with at least one discounted product, then use exists:
select s.*
from suppliers s
where exists (select 1
from products p
where p.supplierid = s.supplierid and
p.Discontinued = 1
);
If you want the list of suppliers with the number of discontinued products, use join:
select s.*, p.num_discontinued
from supplier s join
(select p.supplierid, count(*) as num_discontinued
from products
where p.Discontinued = 1
group by p.supplierid
) p
on p.SupplierID = s.SupplierID ;
If you want the list of products that are discontinued with their suppliers, than use your query but change the left join to an inner join. An outer join is not necessary.

SQL Oracle (using AND clause)

when I use below code, I get the data of customers who ordered "Planned" or 'obsolete' products, but I want to get data of the customers who ordered both type, changing 'or' to 'and' does not work... please help
SELECT DISTINCT customers.CUST_EMAIL
,ORDERS.ORDER_ID
,PRODUCT_INFORMATION.PRODUCT_NAME
,PRODUCT_INFORMATION.PRODUCT_STATUS
FROM PRODUCT_INFORMATION
INNER JOIN ORDER_ITEMS ON PRODUCT_INFORMATION.PRODUCT_ID = ORDER_ITEMS.PRODUCT_ID
INNER JOIN ORDERS ON ORDER_ITEMS.ORDER_ID = ORDERS.ORDER_ID
INNER JOIN CUSTOMERS ON CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID
WHERE PRODUCT_INFORMATION.PRODUCT_STATUS = 'planned'
OR PRODUCT_INFORMATION.PRODUCT_STATUS = 'obsolete'
ORDER BY CUSTOMERS.CUST_EMAIL;
I'm guessing that you want the following. If you want to get correct answer, rather than guesses, please provide a good representative set of sample data and your expected result based on that sample data.
First part of the query returns Customers that ordered planned products, second part of the query returns Customers that ordered obsolete products. INTERSECT operator returns only those that have ordered both planned and obsolete products.
You don't need explicit DISTINCT any more, because INTERSECT would do it anyway.
I've removed PRODUCT_INFORMATION.PRODUCT_STATUS from the list of returned columns, because with it the result set would be always empty.
I removed ORDERS.ORDER_ID and PRODUCT_INFORMATION.PRODUCT_NAME from result as well. I don't know what should be the correct query, but it is likely that INTERSECT should be done just on CUSTOMER_ID and then, once you get the list of IDs, you can join other tables to it fetching other related details if needed.
The performance of this method is beyond the scope of the question.
SELECT
CUSTOMERS.CUSTOMER_ID
,customers.CUST_EMAIL
FROM
PRODUCT_INFORMATION
INNER JOIN ORDER_ITEMS ON PRODUCT_INFORMATION.PRODUCT_ID = ORDER_ITEMS.PRODUCT_ID
INNER JOIN ORDERS ON ORDER_ITEMS.ORDER_ID = ORDERS.ORDER_ID
INNER JOIN CUSTOMERS ON CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID
WHERE PRODUCT_INFORMATION.PRODUCT_STATUS = 'planned'
INTERSECT
SELECT
CUSTOMERS.CUSTOMER_ID
,customers.CUST_EMAIL
FROM
PRODUCT_INFORMATION
INNER JOIN ORDER_ITEMS ON PRODUCT_INFORMATION.PRODUCT_ID = ORDER_ITEMS.PRODUCT_ID
INNER JOIN ORDERS ON ORDER_ITEMS.ORDER_ID = ORDERS.ORDER_ID
INNER JOIN CUSTOMERS ON CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID
WHERE PRODUCT_INFORMATION.PRODUCT_STATUS = 'obsolete'
ORDER BY CUST_EMAIL
without the script for you tables it's difficult to build a test case and a working query; i'll try with this step:
select order_id from (
SELECT customers.CUSTOMER_ID
,sum(decode(PRODUCT_INFORMATION.PRODUCT_STATUS, 'obsolete', 1, 0)) obsolete
,sum(decode(PRODUCT_INFORMATION.PRODUCT_STATUS, 'planned', 1, 0)) planned
FROM PRODUCT_INFORMATION
INNER JOIN ORDER_ITEMS ON PRODUCT_INFORMATION.PRODUCT_ID = ORDER_ITEMS.PRODUCT_ID
INNER JOIN ORDERS ON ORDER_ITEMS.ORDER_ID = ORDERS.ORDER_ID
INNER JOIN CUSTOMERS ON CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID
WHERE PRODUCT_INFORMATION.PRODUCT_STATUS = 'planned'
OR PRODUCT_INFORMATION.PRODUCT_STATUS = 'obsolete'
group by customers.CUSTOMER_ID)
where obsolete>1 and planned>1
This query should return all the customer id that have items in orders with both the product status (the different product status may be in different orders), if you want to retrieve orders that have products with both status you must change the query removing customer.customer_id and adding orders.order_id. If you provide some script with sample data we can provide a better answer

Trying to understand NULL operator in Query

I'm looking to see a breakdown of the total dollar business that each vendor has done (indirectly via the distributor) with each customer, where I'm trying not to use the Inner Join Syntax. I basically don't understand the difference between the two outputs produced by the two queries shown below:
Query1
select customers.cust_id, vendors.vend_id, sum(OrderItems.item_price*OrderItems.quantity) as total_business from
(((Vendors left outer join products
on vendors.vend_id = products.prod_id)
left outer join OrderItems
on products.prod_id = OrderItems.prod_id)
left outer join Orders
on OrderItems.order_num = Orders.order_num)
left outer join Customers
on Orders.cust_id = Customers.cust_id
group by Customers.cust_id, vendors.vend_id
order by total_business
I get the following output:
Query2
select customers.cust_id, Vendors.vend_id, sum(quantity*item_price) as total_business from
(((Vendors left outer join Products
on Products.vend_id = Vendors.vend_id)
left outer join OrderItems --No inner joins allowed
on OrderItems.prod_id = Products.prod_id)
left outer join Orders
on Orders.order_num = OrderItems.order_num)
left outer join Customers
on Customers.cust_id = Orders.cust_id
where Customers.cust_id is not null -- THE ONLY DIFFERENCE BETWEEN QUERY1 AND QUERY2
group by Customers.cust_id, Vendors.vend_id
order by total_business
I don't understand how there are only NULL cust_id's associated with the 1st Output when in the 2nd Output we get some non-NULL cust_ids. Why doesn't the 1st Output include these non-NULL cust_id's
Thank You
Query One is joining Vendors and Products incorrectly:
on vendors.vend_id = products.prod_id -- Vend_ID = Prod_ID
Query Two is joining Vendors and Products correctly:
on Products.vend_id = Vendors.vend_id -- Vend_ID = Vend_ID
Once that is fixed, you'll get the same IDs in both queries. Then I suggest you read Dan's answer to understand why what you were trying to do in eliminating INNER JOIN from the query is cancelled out by adding a WHERE filter to a column from the last table in the chain.
When you left join to a table, then filter on that table in the where clause, the join effectively changes to an inner join. The workaround is to apply the filter as a joining condition.
In your second query, all you have to do is is change the word "where" to "and".

Understand Sub-Queries

I was initially looking to see a breakdown of the total dollar business that each vendor has done (indirectly via the distributor) with each customer, where I'm trying not to use the Inner Join Syntax and used the Query below for this purpose:
select customers.cust_id, Vendors.vend_id, sum(quantity*item_price) as total_business from
(((Vendors left outer join Products
on Products.vend_id = Vendors.vend_id)
left outer join OrderItems --No inner joins allowed
on OrderItems.prod_id = Products.prod_id)
left outer join Orders
on Orders.order_num = OrderItems.order_num)
left outer join Customers
on Customers.cust_id = Orders.cust_id
where Customers.cust_id is not null -- THE ONLY DIFFERENCE BETWEEN QUERY1 AND QUERY2
group by Customers.cust_id, Vendors.vend_id
order by total_business
Now, I am trying to see the query output results for all vendor-customer combinations, including those combinations where there was no business transacted and am trying to write this via a single SQL Query. My teacher provided this solution, but I honestly cannot understand the logic at all, as I've never come across Sub-queries.
select
customers.cust_id,
Vendors.vend_id,
sum(OrderItems.quantity*orderitems.item_price)
from
(
customers
inner join
Vendors on 1 = 1
)
left outer join --synthetic product using joins
(
orders
join
orderitems on orders.order_num = OrderItems.order_num
join
Products on orderitems.prod_id = products.prod_id
) on
Vendors.vend_id = Products.vend_id and
customers.cust_id = orders.cust_id
group by customers.cust_id, vendors.vend_id
order by customers.cust_id
Thanks a lot
I would write this query as:
select c.cust_id, v.vend_id, coalesce(cv.total, 0)
fro Customers c cross join
Vendors v left outer join
(select o.cust_id, v.vend_id, sum(oi.quantity * oi.item_price) as total
from orders o join
orderitems oi
on o.order_num = oi.order_num join
Products p
on oi.prod_id = p.prod_id
group by o.cust_id, v.vend_id
) cv
on cv.vend_id = v.vend_id and
cv.cust_id = c.cust_id
order by c.cust_id;
The structure is quite similar. Both version start by creating a cross product between all customers and vendors. This creates all the rows in the output result set. Next, the aggregation needs to be calculated at this level. In the above query, this is done explicitly as a subquery which aggregates the values to the customer/vendor level. (In the original query, this is done in the outer query.)
The final step is joining these together.
Your teacher should be encouraging you to use table aliases, particularly table abbreviations. You should also be encouraged to use the proper join. So, although you can express a cross join as an inner join with on 1=1, a cross join is part of the SQL language, not a hack.
Similarly, parentheses in the from clause can make the logic harder to follow. Explicit subqueries are more easily read.

SQL Query Showing 4x Records

The following statement works properly but shows each record 4 times. Repeated; I know the relationship is wrong but no idea how to fix it? Apologies if this is simple and i've missed it.
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name, Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers, Plants, Orders, Staff
INNER JOIN Orders AS t2 ON t2.Order_ID = Staff.Order_ID
WHERE Orders.Order_Date
BETWEEN '2011/01/01'
AND '2013/03/01'
You are generating a Cartesian product between the tables since you have not provided join syntax between any of the tables:
SELECT c.First_Name, c.Last_Name,
p.Common_Name, p.Flower_Colour, p.Flowering_Season,
s.First_Name, s.Last_Name
FROM Customers c
INNER JOIN Orders o
on c.customerId = o.customer_id
INNER JOIN Plants p
on o.plant_id = p.plant_id
INNER JOIN Staff s
ON o.Order_ID = s.Order_ID
WHERE o.Order_Date BETWEEN '2011/01/01' AND '2013/03/01'
Note: I am guessing on column names for the joins
Here is a great visual explanation of joins that can help in learning the correct syntax
In the FROM... clause you are doing a cross join - combining every customer with every plant with every order with every staff.
You should only mention one table in the FROM clause and then connect the other ones with INNER JOINS to only get related records.
I don't know exactly how your database looks like, but something like this:
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name,
Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers
INNER JOIN Orders ON Orders.Customer_ID = Customers.Customer_ID
INNER JOIN Staff ON Staff.Staff_ID = Orders.Staff_ID
INNER JOIN Plants ON Plants.Plants_ID = Orders.Plants_ID
WHERE Orders.Order_Date
BETWEEN '2011/01/01'
AND '2013/03/01'
This is because you are selecting from four tables without any joins between them, and also because you are joining Orders twice. As the result, a Cartesian product is made.
Here is how you should fix it: re-write the theta join using the ANSI syntax, and provide proper join conditions:
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name, Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers
JOIN Plants ON ...
JOIN Orders ON ...
JOIN Staff ON ...
INNER JOIN Orders AS t2 ON t2.Order_ID = Staff.Order_ID
WHERE Orders.Order_Date BETWEEN '2011/01/01' AND '2013/03/01'
Replace ... with proper join conditions; this should make the results look as expected.