SQL Query on SQL Server 2008 - sql

I'm trying to get only customers that ordered both a "Gas Range" and a "Washer". I'm getting Customers who ordered a "Gas Range" and not a "Washer" and customers with both. I need the customer that meets both conditions. I'm close but a little stuck. Below is the query that I have so far. Please let me know if you need more information.
My Tables - CUSTOMER(CUST_NUM, CUST_NAME), ORDER_LINE(ORDER_NUM, PART_NUM), ORDERS(ORDER_NUM, CUST_NUM), PART(PART_NUM, PART_DESCRIPTION)
SELECT C.CUST_NAME AS [Customer(s) that ordered a Gas Range and Washer]
FROM CUSTOMER C
INNER JOIN ORDERS O
ON C.CUST_NUM = O.CUST_NUM
INNER JOIN ORDER_LINE OL
ON O.ORDER_NUM = OL.ORDER_NUM
INNER JOIN PART P
ON OL.PART_NUM = P.PART_NUM
WHERE P.PART_DESCRIPTION IN ('GasRange','Washer')
GROUP BY C.CUST_NAME

try the following
SELECT C.CUST_NAME AS [Customer(s) that ordered a Gas Range and Washer]
FROM CUSTOMER C
INNER JOIN ORDERS O
ON C.CUST_NUM = O.CUST_NUM
INNER JOIN ORDER_LINE OL
ON O.ORDER_NUM = OL.ORDER_NUM
INNER JOIN PART P
ON OL.PART_NUM = P.PART_NUM
INNER JOIN ORDERS O2
ON C.CUST_NUM = O2.CUST_NUM
INNER JOIN ORDER_LINE OL2
ON O2.ORDER_NUM = OL2.ORDER_NUM
INNER JOIN PART P2
ON OL2.PART_NUM = P2.PART_NUM
WHERE P.PART_DESCRIPTION IN ('GasRange') and P2.PART_DESCRIPTION IN ('Washer')
GROUP BY C.CUST_NAME
EDIT: Had a further look and I'm afraid that this can't be simplified in any other way than using WITH and complicated aggregate functions, which I would say would be more complicated than this - I think the other solution suggested using WITH won't work - it joins incorrectly. You definitely can't remove order line, and you have to use the order twice as well - if it was used once, it will cover only when the customer ordered it within one order, which is not what you wanted ;)

Try this...
So basically you need to join your Parts table again to ensure the same customer ordered a "Gas Range" and a "Washer". An IN, like in your current query functions as an OR therefore you are not getting the expected result.
WITH CTE AS (
SELECT DISTINCT O.CUST_NUM FROM ORDERS O
INNER JOIN ORDER_LINE OL
ON O.ORDER_NUM = OL.ORDER_NUM
INNER JOIN PART P
ON OL.PART_NUM = P.PART_NUM
INNER JOIN PART P2
ON OL.PART_NUM = P2.PART_NUM
WHERE P.PART_DESCRIPTION IN ('GasRange')
AND P2.PART_DESCRIPTION IN ('Washer')
)
SELECT C.CUST_NAME AS [Customer(s) that ordered a Gas Range and Washer]
FROM CUSTOMER C
INNER JOIN CTE O
ON C.CUST_NUM = O.CUST_NUM

Related

How to inner join with multiple tables for 3NF Normalization in SQL

I am trying to create normalization 3 nf (normal form) in this database. However, when I execute the query, the table is empty. As you will see, this is my table and diagram.
Here is the relationship that we want to 3NF
Torder->(Customer-Food-Carrier-Waiter)-(Ternary Relationship)-(Customer_id,Food_id,Carrier_id,Waiter_id)
Here is my query
SELECT
Customers.ID AS CustomerID,
Food.ID AS FoodID,
Carrier.ID AS CarrierID,
Waiter.ID AS WaiterID,
tOrder.ID AS TORDERID
FROM
tOrder
INNER JOIN
Customers ON Customers.ID = tOrder.Customer_id
INNER JOIN
Food ON Food.ID = tOrder.Food_id
INNER JOIN
Carrier ON Carrier.ID = tOrder.Carrier_id
INNER JOIN
Waiter ON Waiter.ID = tOrder.Waiter_id
ORDER BY tOrder.ID;
Clearly, you have an issue where some of the tables are empty or the ids do not match. You can use LEFT JOIN to keep all orders and see what is happening:
FROM tOrder o LEFT JOIN
Customers c
ON c.ID = o.Customer_id LEFT JOIN
Food f
ON f.ID = o.Food_id LEFT JOIN
Carrier ca
ON ca.ID = o.Carrier_id LEFT JOIN
Waiter w
ON w.ID = o.Waiter_id
If there are no matches, then the orders are still in the results, with NULL for the values in the table with no matches.
Note that this also introduces table aliases, so the query is easier to write and to read.

Query that will result in a list of customer names and the average order value made by each customer

I have a database and I need to create a query that will retrieve a list of customer names and the average order value made by each customer.
I tried:
SELECT c.customer_name, AVG(COUNT(o.order_id)*f.price) AS 'avgorderprice'
FROM Customers c
LEFT JOIN Orders o ON o.customer_id = c.customer_id
INNER JOIN F_in_Or fio ON o.order_id = fio.order_id
INNER JOIN Films f ON fio.film_id = f.film_id;
This is my database structure:
But I get an error, what can be wrong?
But I get an error, what can be wrong ?
You are trying to use an aggregate function Count within an aggregate function and getting a MISUSE OF AN AGGREGATE FUNCTION.
i.e. - > misuse of aggregate function COUNT()
Additionally you are not GROUP'ing your results and thus will receive the entire average rather than a per-customer average.
Aggregate functions work on a GROUP the default being all unless a GROUP BY clause places the rows into GROUPS.
You could instead of AVG(COUNT(o.order_id)*f.price) use sum(f.price) / count(*). However, there is an average aggregate function avg. So avg(f.price)' is simpler.
Additionally as you want an average per customer you want to use a GROUP BY c.customer clause.
Thus you could use :-
SELECT
c.customer_name,
avg(f.price) AS 'avgorderprice' --<<<<< CHANGED
FROM Customers c
LEFT JOIN Orders o ON o.customer_id = c.customer_id
INNER JOIN F_in_Or fio ON o.order_id = fio.order_id
INNER JOIN Films f ON fio.film_id = f.film_id
GROUP BY c.customer_name --<<<<< ADDED
;
This would result in something like :-
I think this should work
SELECT c.customer_name, AVG(f.price) AS 'avgorderprice' FROM Customers c LEFT JOIN Orders o ON o.customer_id = c.customer_id LEFT JOIN F_in_Or fio ON o.order_id = fio.order_id LEFT JOIN Films f ON fio.film_id = f.film_id;

LEFT JOIN vs Stacked Left Join

I wanted to ask whats the difference between those two queries:
SELECT
Customers.CustomerID, Customers.CustomerName, Orders.OrderID,
OrderDetails.Quantity, Products.ProductName
FROM
Customers
LEFT JOIN
(Orders
LEFT JOIN
(OrderDetails
LEFT JOIN
Products ON Products.ProductID = OrderDetails.ProductID
) ON OrderDetails.OrderID = Orders.OrderID
) ON Customers.CustomerID = Orders.CustomerID
GROUP BY
Customers.CustomerName;
Vs
SELECT
Customers.CustomerID, Customers.CustomerName, Orders.OrderID,
OrderDetails.Quantity, Products.ProductName
FROM
Customers
LEFT JOIN
Orders ON Orders.CustomerID = Customers.CustomerID
LEFT JOIN
OrderDetails ON OrderDetails.OrderID = Orders.OrderID
LEFT JOIN
Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY
Customers.CustomerName;
Tested here
https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_join
From what I can see one selects the first of multiple entries, one selects the last of multiple entries, but is that all?
From my point of view the not nested LEFT Join is way easier to read and to understand. Is there any downside of using it?
Your problem is the incorrect use of GROUP BY. The only unaggregated columns in the SELECT should be in the GROUP BY.
The rest of this answer addresses the point about joins.
Your second query is interpreted as:
FROM (((Customers c LEFT JOIN
Orders o
ON o.CustomerID = c.CustomerID
) LEFT JOIN
OrderDetails
ON od.OrderID = o.OrderID
) LEFT JOIN
Products p
ON p.ProductID = od.ProductID
The parentheses can affect the interpretation. But what effect? Essentially, you have:
(((c left join o) left join od) left join p)
versus
c left join (o left join (od left join p)))
Both keep all records in c, regardless of matches in the second. In this case, the two versions do the same thing. But for a particular reason -- the on conditions are strictly chained (that is, c to o, o to od, od to p). If p where joined to o instead of od, then subtle differences can occur.
What are the subtle differences? Two things can differ:
Whether columns from a particular table are NULL or have values.
Whether rows get duplicated, due to multiple matches between two tables.
In practice, I don't fine parentheses particularly useful. If I can about JOIN order, I use an explicit subquery or CTE>

Can this oracle query be tuned anymore?

I'm trying to minimize the cost of this query as much as possible without creating any indexes.
This is the original query with a cost of 599:
SELECT DISTINCT OL.PRODUCT_ID
FROM ORDERS O JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
JOIN CUSTOMERS C ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
WHERE C.CUSTOMER_ID = 474871
OR UPPER(C.FIRST_NAME) = 'EDGAR';
This is what I've done so far. The cost is now 344:
SELECT OL.PRODUCT_ID
FROM ORDER_LINES OL
WHERE EXISTS
(SELECT ORDER_ID
FROM ORDERS
WHERE CUSTOMER_ID = 474871
AND ORDER_ID = OL.ORDER_ID)
OR EXISTS
(SELECT ORDER_ID
FROM ORDERS
WHERE CUSTOMER_ID IN
(SELECT CUSTOMER_ID
FROM CUSTOMERS
WHERE UPPER(FIRST_NAME) = 'EDGAR')
AND ORDER_ID = OL.ORDER_ID);
Is there anything that stands out that I may try to drive down the cost more?
Here is a screen shot of the explain plan:
Screenshot of ERD:
Looking at the cost is misleading and may lead you to make changes that aren't actually beneficial. To quote Tom Kyte, "You cannot compare the cost of 2 queries with each other. ... they might as well be random numbers."
The best way to check query performance is to actually time the query, ideally with realistic data. You should also be wary of premature optimization. Your first query is pretty straight-forward; I would stick with it unless a performance issue manifests.
SELECT OL.PRODUCT_ID
FROM ORDER_LINES OL
WHERE OL.ORDER_ID IN
(SELECT ORDER_ID FROM ORDERS
WHERE CUSTOMER_ID IN (SELECT CUSTOMER_ID FROM CUSTOMERS
WHERE CUSTOMER_ID = 474871 OR UPPER(FIRST_NAME) = 'EDGAR')
);
I suppose there is an index on OL.ORDER_ID (now you have FULL SCAN of ORDER_LINES)
I'm curious if the engine is smart enough to apply the where clause before the joins... If it's doing it after the join, then the results it has to scan are larger than they need to be... What happens if you move the limiting criteria to the join so it HAS to be evaluated before the join occurs. (fully expect this to be 599 or less. just don't know if it will be less...
SELECT OL.PRODUCT_ID
FROM CUSTOMERS C
INNER JOIN ORDERS O
ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
AND (C.customer_ID = 47871 OR upper(C.First_name) = 'EDGAR')
INNER JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
INNER JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
GROUP BY OL.Product_ID
I wonder if the OR is causing the problem....
if you run it w/o the or how much cost is reduced
and then what happens if you union the two sets instead of using an or.
SELECT OL.PRODUCT_ID
FROM CUSTOMERS C
INNER JOIN ORDERS O
ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
INNER JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
INNER JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
WHERE C.customer_ID = 47871
UNION
SELECT OL.PRODUCT_ID
FROM CUSTOMERS C
INNER JOIN ORDERS O
ON (C.CUSTOMER_ID = O.CUSTOMER_ID)
INNER JOIN ORDER_LINES OL ON (O.ORDER_ID = OL.ORDER_ID)
INNER JOIN PRODUCTS P ON (OL.PRODUCT_ID = P.PRODUCT_ID)
WHERE upper(C.First_name) = 'EDGAR')

SQL Query Showing 4x Records

The following statement works properly but shows each record 4 times. Repeated; I know the relationship is wrong but no idea how to fix it? Apologies if this is simple and i've missed it.
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name, Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers, Plants, Orders, Staff
INNER JOIN Orders AS t2 ON t2.Order_ID = Staff.Order_ID
WHERE Orders.Order_Date
BETWEEN '2011/01/01'
AND '2013/03/01'
You are generating a Cartesian product between the tables since you have not provided join syntax between any of the tables:
SELECT c.First_Name, c.Last_Name,
p.Common_Name, p.Flower_Colour, p.Flowering_Season,
s.First_Name, s.Last_Name
FROM Customers c
INNER JOIN Orders o
on c.customerId = o.customer_id
INNER JOIN Plants p
on o.plant_id = p.plant_id
INNER JOIN Staff s
ON o.Order_ID = s.Order_ID
WHERE o.Order_Date BETWEEN '2011/01/01' AND '2013/03/01'
Note: I am guessing on column names for the joins
Here is a great visual explanation of joins that can help in learning the correct syntax
In the FROM... clause you are doing a cross join - combining every customer with every plant with every order with every staff.
You should only mention one table in the FROM clause and then connect the other ones with INNER JOINS to only get related records.
I don't know exactly how your database looks like, but something like this:
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name,
Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers
INNER JOIN Orders ON Orders.Customer_ID = Customers.Customer_ID
INNER JOIN Staff ON Staff.Staff_ID = Orders.Staff_ID
INNER JOIN Plants ON Plants.Plants_ID = Orders.Plants_ID
WHERE Orders.Order_Date
BETWEEN '2011/01/01'
AND '2013/03/01'
This is because you are selecting from four tables without any joins between them, and also because you are joining Orders twice. As the result, a Cartesian product is made.
Here is how you should fix it: re-write the theta join using the ANSI syntax, and provide proper join conditions:
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name, Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers
JOIN Plants ON ...
JOIN Orders ON ...
JOIN Staff ON ...
INNER JOIN Orders AS t2 ON t2.Order_ID = Staff.Order_ID
WHERE Orders.Order_Date BETWEEN '2011/01/01' AND '2013/03/01'
Replace ... with proper join conditions; this should make the results look as expected.