Understand Sub-Queries - sql

I was initially looking to see a breakdown of the total dollar business that each vendor has done (indirectly via the distributor) with each customer, where I'm trying not to use the Inner Join Syntax and used the Query below for this purpose:
select customers.cust_id, Vendors.vend_id, sum(quantity*item_price) as total_business from
(((Vendors left outer join Products
on Products.vend_id = Vendors.vend_id)
left outer join OrderItems --No inner joins allowed
on OrderItems.prod_id = Products.prod_id)
left outer join Orders
on Orders.order_num = OrderItems.order_num)
left outer join Customers
on Customers.cust_id = Orders.cust_id
where Customers.cust_id is not null -- THE ONLY DIFFERENCE BETWEEN QUERY1 AND QUERY2
group by Customers.cust_id, Vendors.vend_id
order by total_business
Now, I am trying to see the query output results for all vendor-customer combinations, including those combinations where there was no business transacted and am trying to write this via a single SQL Query. My teacher provided this solution, but I honestly cannot understand the logic at all, as I've never come across Sub-queries.
select
customers.cust_id,
Vendors.vend_id,
sum(OrderItems.quantity*orderitems.item_price)
from
(
customers
inner join
Vendors on 1 = 1
)
left outer join --synthetic product using joins
(
orders
join
orderitems on orders.order_num = OrderItems.order_num
join
Products on orderitems.prod_id = products.prod_id
) on
Vendors.vend_id = Products.vend_id and
customers.cust_id = orders.cust_id
group by customers.cust_id, vendors.vend_id
order by customers.cust_id
Thanks a lot

I would write this query as:
select c.cust_id, v.vend_id, coalesce(cv.total, 0)
fro Customers c cross join
Vendors v left outer join
(select o.cust_id, v.vend_id, sum(oi.quantity * oi.item_price) as total
from orders o join
orderitems oi
on o.order_num = oi.order_num join
Products p
on oi.prod_id = p.prod_id
group by o.cust_id, v.vend_id
) cv
on cv.vend_id = v.vend_id and
cv.cust_id = c.cust_id
order by c.cust_id;
The structure is quite similar. Both version start by creating a cross product between all customers and vendors. This creates all the rows in the output result set. Next, the aggregation needs to be calculated at this level. In the above query, this is done explicitly as a subquery which aggregates the values to the customer/vendor level. (In the original query, this is done in the outer query.)
The final step is joining these together.
Your teacher should be encouraging you to use table aliases, particularly table abbreviations. You should also be encouraged to use the proper join. So, although you can express a cross join as an inner join with on 1=1, a cross join is part of the SQL language, not a hack.
Similarly, parentheses in the from clause can make the logic harder to follow. Explicit subqueries are more easily read.

Related

Multiple joins with group by (Sum)

When I using multiple JOIN, I hope to get the sum of some column in joined tables.
SELECT
A.*,
SUM(C.purchase_price) AS purcchase_total,
SUM(D.sales_price) AS sales_total,
B.user_name
FROM
PROJECT AS A
LEFT JOIN
USER AS B ON A.user_idx = B.user_idx
LEFT JOIN
PURCHASE AS C ON A.project_idx = C.project_idx
LEFT JOIN
SALES AS D ON A.project_idx = D.project_idx
GROUP BY
????
You need to use subquery as follows:
SELECT A.project_idx,
a.project_name,
A.project_category,
sum(C.purchase_price) AS purcchase_total,
sum(D.sales_price) as sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_price
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_price
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
I am not sure but you can use inner join of project with user instead of left join.
SELECT A.project_idx,
a.project_name,
A.project_category,
purcchase_total,
sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_total
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_total
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
This is working correctly on MS-SQL Server.
Thanks to Popeye
You are attempting to aggregate over two unrelated dimensions, and that throws off all the calculations.
Correlated subqueries are an alternative:
SELECT p.*,
(SELECT SUM(pu.purchase_price)
FROM PURCHASE pu
WHERE p.project_idx = pu.project_idx
) as purchase_total,
(SELECT SUM(s.sales_price)
FROM SALES s
WHERE p.project_idx = s.project_idx
) as sales_total,
u.user_name
FROM PROJECT p LEFT JOIN
USER u
ON p.user_idx = u.user_idx ;
Note that this uses meaningful table aliases so the query is easier to read. Arbitrary letters are really no better (and perhaps worse) than using the entire table name.
Correlated subqueries avoid the outer aggregation as well -- and let you select all the columns from the first table, which is what you want. They also often have better performance with the right indexes.

SQL Command not properly ended trying to join 3 tables using query

Hello I am trying to add a Translated_Name column from the Product_descriptions table to my current query that is already joining two tables however the translated_name column is a type NVARCHAR2. Should I be using Inner Join for it or am I completely wrong?
select order_mode,customer_id,product_id from ORDERS
inner join ORDER_items on order_items.ORDER_ID=Orders.ORDER_ID
where exists(select customer_id from customers where orders.customer_id=customers.customer_id)
inner join product_descriptions on product_descriptions.translated_name = Orders.Customer_id
The where clause goes after the joins:
select
order_mode,
customer_id,
product_id
from orders o
inner join order_items oi
on oi.order_id = o.order_id
inner join product_descriptions pd
on pd.translated_name = o.customer_id
where exists(
select 1
from customers c
where o.customer_id = c.customer_id
)
Notes:
table aliases make the query easier to read and write
you should qualify the columns the the from clause with the alias of the table they belong to
I am quite suspicious about the join condition on product_descriptions, which involves customer_id; you might need to review that (without knowing your table structures, it is not possible to tell what the correct condition is)

WHERE clause in an SQL query

I THINK what is happening with this query is if there are no records in the GenericAttribute table associated with the Product, then that product is not displayed. See line below in WHERE clause: "AND GenericAttribute.KeyGroup = 'Product'"
Is there a way to reword so that that part of the WHERE is ignored if no associated record in the GenericAttribute table?
Also, looking at my ORDER BY clause, will a record from the product table still show up if it has no associated record in the Pvl_AdDates table?
Thanks!
SELECT DISTINCT Product_Category_Mapping.CategoryId, Product.Id, Product.Name, Product.ShortDescription, Pvl_AdDates.Caption, Pvl_AdDates.EventDateTime, convert(varchar(25), Pvl_AdDates.EventDateTime, 120) AS TheDate, Pvl_AdDates.DisplayOrder, Pvl_Urls.URL, [Address].FirstName, [Address].LastName, [Address].Email, [Address].Company, [Address].City, [Address].Address1, [Address].Address2, [Address].ZipPostalCode, [Address].PhoneNumber
FROM [Address]
RIGHT JOIN (GenericAttribute
RIGHT JOIN (Pvl_Urls RIGHT JOIN (Pvl_AdDates
RIGHT JOIN (Product_Category_Mapping
LEFT JOIN Product
ON Product_Category_Mapping.ProductId = Product.Id)
ON Pvl_AdDates.ProductId = Product.Id)
ON Pvl_Urls.ProductId = Product.Id)
ON GenericAttribute.EntityId = Product.Id)
ON Address.Id = convert(int, GenericAttribute.Value)
WHERE
Product_Category_Mapping.CategoryId=12
AND GenericAttribute.KeyGroup = 'Product'
AND Product.Published=1
AND Product.Deleted=0
AND Product.AvailableStartDateTimeUtc <= getdate()
AND Product.AvailableEndDateTimeUtc >= getdate()
ORDER BY
Pvl_AdDates.EventDateTime DESC,
Product.Id,
Pvl_AdDates.DisplayOrder
I strongly encourage you to not mix left join and right join. I have written many SQL queries and cannot think of an occasion when that was necessary.
In fact, just stick to left join.
If you want all products (or at least all products not filtered out by the where clause), then start with the products table and go from there:
FROM Products p LEFT JOIN
Product_Category_Mapping pcm
ON pcm.ProductId = p.Id LEFT JOIN
Pvl_AdDates ad
ON ad.ProductId = p.id LEFT JOIN
Pvl_Urls u
ON u.ProductId = p.id LEFT JOIN
GenericAttribute ga
ON ga.EntityId = p.id LEFT JOIN
Address a
ON a.Id = convert(int, ga.Value)
Note that I added table aliases. These make queries easier to write and to read.
I would add a caution. It looks like you are combining data along different dimensions. You are likely to get a Cartesian product of the dimension attributes for each dimension. Perhaps that is what you want or the WHERE clause takes care of the additional rows.
Yes put constraints (restrictions) on tables on the outer side of outer joins in the on conditions of the outer join, not in the where clause. Conditions in where clauses are not evaluated and applied until after the outer joins are evaluated, so where there is not record in the outer table, the predicate will be false and entire row will be eliminated, undoing the outer-ness. Conditions in the join are evaluated during the join, before the rows from the inner side are added back in, so the result set will still include them.
Second, formatting formatting, formatting! Stick to one direction of join (left is easier) and use Aliases for tables names!
SELECT DISTINCT m.CategoryId, p.Id,
p.Name, p.ShortDescription, d.Caption, d.EventDateTime,
convert(varchar(25), d.EventDateTime, 120) TheDate,
d.DisplayOrder, u.URL, a.FirstName, a.LastName,
a.Email, a.Company, a.City, a.Address1, a.Address2,
a.ZipPostalCode, a.PhoneNumber
FROM Product_Category_Mapping m
left join Product p on p.Id = m.ProductId
and p.Published=1
and p.Deleted=0
and p.AvailableStartDateTimeUtc <= getdate()
and p.AvailableEndDateTimeUtc >= getdate()
left join Pvl_AdDates d ON d.ProductId = p.Id
left join Pvl_Urls u ON u.ProductId = p.Id
left join GenericAttribute g ON g.EntityId = p.Id
and g.KeyGroup = 'Product'
left join [Address] a ON a.Id = convert(int, g.Value)
WHERE m.CategoryId=12
ORDER BY d.EventDateTime DESC, p.Id, d.DisplayOrder

Trying to understand NULL operator in Query

I'm looking to see a breakdown of the total dollar business that each vendor has done (indirectly via the distributor) with each customer, where I'm trying not to use the Inner Join Syntax. I basically don't understand the difference between the two outputs produced by the two queries shown below:
Query1
select customers.cust_id, vendors.vend_id, sum(OrderItems.item_price*OrderItems.quantity) as total_business from
(((Vendors left outer join products
on vendors.vend_id = products.prod_id)
left outer join OrderItems
on products.prod_id = OrderItems.prod_id)
left outer join Orders
on OrderItems.order_num = Orders.order_num)
left outer join Customers
on Orders.cust_id = Customers.cust_id
group by Customers.cust_id, vendors.vend_id
order by total_business
I get the following output:
Query2
select customers.cust_id, Vendors.vend_id, sum(quantity*item_price) as total_business from
(((Vendors left outer join Products
on Products.vend_id = Vendors.vend_id)
left outer join OrderItems --No inner joins allowed
on OrderItems.prod_id = Products.prod_id)
left outer join Orders
on Orders.order_num = OrderItems.order_num)
left outer join Customers
on Customers.cust_id = Orders.cust_id
where Customers.cust_id is not null -- THE ONLY DIFFERENCE BETWEEN QUERY1 AND QUERY2
group by Customers.cust_id, Vendors.vend_id
order by total_business
I don't understand how there are only NULL cust_id's associated with the 1st Output when in the 2nd Output we get some non-NULL cust_ids. Why doesn't the 1st Output include these non-NULL cust_id's
Thank You
Query One is joining Vendors and Products incorrectly:
on vendors.vend_id = products.prod_id -- Vend_ID = Prod_ID
Query Two is joining Vendors and Products correctly:
on Products.vend_id = Vendors.vend_id -- Vend_ID = Vend_ID
Once that is fixed, you'll get the same IDs in both queries. Then I suggest you read Dan's answer to understand why what you were trying to do in eliminating INNER JOIN from the query is cancelled out by adding a WHERE filter to a column from the last table in the chain.
When you left join to a table, then filter on that table in the where clause, the join effectively changes to an inner join. The workaround is to apply the filter as a joining condition.
In your second query, all you have to do is is change the word "where" to "and".

SQL Query Showing 4x Records

The following statement works properly but shows each record 4 times. Repeated; I know the relationship is wrong but no idea how to fix it? Apologies if this is simple and i've missed it.
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name, Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers, Plants, Orders, Staff
INNER JOIN Orders AS t2 ON t2.Order_ID = Staff.Order_ID
WHERE Orders.Order_Date
BETWEEN '2011/01/01'
AND '2013/03/01'
You are generating a Cartesian product between the tables since you have not provided join syntax between any of the tables:
SELECT c.First_Name, c.Last_Name,
p.Common_Name, p.Flower_Colour, p.Flowering_Season,
s.First_Name, s.Last_Name
FROM Customers c
INNER JOIN Orders o
on c.customerId = o.customer_id
INNER JOIN Plants p
on o.plant_id = p.plant_id
INNER JOIN Staff s
ON o.Order_ID = s.Order_ID
WHERE o.Order_Date BETWEEN '2011/01/01' AND '2013/03/01'
Note: I am guessing on column names for the joins
Here is a great visual explanation of joins that can help in learning the correct syntax
In the FROM... clause you are doing a cross join - combining every customer with every plant with every order with every staff.
You should only mention one table in the FROM clause and then connect the other ones with INNER JOINS to only get related records.
I don't know exactly how your database looks like, but something like this:
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name,
Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers
INNER JOIN Orders ON Orders.Customer_ID = Customers.Customer_ID
INNER JOIN Staff ON Staff.Staff_ID = Orders.Staff_ID
INNER JOIN Plants ON Plants.Plants_ID = Orders.Plants_ID
WHERE Orders.Order_Date
BETWEEN '2011/01/01'
AND '2013/03/01'
This is because you are selecting from four tables without any joins between them, and also because you are joining Orders twice. As the result, a Cartesian product is made.
Here is how you should fix it: re-write the theta join using the ANSI syntax, and provide proper join conditions:
SELECT Customers.First_Name, Customers.Last_Name, Plants.Common_Name, Plants.Flower_Colour, Plants.Flowering_Season, Staff.First_Name, Staff.Last_Name
FROM Customers
JOIN Plants ON ...
JOIN Orders ON ...
JOIN Staff ON ...
INNER JOIN Orders AS t2 ON t2.Order_ID = Staff.Order_ID
WHERE Orders.Order_Date BETWEEN '2011/01/01' AND '2013/03/01'
Replace ... with proper join conditions; this should make the results look as expected.