I have written a query like below using NorthWind.
select COUNT(o.OrderId) as Orders
from Orders o
join [Order Details] od on o.OrderID = od.OrderID
The table Orders has 830 data. However when I join Orders on Order Details query gives me the number of data inside Order Details table which is 2155.
Why is the query result not 830?
select COUNT( distinct o.OrderId) as Orders
from Orders o
join [Order Details] od on o.OrderID = od.OrderID
It's because of the join. The details create a one to many relationship repeating order ID in your results. The repeated ID is then counted each time; thus inflating the count to match record count in order details. This can be avoided by either not doing the join, or by using a distinct count of orderID as listed above.
If you need any of the details from order details, or you want to exclude from your count orders without details, then you need the join. Otherwise I'd remove it and the distinct as it's just generating overhead and adding cost to getting your results.
Additionally though, if you have orders without details then and you want them included, you need to alter your join to be a LEFT join not just an join (Inner). As the join will exclude orders w/o details. If you don't want those orders w/o details in your count then an inner join is appropriate.
Related
Hi Guys I am having a bit of trouble writing the most efficient and optimized query for this question:
Find the order ID and date of the last discontinued item sold.
I have my code below as well as the metadata for the tables. I am not sure if my code will produce the correct output because I have no way of testing it and I am not sure if my code will be the best way to complete this query. Any advice would help.
Select
orders.orderid,
Max(orders.orderdate)
from orders
inner join order_details on orders.orderid = order_details.orderid
inner join products on order_details.productid = products.productid
where discontinued = 1
group by orders.orderid ```
Using row_number is the easiest way:
select orderID,OrderDate from (
select o.orderID,o.OrderDate,rn = row_number() over (order by orderdate desc)
from products p
join orderDetails od on od.productID=p.productID
join orders o on o.orderID=od.orderID
where p.discontinued = 1) sub
where sub.rn = 1
your query is pretty much already what you want, the fastest way is to simply order by the required column and select top 1
select top (1) o.orderid, o.orderdate
from orders o
join order_details od on od.orderid = o.orderid
join products p on p.productid = od.productid
where p.discontinued = 1
order by o.orderdate desc
This will be more performant than using a window function to number all the rows before selecting row one.
I have a very similar arrangement of tables with the ubiquitous orders/orderitems/products arrangement including a similar deleted flag for products so it's easy to test both side by side, this query is a bit more performant than using the row_number equivalent, using a table of 13.5m orders and 6m products. Execution times for both were sub-second but this query was slightly faster.
SELECT ORDERS.ORDERID,
ORDERS.CUSTOMERID,
ORDERS.EMPLOYEEID,
ORDERDETAILS.PRODUCTID,
ORDERDETAILS.UNITPRICE,
ORDERDETAILS.QUANTITY,
COUNT(ORDERS.ORDERID)
FROM ORDERS
LEFT JOIN ORDERDETAILS ON ORDERS.ORDERID=ORDERDETAILS.ORDERID
GROUP BY ORDERDETAILS.ORDERID
ERROR:Column 'ORDERS.OrderID' is invalid in the select list because it
is not contained in either an aggregate function or the GROUP BY
clause.
For using aggregate function selected column will also need to include in group by clause
SELECT
ORDERS.ORDERID,ORDERS.CUSTOMERID,ORDERS.EMPLOYEEID,ORDERDETAILS.PRODUCTID,ORD ERDETAILS.UNITPRICE,ORDERDETAILS.QUANTITY,
COUNT(ORDERS.ORDERID)
FROM ORDERS LEFT JOIN ORDERDETAILS ON
ORDERS.ORDERID=ORDERDETAILS.ORDERID
GROUP BY ORDERDETAILS.ORDERID,ORDERS.CUSTOMERID,ORDERS.EMPLOYEEID,ORDERDETAILS.PRODUCTID,ORD ERDETAILS.UNITPRICE,ORDERDETAILS.QUANTITY
Presumably, you intend this:
SELECT o.ORDERID, o.CUSTOMERID, o.EMPLOYEEID,
COUNT(od.ORDERID) as NUM_PRODUCTS
FROM ORDERS o LEFT JOIN
ORDERDETAILS od
ON o.ORDERID = od.ORDERID
GROUP BY o.ORDERID;
This produces one row per ORDERID with a count of the number of products (or more specifically orderdetails rows) in each order.
Notes:
All unaggregated columns in the SELECT should be GROUP BY keys.
You don't want to include unaggregated columns from ORDERDETAILS in the SELECT, because then an ORDER might have multiple rows in the result set.
You do want to use table aliases, so the query is easier to write and to read.
I am trying to find the average dollar amount of an order. I have calculated the average order Total but I need an average that takes into account the fact that not all Orders have a corresponding OrderItems.
This is a homework question and it is as follows:
What is the average $$ value of an order? To get the answer, you need
to add up all the order values and divide this by the
number of orders. There are two possible averages on this question,
because not all of the order numbers in the ORDERS table are in the
ORDERITEMS table... You will calculate and display both averages.
I have writtern the one ignoring orders with no OrderItem, but not sure of how to go about the second case.
SELECT SUM(OrderItems.qty*INVENTORY.price) / COUNT(*) AS dollarValue
FROM Orders, OrderItems, Inventory
WHERE ORDERS.orderid = OrderItems.orderid AND OrderItems.partid = Inventory.partid
Link To DB Diagram
The Avg function will not replace NULL with zero; it will exclude NULL from its calculation. If you have Order rows which have no OrderItem, you need to use Left Joins. A trick you can use in SQL Server is to nest the joins like so (note the parentheses):
Select Avg(OI.Qty * I.Price)
From Orders As O
Left Join (OrderItems As OI
Join Inventory As I
On I.PartId = OI.PartId)
On OI.OrderId = O.OrderId
This will join the Inventory table to the OrderItems table before it Left Joins that result to the Orders table. In this way, OI.Qty and I.Price with both return NULL for Orders that have no OrderItems and be excluded from the calculation. An equivalent approach to the above would be to use two Left Joins:
Select Avg(OI.Qty * I.Price)
From Orders As O
Left Join OrderItems As OI
On OI.OrderId = O.OrderId
Left Join Inventory As I
On I.PartId = OI.PartId
If you wanted to count Orders with no OrderItems as zero, then you need to covert those nulls to zero using Coalesce:
Select Avg(OI.Qty * I.Price) As Avg_ExcludingNull
, Avg( Coalesce(OI.Qty * I.Price,0) ) As Avg_NullAsZero
From Orders As O
Left Join (OrderItems As OI
Join Inventory As I
On I.PartId = OI.PartId)
On OI.OrderId = O.OrderId
SQL has an aggregate function for calculating the average: AVG()
SELECT AVG(OrderItems.qty*INVENTORY.price) AS dollarValue
FROM Orders, OrderItems, Inventory
WHERE ORDERS.orderid = OrderItems.orderid AND OrderItems.partid = Inventory.partid
While we're here, may I suggest you use the more modern JOIN syntax:
SELECT AVG(OrderItems.qty*INVENTORY.price) AS dollarValue
FROM Orders
JOIN OrderItems ON ORDERS.orderid = OrderItems.orderid
JOIN Inventory ON OrderItems.partid = Inventory.partid
This is the code I did
SELECT TOP 5 ContactName FROM Customers
INNER JOIN [Order Details]ON OrderId =
CustomerID
INNER JOIN Orders ON ProductID = OrderID
WHERE UnitPrice >= 25000
ORDER BY ContactName ASC
But this is the error I am getting
Msg 209, Level 16, State 1, Line 5
Ambiguous column name 'orderID'
Can someone explain to me why I am getting this error.
This is what I am trying to do is show the most recent five orders that were purchased from a customer who has spent more than $25,000
So i am assuming to use order,product,and customer.
The column OrderID exists in both tables.
There is probably an OrderID column in both your Order Details and your Orders table, and SQL Server doesn't know which one to take.
Solution: specify which one you want to use by putting the table name in front of it:
Orders.OrderID instead of just OrderID
So your query would look like this then:
SELECT TOP 5 ContactName FROM Customers
INNER JOIN [Order Details]ON Orders.OrderId =
CustomerID
INNER JOIN Orders ON ProductID = Orders.OrderID
WHERE UnitPrice >= 25000
ORDER BY ContactName ASC
Almost certainly you have the field orderID in both the Details and the Orders table.
Clarify it with either Orders.orderID or Details.orderID.
There are 2 OrderID columns across the tables.
You can remove the ambiguity with aliases (like this) or use Orders.OrderID
SELECT TOP 5 C.ContactName
FROM
Customers C
INNER JOIN
[Order Details] OD ON C.OrderId = OD.CustomerID
INNER JOIN
Orders O ON OD.ProductID = O.OrderID
WHERE O.UnitPrice >= 25000 -- or OD?
ORDER BY C.ContactName ASC
Note: did you mean to joion Customers and [Order Details] like using Customers.OrderId?
When you JOIN multiple tables in the same query, you need to differentiate any columns which have the same name in multiple tables. Otherwise, how would the query engine know which one you're talking about?
You can do this either by prefixing the column name with <table name>. or <table alias>..
For example:
SELECT TOP 5
C.ContactName
FROM
Customers C -- Customers is aliased as "C"
INNER JOIN [Order Details] OD ON
OD.OrderId = C.CustomerID
INNER JOIN Orders O ON
OD.ProductID = O.OrderID
WHERE
OD.UnitPrice >= 25000
ORDER BY
C.ContactName ASC
Another important question... are you sure that you're joining on the correct columns there? It looks really wrong.
Finally, if this is for a homework assignment, please make sure that you tag it as such with the "homework" tag.
In your references to OrderId, you need to figure out which table you are pulling the orderId column from. (In some cases, you can just pick either table). Let's call that table .
In your query, replace orderId with .orderId.
I've been trying to find out how to write this query in sql.
What I need is to find the productnames (in the products table) that have 50 or more orders (which are in the order table).
only one orderid is matched up to a productname at a time so when I try to count the orderid's it counts all of them.
I can get distinct productnames but once i add in the orderid's then it goes back to having multiple productnames.
I also need to count the number of customers (in the order table) that have ordered those products.
I need some serious help ASAP! if anyone could help me figure out how to figure this out that would be awesome!
Table: Products
`productname` in the form of a text like 'GrannySmith'
Table: Orders
`orderid` in the form of '10222'..etc
`custid` in the form of something like 'SMITH'
Assuming the orders table has a field that relates back to the products table named ProductId. The SQL would translate to:
SELECT p.ProductName, Count(*)
FROM Orders o
JOIN Products p
on o.ProductId = p.ProductId
GROUP BY p.ProductName HAVING COUNT(*) >= 50
The key is in the having component of the Group By clause. I hope this helps.
You might be missing an "Order Details" table - typically, an order has several order details, and each of the order details then maps to a product - something like the sample in Northwind:
In that case, your SQL query would be something like this: join the [Order Details] table to both the [Orders] and [Products] tables, group by the product ID and name, and count the OrderID's:
select
p.ProductID, p.ProductName, count(o.OrderID)
from
[order details] od
inner join
orders o on od.OrderID = o.OrderID
inner join
products p ON od.productID = p.ProductID
group by
p.ProductID, p.ProductName
having
count(o.OrderID) > 50