So I'm trying to produce a little table from the AdventureWorks database, with the sales of all Silver Mountain Bikes sold in October 2013. I need the table to be ordered by Territory, and have a column of total sales and a column of sales that were the bikes only.
Where is my query going wrong?
Note, I haven't done any aliasing yet, just to keep things sensible in my own head.
select distinct sales.salesorderheader.SalesOrderID, sales.salesorderheader.OrderDate, Color, sales.SalesTerritory.Name, Quantity
from sales.SalesOrderHeader
right join sales.salesTerritory
on sales.salesOrderHeader.TerritoryID = sales.salesTerritory.TerritoryID
right join sales.SalesOrderDetail
on sales.salesOrderHeader.SalesOrderID = sales.SalesOrderDetail.SalesOrderID
right join production.product
on sales.SalesOrderDetail.ProductID = production.Product.ProductID
right join production.transactionHistory
on production.Product.ProductID = production.transactionHistory.ProductID
where production.product.ProductSubcategoryID = '1'
and color = 'Silver'
and sales.SalesOrderDetail.ModifiedDate between '2013-10-01' and '2013-10-31'
and production.TransactionHistory.TransactionType = 'S'
and sales.SalesOrderHeader.OnlineOrderFlag = 'False'
Due to the schema of the database, I've had to daisy-chain a few different tables together through a series of joins, but I've found these joins are grouping data up and seem to be adding a bunch of stuff together. I figure I can run a count of the Quantity column to get my total orders, and a sum of the Quantity column to see how many bikes were sold (some orders will have more than one bike!)
I hoped throwing in a 'distinct' at the top would narrow things down for me, but that's not getting me the count I need either, as I'm trying to replicate the results of a table I've been given.
I'm stumped in trying to complete a problem I was tasked in getting solved.
I'm tasked with doing the following: Calculate the average cost for different products and group them by the category of the product.
How can I do this against different tables?
I see. You want the "average" per "product", not over all the products. So, you need to calculate this yourself using COUNT(DISTINCT):
SELECT p.category AS category_id,
SUM(s.selling_price::numeric) / COUNT(DISTINCT p.product_id)
FROM product p JOIN
supply s
ON p.product_id = s.product_id
GROUP BY p.category;
i have a complex query to be written but cannot figure it out
here are my tables
Sales --one row for each sale made in the system
SaleProducts --one row for each line in the invoice (similar to OrderDetails in NW)
Deals --a list of possible deals/offers that a sale may be entitled to
DealProducts --a list of quantities of products that must be purchased in order to get a deal
now im trying to make a query which will tell me for each sale which deals he may get
the relevant fields are:
Sales: SaleID (PK)
SaleProducts: SaleID (FK), ProductID (FK)
Deals: DealID (PK)
DealProducts: DealID(FK), ProductID(FK), Mandatories (int) for required qty
i believe that i should be able to use some sort of cross join or outer join, but it aint working
here is one sample (of about 30 things i tried)
SELECT DealProducts.DealID, DealProducts.ProductID, DealProducts.Mandatories,
viwSaleProductCount.SaleID, viwSaleProductCount.ProductCount
FROM DealProducts
LEFT OUTER JOIN viwSaleProductCount
ON DealProducts.ProductID = viwSaleProductCount.ProductID
GROUP BY DealProducts.DealID, DealProducts.ProductID, DealProducts.Mandatories,
viwSaleProductCount.SaleID, viwSaleProductCount.ProductCount
The problem is that it doesn't show any product deals that are not fulfilled (probably because of the ProductID join). i need that also sales that don't have the requirements show up, then I can filter out any SaleID that exists in this query where AmountBought < Mandatories etc
Thank you for your help
I'm not sure how well I follow your question (where does viwSaleProductCount fit in?) but it sounds like you will want an outer join to a subquery that returns a list of deals along with their associated products. I think it would go something like this:
Select *
From Sales s Inner Join SaleProducts sp on s.SaleID = sp.SaleID
Left Join (
Select *
From Deals d Inner Join DealProducts dp on d.DealID = dp.DealId
) as sub on sp.ProductID = sub.ProductID
You may need to add logic to ensure that deals don't appear twice, and of course replace * with the specific column names you'd need in all cases.
edit: if you don't actually need any information from the sale or deal tables, something like this could be used:
Select sp.SaleID, sp.ProductID, sp.ProductCount, dp.DealID, dp.Mandatories
From SaleProducts sp
Left Join DealProducts as dp on sp.ProductID = dp.ProductID
If you need to do grouping/aggregation on this result you will need to be careful to ensure that deals aren't counted multiple times for a given sale (Count Distinct may be appropriate, depending on your grouping). Because it is a Left Join, you don't need to worry about excluding sales that don't have a match in DealProducts.
So, I have three tables with the following rows: Customers (Customer_id and Name), Orders (Customer_id, Product_id, Quantity) and Products (Price).
How do I write a query which shows all customers who spent more than 1000$? Do I have to join the tables?
Because this is homework, I'm not going to give you the answer, just information on how to arrive at the answer.
You need to include the customers table, because that's what your looking for, customers. You'll need to join in aggregate with the orders table, so you can find out how many of each product they've ordered, and you'll need to join with the products table to find the prices of all those items in order to total them up to determine if they've spent more than $1000.
Go to it. Tell us how you get on.
SELECT * FROM customer c JOIN (
SELECT a.customer_id,SUM(a.quantity*b.price) spent FROM orders a
JOIN products b ON b.product_id=a.product_id GROUP BY a.customer_id
) d ON d.customer_id=c.customer_id WHERE d.spent>1000
I was once given this task to do in an RDBMS:
Given tables customer, order, orderlines and product. Everything done with the usual fields and relationships, with a comment memo field on the orderline table.
For one customer retrieve a list of all products that customer has ever ordered with product name, year of first purchase, dates of three last purchases, comment of the latest order, sum of total income for that product-customer combination last 12 months.
After a couple of days I gave up doing it as a Query and opted to just fetch every orderline for a customer, and every product and run through the data procedurally to build the required table clientside.
I regard this a symptom of one or more of the following:
I'm a lazy idiot and should have seen how to do it in SQL
Set operations are not as expressive as procedural operations
SQL is not as expressive as it should be
Did I do the right thing? Did I have other options?
You definitely should be able to do this exercise without doing the work equivalent to a JOIN in application code, i.e. by fetching all rows from both orderlines and products and iterating through them. You don't have to be an SQL wizard to do that one. JOIN is to SQL what a loop is to a procedural language -- in that both are fundamental language features that you should know how to use.
One trap people fall into is thinking that the whole report has to be produced in a single SQL query. Not true! Most reports don't fit into a rectangle, as Tony Andrews points out. There are lots of rollups, summaries, special cases, etc. so it's both simpler and more efficient to fetch parts of the report in separate queries. Likewise, in a procedural language you wouldn't try do all your computation in a single line of code, or even in a single function (hopefully).
Some reporting tools insist that a report is generated from a single query, and you have no opportunity to merge in multiple queries. If so, then you need to produce multiple reports (and if the boss wants it on one page, then you need to do some paste-up manually).
To get a list of all products ordered (with product name), dates of last three purchases, and comment on latest order is straightforward:
SELECT o.*, l.*, p.*
FROM Orders o
JOIN OrderLines l USING (order_id)
JOIN Products p USING (product_id)
WHERE o.customer_id = ?
ORDER BY o.order_date;
It's fine to iterate over the result row-by-row to extract the dates and comments on the latest orders, since you're fetching those rows anyway. But make it easy on yourself by asking the database to return the results sorted by date.
Year of first purchase is available from the previous query, if you sort by the order_date and fetch the result row-by-row, you'll have access to the first order. Otherwise, you can do it this way:
SELECT YEAR(MIN(o.order_date)) FROM Orders o WHERE o.customer_id = ?;
Sum of product purchases for the last 12 months is best calculated by a separate query:
SELECT SUM(l.quantity * p.price)
FROM Orders o
JOIN OrderLines l USING (order_id)
JOIN Products p USING (product_id)
WHERE o.customer_id = ?
AND o.order_date > CURDATE() - INTERVAL 1 YEAR;
edit: You said in another comment that you'd like to see how to get the dates of the last three purchases in standard SQL:
SELECT o1.order_date
FROM Orders o1
LEFT OUTER JOIN Orders o2
ON (o1.customer_id = o2.customer_id AND (o1.order_date < o2.order_date
OR (o1.order_date = o2.order_date AND o1.order_id < o2.order_id)))
WHERE o1.customer_id = ?
GROUP BY o1.order_id
HAVING COUNT(*) <= 3;
If you can use a wee bit of vendor-specific SQL features, you can use Microsoft/Sybase TOP n, or MySQL/PostgreSQL LIMIT:
SELECT TOP 3 order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC;
SELECT order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC
LIMIT 3;
Set operations are not as expressive as procedural operations
Perhaps more like: "Set operations are not as familiar as procedural operations to a developer used to procedural languages" ;-)
Doing it iteratively as you have done now is fine for small sets of data, but simply doesn't scale the same way. The answer to whether you did the right thing depends on whether you are satisfied with the performance right now and/or don't expect the amount of data to increase much.
If you could provide some sample code, we might be able to help you find a set-based solution, which will be faster to begin with and scale far, far better. As GalacticCowboy mentioned, techniques such as temporary tables can help make the statements far more readable while largely retaining the performance benefits.
In most RDBMS you have the option of temporary tables or local table variables that you can use to break up a task like this into manageable chunks.
I don't see any way to easily do this as a single query (without some nasty subqueries), but it still should be doable without dropping out to procedural code, if you use temp tables.
This problem may not have been solvable by one query. I see several distinct parts...
For one customer
Get a list of all products ordered (with product name)
Get year of first purchase
Get dates of last three purchases
Get comment on latest order
Get sum of product purchases for the last 12 months
Your procedure is steps 1 - 5 and SQL gets you the data.
Sounds like a data warehouse project to me. If you need things like "three most recent things" and "sum of something over the last 12 months" then store them i.e. denormalize.
EDIT: This is a completely new take on the solution, using no temp tables or strange sub-sub-sub queries. However, it will ONLY work on SQL 2005 or newer, as it uses the "pivot" command that is new in that version.
The fundamental problem is the desired pivot from a set of rows (in the data) into columns in the output. While noodling on the issue, I recalled that SQL Server now has a "pivot" operator to deal with this.
This works on SQL 2005 only, using the Northwind sample data.
-- This could be a parameter to a stored procedure
-- I picked this one because he has products that he ordered 4 or more times
declare #customerId nchar(5)
set #customerId = 'ERNSH'
select c.CustomerID, p.ProductName, products_ordered_by_cust.FirstOrderYear,
latest_order_dates_pivot.LatestOrder1 as LatestOrderDate,
latest_order_dates_pivot.LatestOrder2 as SecondLatestOrderDate,
latest_order_dates_pivot.LatestOrder3 as ThirdLatestOrderDate,
'If I had a comment field it would go here' as LatestOrderComment,
isnull(last_year_revenue_sum.ItemGrandTotal, 0) as LastYearIncome
from
-- Find all products ordered by customer, along with first year product was ordered
(
select c.CustomerID, od.ProductID,
datepart(year, min(o.OrderDate)) as FirstOrderYear
from Customers c
join Orders o on o.CustomerID = c.CustomerID
join [Order Details] od on od.OrderID = o.OrderID
group by c.CustomerID, od.ProductID
) products_ordered_by_cust
-- Find the grand total for product purchased within last year - note fudged date below (Northwind)
join (
select o.CustomerID, od.ProductID,
sum(cast(round((od.UnitPrice * od.Quantity) - ((od.UnitPrice * od.Quantity) * od.Discount), 2) as money)) as ItemGrandTotal
from
Orders o
join [Order Details] od on od.OrderID = o.OrderID
-- The Northwind database only contains orders from 1998 and earlier, otherwise I would just use getdate()
where datediff(yy, o.OrderDate, dateadd(year, -10, getdate())) = 0
group by o.CustomerID, od.ProductID
) last_year_revenue_sum on last_year_revenue_sum.CustomerID = products_ordered_by_cust.CustomerID
and last_year_revenue_sum.ProductID = products_ordered_by_cust.ProductID
-- THIS is where the magic happens. I will walk through the individual pieces for you
join (
select CustomerID, ProductID,
max([1]) as LatestOrder1,
max([2]) as LatestOrder2,
max([3]) as LatestOrder3
from
(
-- For all orders matching the customer and product, assign them a row number based on the order date, descending
-- So, the most recent is row # 1, next is row # 2, etc.
select o.CustomerID, od.ProductID, o.OrderID, o.OrderDate,
row_number() over (partition by o.CustomerID, od.ProductID order by o.OrderDate desc) as RowNumber
from Orders o join [Order Details] od on o.OrderID = od.OrderID
) src
-- Now, produce a pivot table that contains the first three row #s from our result table,
-- pivoted into columns by customer and product
pivot
(
max(OrderDate)
for RowNumber in ([1], [2], [3])
) as pvt
group by CustomerID, ProductID
) latest_order_dates_pivot on products_ordered_by_cust.CustomerID = latest_order_dates_pivot.CustomerID
and products_ordered_by_cust.ProductID = latest_order_dates_pivot.ProductID
-- Finally, join back to our other tables to get more details
join Customers c on c.CustomerID = products_ordered_by_cust.CustomerID
join Orders o on o.CustomerID = products_ordered_by_cust.CustomerID and o.OrderDate = latest_order_dates_pivot.LatestOrder1
join [Order Details] od on od.OrderID = o.OrderID and od.ProductID = products_ordered_by_cust.ProductID
join Products p on p.ProductID = products_ordered_by_cust.ProductID
where c.CustomerID = #customerId
order by CustomerID, p.ProductID
SQL queries return results in the form of a single "flat" table of rows and columns. Reporting requirements are often more complex than this, demanding a "jagged" set of results like your example. There is nothing wrong with "going procedural" to solve such requirements, or using a reporting tool that sits on top of the database. However, you should use SQL as far as possible to get the best performance from the database.