Trying to Find MAX value of a SUM query in SQL - sql

I have 2 tables:
Product(ProductID, ProductName, ProductPrice, VendorID, CategoryID)
SoldVia(ProductID, TID, NoOfItems)
I need to display the productID for the product that has been sold in the highest quantity. I can easily come up with the list sorted in ascending order with this query:
SELECT distinct productid, sum(noofitems)
From soldvia
Group By productid
Order By sum(noofitems) DESC
By question is, how do I only show the top value of the list, using the MAX function? I can't use LIMIT or TOP for this assignment, but whenever I use MAX, I run into various issues with aggregates.
After I'm done with that, how do I show the product name for the best selling product?
Thank you!

Give this a try:
SELECT prd.ProductId
FROM Product prd
INNER JOIN SoldVia sld ON prd.ProductId = sld.ProductId
WHERE prd.NoOfItems = (SELECT MAX(NoOfItems) FROM SoldVia) -- Check for item that has max # items sold
This will return the items with the highest aggregate value of NoOfItems
Update
I didn't know you were on Teradata. That makes life much much easier :)
SELECT ProductName
FROM Product prd
INNER JOIN (
SELECT ProductId, SUM(NoOfItems) AS TotalItemsSold
FROM SoldVia
GROUP BY ProductId
QUALIFY RANK() OVER(ORDER BY TotalItemsSold DESC) = 1 -- Only return ProductId(s) with largest TotalItemsSold value (includes ties)
) agg ON prd.ProductId = agg.ProductId -- Get aggregate # items sold (if any)
This will only return rows if there are matching rows in both tables.

This is a little more simple, but I think this still should work for you
select productid, max(itemsum)
from
(SELECT productid, sum(noofitems) as itemsum
FROM soldvia
group by productid)
;

Based on #ravioli's answer, without a subselect.
From logic I would prefer the subselect (early reducing the number of rows), but the explain shows, that 1 more step is used with the subselect. I expect it to be different for larger number of rows.
select
S.ProductID
, P.ProductName
, sum(NoOfItems) as TotalItemsSold
from SoldVia as S
inner join Product as P
on S.ProductID = P.ProductID
group by S.ProductID, P.ProductName
QUALIFY RANK() OVER(ORDER BY TotalItemsSold DESC) = 1 -- Only return ProductId(s) with largest TotalItemsSold
;

Related

SQL Query number of units in each category

Need help with an SQL Server query to get below results.
An SQL query to report how many units in each category have been ordered on each day of the week
This is current syntax
SELECT TOP 3 ProductID , ProductQty
FROM OrderDetails
ORDER BY ProductQty DESC;
Here is the image from the database
Any help is much appreciated.
Thank you
We can get your required result by dividing logic into two parts.
Firstly, get the product total count from order details
Secondly, we can assing category-wise rank to products with highest to lowest and pull the first 3 products from each category.
;WITH CTE_Data AS (
SELECT PD.PrdCategory AS PrdCategory, OD.ProductID AS ProductID, SUM(ProductQty) AS ProductQty
FROM OrderDetails OD (NOLOCK)
INNER JOIN ProductDetails PD (NOLOCK) ON OD.ProductID = PD.PrdId
GROUP BY PD.PrdCategory, ProductID
)
, CTE_Data2 AS(
SELECT PrdCategory, ProductID, ProductQty, ROW_NUMBER() OVER(PARTITION BY PrdCategory, ProductID ORDER BY ProductQty DESC) AS RowNo
FROM CTE_Data
)
SELECT ProductID, ProductQty
FROM CTE_Data2
WHERE RowNo IN (1,2,3)

Select top 10 products sold in each year

I have two tables :
Sales
columns: (Sales_id, Date , Customer_id, Product_id, Purchase_amount):
Product
columns: ( Product_id, Product_Name, Brand_id,Brand_name)
I have to write a query to find the top 10 products sold every year. The query I have right now is :
WITH PH AS
(SELECT P.Product_Name, LEFT(S.Date,4) "SYEAR", COUNT(S.Product_id) "Product Count"
FROM Sales S LEFT JOIN Product P
ON S.Product_Id=P.Product_Id
GROUP BY P.Product_Name, LEFT(S.Date,4)
SELECT P.Product_Name, "SYEAR", "Product_Count"
FROM (SELECT P.Product_Name, "SYEAR", "Product_Count",
RANK OVER (PARTITION BY "SYEAR" ORDER BY "Product_Count" DESC) "TEMP"
)
WHERE "TEMP"<=10
This doesn't seem like the most optimized query. Can you please help me with that? Can there be an alternate version to obtain the required result?
Notes
The main reason for the repetition of the code is to enable grouping by the year. There's no field for the year in the given table.
The date format is: YYYYMMDD (example: 20200630)
Any help will be appreciated. Thanks in advance
You can combine the window functions with the aggregation:
SELECT PY.*
FROM (SELECT P.Product_Name, LEFT(S.Date,4) AS YEAR, COUNT(*) AS CNT,
RANK() OVER (PARTITION BY LEFT(S.Date, 4) ORDER BY COUNT(*) DESC) AS SEQNUM
FROM Sales S LEFT JOIN
Product P
ON S.Product_Id = P.Product_Id
GROUP BY P.Product_Name, LEFT(S.Date, 4)
) PY
WHERE SEQNUM <= 10;
From a performance perspective, this probably generates an execution plan very similar to your query. It is however simpler to follow.

Fixing Nested aggregated function

I am trying to display the productid for the product that has been sold the most (i.e, that has been sold in the highest quantity)
I have tried multiple different versions of code but every time it says cannot nest aggregated operations
SELECT productid
FROM soldvia
GROUP BY productid
WHERE productid IN (SELECT MAX(SUM(noofitems)) FROM soldvia GROUP BY productid);
I expect the output to be
PRODUCTID
3x3
4x4
You can't nest aggregations.
Use ORDER BY with TOP :
SELECT TOP 1 productid
FROM soldvia
GROUP BY productid
ORDER BY SUM(noofitems) DESC
Please try below query for your exact answer.
select productid, sum(noofitems) as max_sold,
convert(varchar,productid) +' x '+ convert(varchar,sum(noofitems)) as
output_sold from soldvia group by productid order by sum(noofitems) desc
Output will be
ProductId NoOfItemSold Output_Sold
1 7 1x7
2 4 2x4
3 1 3x1
In Teradata, you can use the qualify clause:
SELECT productid
FROM soldvia
GROUP BY productid
QUALIFY ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) = 1;
This is handy. You can get duplicates by changing ROW_NUMBER() to RANK(). Actually, RANK() is more consistent with the code in your question.
The answer by #forpas is probably the way to go but this one is a little closer to yours:
SELECT productid
FROM soldvia
GROUP BY productid
HAVING SUM(noofitems) = (
SELECT MAX(items)
FROM (
SELECT SUM(noofitems) AS items
FROM soldvia
GROUP BY productid
) x
)

SQL query for table with multiple keys?

I am sorry if this seems too easy but I was asked this question and I couldn't answer even after preparing SQL thoroughly :(. Can someone answer this?
There's a table - Seller id, product id, warehouse id, quantity of products at each warehouse for each product as per each seller.
We have to list the Product Ids with Seller Id who has highest number of products for that product and the total number of units he has for that product.
I think I got confused because there were 3 keys in the table.
It's not quite clear which DBMS you are using currently. The below should work if your DBMS support window functions.
You can find count of rows for each product and seller, rank each seller within each product using window function rank and then use filter to get only top ranked sellers in each product along with count of units.
select
product_id,
seller_id,
no_of_products
from (
select
product_id,
seller_id,
count(*) no_of_products,
rank() over (partition by product_id order by count(*) desc) rnk
from your_table
group by
product_id,
seller_id
) t where rnk = 1;
If window functions are not supported, you can use correlated query to achieve the same effect:
select
product_id,
seller_id,
count(*) no_of_products
from your_table a
group by
product_id,
seller_id
having count(*) = (
select max(cnt)
from (
select count(*) cnt
from your_table b
where b.product_id = a.product_id
group by seller_id
) t
);
Don't know why having id columns would mess you up... group by the right columns, sum up the totals and just return the first row:
select *
from (
select sellerid, productid, sum(quantity) as total_sold
from theres_a_table
group by sellerid, productid
) x
order by total_sold desc
fetch first 1 row only
If I do not think about optimization, straight forward answer is like this
select *
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqo
inner join
(
select product_id, max(seller_prod_qty) as max_prod_qty
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqi
group by product_id
) pmaxq
on spqo.product_id = pmaxq.product_id
and spqo.seller_prod_qty = pmaxq.max_prod_qty
both spqi (inner) and sqpo (outer) give you seller, product, sum of quantity across warehouses. pmaxq gives you max of each product again across warehouses, and then final inner join picks up sum of quantities if seller has highest (max) of the product (could be multiple sellers with the same quantity). I think this is the answer you are looking for. However, I'm sure query can be improved, since what I'm posting is the "conceptual" one :)

SQL question about GROUP BY

I've been using SQL for a few years, and this type of problem comes up here and there, and I haven't found an answer. But perhaps I've been looking in the wrong places - I'm not really sure what to call it.
For the sake of brevity, let's say I have a table with 3 columns: Customer, Order_Amount, Order_Date. Each customer may have multiple orders, with one row for each order with the amount and date.
My Question: Is there a simple way in SQL to get the DATE of the maximum order per customer?
I can get the amount of the maximum order for each customer (and which customer made it) by doing something like:
SELECT Customer, MAX(Order_Amount) FROM orders GROUP BY Customer;
But I also want to get the date of the max order, which I haven't figured out a way to easily get. I would have thought that this would be a common type of question for a database, and would therefore be easy to do in SQL, but I haven't found an easy way to do it yet. Once I add Order_Date to the list of columns to select, I need to add it to the Group By clause, which I don't think will give me what I want.
Apart from self-join you can do:
SELECT o1.*
FROM orders o1 JOIN orders o2 ON o1.Customer = o2.Customer
GROUP BY o1.Customer, o1.Order_Amount
HAVING o1.Order_Amount = MAX(o2.Order_Amount);
There's a good article reviewing various approaches.
And in Oracle, db2, Sybase, SQL Server 2005+ you would use RANK() OVER.
SELECT * FROM (
SELECT *
RANK() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) r
FROM orders) o
WHERE r = 1;
Note: If Customer has more than one order with maximum Order_Amount (i.e. ties), using RANK() function would get you all such orders; to get only first one, replace RANK() with ROW_NUMBER().
There's no short-cut... the easiest way is probably to join to a sub-query:
SELECT
*
FROM
orders JOIN
(
SELECT Customer, MAX(Order_Amount) AS Max_Order_Amount
FROM orders
GROUP BY Customer
) maxOrder
ON maxOrder.Customer = orders.Customer
AND maxOrder.Max_Order_Amount = orders.Order_Amount
you will want to join on the same table...
SELECT Customer, order_date, amt
FROM orders o,
( SELECT Customer, MAX(Order_Amount) amt FROM orders GROUP BY Customer ) o2
WHERE o.customer = o2.customer
AND o.order_amount = o2.amt
;
Another approach for the collection:
WITH tempquery AS
(
SELECT
Customer
,Order_Amount
,Order_Date
,row_number() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) AS rn
FROM
orders
)
SELECT
Customer
,Order_Amount
,Order_Date
FROM
tempquery
WHERE
rn = 1
If your DB Supports CROSS APPLY you can do this as well, but it doesn't handle ties correctly
SELECT [....]
FROM Customer c
CROSS APPLY
(SELECT TOP 1 [...]
FROM Orders o
WHERE c.customerID = o.CustomerID
ORDER BY o.Order_Amount DESC) o
See this data.SE query
You could try something like this:
SELECT Customer, MAX(Order_Amount), Order_Date
FROM orders O
WHERE ORDER_AMOUNT = (SELECT MAX(ORDER_AMOUNT) FROM orders WHERE CUSTOMER = O.CUSTOMER)
GROUP BY CUSTOMER, Order_Date
with t as
(
select CUSTOMER,Order_Date ,Order_Amount,max(Order_Amount) over (partition
by Customer) as
max_amount from orders
)
select * from t where t.Order_Amount=max_amount