Select top 10 products sold in each year - sql

I have two tables :
Sales
columns: (Sales_id, Date , Customer_id, Product_id, Purchase_amount):
Product
columns: ( Product_id, Product_Name, Brand_id,Brand_name)
I have to write a query to find the top 10 products sold every year. The query I have right now is :
WITH PH AS
(SELECT P.Product_Name, LEFT(S.Date,4) "SYEAR", COUNT(S.Product_id) "Product Count"
FROM Sales S LEFT JOIN Product P
ON S.Product_Id=P.Product_Id
GROUP BY P.Product_Name, LEFT(S.Date,4)
SELECT P.Product_Name, "SYEAR", "Product_Count"
FROM (SELECT P.Product_Name, "SYEAR", "Product_Count",
RANK OVER (PARTITION BY "SYEAR" ORDER BY "Product_Count" DESC) "TEMP"
)
WHERE "TEMP"<=10
This doesn't seem like the most optimized query. Can you please help me with that? Can there be an alternate version to obtain the required result?
Notes
The main reason for the repetition of the code is to enable grouping by the year. There's no field for the year in the given table.
The date format is: YYYYMMDD (example: 20200630)
Any help will be appreciated. Thanks in advance

You can combine the window functions with the aggregation:
SELECT PY.*
FROM (SELECT P.Product_Name, LEFT(S.Date,4) AS YEAR, COUNT(*) AS CNT,
RANK() OVER (PARTITION BY LEFT(S.Date, 4) ORDER BY COUNT(*) DESC) AS SEQNUM
FROM Sales S LEFT JOIN
Product P
ON S.Product_Id = P.Product_Id
GROUP BY P.Product_Name, LEFT(S.Date, 4)
) PY
WHERE SEQNUM <= 10;
From a performance perspective, this probably generates an execution plan very similar to your query. It is however simpler to follow.

Related

Trying to Find MAX value of a SUM query in SQL

I have 2 tables:
Product(ProductID, ProductName, ProductPrice, VendorID, CategoryID)
SoldVia(ProductID, TID, NoOfItems)
I need to display the productID for the product that has been sold in the highest quantity. I can easily come up with the list sorted in ascending order with this query:
SELECT distinct productid, sum(noofitems)
From soldvia
Group By productid
Order By sum(noofitems) DESC
By question is, how do I only show the top value of the list, using the MAX function? I can't use LIMIT or TOP for this assignment, but whenever I use MAX, I run into various issues with aggregates.
After I'm done with that, how do I show the product name for the best selling product?
Thank you!
Give this a try:
SELECT prd.ProductId
FROM Product prd
INNER JOIN SoldVia sld ON prd.ProductId = sld.ProductId
WHERE prd.NoOfItems = (SELECT MAX(NoOfItems) FROM SoldVia) -- Check for item that has max # items sold
This will return the items with the highest aggregate value of NoOfItems
Update
I didn't know you were on Teradata. That makes life much much easier :)
SELECT ProductName
FROM Product prd
INNER JOIN (
SELECT ProductId, SUM(NoOfItems) AS TotalItemsSold
FROM SoldVia
GROUP BY ProductId
QUALIFY RANK() OVER(ORDER BY TotalItemsSold DESC) = 1 -- Only return ProductId(s) with largest TotalItemsSold value (includes ties)
) agg ON prd.ProductId = agg.ProductId -- Get aggregate # items sold (if any)
This will only return rows if there are matching rows in both tables.
This is a little more simple, but I think this still should work for you
select productid, max(itemsum)
from
(SELECT productid, sum(noofitems) as itemsum
FROM soldvia
group by productid)
;
Based on #ravioli's answer, without a subselect.
From logic I would prefer the subselect (early reducing the number of rows), but the explain shows, that 1 more step is used with the subselect. I expect it to be different for larger number of rows.
select
S.ProductID
, P.ProductName
, sum(NoOfItems) as TotalItemsSold
from SoldVia as S
inner join Product as P
on S.ProductID = P.ProductID
group by S.ProductID, P.ProductName
QUALIFY RANK() OVER(ORDER BY TotalItemsSold DESC) = 1 -- Only return ProductId(s) with largest TotalItemsSold
;

SQL query for table with multiple keys?

I am sorry if this seems too easy but I was asked this question and I couldn't answer even after preparing SQL thoroughly :(. Can someone answer this?
There's a table - Seller id, product id, warehouse id, quantity of products at each warehouse for each product as per each seller.
We have to list the Product Ids with Seller Id who has highest number of products for that product and the total number of units he has for that product.
I think I got confused because there were 3 keys in the table.
It's not quite clear which DBMS you are using currently. The below should work if your DBMS support window functions.
You can find count of rows for each product and seller, rank each seller within each product using window function rank and then use filter to get only top ranked sellers in each product along with count of units.
select
product_id,
seller_id,
no_of_products
from (
select
product_id,
seller_id,
count(*) no_of_products,
rank() over (partition by product_id order by count(*) desc) rnk
from your_table
group by
product_id,
seller_id
) t where rnk = 1;
If window functions are not supported, you can use correlated query to achieve the same effect:
select
product_id,
seller_id,
count(*) no_of_products
from your_table a
group by
product_id,
seller_id
having count(*) = (
select max(cnt)
from (
select count(*) cnt
from your_table b
where b.product_id = a.product_id
group by seller_id
) t
);
Don't know why having id columns would mess you up... group by the right columns, sum up the totals and just return the first row:
select *
from (
select sellerid, productid, sum(quantity) as total_sold
from theres_a_table
group by sellerid, productid
) x
order by total_sold desc
fetch first 1 row only
If I do not think about optimization, straight forward answer is like this
select *
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqo
inner join
(
select product_id, max(seller_prod_qty) as max_prod_qty
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqi
group by product_id
) pmaxq
on spqo.product_id = pmaxq.product_id
and spqo.seller_prod_qty = pmaxq.max_prod_qty
both spqi (inner) and sqpo (outer) give you seller, product, sum of quantity across warehouses. pmaxq gives you max of each product again across warehouses, and then final inner join picks up sum of quantities if seller has highest (max) of the product (could be multiple sellers with the same quantity). I think this is the answer you are looking for. However, I'm sure query can be improved, since what I'm posting is the "conceptual" one :)

Nested query missing expression

SELECT customer_id, company_code
FROM customer, commercial_cust
WHERE commercial_cust.FK_customer_id = customer.customer_id
(
SELECT payment_method, payment_date
FROM payment, cust_order
WHERE payment_link.FK_order_id = cust_order.order_id
(
SELECT order_id, payment_date, SUM(payment_ammount) payment_ammount
FROM cust_order, payment_link, payment
WHERE cust_order.FK_customer_id = customer.customer_id AND payment_link.FK_payment_id = payment.payment_id
)
GROUP BY payment_ammount.DESC
)
WHERE ROWNUM <=(SELECT COUNT(*) FROM cust_order)/4;
For a database assignment I have been asked to display a list of 25% more lucrative commercial customers. I have written this out but I keep getting a missing element error and I'm not sure where I am meant to put the semicolon (if it is the semicolon). I've tried moving it around and removing parts of the script but it doesn't seem to be working.
This will be hugely appreciated if somebody can help. The rest of the code is correct in terms of names etc.
You should have shown what your tables contain, how they are related, and should have given sample data and expected result. So my answer may not match your requirement completely.
Let's simply look at how much a customer ordered: sum the order amount per customer, make sure the customer is a "commercial customer" and divide the results into 4 blocks keeping only the first (i.e. highest ranking) block.
select customer_id, sum_amount
from
(
select
fk_customer_id as customer_id,
sum(order_amount) as sum_amount,
ntile(4) over (order by sum(order_amount) desc) as block
from cust_order
where fk_customer_id in (select fk_customer_id from commercial_cust)
group by fk_customer_id
)
where block = 1
order by sum_amount desc;
If you want to use the payments instead, then do the same but join the payments to the orders and use that amount:
select customer_id, sum_amount
from
(
select
o.fk_customer_id as customer_id,
sum(p.payment_ammount) as sum_amount,
ntile(4) over (order by sum(p.payment_ammount) desc) as block
from cust_order co
join payment_link pl on pl.fk_order_id = o.order_id
join payment p on p.payment_id = pl.fk_payment_id
where o.fk_customer_id in (select fk_customer_id from commercial_cust)
group by o.fk_customer_id
)
where block = 1
order by sum_amount desc;
Try this query instead, you ahve a lot of semantic and syntax errors:
SELECT customer_id, company_code, payment, cust_order, order_id, payment_date, SUM(payment_ammount) payment_ammount
FROM customer, commercial_cust, cust_order, payment_link, payment
WHERE commercial_cust.FK_customer_id = customer.customer_id
AND payment_link.FK_order_id = cust_order.order_id
AND cust_order.FK_customer_id = customer.customer_id AND payment_link.FK_payment_id = payment.payment_id
GROUP BY payment_ammount.DESC
HAVING ROWNUM <=(SELECT COUNT(*) FROM cust_order)/4;

Summing the most recent rows, grouped by the id

SELECT distinct on (prices.item_id) *
FROM prices
ORDER BY prices.item_id, prices.updated_at DESC
The above query retrieves the most recent prices, how would I get the total sum of all the current prices?
Is it possible without using a subselect?
This is trivial using a subquery:
select sum(p.price)
from (select distinct on (p.item_id) p.*
from prices p
order by p.item_id, p.updated_at desc
) p
If you don't mind repeated rows, I think the following might work:
select distinct on (p.item_id) sum(prices.price) over ()
from prices p
order by p.item_id, p.updated_at desc
You might be able to add a limit clause to this to get what you want. By the way, I would write this as:
select sum(p.price)
from (select p.*,
row_number() over (partition by p.item_id order by updated_at desc) as seqnum
from prices p
order by p.item_id, p.updated_at desc
) p
where seqnum = 1
ROW_NUMBER() is standard SQL. The DISTINCT ON clause is specific to Postgres.

SQL question about GROUP BY

I've been using SQL for a few years, and this type of problem comes up here and there, and I haven't found an answer. But perhaps I've been looking in the wrong places - I'm not really sure what to call it.
For the sake of brevity, let's say I have a table with 3 columns: Customer, Order_Amount, Order_Date. Each customer may have multiple orders, with one row for each order with the amount and date.
My Question: Is there a simple way in SQL to get the DATE of the maximum order per customer?
I can get the amount of the maximum order for each customer (and which customer made it) by doing something like:
SELECT Customer, MAX(Order_Amount) FROM orders GROUP BY Customer;
But I also want to get the date of the max order, which I haven't figured out a way to easily get. I would have thought that this would be a common type of question for a database, and would therefore be easy to do in SQL, but I haven't found an easy way to do it yet. Once I add Order_Date to the list of columns to select, I need to add it to the Group By clause, which I don't think will give me what I want.
Apart from self-join you can do:
SELECT o1.*
FROM orders o1 JOIN orders o2 ON o1.Customer = o2.Customer
GROUP BY o1.Customer, o1.Order_Amount
HAVING o1.Order_Amount = MAX(o2.Order_Amount);
There's a good article reviewing various approaches.
And in Oracle, db2, Sybase, SQL Server 2005+ you would use RANK() OVER.
SELECT * FROM (
SELECT *
RANK() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) r
FROM orders) o
WHERE r = 1;
Note: If Customer has more than one order with maximum Order_Amount (i.e. ties), using RANK() function would get you all such orders; to get only first one, replace RANK() with ROW_NUMBER().
There's no short-cut... the easiest way is probably to join to a sub-query:
SELECT
*
FROM
orders JOIN
(
SELECT Customer, MAX(Order_Amount) AS Max_Order_Amount
FROM orders
GROUP BY Customer
) maxOrder
ON maxOrder.Customer = orders.Customer
AND maxOrder.Max_Order_Amount = orders.Order_Amount
you will want to join on the same table...
SELECT Customer, order_date, amt
FROM orders o,
( SELECT Customer, MAX(Order_Amount) amt FROM orders GROUP BY Customer ) o2
WHERE o.customer = o2.customer
AND o.order_amount = o2.amt
;
Another approach for the collection:
WITH tempquery AS
(
SELECT
Customer
,Order_Amount
,Order_Date
,row_number() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) AS rn
FROM
orders
)
SELECT
Customer
,Order_Amount
,Order_Date
FROM
tempquery
WHERE
rn = 1
If your DB Supports CROSS APPLY you can do this as well, but it doesn't handle ties correctly
SELECT [....]
FROM Customer c
CROSS APPLY
(SELECT TOP 1 [...]
FROM Orders o
WHERE c.customerID = o.CustomerID
ORDER BY o.Order_Amount DESC) o
See this data.SE query
You could try something like this:
SELECT Customer, MAX(Order_Amount), Order_Date
FROM orders O
WHERE ORDER_AMOUNT = (SELECT MAX(ORDER_AMOUNT) FROM orders WHERE CUSTOMER = O.CUSTOMER)
GROUP BY CUSTOMER, Order_Date
with t as
(
select CUSTOMER,Order_Date ,Order_Amount,max(Order_Amount) over (partition
by Customer) as
max_amount from orders
)
select * from t where t.Order_Amount=max_amount