Top selling product by category - SQL - sql

I am looking to write a query to pull the top selling product per category from a schema. The schema on a simplified view looks like this:
Category
Orderid
Revenue
Food
12as
234
Sport
421bb
3434
Steel
35366cd
12355
Food
3421ww
362
Sport
546421qw
436456
etc etc.
I am using amazon redshift. I want to find the distinct category, its top selling order ID and the sum of the revenue.
Select distinct category, orderid, sum(revenue) as rev from XXX
I've got the start of the query but not sure where to go from here.

You can try to use ROW_NUMBER window function, using PARTITION BY with your grouping column then get the top-selling values.
SELECT Category,Orderid,Revenue
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Category ORDER BY Revenue desc) rn
FROM XXX
) t1
WHERE rn = 1

See if the following works for you - using Window Function to find the OrderId with the highest revenue, then aggregating:
select Category, Sum(Revenue) TotalRevenue, Max(TopSelling) TopSelling
from (
select *,
First_Value(OrderId) over(partition by category order by Revenue desc) TopSelling
from t
)t
group by Category;

If I understand correctly what your result should look like then you can use Postgres's DISTINCT ON expression. This is not to be confused with DISTINCT. It "groups" by the specified columns and returns the first row. You can control which first row it returns by using ORDER BY.
create temporary table orders (
category varchar,
order_id varchar,
revenue int
);
insert into orders (category, order_id, revenue) values
('Food', '12as', '234'),
('Sport', '421bb', '3434'),
('Steel', '35366cd', '12355'),
('Food', '3421ww', '362'),
('Sport', '546421qw', '436456');
select distinct on (category)
category,
order_id,
revenue
from orders
order by category, revenue DESC;
/*
category | order_id | revenue
----------+----------+---------
Food | 3421ww | 362
Sport | 546421qw | 436456
Steel | 35366cd | 12355
*/

Related

Just another SQL case (GROUP BY)

I'm stuck on an SQL problem that I don't know how to solve.
Let's say I have a table like this (concerning estimations on house prices):
estimationID | estimationDate | userID | cityID
1 | '2020-01-01' | 123456 | 987654
2 | '2020-12-01' | 135790 | 975310
...
With estimationDate being the date when the estimation was made, userID the ID of the user who made the estimation and cityID the ID of the city where the estimation was made.
I need to get the maximum number of estimations made by one user (I don't care which one, I don't need an ID) for each city.
Something like
SELECT cityID,*maximum number of estimations made by one user from this city* FROM estimationsTable GROUP BY cityID
Any idea?
Step by step:
Get the number of estimations per user and city.
Get the maximum of these numbers per city.
The query:
select cityid, max(cnt)
from
(
select cityid, userid, count(*) as cnt
from estimationstable
group by cityid, userid
) counted
group by cityid
order by cityid;
try like below
with cte as (
select userid,cityid,count(*) as cnt
from table_name group by userid,cityid
)
, cte2 as (
select *,
row_number() over(partition by cityid order by cnt desc) rn
from cte
) select * from cte2 where rn=1
sol 1:
SELECT id, MAX(maximum_number_of_estimations)
FROM (SELECT id,COUNT(*) AS maximum_number_of_estimations
FROM TABLE x)group by id as final_query
sol2:
use order by Count DESC with group by`
something like this should work
the idea is you count all the occurrences in the inner query with the group by on your id and another query to get the max of it OR you use ORDER BY [Field] DESC
with GROUP BY which will automatically put the highest ones on the top
In BigQuery, I think you can do this without a subquery:
select distinct cityid,
(array_agg(userid order by count(*) desc, userid))[ordinal(1)] as userid,
max(count(*)) over (order by count(*) desc) as cnt
from estimationstable
group by cityid, userid

Getting the lastest entry grouped by ID

I have a table with stock for products. The problem is that every time there is a stock change, the new value is stored, together with the new Quantity. Example:
ProductID | Quantity | LastUpdate
1 123 2019.01.01
2 234 2019.01.01
1 444 2019.01.02
2 222 2019.01.02
I therefore need to get the latest stock update for every Product and return this:
ProductID | Quantity
1 444
2 222
The following SQL works, but is slow.
SELECT ProductID, Quantity
FROM (
SELECT ProductID, Quantity
FROM Stock
WHERE LastUpdate
IN (SELECT MAX(LastUpdate) FROM Stock GROUP BY ProductID)
)
Since the query is slow and supposed to be left joined into another query, I really would like some input on how to do this better.
Is there another way?
Use analytic functions. row_number can be used in this case.
SELECT ProductID, Quantity
FROM (SELECT ProductID, Quantity, row_number() over(partition by ProductID order by LstUpdte desc) as rnum
FROM Stock
) s
WHERE RNUM = 1
Or with first_value.
SELECT DISTINCT ProductID, FIRST_VALUE(Quantity) OVER(partition by ProductID order by LstUpdte desc) as quantuity
FROM Stock
Just another option is using WITH TIES in concert with Row_Number()
Full Disclosure: Vamsi's answer will be a nudge more performant.
Example
Select Top 1 with ties *
From YourTable
Order by Row_Number() over (Partition By ProductID Order by LastUpdate Desc)
Returns
ProductID Quantity LastUpdate
1 444 2019-01-02
2 222 2019-01-02
So you Could use a CTE(Common Table Expression)
Base Data:
SELECT 1 AS ProductID
,123 AS Quantity
,'2019-01-01' as LastUpdate
INTO #table
UNION
SELECT 2 AS ProductID
,234 AS Quantity
,'2019-01-01' as LastUpdate
UNION
SELECT 1 AS ProductID
,444 AS Quantity
,'2019-01-02' as LastUpdate
UNION
SELECT 2 AS ProductID
,222 AS Quantity
,'2019-01-02' as LastUpdate
Here is the code using a Common Table Expression.
WITH CTE (ProductID, Quantity, LastUpdate, Rnk)
AS
(
SELECT ProductID
,Quantity
,LastUpdate
,ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY LastUpdate DESC) AS Rnk
FROM #table
)
SELECT ProductID, Quantity, LastUpdate
FROM CTE
WHERE rnk = 1
Returns
You could then Join the CTE to whatever table you need.
row_number() function might be the most efficient, but the big slow down in your query is the use of the IN statement when used on a subquery, it's a little bit of a tricky one but a join is faster. This query should get what you want and be much faster.
SELECT
a.ProductID
,a.Quantity
FROM stock as a
INNER JOIN (
SELECT
ProductID
,MAX(LastUpdate) as LastUpdate
FROM stock
GROUP BY ProductID
) b
ON a.ProductID = b.ProductId AND
a.LastUpdate = b.LastUpdate

Calculating the product of two columns from each row and generate sum result total using Microsoft SQL

I'm looking for something nice and sweet that will enable me to take the product of two columns per row and then take each of these products and sum them together for a final total.
The solution must be compatible with MS SQL 2008/2012.
Example of table.
|---------------------|------------------|
| Qty | Price |
|---------------------|------------------|
| 12 | 34.25 |
|---------------------|------------------|
| 44 | 11.05 |
|---------------------|------------------|
The result of this table should be calculated like this:
Row 1:
12 * 34.25 = 411
Row 2:
44 * 11.05 = 486.20
FINAL CALCULATED RESULT:
897.20
Thank you,
First multiply each quantity with their price.
SELECT
T.*,
TotalByRow = Quantity * Price
FROM
YourTable AS T
Then do a sum of all the results.
SELECT
Total = SUM(Quantity * Price)
FROM
YourTable AS T
If you need the result by product (or any grouped column) you can add a GROUP BY clause (in the previous example, when using an aggregate function such as SUM() without a GROUP BY then the whole table is considered a group).
SELECT
T.ProductID,
Total = SUM(Quantity * Price)
FROM
YourTable AS T
GROUP BY
T.ProductID
Use GROUPING SETS to get the grand total.
Query
select [Qty], [Price], sum([Qty] * [Price]) as [Total]
from [your_table_name]
group by grouping sets(([Qty], [Price]), ());
i always calculate the entire total in my client, but you can do it in sql too like this :
declare #table table (qty int, price decimal(16,2))
insert into #table(qty, price) values (12, 34.25), (44, 11.05)
select qty,
price,
sum(qty * price) as total
from #table
group by grouping sets((qty, price), ())
The result is this
qty price total
--- ----- -----
12 34,25 411
44 11,05 486,2
897,2
More info on grouping sets here https://technet.microsoft.com/en-us/library/bb522495(v=sql.105).aspx
Try this.
SELECT TOP(1) TOTAL_AMOUNT
FROM (
SELECT QTY
, PRICE
, QTY*PRICE 'AMOUNT'
, SUM(QTY*PRICE) OVER( ORDER BY QTY ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING ) 'TOTAL_AMOUNT'
FROM PRODUCT ) AS GET_SUM
ORDER BY GET_SUM.TOTAL_AMOUNT DESC
--SECOND SOLUTION:
select SUM(qty*price) 'Amount'from product

SQL query for table with multiple keys?

I am sorry if this seems too easy but I was asked this question and I couldn't answer even after preparing SQL thoroughly :(. Can someone answer this?
There's a table - Seller id, product id, warehouse id, quantity of products at each warehouse for each product as per each seller.
We have to list the Product Ids with Seller Id who has highest number of products for that product and the total number of units he has for that product.
I think I got confused because there were 3 keys in the table.
It's not quite clear which DBMS you are using currently. The below should work if your DBMS support window functions.
You can find count of rows for each product and seller, rank each seller within each product using window function rank and then use filter to get only top ranked sellers in each product along with count of units.
select
product_id,
seller_id,
no_of_products
from (
select
product_id,
seller_id,
count(*) no_of_products,
rank() over (partition by product_id order by count(*) desc) rnk
from your_table
group by
product_id,
seller_id
) t where rnk = 1;
If window functions are not supported, you can use correlated query to achieve the same effect:
select
product_id,
seller_id,
count(*) no_of_products
from your_table a
group by
product_id,
seller_id
having count(*) = (
select max(cnt)
from (
select count(*) cnt
from your_table b
where b.product_id = a.product_id
group by seller_id
) t
);
Don't know why having id columns would mess you up... group by the right columns, sum up the totals and just return the first row:
select *
from (
select sellerid, productid, sum(quantity) as total_sold
from theres_a_table
group by sellerid, productid
) x
order by total_sold desc
fetch first 1 row only
If I do not think about optimization, straight forward answer is like this
select *
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqo
inner join
(
select product_id, max(seller_prod_qty) as max_prod_qty
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqi
group by product_id
) pmaxq
on spqo.product_id = pmaxq.product_id
and spqo.seller_prod_qty = pmaxq.max_prod_qty
both spqi (inner) and sqpo (outer) give you seller, product, sum of quantity across warehouses. pmaxq gives you max of each product again across warehouses, and then final inner join picks up sum of quantities if seller has highest (max) of the product (could be multiple sellers with the same quantity). I think this is the answer you are looking for. However, I'm sure query can be improved, since what I'm posting is the "conceptual" one :)

T-Sql find duplicate row values

I want to write a stored procedure.
In that stored procedure, I want to find duplicate row values from a table, and calculate sum operation on these rows to the same table.
Let's say, I have a CustomerSales table;
ID SalesRepresentative Customer Quantity
1 Michael CustA 55
2 Michael CustA 10
and I need to turn table to...
ID SalesRepresentative Customer Quantity
1 Michael CustA 65
2 Michael CustA 0
When I find SalesRepresentative and Customer duplicates at the same time, I want to sum all Quantity values of these rows and assign to the first row of a table, and others will be '0'.
Could you help me.
To aggregate duplicates into one row:
SELECT min(ID) AS ID, SalesRepresentative, Customer
,sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
ORDER BY min(ID)
Or, if you actually want those extra rows with 0 as Quantity in the result:
SELECT ID, SalesRepresentative, Customer
,CASE
WHEN (count(*) OVER (PARTITION BY SalesRepresentative,Customer)) = 1
THEN Quantity
WHEN (row_number() OVER (PARTITION BY SalesRepresentative,Customer
ORDER BY ID)) = 1
THEN sum(Quantity) OVER (PARTITION BY SalesRepresentative,Customer)
ELSE 0
END AS Quantity
FROM CustomerSales
ORDER BY ID
This makes heavy use of window functions.
Alternative version without window functions:
SELECT min(ID) AS ID, SalesRepresentative, Customer, sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
UNION ALL
SELECT ID, SalesRepresentative, Customer, 0 AS Quantity
FROM CustomerSales c
GROUP BY SalesRepresentative, Customer
LEFT JOIN (
SELECT min(ID) AS ID
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
) x ON (x.ID = c.ID)
WHERE x.ID IS NULL
ORDER BY ID