how to write a query to get product ids contributing to top 80% sales? - sql

suppose i have a product is and sales column
product id sales
1 1000
2 10000
3 50000
4 12000
5 8000
write an sql query to get all product ids that contribute to top 80 % of sales?

For this, you want a cumulative sum. Presumably, you want the top selling such products, so:
select p.*
from (select p.*,
sum(sales) over (order by sales desc) as running_sales,
sum(sales) over () as total_sales,
from products
) p
where running_sales - sales < 0.8 * total_sales;
This returns the product that reaches or first exceeds 80% of the total sales.

Related

SQL How to take the minium for multiple fields?

Consider the following data set that records the product sold, year, and revenue from that particular product in thousands of dollars. This data table (YEARLY_PRODUCT_REVENUE) is stored in SQL and has many more rows.
Year | Product | Revenue
2000 Table 100
2000 Chair 200
2000 Bed 150
2010 Table 120
2010 Chair 190
2010 Bed 390
Using SQL, for every year I would like to find the product that has the maximum revenue.
That is, I would like my output to be the following:
Year | Product | Revenue
2000 Chair 200
2010 Bed 390
My attempt so far has been this:
SELECT year, product, MIN(revenue)
FROM YEARLY_PRODUCT_REVENUE
GROUP BY article, month;
But when I do this, I get multiple-year values for distinct products. For instance, I'm getting the output below which is an error. I'm not entirely sure what the error here is. Any help would be much appreciated!
Year | Product | Revenue
2000 Table 100
2000 Bed 150
2010 Table 120
2010 Chair 190
You don't mention the database so I'll assume it's PostgreSQL. You can do:
select distinct on (year) * from t order by year, revenue desc
You want filtering rather than aggregation. We can use window functions (which most databases support) to rank yearly product sales, and then retain only the top selling product per year.
select *
from (
select r.*, rank() over(partition by year order by revenue desc) rn
from yearly_product_revenue r
) r
where rn = 1;
Here is a shorter solution if your database support the standard WITH TIES clause:
select *
from yearly_product_revenue r
order by rank() over(partition by year order by revenue desc)
fetch first row with ties

Calculating percentages per product using aggregate functions and group by

I have a table with 5 rows representing two different products. The columns include the product, sales, discount. I'm trying to calculate the percentage of sales per product that included a discount. The table looks like this:
product
sales
discount
1
10
0
1
10
5
2
20
10
2
20
0
2
20
10
My results should look like the below (which I know because I've calculated this in Excel):
product
perc_discount
1
50.00
2
66.67
For each of the two products we are calculating the count of sales with discount divided by the total count of sales, so for product 1 it would be (1/2)*100 = 50.
My SQL code looks like the below:
SELECT
product,
(SELECT COUNT(*) FROM sales WHERE discount >0)/COUNT(*)*100 AS perc_discount
FROM sales
GROUP BY product
However, the result I'm getting is:
product
perc_discount
1
150.0
2
100.0
It seems to be calculating the total count of discounted sales in the table and diving it by the count of each product and I can't seem to figure out how to change it. Any ideas on how I can improve this?
Thanks.
How about conditional sum?
SQL> select product,
2 round(sum(case when discount > 0 then 1 else 0 end) / count(*) * 100, 2) perc_discount
3 from sales
4 group by product;
PRODUCT PERC_DISCOUNT
---------- -------------
1 50
2 66,67
SQL>
So: add 1 for every discount row per product. Divide that sum with total number of rows per product (that's count). Round the result to 2 decimals (so that it looks prettier).
You can use conditional aggregation. For example:
select
product,
100.0 * count(case when discount <> 0 then 'x' end) / count(*) as perc_discount
from sales
group by product

Select the best selling product ID

What if I have table like this and I want to select the best selling product_id.
id
transaction_id
product_id
qty_sold
1
21
2
5
2
22
3
2
3
23
4
2
3
24
2
1
3
25
2
4
I want the best selling product_id with the highest qty_sold
Using SQLS, you can group by the productID, add up the number of sold, and order by the total descending. If we also take the minimum transaction ID per product, if two products come out to have the same total qty, we can take the minimum tran ID to split the tie
SELECT TOP 1 product_id, SUM(qty_sold) as sellcount, MIN(transaction_id) as firsttran
FROM t
GROUP BY product_id
ORDER BY SUM(qty_sold) DESC, MIN(transaction_id)
Once you're happy the sums are right etc, you can remove the , SUM(qty_sold) as sellcount, MIN(transaction_id) from the SELECT if you want/if you only need the prod ID

SQL - select top xx% rows

I have a table, sales, which is ordered by descending TotalSales
user_id | TotalSales
----------------------
4 10
2 1.5
5 0.99
3 0.5
1 0.33
What I would like to do is find the percentage of the sum of all sales that the xx% most important sales represent.
For example if I wanted to do it for top 40% sales, here I would get (10+1.5)/(10+1.5+0.99+0.5+0.33)= 86%
But right now I haven't been able to select "top xx% rows".
Edit: DB management system can be MySQL or Vertica or Hive
select Sum(a) as s from sales where a in (Select TotalSales from sales where TotalSales>=x)
GROUP BY a
select Sum(TotalSales) as b from sales group by b
your result is s/b
and x= the percentage you set each time

Creating averages and detecting increases higher than x% in SQL

I want to create the following in SQL Server 2012: (I've found that the best way to explain it is with tables).
I have the date of purchase, the customer id and the price the customer paid in a table like this:
DateOnly Customer Price
2012/01/01 1 50
2012/01/01 2 60
2012/01/01 3 80
2012/01/02 4 40
2012/01/02 5 30
2012/01/02 1 55
2012/01/03 6 80
2012/01/04 2 90
What I need to do then is to keep a register of the average price paid by a customer. Which would be as follows:
DateOnly Customer Price AveragePrice
2012/01/01 1 50 50
2012/01/01 2 60 60
2012/01/01 3 80 80
2012/01/02 4 40 40
2012/01/02 5 30 30
2012/01/02 1 55 52.5
2012/01/03 6 80 80
2012/01/04 2 90 75
And finally, I need to select the rows which have caused an increase higher than 10% in the averageprice paid by a customer.
In this case, the second order of customer 2 should be the only one to be selected, as it introduced an increase higher than 10% in the average price paid by this customer.
Hence, the resulting table should be as follows:
DateOnly Customer Price AveragePrice
2012/01/04 2 90 75
Thanks in advance for your help.
First CTE is to prepare your data = assign row_numbers to each customer's purchase, to be used in joins further.
Second CTE is recursive and it does all the work in process. First part is to get each customer's first purchase and recursive part joins on next purchase and calculates TotalPrice, AveragePrice and Increase.
At the end just select the rows with Increase more than 10%.
WITH CTE_Prep AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY DateOnly) RN
FROM Table1
)
,CTE_Calc AS
(
SELECT *, Price AS TotalPrice, CAST(Price AS DECIMAL(18,2)) AS AveragePrice, CAST (0 AS DECIMAL(18,2)) AS Increase
FROM CTE_Prep WHERE RN = 1
UNION ALL
SELECT p.*
, c.TotalPrice + p.Price AS TotalPrice
, CAST(CAST(c.TotalPrice + p.Price AS DECIMAL(18,2)) / p.RN AS DECIMAL(18,2)) AS AveragePrice
, CAST(CAST(CAST(c.TotalPrice + p.Price AS DECIMAL(18,2)) / p.RN AS DECIMAL(18,2)) / c.AveragePrice AS DECIMAL(18,2)) AS Increase
FROM CTE_Calc c
INNER JOIN CTE_Prep p ON c.RN + 1 = p.RN AND p.Customer = c.Customer
)
SELECT * FROM CTE_Calc
WHERE Increase > 1.10
SQLFiddle DEMO
Interesting problem.
You can get the average without the current purchase by subtracting the price on each row from the sum of all prices for the row. This observation -- in combination with window functions -- gives the information needed to get the rows you are looking for:
select *
from (select t.*,
avg(price) over (partition by customer) as avgprice,
sum(price) over (partition by customer) as sumprice,
count(price) over (partition by customer) as cntprice
from table1 t
) t
where (case when cntprice > 1
then (sumprice - price) / (cntprice - 1)
end) > avgprice*1.1;
Note the use of the case in the where clause. There is a potential divide by zero problem. SQL Server guarantees that the when part of the case will be evaluated before the then part (in the situation). So this is safe from that problem.