SQL - select top xx% rows - sql

I have a table, sales, which is ordered by descending TotalSales
user_id | TotalSales
----------------------
4 10
2 1.5
5 0.99
3 0.5
1 0.33
What I would like to do is find the percentage of the sum of all sales that the xx% most important sales represent.
For example if I wanted to do it for top 40% sales, here I would get (10+1.5)/(10+1.5+0.99+0.5+0.33)= 86%
But right now I haven't been able to select "top xx% rows".
Edit: DB management system can be MySQL or Vertica or Hive

select Sum(a) as s from sales where a in (Select TotalSales from sales where TotalSales>=x)
GROUP BY a
select Sum(TotalSales) as b from sales group by b
your result is s/b
and x= the percentage you set each time

Related

Get Decile of Values and # of records between deciles Presto SQL

I have a table that looks like this
User ID
Income
1
4.00
2
5.00
1
7.00
3
10.00
4
80.00
1
40.00
5
7.00
6
4.00
I need a Presto SQL query that breaks the range of "Income" {eg.4.00-80.00} up into deciles irrespective of frequency of that "Income" value. I also need the # of unique "User ID" that falls beneath that decile (eg. 10th percentile -> X users, 20th percentile Y users).
You can calculate the decile for each user-income, and then join the cte with itself (to account for the repeated users, since count distinct over() is not allowed in Presto).
WITH user_deciles_cte AS(
SELECT user_id,
NTILE(10) OVER (ORDER BY income) AS deciles
FROM table
),
join_users_and_deciles_cte AS(
SELECT DISTINCT dec.deciles,
users.user_id
FROM user_deciles_cte dec
LEFT JOIN user_deciles_cte users
ON users.deciles <= dec.deciles
)
SELECT deciles,
COUNT(DISTINCT user_id) AS users
FROM join_users_and_deciles_cte
GROUP BY 1
ORDER BY 1 ASC

SQL How to take the minium for multiple fields?

Consider the following data set that records the product sold, year, and revenue from that particular product in thousands of dollars. This data table (YEARLY_PRODUCT_REVENUE) is stored in SQL and has many more rows.
Year | Product | Revenue
2000 Table 100
2000 Chair 200
2000 Bed 150
2010 Table 120
2010 Chair 190
2010 Bed 390
Using SQL, for every year I would like to find the product that has the maximum revenue.
That is, I would like my output to be the following:
Year | Product | Revenue
2000 Chair 200
2010 Bed 390
My attempt so far has been this:
SELECT year, product, MIN(revenue)
FROM YEARLY_PRODUCT_REVENUE
GROUP BY article, month;
But when I do this, I get multiple-year values for distinct products. For instance, I'm getting the output below which is an error. I'm not entirely sure what the error here is. Any help would be much appreciated!
Year | Product | Revenue
2000 Table 100
2000 Bed 150
2010 Table 120
2010 Chair 190
You don't mention the database so I'll assume it's PostgreSQL. You can do:
select distinct on (year) * from t order by year, revenue desc
You want filtering rather than aggregation. We can use window functions (which most databases support) to rank yearly product sales, and then retain only the top selling product per year.
select *
from (
select r.*, rank() over(partition by year order by revenue desc) rn
from yearly_product_revenue r
) r
where rn = 1;
Here is a shorter solution if your database support the standard WITH TIES clause:
select *
from yearly_product_revenue r
order by rank() over(partition by year order by revenue desc)
fetch first row with ties

Calculating percentages per product using aggregate functions and group by

I have a table with 5 rows representing two different products. The columns include the product, sales, discount. I'm trying to calculate the percentage of sales per product that included a discount. The table looks like this:
product
sales
discount
1
10
0
1
10
5
2
20
10
2
20
0
2
20
10
My results should look like the below (which I know because I've calculated this in Excel):
product
perc_discount
1
50.00
2
66.67
For each of the two products we are calculating the count of sales with discount divided by the total count of sales, so for product 1 it would be (1/2)*100 = 50.
My SQL code looks like the below:
SELECT
product,
(SELECT COUNT(*) FROM sales WHERE discount >0)/COUNT(*)*100 AS perc_discount
FROM sales
GROUP BY product
However, the result I'm getting is:
product
perc_discount
1
150.0
2
100.0
It seems to be calculating the total count of discounted sales in the table and diving it by the count of each product and I can't seem to figure out how to change it. Any ideas on how I can improve this?
Thanks.
How about conditional sum?
SQL> select product,
2 round(sum(case when discount > 0 then 1 else 0 end) / count(*) * 100, 2) perc_discount
3 from sales
4 group by product;
PRODUCT PERC_DISCOUNT
---------- -------------
1 50
2 66,67
SQL>
So: add 1 for every discount row per product. Divide that sum with total number of rows per product (that's count). Round the result to 2 decimals (so that it looks prettier).
You can use conditional aggregation. For example:
select
product,
100.0 * count(case when discount <> 0 then 'x' end) / count(*) as perc_discount
from sales
group by product

how to write a query to get product ids contributing to top 80% sales?

suppose i have a product is and sales column
product id sales
1 1000
2 10000
3 50000
4 12000
5 8000
write an sql query to get all product ids that contribute to top 80 % of sales?
For this, you want a cumulative sum. Presumably, you want the top selling such products, so:
select p.*
from (select p.*,
sum(sales) over (order by sales desc) as running_sales,
sum(sales) over () as total_sales,
from products
) p
where running_sales - sales < 0.8 * total_sales;
This returns the product that reaches or first exceeds 80% of the total sales.

Deciling by partitions in Teradata SQL

I have a table in Teradata which contains Sales Information per store pertaining to each region.
StoreID RegionID Sales
1 A 200
2 A 150
3 A 210
4 B 400
5 B 420
How can I find out the stores in top 2 deciles by sales for each region?
There's the QUANTILE function, but this is old deprecated syntax. The top 2 decile are the top 20 percent and you can simply use PERCENT_RANK for this:
QUALIFY
PERCENT_RANK()
OVER (PARTITION BY RegionID
ORDER BY Sales DESC) <= 0.2