SQL Server - How calculate # of entities to hit 80% of sum total? - sql

I have a list of companies, their industry, and their annual revenue.
I need to partition the list by industry and figure out how many companies in each industry it takes to account for 80% of the industry's total revenue.
I can run the partition, I can figure out what 80% of each industry's revenue is, but I have zero idea how to figure out how many companies it takes to hit 80%. My only idea is to pull a list for each industry, sort revenue high to low, and sum down until I hit the 80% number.
Are there any built-in functions or clever approaches that can help me here?
Thanks!

I would use window functions:
select industry, count(*)
from (select t.*,
sum(revenue) over (partition by industry order by revenue desc) as running_revenue,
sum(revenue) over (partition by industry) as total_revenue
from t
) t
where running_revenue - revenue < 0.8 * total_revenue
group by industry;
The where includes all companies up to the first that passes the 80% threshold.
There are other functions such as ntile() and percentile() that can be used. I find it simplest to do the calculation directly using sum().

Related

SQL bucket based on totals

I have a list of retailers and their total sales. I want to bucket them in 4 categories based on their total sales. I want to show 10% of retailers cover 70% of sales.
In the example below, I am trying to divide the retailers in 4 quantiles.
In the below case total sales for all 10 retailers is 4500. In order to divide these retailers in 4 quantiles, I have sorted data by sales from high to low and assign them quantile.
Sum of sales for retailers in each quantile is around 4500/4= 1100.
How can I replicate this logic in sql?
Here's sample data :-
If I understand correctly, you can use cumulative sums and some arithmetic. I think this does what you want.
select t.*,
ceiling(running_total * 4.0 / total_total)
from (select t.*, sum(total) over (order by total desc) as running_total,
sum(total) over() as total_total
from t
) t

How to NTILE over distinct values in BigQuery?

I have a query that I'm trying to put together in Google BigQuery that would decile sales for each customer. The problem I'm running into is that if a decile breaks at the point where many customers have the same sales value, they could end up in different deciles despite having the same sales.
For example, if there were twenty customers in total, and one spent $100, 18 spent $50, and one spent $25, the 18 customers who spent $50 will still be broken out across all the deciles due to equal groups being created, whereas in reality I would want them to be placed in the same decile.
The data that I'm using is obviously a bit more complex -- there are about 10 million customers, and the sales are deciled within a particular group to which each customer belongs.
Example code:
NTILE(10) OVER (PARTITION BY customer_group ORDER BY yearly_sales asc) as current_sales_decile
The NTILE function works, but I just run into the problem described above and haven't figured out how to fix it. Any suggestions welcome.
Calculate the ntile yourself:
select ceiling(rank() over (partition by customer_group order by yearly_sales) * 10.0 /
count(*) over (partition by customer_group)
)
This gives you more control over how the tiles are formed. In particular, all rows with the same value go in the same tile.

Redshift Rounding Off Issue

I have a table which has a numeric(23,2) field that I need to divide to a constant.
My baseline is this aggregation
select site, sum(sales) / 1.07 as sales from sales group by site;
But when I add another column then compare the total sum across all, I noticed some decimal drop-offs
select site, product, sum(sales) / 1.07 as sales from sales group by site, product;
Is there like a proper way to handle such in Redshift?
I would suggest dividing before doing the sum:
select site, product, sum(sales / 1.07) as sales
from sales
group by site, product;
Mathematically, this should be equivalent. However, because numbers are not infinite precision in computers, this may address your issue.

Optimize Average of Averages SQL Query

I have a table where each row is a vendor with a sale made on some date.
I'm trying to compute average daily sales per vendor for the year 2019, and get a single number. Which I think means I want to compute an average of averages.
This is the query I'm considering, but it takes a very long time on this large table. Is there a smarter way to compute this average without this much nesting? I have a feeling I'm scanning rows more times than I need to.
-- Average of all vendor's average daily sale counts
SELECT AVG(vendor_avgs.avg_daily_sales) avg_of_avgs
FROM (
-- Get average number of daily sales for each vendor
SELECT vendor_daily_totals.memberdeviceid, AVG(vendor_daily_totals.cnt)
avg_daily_sales
FROM (
-- Get total number of sales for each vendor
SELECT vendorid, COUNT(*) cnt
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid, month, day
) vendor_daily_totals
GROUP BY vendor_daily_totals.vendorid
) vendor_avgs;
I'm curious if there is in general a way to compute an average of averages more efficiently.
This is running in Impala, by the way.
I think you can just do the calculation in one shot:
SELECT AVG(t.avgs)
FROM (
SELECT vendorid,
COUNT(*) * 1.0 / COUNT(DISTINCT month, day) as avgs
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid
) t
This gets the total and divides by the number of days. However, COUNT(DISTINCT) might be even slower than nested GROUP BYs in Impala, so you need to test this.

Issue with finding out a percentage from the average in Postgres

Before I introduce my issue, I must specify that I am a beginner with SQL and Postgres.
I've made a database in Postgres, as a part of a project and I need to interrogate it. The database is about a firm which sells fertilizer.
One of the request is that I need to write a query that will return the Stores with Sales of 25% of the average of the total sales.
I have found out the average of the Sales by using the following query:
SELECT StoreID
FROM Sales
WHERE Price < (SELECT ROUND(AVG(Price)) FROM Sales);
Now, I don't know what should I put in the query to get the result.
Can anyone guide me?
If you mean sales with price below 25% of the average:
select storeid
from (
select storeid, price, avg(price) over() as avg_price
from sales
) s
where price < 0.25 * avg_price