Percentage distribution of column occurence? - sql

I am looking for a way to calculate the distribution of column values in my table.
e.g. if I have two rows containing "red" and "blue", each should have 50%.
I want to count the occurence of col and compare that to the overall amount of rows.
My attempt:
SELECT
log_domain,
count(log_domain),
count(log_domain) over(),
ROUND(
COUNT(log_domain)
/
COUNT(*) OVER()
,2) AS percentage
FROM logs
GROUP BY log_domain
Any help? Thank you!

Just pay attention to integer division. I often just multiply by 1.0:
SELECT log_domain, COUNT(*), COUNT(*) OVER (),
ROUND(COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), 2) as ratio
FROM logs
GROUP BY log_domain;
I also notice that the denominator needs to be SUM(COUNT(*)) rather than COUNT(*). Your version just divides by the number of rows in the result set -- that is, the number of values of log_domain.

Related

Running calculations on different percent amounts of data - SQL

I'm brainstorming on ways to find trends over a dataset containing transaction amounts that spans a year.
I'd like to run an average of top 25% observations of data and bottom 75% observations of data and viceversa.
If the entire dataset contains 1000 observations, I'd like to run:
An average of the top 25% and then separately, an average of the bottom 75% and find the resulting average of this.
Inversely, top 75% average, then bottom 25%, then the average of the 2.
For the overall average I have: avg(transaction_amount)
I am aware that in order for the sectioning averages to be useful, I will have to order the data according to the date which I already have accounted for in my SQL code:
select avg(transaction_amount)
from example.table
order by transaction_date
I am now struggling to find a way to split the data between 25% and 75% based on the number of observations.
Thanks.
If you're using MSSQL, it's pretty trivial depending on exactly the output you're looking for.
SELECT TOP 25 PERCENT
*
FROM (
SELECT
AVG(transaction_amount) as avg_amt
FROM example.table
) AS sub
ORDER BY sub.avg_amt DESC
Use PERCENT_RANK in order to see which percentage block a row belongs to. Then use this to group your data:
with data as
(
select t.*, percent_rank() over (order by transaction_amount) as pr
from example.table t
)
select
case when pr <= 0.75 then '0-75%' else '75-100%' end as percent,
avg(transaction_amount) as avg,
avg(avg(transaction_amount)) over () as avg_of_avg
from data
group by case when pr <= 0.75 then '0-75%' else '75-100%' end
union all
select
case when pr <= 0.25 then '0-25%' else '25-100%' end as percent,
avg(transaction_amount) as avg,
avg(avg(transaction_amount)) over () as avg_of_avg
from data
case when pr <= 0.25 then '0-25%' else '25-100%' end;

How to split a list of number into ranges with a fixed interval with SQL?

Let's say I have a table like this
I want to calculate the frequency ( How many times that product exists in that price range ), in intervals of "50"
So eventually it will give me a table like
Interval for range will be lets pretend a fixed 50
We don't know highest and lowest price of these each products.
So I will run the query and it will give a table as shown above.
You can use arithmetic and aggregation:
select product, count(*) as frequency,
floor(price / 50)*50 as range_start, floor(price / 50)*50 + 50 as range_end
from t
group by product, floor(price / 50)
order by product, min(price)

Redshift - Find % as compared to total value

I have a table with count by product. I am trying to add a new column that would find % as compared to sum of all rows in that column.
prod_name,count
prod_a,100
prod_b,50
prod_c,150
For example, I want to find % of prod_a as compared to the total count and so on.
Expected output:
prod_name,count,%
prod_a,100,0.33
prod_b,50,0.167
prod_c,150,0.5
Edit on SQL:
select count(*),ratio_to_report(prod_name)
over (partition by count(*))
from sales
group by prod_name;
Using window functions.
select t.*,100.0*cnt_by_prod/sum(cnt_by_prod) over() as pct
from tbl t
Edit: Based on OP's question change, To compute the counts and then percentage, use
select prod_name,100.0*count(*)/sum(count(*)) over()
from tbl
group by prod_name

Calculate percentage between two columns in SQL Query as another column

I have a table with two columns, number of maximum number of places (capacity) and number of places available (availablePlaces)
I want to calculate the availablePlaces as a percentage of the capacity.
availablePlaces capacity
1 20
5 18
4 15
Desired Result:
availablePlaces capacity Percent
1 20 5.0
5 18 27.8
4 15 26.7
Any ideas of a SELECT SQL query that will allow me to do this?
Try this:
SELECT availablePlaces, capacity,
ROUND(availablePlaces * 100.0 / capacity, 1) AS Percent
FROM mytable
You have to multiply by 100.0 instead of 100, so as to avoid integer division. Also, you have to use ROUND to round to the first decimal digit.
Demo here
The following SQL query will do this for you:
SELECT availablePlaces, capacity, (availablePlaces/capacity) as Percent
from table_name;
Why not use a number formatting function such as format_number (or an equivalent one in your database) to format a double as a percentage? This example is generalized. The returned value is a string.
WITH t
AS
(
SELECT count(*) AS num_rows, count(foo) as num_foo
FROM mytable
)
SELECT *, format_number(num_foo/num_rows, '#.#%') AS pct_grade_rows
FROM t
This avoids the use of round and multiplying the numerator by 100.

Percentage calculation of specific number of row over total

I'm looking for SELECT statement to calculate percentage of specific number of row over the total number of rows.
For example; lets say i have a FRUIT table like this;
I want to calculate a percentage of rows that its name is not peach, over the total number of rows. I try this statement :
SELECT CAST((select count(name) from fruit WHERE name !='peach')
as FLOAT) /
(select count(name)from fruit)*100.0 as percentage ;
but it doesn't give me correct number. I also need a statement that i can calculate percentage of each fruit by grouping them with Group by function
I'm very new at SQL and i keep trying but cant find the right syntax. Please help me.
I think the easiest way to do this is using conditional aggregation with average:
select avg(case when fruit <> 'peach' then 100.0 else 0.0 end)
from fruits;
In Postgres, you can use the shorthand:
select 100*avg((fruit <> 'peach')::int)
from fruits;