Calculating percentage within a group - sql

given a table that for the following commands:
select sex, count(*) from my_table group by sex;
select sex, employed, count(*) from my_table group by sex, employed;
gives:
sex | count
-------+------
male | 1960
female | 1801
and:
sex | employed | count
---------+----------+-------
male | f | 1523
male | t | 437
female | f | 1491
female | t | 310
I'm having a difficulty writing a query that will calculate percentage of employed within each sex group. So the output should look like this:
sex | employed | count | percent
---------+----------+--------+-----------
male | f | 1523 | 77.7% (1523/1960)
male | t | 437 | 22.3% (437/1960)
female | f | 1491 | 82.8% (1491/1801)
female | t | 310 | 17.2% (310/1801)

May be too late, but for upcoming searchers, possible solution could be:
select sex, employed, COUNT(*) / CAST( SUM(count(*)) over (partition by sex) as float)
from my_table
group by sex, employed
By IO Statistics this seems to be most effective solution - may be dependant on number of rows to be queried - tested on numbers above ...
The same attitude could be used for getting male / female percentage:
select sex, COUNT(*) / CAST( SUM(count(*)) over () as float)
from my_table
group by sex
Regards,
Jan

You can do it with a sub-select and a join:
SELECT t1.sex, employed, count(*) AS `count`, count(*) / t2.total AS percent
FROM my_table AS t1
JOIN (
SELECT sex, count(*) AS total
FROM my_table
GROUP BY sex
) AS t2
ON t1.sex = t2.sex
GROUP BY t1.sex, employed;
I can't think of other approaches off the top of my head.

Related

Postgresql: how to calculate the percentage correctly?

With this very simple query I obtain the count(total) and genre of movies
from my db.
select genre,count(*) from titles group by genre order by count desc;
genre | count
-----------------+-------
Drama | 529
Comic | 393
Martial arts | 276
History | 269
Action | 237
My question is: how to get a percentage?
I want something like this
select ??????;
genre | percentage
-----------------+-------
Drama | 30%
Comic | 20%
Martial arts | 20%
History | 15%
Action | 11%
Other | 4%
I have tried a lot of codes taken from gogle and stackexchange before asking, as you can see from this psql history
but all fail with error or very strange results.
SELECT
title,
ROUND( AVG( genre ), 2 ) percentage
FROM
titles
INNER JOIN genre
USING(id_genre)
GROUP BY
title
ORDER BY
genre DESC;
select title, round ( AVG( genre ), 2 ) percentage from titles;
SELECT round((count(genre) *100)::numeric / NULLIF(count(*), 0), 2) AS percentage;
Use window functions:
select genre,
count(*) as cnt,
count(*) * 1.0 / sum(count(*)) over ()
from titles
group by genre
order by cnt desc
Thanks to Gordon for the correct answer, using this string I obtain a perfect result(except for 0% results which are the 0.0001... rounded by round function, but is not a problem)
select genre, count(*) as total, round(count(*) * 100 / sum(count(*)) over ()) || ' %' as percent from titles group by genre order by total desc;
genre | total | percent
-----------------+--------+-------------
Drama | 529 | 17 %
Comic | 393 | 13 %
Martial arts | 276 | 9 %
History | 269 | 9 %
Action | 237 | 8 %
...

There is a table having country and city columns as shown in the below table input. I need the output as mentioned below

I need a SQL query to get the desired output from the input table
You can do this with a UNION query, first selecting the distinct country names, and then each of the cities for that country. The output is then ordered by the country; whether the value is a country or a city; and then by the value:
SELECT DISTINCT country AS data, country, 1 AS ctry
FROM cities
UNION ALL
SELECT city, country, 0
FROM cities
ORDER BY country, ctry DESC, data
Output:
data country ctry
India India 1
BNG India 0
CHN India 0
HYD India 0
Sweden Sweden 1
GOTH Sweden 0
STOCK Sweden 0
VAXO Sweden 0
Demo on dbfiddle
It really looks like you are willing to interleave the records, with each country followed by its related countries.
The actual solution heavily depends on your datase, so let me assume that yours supports window functions, row constructor values() and lateral joins (SQL Server and Postgres are two candidates).
In SQL Server, you could do:
select distinct rn, idx, val
from (
select t.*, dense_rank() over(order by country) rn
from mytable t
) t
cross apply (values (t.country, 1), (t.city, 2)) as v(val, idx)
order by rn, idx, val
Demo on DB Fiddle:
rn | idx | val
-: | --: | :-----
1 | 1 | INDIA
1 | 2 | BNG
1 | 2 | CHN
1 | 2 | HYD
2 | 1 | SWEDEN
2 | 2 | STOCK
2 | 2 | VAXO
In Postgres, you would just replace outer apply with cross join lateral: Demo.

Unique values with string_agg on more than 1 column

I am trying to group by and get list of values for multiple columns. Here is an example:
City | State | Income
-------+-------+--------
Salem | OH | 40000
Salem | OH | 45000
Mason | OH | 50000
Dayton | OH | 60000
Salem | MA | 40000
Mason | MA | 45000
Mason | MA | 50000
Dayton | MA | 70000
Salem | PA | 45000
Mason | PA | 50000
Dayton | PA | 60000
The result I am looking for is:
City | States | Income
-------+------------+--------------
Salem | OH,MA,PA | 40000,45000
Mason | OH,MA,PA | 50000,45000
Dayton | OH,MA,PA | 60000,70000
I managed to get this far:
City | States | Income
-------+------------+-------------------------
Salem | OH,MA,PA | 40000,40000,45000,45000
Mason | OH,MA,PA | 50000,50000,50000,45000
Dayton | OH,MA,PA | 60000,70000,60000
How do I go from here to the result set?
City | States | Income
-------+------------+-------------------------
Salem | OH,MA,PA | 40000,45000,50000
Mason | OH,MA,PA | 50000,45000
Dayton | OH,MA,PA | 60000,70000
Alas, you cannot use string_agg() with distinct. But you can use conditional aggregation:
select city,
string_agg(case when seqnum_state = 1 then state end, ',') as states,
string_agg(case when seqnum_income = 1 then income end, ',') as incomes
from (select t.*,
row_number() over (partition by city, state order by state) as seqnum_state,
row_number() over (partition by city, income order by income) as seqnum_income
from t
) t
group by city;
Here is a db<>fiddle.
You can perform separate group by on (City, state) and (City, Income) to remove duplicates, then you can separately build the (States) and (Incomes) aggregated strings and finally you can join the results in a single table:
DECLARE #tmp TABLE (City VARCHAR(100), State VARCHAR(100), Income int);
INSERT INTO #tmp
VALUES ('Salem' ,'OH', 40000) ,('Salem' ,'OH', 45000) ,('Mason' ,'OH', 50000)
,('Dayton','OH', 60000) ,('Salem' ,'MA', 40000) ,('Mason' ,'MA', 45000)
,('Mason' ,'MA', 50000) ,('Dayton','MA', 70000) ,('Salem' ,'PA', 45000)
,('Mason' ,'PA', 50000) ,('Dayton','PA', 60000)
;with States as(
select City, state
from #tmp
group by City, state
),
incomes as(
select City, Income
from #tmp
group by City, Income
)
, states_g as (
select city, STRING_AGG(state,',') as States
from states
group by city
)
, incomes_g as (
select city, STRING_AGG(Income,',') as Incomes
from incomes
group by city
)
select
s.City, s.States, i.Incomes
from
states_g as s
inner join
incomes_g as i
on i.City = s.City
Results:
Here is one more way of doing it (db fiddle):
select city,
(select string_agg(value,', ') from (select distinct value from string_split(string_agg(state, ','),',')) t) as states,
(select string_agg(value,', ') from (select distinct value from string_split(string_agg(income, ','),',')) t) as incomes
from t
group by city;
You may easily convert the splitting and merging part into a reusable scalar valued function.
Experts are welcome to comment on performance.

Sum Decode statement SQL

I am trying to sum a few Decode statements and column names, but am having difficulties.
currently it is showing as
rank | name | points
----------------------
0 | john | 0
0 | john | 40
1 | john | 30
2 | tom | 22
0 | tom | 0
I expect to have this result:
rank | name | points
----------------------
1 | john | 70
2 | tom | 22
Query:
Select Rank, Name, Code, Points
From
(select
decode(Table.name, 'condition1', Table.value) As Points,
decode(Table.name, 'Condition2', Table.value) As Rank,
Employee.name as Name,
Employee.GA1 as Code
from Table
inner Join Employee
on Empolyee.positionseq = name.positionseq
where Table.name IN ('Condition1', 'Condition2')
);
Select MAX(Rank), Name, Code, SUM(Points)
From
(select
decode(Table.name, 'condition1', Table.value) As Points
decode(Table.name, 'Condition2', Table.value) As Rank
,Employee.name as Name
,Employee.GA1 as Code
from Table
inner Join Employee
on Employee.positionseq = name.positionseq
where Table.name IN( 'Condition1', 'Condition2'))
GROUP BY Employee.id;
I added the SUM, MAX (for rank) and GROUP BY statements. Also corrected some misspellings (Empolyee)
I may be understanding your question incorrectly, however, it seems like you are trying to do the following (omitting inner join for simplicity):
Select MAX(rank), name, SUM(points)
FROM UserRanks
GROUP BY name
Based on your data set above, you should get the following results:
rank name points
1 john 70
2 tom 22

Crosstab query in PostgreSQL

I would like the result of this query in a crosstab:
SELECT district, sex ,count(sex)
FROM table1
GROUP BY sex, district
ORDER BY district;
district | sex | count
---------+-----+-----
dis_1 | M | 2
dis_1 | F | 4
dis_1 | NI | 1
dis_2 | M | 5
dis_2 | F | 2
Like this:
district | M | F | NI
---------+---+---+---
dis_1 | 2 | 4 | 1
dis_2 | 5 | 2 | 0
I did some testing without success, as the query below:
SELECT row_name AS district,
category_1::varchar(10) AS m,
category_2::varchar(10) AS f,
category_3::varchar(10) AS ni,
category_4::int AS count
FROM crosstab('select district, sex, count(*)
from table1 group by district, sex')
AS ct (row_name varchar(27),
category_1 varchar(10),
category_2 varchar(10),
category_3 varchar(10),
category_4 int);
This crosstab function produces exactly what you asked for (except for simplified data types):
SELECT *
FROM crosstab('
SELECT district, sex, count(*)::int
FROM table1
GROUP BY 1,2
ORDER BY 1,2'
,$$VALUES ('M'), ('F'), ('NI')$$)
AS ct (district text
,"M" int
,"F" int
,"NI" int);
You had a couple of errors in your attempt.
Find details and explanation in this closely related answer:
PostgreSQL Crosstab Query
You can use an aggregate function with a CASE expression to get the result in columns:
select district,
sum(case when sex ='M' then 1 else 0 end) M,
sum(case when sex ='F' then 1 else 0 end) F,
sum(case when sex ='NI' then 1 else 0 end) NI
from table1
group by district
order by district