SQL / Postgresql count multiple columns with conditions - sql

I have a simple table of the form:
id
gender
a_feature (bool)
b_feature (bool)
...
xyz_feature (bool)
and I want to sum over all feature columns dependent on gender.
metric
male
female
a_feature
345
3423
b_feature
65
143
...
...
...
xyz_feature
133
5536
Is there a simple way to do this, e.g. using the information_schema.
I found only the solution below, but this is very ugly:
select
'a_feature' as feature_name,
count(case a_feature and gender = 'male') as male,
count(case a_feature and gender = 'female') as female
from table
union
select
b_feature as feature_name,
count(case b_feature and gender = 'male') as male,
count(case b_feature and gender = 'female') as female
from table
.
.
.
select
xyz_feature as feature_name,
count(case xyz_feature and gender = 'male') as male,
count(case xyz_feature and gender = 'female') as female
from table

You can unpivot and aggregate. One method is:
select name,
sum(case when feature and gender = 'male' then 1 else 0 end) as num_male,
sum(case when feature and gender = 'female' then 1 else 0 end) as num_female
from ((select 'a_feature' as name, a_feature as feature, gender
from t
) union all
(select 'b_feature' as name, b_feature, gender
from t
) union all
. . .
) f
group by name;
In Postgres, you would unpivot using a lateral join:
select name,
sum(case when feature and gender = 'male' then 1 else 0 end) as num_male,
sum(case when feature and gender = 'female' then 1 else 0 end) as num_female
from t cross join lateral
(values ('a_feature', a_feature),
('b_feature', b_feature),
. . .
) v(name, feature)
group by name;
You can generate the list for values() using information_schema.columns if you are reluctant to type it all in.
EDIT:
You can construct the values clause using something like this:
select string_agg('(''' || column_name || ''', column_name)', ', ')
from information_schema.columns
where table_name = ?

When you use this in Postgres, what do you mean by t (I think t is for table) in "from t cross join lateral" and what do you mean by v in "v(name, feature)" ?
select name,
sum(case when feature and gender = 'male' then 1 else 0 end) as num_male,
sum(case when feature and gender = 'female' then 1 else 0 end) as num_female
from t cross join lateral
(values ('a_feature', a_feature),
('b_feature', b_feature),
. . .
) v(name, feature)
group by name;

Related

print out the number of two values in the same column

I have a database in which there are people. They have genders. How would I count male and female separate.
SELECT count(id_osb)
from ds_osebe
where spol = 'M'
or spol = 'Z';
this is how i can get the number of male and female combined
I do not know how to make this, it's my second day learning this.
You need to use grouping (group by)
SELECT spol, count(id_osb)
from ds_osebe
where spol = 'M'
or spol = 'Z'
group by spol
NOTE: replace spol if it doesn't denote sex
You can use a CASE expression.
Query
select SUM(case spol when 'M' then 1 else 0 end) as male_cnt
SUM(case spol when 'Z' then 1 else 0 end) as female_cnt
from ds_osebe;
select count(1) over (partition by spol) as qty, spol
from ds_osebe
/*if you have more than 2 gender options*/
where spol in ('Z', 'M');

SQL works in Athena Engine v1 but not v2

I have a SQL query embedded into a system that has worked successfully until now in Athena with engine version 1. However it fails in engine version 2 and I haven't been able to work out why.
Here is a generalised version of the SQL. It sums the number of people in 3 groups: adults, NY residents and the overlap of the two. (NY adults).
In version 1 this works, but in v2 I get the error "column z.id_field cannot be resolved"
WITH BASE AS (SELECT person_id, age, state
FROM people
WHERE gender = 'male'
)
,group_a as (
SELECT distinct (person_id) as id_field
FROM BASE
WHERE age > 17
),
group_b as (
SELECT distinct (person_id) as id_field
FROM BASE
WHERE state = 'NY'
)
SELECT CASE WHEN z.id_field is null then 'group_b_only' WHEN r.id_field is null then 'group_a_only' ELSE 'Overlap' END as group
, COUNT (coalesce (z.id_field, r.id_field)) as count
FROM group_a AS z FULL OUTER JOIN group_b as r USING (id_field)
GROUP BY 1;
As a note, in any database this would be simpler as an aggregation and probably faster too:
SELECT grp, COUNT(*)
FROM (SELECT person_id,
(CASE WHEN MAX(age) > 17 AND MAX(state) = 'NY' THEN 'Both'
WHEN MAX(age) > 17 THEN 'Age Only'
ELSE 'State Only'
END) as grp
FROM people
WHERE gender = 'male' AND
(age > 17 OR state = 'NY')
GROUP BY person_id
) x
GROUP BY grp;
The above assumes that person_id can be repeated in people. If that is not the case, then this can be simplified to:
SELECT (CASE WHEN age > 17 AND state = 'NY' THEN 'Both'
WHEN age > 17 THEN 'Age Only'
ELSE 'State Only'
END) as grp, COUNT(*)
FROM people
WHERE gender = 'male' AND
(age > 17 OR state = 'NY')
GROUP BY grp;

Compare the count of data in same column in same table and display the larger value

I wanted to a count of the same field for different values for example:
user{user_id, gender}
Gender can have obviously male or female :)
i want to get count for all the males and females i.e.
COUNT(male) COUNT(female)
4 16
but i'm confused because they come from the same gender column thanks
ALSO, I Want the result to only display the higher count. Like
COUNT(female)
16
Try the following using case statement.
select
sum(case when gender = 'male' then 1 else 0 end) as total_male,
sum(case when gender = 'female' then 1 else 0 end) as total_female
from user
If you are using MySQL then use following
select
sum(gender = 'male') as total_male,
sum(gender = 'female') as total_female
from user
If you are using PostgreSQL then use filter
select
count(1) filter (where gender = 'male') as total_male,
count(1) filter (where gender = 'female') as total_female
from user
You can achieve your final result by following query. here is the demo.
select
case
when total_male < total_female then total_female
else total_male
end as total_count
from
(
select
count(1) filter (where gender = 'male') as total_male,
count(1) filter (where gender = 'female') as total_female
from users
) t

Combining results before grouping in SQL

I want to work out the male/female split of my customer based on the person's title (Mr, Mrs, etc)
To do this I need to combine the result for the Miss/Mrs/Ms into a 'female' field.
The query below gets the totals per title but I think I need a sub query to return the combined female figure.
Any help would be greatly appreciated.
Query:
SELECT c.Title, COUNT(*) as Count
FROM
Customers c
GROUP BY Title
ORDER By [Count] DESC
Answer:
Mr 1903
Miss 864
Mrs 488
Ms 108
You could do it like this
SELECT
[Gender] = CASE [Title] WHEN 'Mr' THEN 'M' ELSE 'F' END,
COUNT(*) as Count
FROM
Customers c
GROUP BY
CASE [Title] WHEN 'Mr' THEN 'M' ELSE 'F' END
ORDER By
[Count] DESC
Demo at http://sqlfiddle.com/#!3/05c74/4
You can use CASE to project the new groups for the Titles:
SELECT SUM(CASE WHEN Title IN ('Mr') THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN Title IN ('Miss', 'Ms', 'Mrs') THEN 1 ELSE 0 END) AS Female
FROM
Customers c;
Try this:
SELECT (CASE WHEN c.Title = 'Mr' THEN 'Male'
WHEN c.Title IN ('Mrs', 'Miss', 'Ms') THEN 'Female'
ELSE 'NA'
END) AS title,
COUNT(1) AS PeopleCount
FROM Customers c
GROUP BY (CASE WHEN c.Title = 'Mr' THEN 'Male'
WHEN c.Title IN ('Mrs', 'Miss', 'Ms') THEN 'Female'
ELSE 'NA'
END)
ORDER By PeopleCount DESC;

Sql Nested group by

Following query returns a number of people having the same name with gender = Male.
select lookup_name.firstname,count(lookup_name.firstname)
from lookup_name
where gender='M'
group by firstname
similarly, the query below returns a number of people having the same name with gender = Female.
select lookup_name.firstname,count(lookup_name.firstname)
from lookup_name
where gender='F'
group by firstname
I need to write a query which finds out the name and tell the gender (whether male or female) with the greater count. i.e higher probability of that name in the database is of being male or female?
SELECT firstname, Male, Female,
case when Male=Female then 'indeterminate'
when Male>Female then 'probably male'
else 'probably female' end MostProbablySex
FROM (
select firstname,
SUM(case when gender='M' then 1 else 0 end) Male,
SUM(case when gender='F' then 1 else 0 end) Female
from lookup_name
group by firstname
) X;
Or a single pass:
select firstname,
CASE SIGN(2.0 * SUM(case when gender='M' then 1 else 0 end) / COUNT(*) - 1)
WHEN -1 then 'probably female'
WHEN 0 then 'indeterminate'
WHEN 1 then 'probably male'
END
from lookup_name
group by firstname;