How to select count of 0s, 1s, and both 0s and 1s in a postgres table column? - sql

Say there's a table that has columns named binary_value, name, and created_at along with the id column.
Here's the SQL Fiddle for this question: http://sqlfiddle.com/#!15/d15d1/36
What would be an efficient query to get a result like the following?
ones_count | zeros_count | total
3 | 1 | 4
So far, I've got:
with cte2(count_type, counted) as (
with cte as (
select binary_value,
sum(case when binary_value = 1 then 1 else 0 end) as ones_count,
sum(case when binary_value = 0 then 1 else 0 end) as zeros_count
from infos
where name = 'me'
and created_at >= '2020-03-10 21:13:01.319677'
and created_at <= '2020-03-10 21:13:01.619677'
group by binary_value
)
select 'ones_count', ones_count from cte where binary_value = 1
union
select 'ones_count', zeros_count from cte where binary_value = 0
union
select 'total', sum(ones_count + zeros_count) as total from cte
)
select * from cte2;
Which gives it in column form:
count_type | counted
ones_count | 1
total | 4
ones_count | 3
How can we get the result in a row? Perhaps there's a different approach altogether than Common Table Expression? I'm starting to look at crosstab, which is postgres-specific, and so wondering if all this is overkill.
Including DDL and data here, too:
create table infos (
id serial primary key,
name character varying not null,
binary_value integer not null,
created_at timestamp without time zone not null
)
insert into infos ("binary_value", "name", "created_at") values
(1, 'me', '2020-03-10 21:13:01.319677'),
(1, 'me', '2020-03-10 21:13:01.419677'),
(0, 'me', '2020-03-10 21:13:01.519677'),
(1, 'me', '2020-03-10 21:13:01.619677');

I think you just want conditional aggregation:
select count(*) filter (where binary_value = 0) as num_0s,
count(*) filter (where binary_value = 1) as num_1s,
count(*)
from infos
where name = 'me' and
created_at >= '2020-03-10 21:13:01.319677' and
created_at <= '2020-03-10 21:13:01.619677';
The date comparison looks rather, uh, specific. I assume that you really intend a range there.
Here is a SQL Fiddle.
Note: If you are really using Postgres 9.3, then you can't use the filter clause (alas). Instead:
select sum( (binary_value = 0)::int ) as num_0s,
sum( (binary_value = 1)::int ) as num_1s,
count(*)
from infos
where name = 'me' and
created_at >= '2020-03-10 21:13:01.319677' and
created_at <= '2020-03-10 21:13:01.619677';
Also, if you wanted the results in three separate rows, a simpler query is:
select binary_value, count(*)
from infos
where name = 'me' and
created_at >= '2020-03-10 21:13:01.319677' and
created_at <= '2020-03-10 21:13:01.619677'
group by grouping sets ( (binary_value), () );

Much simpler:
select
sum(case when binary_value = 1 then 1 else 0 end) as ones_count,
sum(case when binary_value = 0 then 1 else 0 end) as zeroes_count,
count(*) as total
from infos

Related

SQL aggregate function inside an aggregate function

I know it's not possible to nest aggregate functions. But I want to achieve something like this and quite confused about how to do this compromising performance.
SELECT
date,
count(CASE WHEN SUM(active_time) > 5 THEN user_id END) AS total_active_users,
count(CASE WHEN SUM(active_time) > 5 AND is_admin = true THEN user_id END) AS total_active_admin_users
FROM
(
SELECT date, user_id, user_name, active_time, is_admin FROM users
)
GROUP BY date
It's really appreciated if someone could suggest a way to achieve this.
Perhaps you want something like this:
select date,
sum(case when sum_active_time > 5 then 1 else 0 end) as total_active_users,
sum(case when sum_active_time > 5 and is_admin then 1 else 0 end) as total_active_admin_users
from (select u.*, sum(active_time) over (partition by user_id) as sum_active_time
from users
) u
group by date;
However, I would expect user_id to be unique in a table called users. That makes me wonder why you need to do a count or sum at all. So, you might want:
select date,
sum(case when active_time > 5 then 1 else 0 end) as total_active_users,
sum(case when active_time > 5 and is_admin then 1 else 0 end) as total_active_admin_users
from users
group by date;
SELECT date,
COUNT(user_id) as total_active_users,
COUNT(CASE WHEN is_admin = 1 THEN user_id END ) as total_active_admin_users
FROM (
SELECT date, is_admin, user_id
FROM users
GROUP BY date, is_admin, user_id
HAVING SUM(active_time) > 5
) t
GROUP BY date

How I can group by and count in PostgreSQL to prevent empty cells in result

I have the table in PostgreSQL DB
Need to calculate SUM of counts for each event_type (example for 4 and 1)
When I use query like this
SELECT account_id, date,
CASE
WHEN event_type = 1 THEN SUM(count)
ELSE null
END AS shows,
CASE
WHEN event_type = 4 THEN SUM(count)
ELSE null
END AS clicks
FROM widgetstatdaily WHERE account_id = 272 AND event_type = 1 OR event_type = 4 GROUP BY account_id, date, event_type ORDER BY date
I receive this table
With <null> fields. It's because I have event_type in select and I need to GROUP BY on it.
How I can make query to receive grouped by account_id and date result without null's in cells? Like (first row)
272 2018-03-28 00:00:00.000000 57 2
May be I can group it after receiving result
You need conditional aggregation and some other fixes. Try this:
SELECT account_id, date,
SUM(CASE WHEN event_type = 1 THEN count END) as shows,
SUM(CASE WHEN event_type = 4 THEN count END) as clicks
FROM widgetstatdaily
WHERE account_id = 272 AND
event_type IN (1, 4)
GROUP BY account_id, date
ORDER BY date;
Notes:
The CASE expression should be an argument to the SUM().
The ELSE NULL is redundant. The default without an ELSE is NULL.
The logic in the WHERE clause is probably not what you intend. That is fixed using IN.
try its
SELECT account_id, date,
SUM(CASE WHEN event_type = 1 THEN count else 0 END) as shows,
SUM(CASE WHEN event_type = 4 THEN count else 0 END) as clicks
FROM widgetstatdaily
WHERE account_id = 272 AND
event_type IN (1, 4)
GROUP BY account_id, date
ORDER BY date;

counting events over flexible ranges

I am trying to count events (which are rows in the event_table) in the year before and the year after a particular target date for each person. For example, say I have a person 100 and target date is 10/01/2012. I would like to count events in 9/30/2011-9/30/2012 and in 10/02/2012-9/30/2013.
My query looks like:
select *
from (
select id, target_date
from subsample_table
) as i
left join (
select id, event_date, count(*) as N
, case when event_date between target_date-365 and target_date-1 then 0
when event_date between target_date+1 and target_date+365 then 1
else 2 end as after
from event_table
group by id, target_date, period
) as h
on i.id = h.id
and i.target_date = h.event_date
The output should look something like:
id target_date after N
100 10/01/2012 0 1000
100 10/01/2012 1 0
It's possible that some people do not have any events in the before or after periods (or both), and it would be nice to have zeros in that case. I don't care about the events outside the 730 days.
Any suggestions would be greatly appreciated.
I think the following may approach what you are trying to accomplish.
select id
, target_date
, event_date
, count(*) as N
, SUM(case when event_date between target_date-365 and target_date-1
then 1
else 0
end) AS Prior_
, SUM(case when event_date between target_date+1 and target_date+365
then 1
else 0
end) as After_
from subsample_table i
left join
event_table h
on i.id = h.id
and i.target_date = h.event_date
group by id, target_date, period
This is a generic answer. I don't know what date functions teradata has, so I will use sql server syntax.
select id, target_date, sum(before) before, sum(after) after, sum(righton) righton
from yourtable t
join (
select id, target_date td
, case when yourdate >= dateadd(year, -1, target_date)
and yourdate < target_date then 1 else 0 end before
, case when yourdate <= dateadd(year, 1, target_date)
and yourdate > target_date then 1 else 0 end after
, case when yourdate = target_date then 1 else 0 end righton
from yourtable
where whatever
group by id, target_date) sq on t.id = sq.id and target_date = dt
where whatever
group by id, target_date
This answer assumes that an id can have more than one target date.

Multiple Queries in different table

(Also posted here.)
So I have two tables, one is invalid table and the other is valid table.
valid table:
id
status
date
invalid table:
id
status
date
I have to produce a report with this output:
date on-time late total valid invalid1 invalid2 total rate
--------- ------- ---- ----- ----- -------- -------- ----- ----
9/10/2011 4 10 14 3 3 3 6
date: common fields on the 2 tables, field to group by, how many records on that day has
on-time: count of all the id on the valid table
late: count of all the records(id) on the invalid table
total: total of on-time and late
valid: count of id on the valid table with the "valid" status
invalid1: count of id on the invalid table with "invalid1" status
invalid2: count of id on the invalid table with "invalid2" status
total: total of valid, invalid1, invalid2
rate: average of totals
It's basically multiple queries with different table. How can I achieve it?
Someting like this?
SELECT
*,
(result.total + result._total) / 2 AS rate
FROM (
SELECT
date,
SUM(CASE WHEN data.valid = 1 THEN 1 ELSE 0 END) AS ontime,
SUM(CASE WHEN data.valid = 0 THEN 1 ELSE 0 END) AS late,
COUNT(*) AS total,
SUM(CASE WHEN data.valid = 1 AND data.status = 'valid' THEN 1 ELSE 0 END) AS valid,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid1' THEN 1 ELSE 0 END) AS invalid1,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid2' THEN 1 ELSE 0 END) AS invalid2,
SUM(CASE WHEN data.status IN ('valid', 'invalid', 'invalid2') THEN 1 ELSE 0 END) AS _total
FROM (
SELECT
date,
status,
valid = 1
FROM
Valid
UNION ALL
SELECT
date,
status,
valid = 0
FROM
InValid ) AS data
GROUP BY
date) AS result
SELECT date, ontime, late, ontime+late total, valid, invalid1, invalid2, valid+invalid1+invalid2 total
FROM
(SELECT date,
COUNT(*) late,
COUNT(IIF(status = 'invalid1', 1, NULL)) invalid1,
COUNT(IIF(status = 'invalid2', 1, NULL)) invalid2,
FROM invalid
GROUP BY date
) JOIN (
SELECT date,
COUNT(*) ontime,
COUNT(IIF(status = 'valud', 1, NULL)) valid,
FROM valid
GROUP BY date
) USING (date)
First of all, it seems that you are holding exactly the same information in 2 tables - I would recommend merging those tables together and add an additional boolean column called valid to hold the info related to validity of the record.
The query on your existent DB structure might look something like this:
SELECT unioned.* FROM (
( SELECT v.date AS date, v.status AS status, v.id AS id, COUNT(id) AS valid, 0 AS invalid1, 0 AS invalid2 FROM valid v GROUP BY v.date)
UNION
( SELECT i1.date AS date, i1.status AS status, i1.id AS id, 0 AS valid, COUNT(i1.id) AS invalid1, 0 AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
UNION
( SELECT i2.date AS date, i2.status AS status, i2.id AS id, 0 AS valid, 0 AS invalid1, COUNT(i.id) AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
) AS unioned GROUP BY unioned.date

change rows to columns and count

how to calculate count based on rows?
SOURCE TABLE
each employee can take 2 days off
Employee-----First_Day_Off-----Second_Day_Off
1------------10/21/2009--------12/6/2009
2------------09/3/2009--------12/6/2009
3------------09/3/2009--------NULL
4
5
.
.
.
Now i need a table that shows the dates and number of people taking off on that day
Date---------First_Day_Off-------Second_Day_Off
10/21/2009---1-------------------0
12/06/2009---1--------------------1
09/3/2009----2--------------------0
Any ideas?
Oracle 9i+, using Subquery Factoring (WITH):
WITH sample AS (
SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL)
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM sample s
GROUP BY s.day_off
Non Subquery Version
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM (SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL) s
GROUP BY s.day_off
It is a bit awkward to handle these queries, since you have days off stored in different columns. A better layout would be to have something like
EMPLOYEE_ID DAY_OFF
Then you would have multiple rows if an employee took multiple days off
EMPLOYEE_ID DAY_OFF
1 10/21/2009
1 12/6/2009
2 09/3/2009
2 12/6/2009
3 09/3/2009
...
In that case, you could find out how many days off each person took by using the following query:
SELECT EMPLOYEE_ID, COUNT(*) AS NUM_DAYS_OFF FROM DAYS_OFF_TABLE GROUP BY EMPLOYEE_ID
And the number of people who took days off on each date like this:
SELECT DAY_OFF, COUNT(*) AS NUM_PEOPLE FROM DAYS_OFF_TABLE GROUP BY DAY_OFF
But I digress...
You can try to use an SQL CASE statement to help with this:
SELECT Employee, CASE
WHEN First_Day_Off is NULL AND Second_Day_Off is NULL THEN 0
WHEN First_Day_Off is NOT NULL AND Second_Day_Off is NULL THEN 1
WHEN First_Day_Off is NULL AND Second_Day_Off is NOT NULL THEN 1
ELSE 2
END AS NUM_DAYS_OFF
FROM DAYS_OFF_TABLE
(note that you may need to change around the syntax slightly depending on your database.
Getting dates and number of people who took off on that day might be more complicated.
I don't know if this would work, but you can try it:
SELECT
Date_Off,
COUNT(*) AS Num_People
FROM
(SELECT
First_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE First_Day_Off IS NOT NULL GROUP BY First_Day_Off
UNION
SELECT Second_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE Second_Day_Off IS NOT NULL GROUP BY Second_Day_Off)
GROUP BY
Num_People