How I can group by and count in PostgreSQL to prevent empty cells in result - sql

I have the table in PostgreSQL DB
Need to calculate SUM of counts for each event_type (example for 4 and 1)
When I use query like this
SELECT account_id, date,
CASE
WHEN event_type = 1 THEN SUM(count)
ELSE null
END AS shows,
CASE
WHEN event_type = 4 THEN SUM(count)
ELSE null
END AS clicks
FROM widgetstatdaily WHERE account_id = 272 AND event_type = 1 OR event_type = 4 GROUP BY account_id, date, event_type ORDER BY date
I receive this table
With <null> fields. It's because I have event_type in select and I need to GROUP BY on it.
How I can make query to receive grouped by account_id and date result without null's in cells? Like (first row)
272 2018-03-28 00:00:00.000000 57 2
May be I can group it after receiving result

You need conditional aggregation and some other fixes. Try this:
SELECT account_id, date,
SUM(CASE WHEN event_type = 1 THEN count END) as shows,
SUM(CASE WHEN event_type = 4 THEN count END) as clicks
FROM widgetstatdaily
WHERE account_id = 272 AND
event_type IN (1, 4)
GROUP BY account_id, date
ORDER BY date;
Notes:
The CASE expression should be an argument to the SUM().
The ELSE NULL is redundant. The default without an ELSE is NULL.
The logic in the WHERE clause is probably not what you intend. That is fixed using IN.

try its
SELECT account_id, date,
SUM(CASE WHEN event_type = 1 THEN count else 0 END) as shows,
SUM(CASE WHEN event_type = 4 THEN count else 0 END) as clicks
FROM widgetstatdaily
WHERE account_id = 272 AND
event_type IN (1, 4)
GROUP BY account_id, date
ORDER BY date;

Related

Combining multiple rows of data to one row per id

I have a raw data that has multiple dates per category, and I use code case when category = 'referral' then min(date) end as date_referral to get earliest dates of each category per id.
However, it will not return data in a row but create row per category, as such:
id date_entered date_referral date_reply date_final
-------------------------------------------------------------------------
1 2020-12-20 null null null
1 2020-12-20 2020-12-21 null null
1 2020-12-20 null 2020-12-21 null
1 2020-12-20 null null 2020-12-24
I tried enforcing single rows by using distinct or group by (separately and together):
select distinct id
, date_entered
, case when category = 'referral' then min(date) end as date_referral
, case when category = 'reply' then min(date) end as date_reply
, case when category = 'final' then min(date) end as date_final
from data
group by id
, date_entered
, category
but it will keep returning multiple rows, with each row being calculated earliest date per category. I also tried creating cte after this code to select distinct id, date_entered, date_referral, date_reply, date_final from table but that also still returns multiple rows..
How can I combine these rows and make it return one single row?
You should not group by category.
Use conditional aggregation like this:
select id, date_entered,
min(case when category = 'referral' then date end) as date_referral,
min(case when category = 'reply' then date end) as date_reply,
min(case when category = 'final' then date end) as date_final
from data
group by id, date_entered

How to select count of 0s, 1s, and both 0s and 1s in a postgres table column?

Say there's a table that has columns named binary_value, name, and created_at along with the id column.
Here's the SQL Fiddle for this question: http://sqlfiddle.com/#!15/d15d1/36
What would be an efficient query to get a result like the following?
ones_count | zeros_count | total
3 | 1 | 4
So far, I've got:
with cte2(count_type, counted) as (
with cte as (
select binary_value,
sum(case when binary_value = 1 then 1 else 0 end) as ones_count,
sum(case when binary_value = 0 then 1 else 0 end) as zeros_count
from infos
where name = 'me'
and created_at >= '2020-03-10 21:13:01.319677'
and created_at <= '2020-03-10 21:13:01.619677'
group by binary_value
)
select 'ones_count', ones_count from cte where binary_value = 1
union
select 'ones_count', zeros_count from cte where binary_value = 0
union
select 'total', sum(ones_count + zeros_count) as total from cte
)
select * from cte2;
Which gives it in column form:
count_type | counted
ones_count | 1
total | 4
ones_count | 3
How can we get the result in a row? Perhaps there's a different approach altogether than Common Table Expression? I'm starting to look at crosstab, which is postgres-specific, and so wondering if all this is overkill.
Including DDL and data here, too:
create table infos (
id serial primary key,
name character varying not null,
binary_value integer not null,
created_at timestamp without time zone not null
)
insert into infos ("binary_value", "name", "created_at") values
(1, 'me', '2020-03-10 21:13:01.319677'),
(1, 'me', '2020-03-10 21:13:01.419677'),
(0, 'me', '2020-03-10 21:13:01.519677'),
(1, 'me', '2020-03-10 21:13:01.619677');
I think you just want conditional aggregation:
select count(*) filter (where binary_value = 0) as num_0s,
count(*) filter (where binary_value = 1) as num_1s,
count(*)
from infos
where name = 'me' and
created_at >= '2020-03-10 21:13:01.319677' and
created_at <= '2020-03-10 21:13:01.619677';
The date comparison looks rather, uh, specific. I assume that you really intend a range there.
Here is a SQL Fiddle.
Note: If you are really using Postgres 9.3, then you can't use the filter clause (alas). Instead:
select sum( (binary_value = 0)::int ) as num_0s,
sum( (binary_value = 1)::int ) as num_1s,
count(*)
from infos
where name = 'me' and
created_at >= '2020-03-10 21:13:01.319677' and
created_at <= '2020-03-10 21:13:01.619677';
Also, if you wanted the results in three separate rows, a simpler query is:
select binary_value, count(*)
from infos
where name = 'me' and
created_at >= '2020-03-10 21:13:01.319677' and
created_at <= '2020-03-10 21:13:01.619677'
group by grouping sets ( (binary_value), () );
Much simpler:
select
sum(case when binary_value = 1 then 1 else 0 end) as ones_count,
sum(case when binary_value = 0 then 1 else 0 end) as zeroes_count,
count(*) as total
from infos

SQL aggregate function inside an aggregate function

I know it's not possible to nest aggregate functions. But I want to achieve something like this and quite confused about how to do this compromising performance.
SELECT
date,
count(CASE WHEN SUM(active_time) > 5 THEN user_id END) AS total_active_users,
count(CASE WHEN SUM(active_time) > 5 AND is_admin = true THEN user_id END) AS total_active_admin_users
FROM
(
SELECT date, user_id, user_name, active_time, is_admin FROM users
)
GROUP BY date
It's really appreciated if someone could suggest a way to achieve this.
Perhaps you want something like this:
select date,
sum(case when sum_active_time > 5 then 1 else 0 end) as total_active_users,
sum(case when sum_active_time > 5 and is_admin then 1 else 0 end) as total_active_admin_users
from (select u.*, sum(active_time) over (partition by user_id) as sum_active_time
from users
) u
group by date;
However, I would expect user_id to be unique in a table called users. That makes me wonder why you need to do a count or sum at all. So, you might want:
select date,
sum(case when active_time > 5 then 1 else 0 end) as total_active_users,
sum(case when active_time > 5 and is_admin then 1 else 0 end) as total_active_admin_users
from users
group by date;
SELECT date,
COUNT(user_id) as total_active_users,
COUNT(CASE WHEN is_admin = 1 THEN user_id END ) as total_active_admin_users
FROM (
SELECT date, is_admin, user_id
FROM users
GROUP BY date, is_admin, user_id
HAVING SUM(active_time) > 5
) t
GROUP BY date

Funnel query with Amazon Redshift / PostgreSQL

I'm trying to analyze a funnel using event data in Redshift and have difficulties finding an efficient query to extract that data.
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
I would like to extract the funnel statistics. For example:
homepage_count product_page_count checkout_count
-------------- ------------------ --------------
100 50 25
Where homepage_count represent the distinct number of users who visited the homepage, product_page_count represents the distinct numbers of users who visited the homepage after visiting the homepage, and checkout_count represents the number of users who checked out after visiting the homepage and the product page.
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
I think the best method might be to add flags to the data for the first visit of each type for each user and then use these for aggregation logic:
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts.productpage and ts.productpage > ts.homepage then 1 else 0 end) as checkout_count
from (select userid,
min(case when action = 'homepage' then timestamp end) as ts_homepage,
min(case when action = 'product page' then timestamp end) as ts_productpage,
min(case when action = 'checkout' then timestamp end) as ts_checkout
from table t
group by userid
) t
The above answer is very much correct . I have modified it for people using it for AWS Mobile Analytics and Redshift.
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts_productpage and ts_productpage > ts_homepage then 1 else 0 end) as checkout_count
from (select client_id,
min(case when event_type = 'App Launch' then event_timestamp end) as ts_homepage,
min(case when event_type = 'SignUp Success' then event_timestamp end) as ts_productpage,
min(case when event_type = 'Start Quiz' then event_timestamp end) as ts_checkout
from awsma.v_event
group by client_id
) ts;
Just in case more precise model required: when product page can be opened twice. First time before home page and second one after. This case usually should be considered as conversion as well.
Redshift SQL query:
SELECT
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL
THEN user_id END
) Step1,
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL
THEN user_id END
) Step2,
COUNT(
DISTINCT CASE WHEN
cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL AND cur_checkout_time IS NOT NULL
THEN user_id END
) Step3
FROM (
SELECT
user_id,
timestamp,
COALESCE(homepage_time,
LAG(homepage_time) IGNORE NULLS OVER(PARTITION BY user_id
ORDER BY time)
) cur_homepage_time,
COALESCE(productpage_time,
LAG(productpage_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_productpage_time,
COALESCE(checkout_time,
LAG(checkout_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_checkout_time
FROM
(
SELECT
timestamp,
user_id,
(CASE WHEN event = 'homepage'
THEN timestamp END) homepage_time,
(CASE WHEN event = 'product page'
THEN timestamp END) productpage_time,
(CASE WHEN event = 'checkout'
THEN timestamp END) checkout_time
FROM events
WHERE timestamp > '2016-05-01' AND timestamp < '2017-01-01'
ORDER BY user_id, timestamp
) event_times
ORDER BY user_id, timestamp
) event_windows
This query fills each row's cur_homepage_time, cur_productpage_time and cur_checkout_time with recent timestamp of event occurrences. So in case for some specific time (read row) event occured then particular column is not NULL.
More info here.

Multiple Queries in different table

(Also posted here.)
So I have two tables, one is invalid table and the other is valid table.
valid table:
id
status
date
invalid table:
id
status
date
I have to produce a report with this output:
date on-time late total valid invalid1 invalid2 total rate
--------- ------- ---- ----- ----- -------- -------- ----- ----
9/10/2011 4 10 14 3 3 3 6
date: common fields on the 2 tables, field to group by, how many records on that day has
on-time: count of all the id on the valid table
late: count of all the records(id) on the invalid table
total: total of on-time and late
valid: count of id on the valid table with the "valid" status
invalid1: count of id on the invalid table with "invalid1" status
invalid2: count of id on the invalid table with "invalid2" status
total: total of valid, invalid1, invalid2
rate: average of totals
It's basically multiple queries with different table. How can I achieve it?
Someting like this?
SELECT
*,
(result.total + result._total) / 2 AS rate
FROM (
SELECT
date,
SUM(CASE WHEN data.valid = 1 THEN 1 ELSE 0 END) AS ontime,
SUM(CASE WHEN data.valid = 0 THEN 1 ELSE 0 END) AS late,
COUNT(*) AS total,
SUM(CASE WHEN data.valid = 1 AND data.status = 'valid' THEN 1 ELSE 0 END) AS valid,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid1' THEN 1 ELSE 0 END) AS invalid1,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid2' THEN 1 ELSE 0 END) AS invalid2,
SUM(CASE WHEN data.status IN ('valid', 'invalid', 'invalid2') THEN 1 ELSE 0 END) AS _total
FROM (
SELECT
date,
status,
valid = 1
FROM
Valid
UNION ALL
SELECT
date,
status,
valid = 0
FROM
InValid ) AS data
GROUP BY
date) AS result
SELECT date, ontime, late, ontime+late total, valid, invalid1, invalid2, valid+invalid1+invalid2 total
FROM
(SELECT date,
COUNT(*) late,
COUNT(IIF(status = 'invalid1', 1, NULL)) invalid1,
COUNT(IIF(status = 'invalid2', 1, NULL)) invalid2,
FROM invalid
GROUP BY date
) JOIN (
SELECT date,
COUNT(*) ontime,
COUNT(IIF(status = 'valud', 1, NULL)) valid,
FROM valid
GROUP BY date
) USING (date)
First of all, it seems that you are holding exactly the same information in 2 tables - I would recommend merging those tables together and add an additional boolean column called valid to hold the info related to validity of the record.
The query on your existent DB structure might look something like this:
SELECT unioned.* FROM (
( SELECT v.date AS date, v.status AS status, v.id AS id, COUNT(id) AS valid, 0 AS invalid1, 0 AS invalid2 FROM valid v GROUP BY v.date)
UNION
( SELECT i1.date AS date, i1.status AS status, i1.id AS id, 0 AS valid, COUNT(i1.id) AS invalid1, 0 AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
UNION
( SELECT i2.date AS date, i2.status AS status, i2.id AS id, 0 AS valid, 0 AS invalid1, COUNT(i.id) AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
) AS unioned GROUP BY unioned.date