This seems like a simple query, but I'm struggling with it.
Here's a sampling of my data.
user_id dated
463 2016-01-01
463 2016-01-02
1456 2016-01-01
1456 2016-01-02
1398 2015-12-01
1398 2015-12-02
I want to get the number of unique users in two different time periods. Here are the queries I want to get a combined output from in a single row, and two columns.
-- 60
SELECT COUNT(DISTINCT(tld.user_id)) count_active_users_60
FROM table tld
WHERE tld.dated BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days')
-- 30
SELECT COUNT(DISTINCT(tld.user_id)) count_active_users_30
FROM table tld
WHERE tld.dated >= NOW() - INTERVAL '30 days'
I'd like an output that looks like this:
count_active_users_60 count_active_users_30
1 2
I've been messing with various CASE statements, and sub-selects, but the distinct clause is throwing me off.
SELECT COUNT(DISTINCT(rar.user_id))
FROM
(
SELECT user_id,
COUNT(CASE WHEN tld.dated BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days') THEN 1 ELSE NULL END) AS count_active_users_60,
COUNT(CASE WHEN tld.dated >= NOW() - INTERVAL '30 days' THEN 1 ELSE NULL END) AS count_active_users_30
FROM testing_login_duration tld
GROUP BY user_id
) rar;
Use conditional aggregation:
SELECT COUNT(DISTINCT CASE WHEN tld.dated BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days')
THEN tld.user_id
END) count_active_users_60,
COUNT(DISTINCT CASE WHEN tld.dated >= NOW() - INTERVAL '30 days'
THEN tld.user_id
END) count_active_users_30
FROM table tld
WHERE tld.dated >= NOW() - INTERVAL '60 days';
Related
Hey Pros,
I am far away to have good knowledge about SQL, and would ask you to give me some hints.
Currently we aggregate our data with python and I would try to switch this when possible to. (SQL (Postgresql server)
My goal is to have one statment that generate an average for two seperates column's for specific time intervals (1 Hour, 1 Day, 1 Week, Overall) also all events in each period shoud be counted.
I can create 4 single statments for each interval but strugle how to combine this four selects into on result set.
select
count(id) as hour_count,
camera_name,
round(avg("pconf")) as hour_p_conf,
round(avg("dconf")) as hour_d_conf
from camera_events where timestamp between NOW() - interval '1 HOUR' and NOW() group by camera_name;
select
count(id) as day_count,
camera_name,
round(avg("pconf")) as day_p_conf,
round(avg("dconf")) as day_d_conf
from camera_events where timestamp between NOW() - interval '1 DAY' and NOW() group by camera_name;
select
count(id) as week_count,
camera_name,
round(avg("pconf")) as week_p_conf,
round(avg("dconf")) as week_d_conf
from camera_events where timestamp between NOW() - interval '1 WEEK' and NOW() group by camera_name;
select
count(id) as overall_count,
camera_name,
round(avg("pconf")) as overall_p_conf,
round(avg("dconf")) as overall_d_conf
from camera_events group by camera_name;
When possbile the result should look like the data on image
Some hints would be great, thank u
Consider conditional aggregation by moving WHERE logic to CASE statements in SELECT. Alternatively, in PostgreSQL use FILTER clauses.
select
camera_name,
count(id) filter(timestamp between NOW() - interval '1 HOUR' and NOW()) as hour_count,
round(avg("pconf") filter(timestamp between NOW() - interval '1 HOUR' and NOW())) as hour_p_conf,
round(avg("dconf") filter(timestamp between NOW() - interval '1 HOUR' and NOW())) as hour_d_conf,
count(id) filter(timestamp between NOW() - interval '1 DAY' and NOW()) as day_count,
round(avg("pconf") filter(timestamp between NOW() - interval '1 DAY' and NOW())) as day_p_conf,
round(avg("dconf") filter(timestamp between NOW() - interval '1 DAY' and NOW())) as day_d_conf,
count(id) filter(timestamp between NOW() - interval '1 WEEK' and NOW()) as week_count,
round(avg("pconf") filter(timestamp between NOW() - interval '1 WEEK' and NOW())) as week_p_conf,
round(avg("dconf") filter(timestamp between NOW() - interval '1 WEEK' and NOW())) as week_d_conf,
count(id) as overall_count,
round(avg("pconf")) as overall_p_conf,
round(avg("dconf")) as overall_d_conf
from camera_events
group by camera_name;
The simplest way is to join them. For example:
select
coalesce(h.camera_name, d.camera_name, w.camera_name) as camera_name
h.hour_count, h.hour_p_conf, h.hour_d_conf
d.day_count, d.day_p_conf, d.day_d_conf
w.week_count, w.week_p_conf, w.week_d_conf
from (
-- hourly query here
) h
full join (
-- daily query here
) d on d.camera_name = h.camera_name
full join (
-- weekly query here
) w on w.camera_name = coalesce(h.camera_name, d.camera_name)
I'm working with PostgreSQL and bookshelf and trying to run a simple SQL query in order to get multiple counts in a single query.
This query look like:
SELECT SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_two_months,
SUM(CASE WHEN date_last_check > (now() - interval '7 DAY') THEN 1 ELSE 0 END) as since_one_week,
SUM(CASE WHEN date_last_check > (now() - interval '1 DAY') THEN 1 ELSE 0 END) as since_one_days
FROM myTable;
It seems impossible to do a CASE statement in a sum() function in bookshelf. I'm tried:
return myTable.query(function(qb:any){
qb.sum("(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_two_months")
})
And this returns the following query:
select sum("(SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END)") as "since_two_months" from "myTable"
This does not work because of the quotes after the sum(").
Does anyone know how to make this work without using a raw query?
I found a poor solution, it's to use knew raw inside the bookshelf query :
return myTable.query(function(qb:any){
qb.select(bookshelf.knex.raw("SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_one_month"));
})
Rather use modern syntax for conditional aggregates: the aggregate FILTER clause:
SELECT count(*) FILTER (WHERE date_last_check > now() - interval '1 month') AS since_two_months -- one_month?
, count(*) FILTER (WHERE date_last_check > now() - interval '7 days') AS since_one_week
, count(*) FILTER (WHERE date_last_check > now() - interval '1 day') AS since_one_day
FROM mytable;
See:
Aggregate columns with additional (distinct) filters
I have created the following query which returns 3 values for 1 day ('20170731'). What I am struggling to figure out is how do I run this query for everyday in series from 30 days ago to 60 days from now and return a row for each day.
SELECT DATE_TRUNC('day', '20170731'::TIMESTAMP),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions
WHERE DATE_TRUNC('day', run) >= '20170731'
AND DATE_TRUNC('day', created_at) <= '20170731'
ORDER BY 1
You can use generate_series() to generate the dates. The idea is:
SELECT gs.dte,
SUM( (state NOT IN ('unsub','skipped', 'error'))::int) AS a,
SUM( (state IN ('unsub') AND DATE_TRUNC('month', unsub_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS b,
SUM( (state IN ('skipped') AND DATE_TRUNC('month', skipped_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS c
FROM subscriptions s CROSS JOIN
generate_series(current_date - interval '30 day',
current_date + interval '60 day',
interval '1 day'
) gs(dte)
WHERE DATE_TRUNC('day', run) >= gs.dte AND
DATE_TRUNC('day', created_at) <= gs.dte
GROUP BY gs.dte
ORDER BY 1;
I switched the query to cast the booleans as integers -- I just find that easier to follow.
See Set Returning Functions. The generate_series function is what you want.
First check this, so you know what it does:
SELECT
*
FROM
generate_series(
'2017-07-31'::TIMESTAMP - INTERVAL '30 days',
'2017-07-31'::TIMESTAMP + INTERVAL '60 days',
INTERVAL '1 day');
Then your query could look something like that:
SELECT DATE_TRUNC('day', stamp),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN stamp AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions,
generate_series('2017-07-31'::TIMESTAMP - INTERVAL '30 days', '2017-07-31'::TIMESTAMP + INTERVAL '60 days', INTERVAL '1 day') AS stamp
WHERE DATE_TRUNC('day', run) >= stamp
AND DATE_TRUNC('day', created_at) <= stamp
ORDER BY 1
Just add generate_series function as you would do with plain input table (alias it AS stamp), JOIN with subscriptions (cartesian product) and use stamp value instead of hard-coded '20170731'.
I am trying to pull daily data using PostgreSQL. My problem is that the day seems to 'reset' at around 5 PM (Los Angeles Time). Is there a workaround this problem? Here is my query:
SELECT COUNT(distinct be.booking_id)AS "Number of Bookings Today"
FROM booking_events be
WHERE be.event IN ('approve') AND
be.created_at >= current_date AND
be.created_at < current_date + interval '1 day';
You can use hours to offset the current date. I think the logic is:
SELECT COUNT(distinct be.booking_id) as "Number of Bookings Today"
FROM booking_events be
WHERE be.event IN ('approve') AND
be.created_at >= current_date - interval '7 hour' AND
be.created_at < current_date + interval '1 day' - interval '7 hour';
I have to list items that have not been updated for a multiple of two years after their last update. This is to run as a cron job once a day.
I know I can do this with something ugly like:
SELECT art_id, art_update FROM items
WHERE art_update = now()::date - interval '2 years'
OR art_update = now()::date - interval '4 years'
OR art_update = now()::date - interval '6 years'
OR art_update = now()::date - interval '8 years'
OR art_update = now()::date - interval '10 years';
Is there any way to avoid this by checking for a modulo interval? Or some other generalised way to express this?
select art_id, art_update
from items
where art_update in (
(now() - interval '2 years')::date,
(now() - interval '4 years')::date,
(now() - interval '6 years')::date,
(now() - interval '8 years')::date,
(now() - interval '10 years')::date
);
or
select art_id, art_update
from items
where art_update in (
select d::date
from generate_series (
now() - interval '2 years',
now() - interval '10 years',
- interval '2 years'
) d(d)
);
http://www.postgresql.org/docs/current/static/functions-srf.html
You can try this.
SELECT art_id, art_update
FROM items
Where int4(date_part('year', art_update)) % 2 = 0;
You can generate a series of dates at 2 year intervals going back from today (to 10 years ago in the below) and join this back to your table:
SELECT i.art_id, i.art_update
FROM items i
INNER JOIN generate_series(2, 10, 2) s (years)
ON i.art_update = now()::date - interval '1 years' * s.years;
Example on SQL Fiddle
N.B This appears to be marginally faster if you generate the dates in the series, rather than numbers:
SELECT i.art_id, i.art_update
FROM items i
INNER JOIN generate_series(now() - interval '10 years',
now() - interval '2 years',
interval '2 years') d (d)
ON art_update = d.d::date;