getting first column blank postgres - sql

SELECT CASE WHEN date_part('hour',created_at) BETWEEN 3 AND 15 THEN '9am-3pm'
WHEN date_part('hour',created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END "time window",COUNT(*) FROM tickets where created_at < now()
GROUP BY CASE WHEN date_part('hour',created_at) BETWEEN 3 AND 15 THEN '9am-3pm' WHEN date_part('hour',created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END;
time window | count
-------------+-------
| 6
9am-3pm | 69
is it possible to filter it by date along with time so that my result set will looks like
Date | time window | count
------------+-------------+-------
12-01-2020 | 9am-3pm| 6
12-01-2020 | 3pm-6pm| 69
13-01-2020 | 9am-3pm| 12
13-01-2020 | 3pm-6pm| 14

We can handle this requirement using a calendar table approach:
WITH dates AS (
SELECT '12-01-2020' AS created_at UNION ALL
SELECT '13-01-2020'
),
tw AS (
SELECT '9am-3pm' AS "time window" UNION ALL
SELECT '3pm-6pm'
),
cte AS (
SELECT
created_at::date AS created_at,
CASE WHEN DATE_PART('hour', created_at) BETWEEN 3 AND 15 THEN '9am-3pm'
WHEN DATE_PART('hour', created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END "time window",
COUNT(*) AS cnt
FROM tickets
WHERE created_at < NOW()
GROUP BY 1, 2
)
SELECT
d.created_at,
tw."time window",
COALESCE(t.cnt, 0) AS count
FROM dates d
CROSS JOIN tw
LEFT JOIN cte t
ON d.created_at = t.created_at AND tw."time window" = t."time window"
ORDER BY
d.dt,
tw."time window";

You are actually asking two questions:
The "empty space" (really an SQL NULL) is there because there are dates that do not fall within any of the time ranges. You can exclude them with an additional WHERE condition.
To get the date part as well, add
CAST (created_at AS date)
to the SELECT list and the GROUP BY ckause.

Related

SQL 30 day active user query

I have a table of users and how many events they fired on a given date:
DATE
USERID
EVENTS
2021-08-27
1
5
2021-07-25
1
7
2021-07-23
2
3
2021-07-20
3
9
2021-06-22
1
9
2021-05-05
1
4
2021-05-05
2
2
2021-05-05
3
6
2021-05-05
4
8
2021-05-05
5
1
I want to create a table showing number of active users for each date with active user being defined as someone who has fired an event on the given date or in any of the preceding 30 days.
DATE
ACTIVE_USERS
2021-08-27
1
2021-07-25
3
2021-07-23
2
2021-07-20
2
2021-06-22
1
2021-05-05
5
I tried the following query which returned only the users who were active on the specified date:
SELECT COUNT(DISTINCT USERID), DATE
FROM table
WHERE DATE >= (CURRENT_DATE() - interval '30 days')
GROUP BY 2 ORDER BY 2 DESC;
I also tried using a window function with rows between but seems to end up getting the same result:
SELECT
DATE,
SUM(ACTIVE_USERS) AS ACTIVE_USERS
FROM
(
SELECT
DATE,
CASE
WHEN SUM(EVENTS) OVER (PARTITION BY USERID ORDER BY DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) >= 1 THEN 1
ELSE 0
END AS ACTIVE_USERS
FROM table
)
GROUP BY 1
ORDER BY 1
I'm using SQL:ANSI on Snowflake. Any suggestions would be much appreciated.
This is tricky to do as window functions -- because count(distinct) is not permitted. You can use a self-join:
select t1.date, count(distinct t2.userid)
from table t join
table t2
on t2.date <= t.date and
t2.date > t.date - interval '30 day'
group by t1.date;
However, that can be expensive. One solution is to "unpivot" the data. That is, do an incremental count per user of going "in" and "out" of active states and then do a cumulative sum:
with d as ( -- calculate the dates with "ins" and "outs"
select user, date, +1 as inc
from table
union all
select user, date + interval '30 day', -1 as inc
from table
),
d2 as ( -- accumulate to get the net actives per day
select date, user, sum(inc) as change_on_day,
sum(sum(inc)) over (partition by user order by date) as running_inc
from d
group by date, user
),
d3 as ( -- summarize into active periods
select user, min(date) as start_date, max(date) as end_date
from (select d2.*,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by user order by date) as active_period
from d2
) d2
where running_inc > 0
group by user
)
select d.date, count(d3.user)
from (select distinct date from table) d left join
d3
on d.date >= start_date and d.date < end_date
group by d.date;

Finding multiple consecutive dates (datetime) in Ruby on Rails / Postgresql

How can we find X consecutive dates (using by hour) that meet a condition?
EDIT: here is the SQL fiddle http://sqlfiddle.com/#!17/44928/1
Example:
Find 3 consecutive dates where aa < 2 and bb < 6 and cc < 7
Given this table called weather:
timestamp
aa
bb
cc
01/01/2000 00:00
1
5
5
01/01/2000 01:00
5
5
5
01/01/2000 02:00
1
5
5
01/01/2000 03:00
1
5
5
01/01/2000 04:00
1
5
5
01/01/2000 05:00
1
5
5
Answer should return the 3 records from 02:00, 03:00, 04:00.
How can we do this in Ruby on Rails - or directly in SQL if that is better?
I started working on a method based on this answer:
Detect consecutive dates ranges using SQL
def consecutive_dates
the_query = "WITH t AS (
SELECT timestamp d,ROW_NUMBER() OVER(ORDER BY timestamp) i
FROM #d
GROUP BY timestamp
)
SELECT MIN(d),MAX(d)
FROM t
GROUP BY DATEDIFF(hour,i,d)"
ActiveRecord::Base.connection.execute(the_query)
end
But I was unable to get it working.
Assuming that you have one row every hour, then an easy way to get the first hour where this occurs uses lead():
select t.*
from (select t.*,
lead(timestamp, 2) over (order by timestamp) as timestamp_2
from t
where aa < 2 and bb < 6 and cc < 7
) t
where timestamp_2 = timestamp + interval '2 hour';
This filters on the conditions and looks at the rows two rows ahead. If it is two hours ahead, then three rows in a row match the conditions. Note: The above will return both 2020-01-01 02:00 and 2020-01-01 03:00.
From your question you only seem to want the earliest. To handle that, use lag() as well:
select t.*
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
lead(timestamp, 2) over (order by timestamp) as timestamp_2
from t
where aa < 2 and bb < 6 and cc < 7
) t
where timestamp_2 = timestamp + interval '2 hour' and
(prev_timestamp is null or prev_timestamp < timestamp - interval '1' hour);
You can generate the additional hours use generate_series() if you really need the original rows:
select t.timestamp + n.n * interval '1 hour', aa, bb, cc
from (select t.*,
lead(timestamp, 2) over (order by timestamp) as timestamp_2
from t
where aa < 2 and bb < 6 and cc < 7
) t cross join lateral
generate_series(0, 2) n
where timestamp_2 = timestamp + interval '2 hour';
Your data seems to have precise timestamps based on the question, so the timestamp equalities will work. If the real data has more fuzziness, then the queries can be tweaked to take this into account.
)This is a gaps-and-islands problem. Islands are adjacent records that match the condition, and you want islands that are at least 3 records long.
Here is one approach that uses a window count that increments every time value that does not match the condition is met to define the groups. We can then count how many rows there are in each group, and use that information to filter.
select *
from (
select t.*, count(*) over(partition by a, grp) cnt
from (
select t.*,
count(*) filter(where b <= 4) over(partition by a order by timestamp) grp
from mytable t
) t
) t
where cnt >= 3

SQL not returning a value if no row exist for time queried

I'm writing this SQL query which returns the number of records created in an hour in last 24 hours. I'm getting the result for only those hours that have a non zero value. If no records were created, it doesn't return anything at all.
Here's my query:
SELECT HOUR(timeStamp) as hour, COUNT(*) as count
FROM `events`
WHERE timeStamp > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY HOUR(timeStamp)
ORDER BY HOUR(timeStamp)
The output of current Query:
+-----------------+----------+
| hour | count |
+-----------------+----------+
| 14 | 6 |
| 15 | 5 |
+-----------------+----------+
But i'm expecting 0 for hours in which no records were created. Where am I going wrong?
One solution is to generate a table of numbers from 0 to 23 and left join it with your original table.
Here is a query that uses a recursive query to generate the list of hours (if you are running MySQL, this requires version 8.0):
with hours as (
select 0 hr
union all select hr + 1 where h < 23
)
select h.hr, count(e.eventID) as cnt
from hours h
left join events e
on e.timestamp > now() - interval 1 day
and hour(e.timestamp) = h.hr
group by h.hr
If your RDBMS does not support recursive CTEs, then one option is to use an explicit derived table:
select h.hr, count(e.eventID) as cnt
from (
select 0 hr union all select 1 union all select 2 ... union all select 23
) h
left join events e
on e.timestamp > now() - interval 1 day
and hour(e.timestamp) = h.hr
group by h.hr

Add Missing monthly dates in a timeseries data in Postgresql

I have monthly time series data in table where dates are as a last day of month. Some of the dates are missing in the data. I want to insert those dates and put zero value for other attributes.
Table is as follows:
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-04-30 34
2 2014-05-31 45
2 2014-08-31 47
I want to convert this table to
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-03-31 0
1 2015-04-30 34
2 2014-05-31 45
2 2014-06-30 0
2 2014-07-31 0
2 2014-08-31 47
Is there any way we can do this in Postgresql?
Currently we are doing this in Python. As our data is growing day by day and its not efficient to handle I/O just for one task.
Thank you
You can do this using generate_series() to generate the dates and then left join to bring in the values:
with m as (
select id, min(report_date) as minrd, max(report_date) as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) as report_date
from m
) m left join
t
on m.report_date = t.report_date;
EDIT:
Turns out that the above doesn't quite work, because adding months to the end of month doesn't keep the last day of the month.
This is easily fixed:
with t as (
select 1 as id, date '2012-01-31' as report_date, 10 as price union all
select 1 as id, date '2012-04-30', 20
), m as (
select id, min(report_date) - interval '1 day' as minrd, max(report_date) - interval '1 day' as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) + interval '1 day' as report_date
from m
) m left join
t
on m.report_date = t.report_date;
The first CTE is just to generate sample data.
This is a slight improvement over Gordon's query which fails to get the last date of a month in some cases.
Essentially you generate all the month end dates between the min and max date for each id (using generate_series) and left join on this generated table to show the missing dates with 0 price.
with minmax as (
select id, min(report_date) as mindt, max(report_date) as maxdt
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select *,
generate_series(date_trunc('MONTH',mindt+interval '1' day),
date_trunc('MONTH',maxdt+interval '1' day),
interval '1' month) - interval '1 day' as report_date
from minmax
) m
left join t on m.report_date = t.report_date
Sample Demo

How to GROUP BY several days in PostgreSQL?

The following code generates dates and counts records by day.
SELECT ts, COUNT(DISTINCT(user_id)) FROM
( SELECT current_date + s.ts FROM generate_series(-20,0,1) AS s(ts) )
AS series(ts)
LEFT JOIN messages
ON messages.created_at::date = ts
GROUP BY ts
ORDER BY ts
The output looks like:
2011-07-07 0
2011-07-08 0
2011-07-09 0
2011-07-10 0
2011-07-11 0
2011-07-12 94
2011-07-13 56
2011-07-14 35
2011-07-15 56
2011-07-16 0
2011-07-17 13
How would you modify it to group by 2 days, so that the results overlap? Instead of counting the distinct user_id's for each day, it would count the distinct user_id's for each 2 day period.
This is different from summing the counts of the 2 days, as the user_id should be counted only once for each 2 day period.
Working in PostgreSQL 8.3.
Thanks.
SELECT ts, COUNT(DISTINCT(user_id)) FROM
( SELECT current_date + s.ts FROM generate_series(-20,0,1) AS s(ts) )
AS series(ts)
LEFT JOIN messages
ON messages.created_at::date between ts - 1 and ts -- JOIN on a range
GROUP BY ts
ORDER BY ts
Try this:
SELECT ts, COUNT(DISTINCT(user_id))
FROM
( SELECT current_date + s.ts
FROM generate_series(-20,0,2) AS s(ts) ) AS series(ts)
LEFT JOIN messages
ON messages.created_at::date = ts or messages.created_at::date = ts + 1
GROUP BY ts
ORDER BY ts