PostgreSQL: sum of days in range - sql

I have a table as follows:
Table "public.fish_schedule"
Column | Type | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
name | character varying(255) | | not null |
tr | timerange | | not null |
mr | int4range | | not null |
Example data:
name | tr | mr
--------------+---------------------+--------
ray | [04:00:00,21:00:00) | [8,12)
moray eel | [00:00:00,24:00:00) | [8,11)
yellow perch | [00:00:00,24:00:00) | [1,4)
(3 rows)
The field mr represents the month range. I would like to add the total number of days in the range of, for example, moray eel, which would be from August to October.
I've only managed to get the following SQL working so far, and haven't got the faintest idea how to write a function to do what I need.
SELECT generate_series(1,12) AS n,
generate_series('2020-01-01'::date,'2020-12-01'::date,'1 month'::interval)+ '1 month'::interval - generate_series('2020-01-01'::date,'2020-12-01'::date,'1 month'::interval) as m;
Here's the output.
n | m
----+---------
1 | 31 days
2 | 29 days
3 | 31 days
4 | 30 days
5 | 31 days
6 | 30 days
7 | 31 days
8 | 31 days
9 | 30 days
10 | 31 days
11 | 30 days
12 | 31 days
(12 rows)
So, the function would have to add up the days in August(31), September(30), and October(31), based on the range in the mr field.
Would appreciate any guidance or pointers.
UPDATE: Here is the solution for the curious.
WITH feeding(name, the_hours, start_schedule, end_schedule) AS
(SELECT name,
EXTRACT(HOUR FROM upper(tr)-lower(tr)),
make_date(extract(year from now())::int4,lower(mr),1)::timestamp,
make_date(extract(year from now())::int4,upper(mr)-1,1)::timestamp
+ interval '1 month' - interval '1 day'
from fish_schedule
)
SELECT name, SUM(the_hours * (EXTRACT (days from (end_schedule - start_schedule)) + 1)) "total_hours"
FROM feeding
GROUP by name
ORDER by total_hours;

You do not need the number of days in a month since your for any given row your time frame in contiguous. The following converts the MT column to timestamps from the 1st day of lower value of range the last date of the upper value of the range in a CTE. The main part the then extracts number of days from the difference.
with feeding( name, start_schedule, end_schedule) as
( select name
, make_date(extract(year from now())::int4,lower(mr),1)::timestamp
, make_date(extract(year from now())::int4,upper(mr),1)::timestamp
+ interval '1 month' - interval '1 day'
from fish_schedule
)
select name, extract(days from (end_schedule - start_schedule)) + 1 "# of days"
from feeding;
Note: A slight matter of opinion here. The "interval '1 day' and the addition of 1 in the main can be eliminated and still produce the same result. IMO the above more clearly the intent. But not having them makes the query slightly shorter and 'infinitesimally' faster.
PS. It also handles #Vesa point on leap year.

To get the number of days in a year, you could do:
select sum(d) from (
SELECT
date_part('month',generate_series) as n,
generate_series as startOfMonth,
date_trunc('month',generate_series) + '1 month'::interval - '1 day'::interval as endOfMonth,
date_part('days', date_trunc('month',generate_series) + '1 month'::interval - '1 day'::interval) as d
FROM generate_series('2020-01-01'::date,'2020-12-01'::date,'1 month'::interval)
) x
;
This returns 366 days, the inner-query returns:
n | startofmonth | endofmonth | d
----+------------------------+------------------------+----
1 | 2020-01-01 00:00:00+01 | 2020-01-31 00:00:00+01 | 31
2 | 2020-02-01 00:00:00+01 | 2020-02-29 00:00:00+01 | 29
3 | 2020-03-01 00:00:00+01 | 2020-03-31 00:00:00+02 | 31
4 | 2020-04-01 00:00:00+02 | 2020-04-30 00:00:00+02 | 30
5 | 2020-05-01 00:00:00+02 | 2020-05-31 00:00:00+02 | 31
6 | 2020-06-01 00:00:00+02 | 2020-06-30 00:00:00+02 | 30
7 | 2020-07-01 00:00:00+02 | 2020-07-31 00:00:00+02 | 31
8 | 2020-08-01 00:00:00+02 | 2020-08-31 00:00:00+02 | 31
9 | 2020-09-01 00:00:00+02 | 2020-09-30 00:00:00+02 | 30
10 | 2020-10-01 00:00:00+02 | 2020-10-31 00:00:00+01 | 31
11 | 2020-11-01 00:00:00+01 | 2020-11-30 00:00:00+01 | 30
12 | 2020-12-01 00:00:00+01 | 2020-12-31 00:00:00+01 | 31
(12 rows)
I hope this does help to change your query to get the correct results.

SELECT
name,
to_date('2020-' || upper(mr) || '-01', 'yyyy-mm-dd')
- to_date ('2020-' || lower(mr) || '-01', 'yyyy-mm-dd')
FROM
fish_schedule;
Do you have to be careful on leap days? 2020 is a leap year.

Related

Calculating values from the same table but a different period

I have a readings table with the following definition.
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('readings_id_seq'::regclass)
created_at | timestamp without time zone | | not null |
type | character varying(50) | | not null |
device | character varying(100) | | not null |
value | numeric | | not null |
It has data such as:
id | created_at | type | device | value
----+---------------------+--------+--------+-------
1 | 2021-05-11 04:00:00 | weight | 1 | 100
2 | 2021-05-10 03:00:00 | weight | 2 | 120
3 | 2021-05-10 04:00:00 | weight | 1 | 120
4 | 2021-05-10 03:00:00 | weight | 1 | 124
5 | 2021-05-01 22:43:47 | weight | 1 | 130
6 | 2021-05-01 15:00:48 | weight | 1 | 140
7 | 2021-05-01 13:00:48 | weight | 2 | 160
Desired Output
Given a device and a type, I would like the max and min value from the past 7 days for each matched row (active row ignored). If there's nothing in the past 7 days, then it should be 0.
id | created_at | type | device | value | min | max
----+---------------------+--------+--------+-------+-----+-----
1 | 2021-05-11 03:09:47 | weight | 1 | 100 | 120 | 124
3 | 2021-05-10 04:00:00 | weight | 1 | 120 | 124 | 124
4 | 2021-05-10 03:00:00 | weight | 1 | 124 | 0 | 0
5 | 2021-05-01 22:43:47 | weight | 1 | 130 | 140 | 140
6 | 2021-05-01 15:00:48 | weight | 1 | 140 | 0 | 0
I have created a db-fiddle.
You can use lateral left join for your requirement like below:
select
t1.id,
t1.created_at,
t1.type,
t1.device,
t1.value,
min(coalesce(t2.value,0)),
max(coalesce(t2.value,0))
from
readings t1
left join lateral
( select *
from readings
where id!=t1.id and created_at between t1.created_at- interval '7 day' and t1.created_at and device=t1.device and t1.type=type
) t2 on true
where t1.device='1' -- Change the device
and t1.type='weight' -- Change the type
group by 1,2,3,4,5
order by 1
DEMO
Considering the comments, here is your PSQL:
select readings.id, readings.type, readings.device, readings.created_at, readings.value,
min(COALESCE(m_readings.value,0)) min, max(COALESCE(m_readings.value,0)) max
from readings LEFT JOIN readings m_readings
ON m_readings.type =readings.type
AND m_readings.device =readings.device
AND m_readings.id > readings.id
AND date( m_readings.created_at) between (date(readings.created_at)-7) and date(readings.created_at)
group by readings.id, readings.type, readings.device, readings.created_at, readings.value
order by readings.id;
Explanations: We make a LEFT junction between each records of readings and the others records of readings which are of the same type and device but not the same id, keeping only the last 7 days records. Then for each type/devicewe are grouping to get max and min valueon this 7 days.
You should be using window functions for this!
select r.*,
max(value) over (partition by device, type
order by created_at
range between interval '7 day' preceding and interval '1 second' preceding
),
min(value) over (partition by device, type
order by created_at
range between interval '7 day' preceding and interval '1 second' preceding
)
from readings r;
The above returns NULL values when there are no values -- and that makes more sense to me than 0. But if you really want 0, just use COALESCE():
select r.*,
coalesce(max(value) over (partition by device, type
order by created_at
range between interval '7 day' preceding and interval '1 second' preceding
), 0),
coalesce(min(value) over (partition by device, type
order by created_at
range between interval '7 day' preceding and interval '1 second' preceding
), 0)
from readings r;
In addition to being more concise, this is easier to read and should have better performance than other methods.

Generating counts of open tickets over time, given opened and closed dates

I have a set of data for some tickets, with datetime of when they were opened and closed (or NULL if they are still open).
+------------------+------------------+
| opened_on | closed_on |
+------------------+------------------+
| 2019-09-01 17:00 | 2020-01-01 13:37 |
| 2020-04-14 11:00 | 2020-05-14 14:19 |
| 2020-03-09 10:00 | NULL |
+------------------+------------------+
We would like to generate a table of data showing the total count of tickets that were open through time, grouped by date. Something like the following:
+------------------+------------------+
| date | num_open |
+------------------+------------------+
| 2019-09-01 00:00 | 1 |
| 2020-09-02 00:00 | 1 |
| etc... | |
| 2020-01-01 00:00 | 0 |
| 2020-01-02 00:00 | 0 |
| etc... | |
| 2020-03-08 00:00 | 0 |
| 2020-03-09 00:00 | 1 |
| etc... | |
| 2020-04-14 00:00 | 2 |
+------------------+------------------+
Note that I am not sure about how the num_open is considered for a given date - should it be considered from the point of view of the end of the date or the start of it i.e. if one opened and closed on the same date, should that count as 0?
This is in Postgres, so I thought about using window functions for this, but trying to truncate by the date is making it complex. I have tried using a generate_series function to create the date series to join onto, but when I use the aggregate functions, I've "lost" access to the individual ticket datetimes.
You can use generate_series() to build the list of dates, and then a left join on inequality conditions to bring the table:
select s.dt, count(t.opened_on) num_open
from generate_series(date '2019-09-01', date '2020-09-01', '1 day') s(dt)
left join mytable t
on s.dt >= t.opened_on and s.dt < coalesce(t.closed_on, 'infinity')
group by s.dt
Actually, this seems a bit closer to what you want:
select s.dt, count(t.opened_on) num_open
from generate_series(date '2019-09-01', date '2020-09-01', '1 day') s(dt)
left join mytable t
on s.dt >= t.opened_on::date and s.dt < coalesce(t.closed_on::date, 'infinity')
group by s.dt

How to group date by week in PostgreSQL?

I have pretty simple table which has 2 column. First one show time (timestamp), the second one show speed of car at that time (float8).
| DATE_TIME | SPEED |
|---------------------|-------|
| 2018-11-09 00:00:00 | 256 |
| 2018-11-09 01:00:00 | 659 |
| 2018-11-09 02:00:00 | 256 |
| other dates | xxx |
| 2018-11-21 21:00:00 | 651 |
| 2018-11-21 22:00:00 | 515 |
| 2018-11-21 23:00:00 | 849 |
Lets say we have period from 9 november to 21 november. How to group that period by week. In fact I want such result:
| DATE_TIME | AVG_SPEED |
|---------------------|-----------|
| 9-11 November | XXX |
| 12-18 November | YYY |
| 19-21 November | ZZZ |
I use PostgreSQL 10.4.
I use such SQL Statement to know the number of the week of the certain date:
SELECT EXTRACT(WEEK FROM TIMESTAMP '2018-11-09 00:00:00');
EDIT:
#tim-biegeleisen when I set period from '2018-11-01' to '2018-11-13' your sql statement return 2 result:
In fact I need such result:
2018-11-01 00:00:00 | 2018-11-04 23:00:00
2018-11-05 00:00:00 | 2018-11-11 23:00:00
2018-11-12 00:00:00 | 2018-11-13 05:00:00
As you can see in the calendar there are 3 week in that period.
We can do this using a calendar table. This answer assumes that a week begins with the first date in your data set. You could also do this assuming something else, e.g. a standard week according to something else.
WITH dates AS (
SELECT date_trunc('day', dd)::date AS dt
FROM generate_series
( '2018-11-09'::timestamp
, '2018-11-21'::timestamp
, '1 day'::interval) dd
),
cte AS (
SELECT t1.dt, t2.DATE_TIME, t2.SPEED,
EXTRACT(week from t1.dt) week
FROM dates t1
LEFT JOIN yourTable t2
ON t1.dt = t2.DATE_TIME::date
)
SELECT
MIN(dt)::text || '-' || MAX(dt) AS DATE_TIME,
AVG(SPEED) AS AVG_SPEED
FROM cte
GROUP BY
week
ORDER BY
MIN(dt);
Demo

Split rows on different days if summing hours value to given day exceeds midnight

I have a structure like this
+-----+-----+------------+----------+------+----------------------+---+
| Row | id | date | time | hour | description | |
+-----+-----+------------+----------+------+----------------------+---+
| 1 | foo | 2018-03-02 | 19:00:00 | 8 | across single day | |
| 2 | bar | 2018-03-02 | 23:00:00 | 1 | end at midnight | |
| 3 | qux | 2018-03-02 | 10:00:00 | 3 | inside single day | |
| 4 | quz | 2018-03-02 | 23:15:00 | 2 | with minutes | |
+-----+-----+------------+----------+------+----------------------+---+
(I added the description column only to understand the context, for analysis purpose is useless)
Here is the statement to generate table
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
UNION ALL
SELECT "quz", CURRENT_dATE(), TIME(23,15,0), 2
)
SELECT * FROM table
Adding the hour value to the given time, I need to split the row on multiple ones, if the sum goes on the next day.
Jumps on multiple days are NOT to be considered, like +27 hours (this should simplify the scenario)
My initial idea was starting from adding the hours value in a date field, in order to obtain start and end limits of the interval
SELECT
id,
DATETIME(date, time) AS date_start,
DATETIME_ADD(DATETIME(date, time), INTERVAL hour HOUR) AS date_end
FROM table
here is the result
+-----+-----+---------------------+---------------------+---+
| Row | id | date_start | date_end | |
+-----+-----+---------------------+---------------------+---+
| 1 | foo | 2018-03-02T19:00:00 | 2018-03-03T03:00:00 | |
| 2 | bar | 2018-03-02T23:00:00 | 2018-03-03T00:00:00 | |
| 3 | qux | 2018-03-02T10:00:00 | 2018-03-02T13:00:00 | |
| 4 | quz | 2018-03-02T23:15:00 | 2018-03-03T01:15:00 | |
+-----+-----+---------------------+---------------------+---+
but now I'm stuck on how to proceed considering the existing interval.
Starting from this table, the rows should be splitted if the day change, like
+-----+-----+------------+-------------+----------+-------+--+
| Row | id | date | hourt_start | hour_end | hours | |
+-----+-----+------------+-------------+----------+-------+--+
| 1 | foo | 2018-03-02 | 19:00:00 | 00:00:00 | 5 | |
| 2 | foo | 2018-03-03 | 00:00:00 | 03:00:00 | 3 | |
| 3 | bar | 2018-03-02 | 23:00:00 | 00:00:00 | 1 | |
| 4 | qux | 2018-03-02 | 10:00:00 | 13:00:00 | 3 | |
| 5 | quz | 2018-03-02 | 23:15:00 | 00:00:00 | 0.75 | |
| 6 | quz | 2018-03-03 | 00:00:00 | 01:15:00 | 1.25 | |
+-----+-----+------------+-------------+----------+-------+--+
I tried to study a similar scenario from an already analyzed scenario, but I was unable to adapt it for handling the day component as well.
My whole final scenario will include both this approach and the other one analyzed in the other question (split on single days and then split on given breaks of hours), but I can approach these 2 themes separately, first query split with day (this question) and then split on time breaks (other question)
Interesting problem ... I tried the following:
Create a second table creating all the new rows starting at midnight
UNION ALL it with source table while correcting hours of old rows accordingly
Commented Result:
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
)
,table2 AS (
SELECT
id,
-- create datetime, add hours, then cast as date again
CAST( datetime_add( datetime(date, time), INTERVAL hour HOUR) AS date) date,
time(0,0,0) AS time -- losing minutes and seconds
-- substract hours to midnight
,hour - (24-EXTRACT(HOUR FROM time)) hour
FROM
table
WHERE
date != CAST( datetime_add( datetime(date,time), INTERVAL hour HOUR) AS date) )
SELECT
id
,date
,time
-- correct hour if midnight split
,IF(EXTRACT(hour from time)+hour > 24,24-EXTRACT(hour from time),hour) hour
FROM
table
UNION ALL
SELECT
*
FROM
table2
Hope, it makes sense.
Of course, if you need to consider jumps over multiple days, the correction fails :)
Here a possibile solution I came up starting from #Martin Weitzmann approach.
I used 2 different ways:
ids where there is a "jump" on the day
ids which are in the same day
and a final UNION ALL of the two data
I forgot to mention the first time that the hours value of the input value can be float (portion of hours) so I added that too.
#standardSQL
WITH
input AS (
-- change of day
SELECT "bap" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time, 8.0 AS hour UNION ALL
-- end at midnight
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1.0 UNION ALL
-- inside single day
SELECT "foo", CURRENT_dATE(), TIME(10,0,0), 3.0 UNION ALL
-- change of day with minutes and float hours
SELECT "qux", CURRENT_dATE(), TIME(23,15,0), 2.5 UNION ALL
-- start from midnight
SELECT "quz",CURRENT_dATE(), TIME(0,0,0), 4.5
),
-- Calculate end_date and end_time summing hours value
table AS (
SELECT
id,
date AS start_date,
time AS start_time,
EXTRACT(DATE FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_date,
EXTRACT(TIME FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_time
FROM input
),
-- portion that start from start_time and end at midnight
start_to_midnight AS (
SELECT
id,
start_time,
start_date,
TIME(23,59,59) as end_time,
start_date as end_date
FROM
table
WHERE end_date > start_date
),
-- portion that start from midnightand end at end_time
midnight_to_end AS (
SELECT
id,
TIME(0,0,0) as start_time,
end_date as start_date,
end_time,
end_date
FROM
table
WHERE
end_date > start_date
-- Avoid rows that starts from 0:0:0 and ends to 0:0:0 (original row ends at 0:0:0)
AND end_time != TIME(0,0,0)
)
-- Union of the 3 tables
SELECT
id,
start_date,
start_time,
end_time
FROM (
SELECT id, start_time, end_time, start_date FROM table WHERE start_date = end_date
UNION ALL
SELECT id, start_time, end_time, start_date FROM start_to_midnight
UNION ALL
SELECT id, start_time, end_time, start_date FROM midnight_to_end
)
ORDER BY id,start_date,start_time
Here is the provided output
+-----+-----+------------+------------+----------+---+
| Row | id | start_date | start_time | end_time | |
+-----+-----+------------+------------+----------+---+
| 1 | bap | 2018-03-03 | 19:00:00 | 23:59:59 | |
| 2 | bap | 2018-03-04 | 00:00:00 | 03:00:00 | |
| 3 | bar | 2018-03-03 | 23:00:00 | 23:59:59 | |
| 4 | foo | 2018-03-03 | 10:00:00 | 13:00:00 | |
| 5 | qux | 2018-03-03 | 23:15:00 | 23:59:59 | |
| 6 | qux | 2018-03-04 | 00:00:00 | 01:45:00 | |
| 7 | quz | 2018-03-03 | 00:00:00 | 04:30:00 | |
+-----+-----+------------+------------+----------+---+

Transform log table into another table with values spread over every month

I have a table with rows consisting of values and timestamps. A row is inserted whenever the value has changed, and the timestamp indicates when the change has happened. For example, something like this:
id | value | timestamp
-----+-------+----------------------------
1 | 736 | 2014-03-18 16:38:22.20548
2 | 531 | 2014-06-18 16:38:22.664324
3 | 24 | 2014-07-18 16:38:22.980137
4 | 530 | 2014-09-22 10:01:36.13856
5 | 529 | 2014-09-23 10:01:27.202026
I need a query in Postgresql which generates a table with one row for every month of the year. Every row has a timestamp (first day of the month) and a value. The value is the last value of the first table which was inserted before the beginning of the given month. The value should be 0 if there are no matching rows in the first table. Something like this:
id | value | timestamp
-----+-------+----------------------------
1 | 0 | 2014-01-01 00:00:00.000000
2 | 0 | 2014-02-01 00:00:00.000000
3 | 0 | 2014-03-01 00:00:00.000000
4 | 736 | 2014-04-01 00:00:00.000000
5 | 736 | 2014-05-01 00:00:00.000000
6 | 736 | 2014-06-01 00:00:00.000000
7 | 531 | 2014-07-01 00:00:00.000000
8 | 24 | 2014-08-01 00:00:00.000000
9 | 24 | 2014-09-01 00:00:00.000000
10 | 529 | 2014-10-01 00:00:00.000000
11 | 529 | 2014-11-01 00:00:00.000000
12 | 529 | 2014-12-01 00:00:00.000000
I tried for a while, but I didn't manage to get the full result. I guess I need to generate a list of months like this.
SELECT
*
FROM
generate_series('2014-01-01 00:00'::timestamp, now(), '1 month') AS months
And then do something like this to get the last occurrency before a month:
SELECT
*
FROM first_table
WHERE timestamp < --current_month_selection--
ORDER BY timestamp desc
LIMIT 1;
I guess one needs an OUTER JOIN and a CASE conditional...
Unfortunately I didn't manage to put it all together. Can somebody help me?
Actually, I think I solved this by myself, it was quite trivial:
SELECT
month,
COALESCE((SELECT value
FROM first_table
WHERE timestamp < month
ORDER BY timestamp DESC
LIMIT 1),0)
FROM
generate_series('2014-01-01 00:00'::timestamp, now(), '1 month') AS month