Summing counts based on overlapping intervals in postgres - sql

I want to sum the column for every two minute interval (so it would be the sum of 1,2 and 2,3 and 3,4, etc...), but I'm not exactly sure how to go about doing that.
My data looks something like:
minute | source | count
2018-01-01 10:00 | a | 7
2018-01-01 10:01 | a | 5
2018-01-01 10:02 | a | 10
2018-01-01 10:00 | b | 20
2018-01-01 10:05 | a | 12
What I want
(e.g. row1+row2, row2+3, row3, row4, row5)
minute | source | count
2018-01-01 10:00 | a | 12
2018-01-01 10:01 | a | 15
2018-01-01 10:02 | a | 10
2018-01-01 10:00 | b | 20
2018-01-01 10:05 | a | 12

You can use a correlated subquery selecting the sum of the counts for the records in the interval sharing the source (I guess that the source must match is an requirement. If not, just remove the comparison in the WHERE clause.).
SELECT "t1"."minute",
"t1"."source",
(SELECT sum("t2"."count")
FROM "elbat" "t2"
WHERE "t2"."source" = "t1"."source"
AND "t2"."minute" >= "t1"."minute"
AND "t2"."minute" <= "t1"."minute" + INTERVAL '1 MINUTE') "count"
FROM "elbat" "t1";
SQL Fiddle

the post above assumes all the timestamps are to the minute. if you want to check for every 2 minutes throughout the day you can use the generate_series function. the issue with including the beginning minute and ending time in each interval will be b having 2 rows in the results.
ie.
select begintime,
endtime,
source,
sum(count)
from mytable
inner join (
select begintime, endtime
from (
select lag(time, 1) over (order by time) as begintime,
time as endtime
from (
select *
from generate_series('2018-01-01 00:00:00', '2018-01-02 00:00:00', interval '2 minutes') time
) q
) q2
where begintime is not null
) times on minute between begintime and endtime
group by begintime, endtime, source
order by begintime, endtime, source
you can change the 'minute between begintime and endtime' to 'minute > begintime and minute <= endtime' if you don't want that overlap

Related

SQLite: Sum of differences between two dates group by every date

I have a SQLite database with start and stop datetimes
With the following SQL query I get the difference hours between start and stop:
SELECT starttime, stoptime, cast((strftime('%s',stoptime)-strftime('%s',starttime)) AS real)/60/60 AS diffHours FROM tracktime;
I need a SQL query, which delivers the sum of multiple timestamps, grouped by every day (also whole dates between timestamps).
The result should be something like this:
2018-08-01: 12 hours
2018-08-02: 24 hours
2018-08-03: 12 hours
2018-08-04: 0 hours
2018-08-05: 1 hours
2018-08-06: 14 hours
2018-08-07: 8 hours
You can try this, use CTE RECURSIVE make a calendar table for every date start time and end time, and do some calculation.
Schema (SQLite v3.18)
CREATE TABLE tracktime(
id int,
starttime timestamp,
stoptime timestamp
);
insert into tracktime values
(11,'2018-08-01 12:00:00','2018-08-03 12:00:00');
insert into tracktime values
(12,'2018-09-05 18:00:00','2018-09-05 19:00:00');
Query #1
WITH RECURSIVE cte AS (
select id,starttime,date(starttime,'+1 day') totime,stoptime
from tracktime
UNION ALL
SELECT id,
date(starttime,'+1 day'),
date(totime,'+1 day'),
stoptime
FROM cte
WHERE date(starttime,'+1 day') < stoptime
)
SELECT strftime('%Y-%m-%d', starttime),(strftime('%s',CASE
WHEN totime > stoptime THEN stoptime
ELSE totime
END) -strftime('%s',starttime))/3600 diffHour
FROM cte;
| strftime('%Y-%m-%d', starttime) | diffHour |
| ------------------------------- | -------- |
| 2018-08-01 | 12 |
| 2018-09-05 | 1 |
| 2018-08-02 | 24 |
| 2018-08-03 | 12 |
View on DB Fiddle

how to get data when using where clause of time stamp in select query

I’m trying to get date with select query that have time stamp in WHERE clause.
my data table look like this:
| startTime | endTime | TimeID
----------------------------------
| 07:00:00 | 15:00:00 | 1
| 15:00:00 | 23:00:00 | 2
| 23:00:00 | 07:00:00 | 3
---------------------------------
this is my query statement:
SELECT TimeID
FROM myTable
WHERE StartTime >= 'Current_TIME' AND EndTime < 'Current_TIME'
if the time is somewhere between 07:00 and 23:00 then I get the answer correct, else I don't.
for example:
if the current time is 02:00:00 then the first condition is false because 02 is not larger then 23 and second condition is valid > 02 is smaller then 07
I try to use BETWEEN clause and use CASE WHEN and ISNULL but the query always returns empty in the scenario above.
This will work, but it's not the best in performance wise:
SELECT TimeID
FROM myTable
WHERE
(
(StartTime <= 'Current_TIME' AND EndTime > 'Current_TIME') or
(StartTime <= 'Current_TIME' AND StartTime > EndTime) or
(EndTime > 'Current_TIME' AND StartTime > EndTime)
)
It would probably be better to split the row that spans midnight into 2 separate rows, if possible.

How to scale timestamp from one range to another range?

I've got table with timestamps of item changes:
ID: INT
item: INT - foreign key
created: TIMESTAMP
I need for certain item to squeeze created timestamps to some range. Ie.:
src range: 1.10.2015 00:00 - 4.10.2015 23:59
dst range: 1.10.2015 00:00 - 1.10.2015 23:59
so, sample TIMESTAMP would look like:
2.10.2015 00:00 -> 1.10.2015 6:00
3.10.2015 00:00 -> 1.10.2015 12:00
4.10.2015 00:00 -> 1.10.2015 18:00
4.10.2015 12:00 -> 1.10.2015 21:00
I need to keep year, month, day, hour, minute, seconds part of resulting timestamp. Must not round seconds/minutes to zeroes.
Scale precision doesn't matter much, but order of changes by created timestamp must be kept same as it was before scaling. I don't care much about other scaling details: whether it's start/final time, inclusive/exclusive, or whether it scales to range of 6:00/5:59 hours.
I can do it with some external application that would transform it on its own, then updates timestamps. However, I now need to do it using SQL only. Is it possible? It may be postgres specific.
You may assume, there won't be collision after scaling applies.
something like this?
select id, item, (' ['||created||', '||created_l- '00:00.000001'::time||') ' )::tsrange as some_range from (
select id, item, created , lead(created) over(order by created) created_l from my_table order by created) q1
with r (src, dst) as ( values
(
tsrange('2015-10-01', '2015-10-05', '[)'),
tsrange('2015-10-01', '2015-10-02', '[)')
),
(
tsrange('2015-10-06', '2015-10-10', '[)'),
tsrange('2015-10-02', '2015-10-03', '[)')
)
), t (id, src) as ( values
(1,'2015-10-02 00:00'::timestamp),
(2,'2015-10-03 00:00'),
(3,'2015-10-04 00:00'),
(4,'2015-10-04 12:00'),
(5,'2015-10-06 11:00'),
(6,'2015-10-07 15:00')
)
select
t.id, t.src,
lower(r.dst) + rank() over(
partition by r.dst order by t.src
) * interval '1 second' as squeezed
from
t
inner join
r on t.src <# r.src
;
id | src | squeezed
----+---------------------+---------------------
1 | 2015-10-02 00:00:00 | 2015-10-01 00:00:01
2 | 2015-10-03 00:00:00 | 2015-10-01 00:00:02
3 | 2015-10-04 00:00:00 | 2015-10-01 00:00:03
4 | 2015-10-04 12:00:00 | 2015-10-01 00:00:04
5 | 2015-10-06 11:00:00 | 2015-10-02 00:00:01
6 | 2015-10-07 15:00:00 | 2015-10-02 00:00:02

Filling Out & Filtering Irregular Time Series Data

Using Postgresql 9.4, I am trying to craft a query on time series log data that logs new values whenever the value updates (not on a schedule). The log can update anywhere from several times a minute to once a day.
I need the query to accomplish the following:
Filter too much data by just selecting the first entry for the timestamp range
Fill in sparse data by using the last reading for the log value. For example, if I am grouping the data by hour and there was an entry at 8am with a log value of 10. Then the next entry isn't until 11am with a log value of 15, I would want the query to return something like this:
Timestamp | Value
2015-07-01 08:00 | 10
2015-07-01 09:00 | 10
2015-07-01 10:00 | 10
2015-07-01 11:00 | 15
I have got a query that accomplishes the first of these goals:
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
)
select
time_range.hour,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1;
But I can't figure out how to fill in the nulls where there is no value. I tried using the lag() feature of Postgresql's Window functions, but it didn't work when there were multiple nulls in a row.
Here's a SQLFiddle that demonstrates the issue:
http://sqlfiddle.com/#!15/f4d13/5/0
your columns are log_hour and first_vlue
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
),
base as (
select
time_range.hour lh,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1)
SELECT
log_hour, log_val, value_partition, first_value(log_val) over (partition by value_partition order by log_hour)
FROM (
SELECT
date_trunc('hour', base.lh) as log_hour,
log_val,
sum(case when log_val is null then 0 else 1 end) over (order by base.lh) as value_partition
FROM base) as q
UPDATE
this is what your query return
Timestamp | Value
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
I want this result set to be split in groups like this
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
and to assign to every row in a group the value of first row from that group (done by last select)
In this case, a method for obtaining the grouping is to create a column which holds the number of
not null values counted until current row and split by this value. (use of sum(case))
value | sum(case)
| 10 | 1 |
| null | 1 |
| null | 1 |
| 15 | 2 | <-- new not null, increment
| nul | 2 |
| 19 | 3 | <-- new not null, increment
| 13 | 4 | <-- new not null, increment
and now I can partion by sum(case)

T-SQL Query for counting ongoing events per given intervals

We have events with a date range:
Event | Begin | End
------|-------|------
a | 11:30 | 12:15
b | 10:30 | 13:15
c | 11:30 | 13:30
Visualized as a timetable:
a) |---|
b) |---------------|
c) |-----------|
|-----|-----|-----|-----|
10:00 11:00 12:00 13:00 14:00
We want an efficient query for counting ongoing events on given timestamps. In this example we want them per hour. Like this:
Time | OnGoing
-----------------|--------
2014-02-06 10:00 | 0
2014-02-06 11:00 | 1
2014-02-06 12:00 | 3
2014-02-06 13:00 | 2
2014-02-06 14:00 | 0
You can create a driver table with all hours, then join to that:
;WITH cal AS (SELECT CAST('10:00:00' AS TIME) dt
UNION ALL
SELECT DATEADD(hour,1,dt)
FROM cal
WHERE dt < '14:00:00')
SELECT dt, COUNT(DISTINCT Event) OnGoing
FROM cal a
LEFT JOIN Table1 b
ON a.dt BETWEEN b.[Begin] AND b.[End]
GROUP BY dt
Demo: SQL Fiddle
Adjust the range in the cal cte to fit your preferences. I notice your sample output shows a datetime, so you could cast begin and end as TIME in your join and add a DATE portion to your select and group by, or you could alter the cte to be full datetime. If spanning more than 100 units in your cte, you'll need to add OPTION (MAXRECURSION 0) to the very end of your query.