Querying for a 'run' of consecutive columns in Postgres

Querying for a 'run' of consecutive columns in Postgres - sql

I have a table:
create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');
I would like to construct a statement that given an event could return the length of the 'run' of events starting with that event. A run is defined by:
Two events are in a run together if they are within 30 seconds of one another.
If A and B are in a run together, and B and C are in a run together then A is in a run
with C.
However my query does not need to go backwards in time, so if I select on event 2, then only events 2, 3, and 4 should be counted as part of the run of events starting with 2, and 3 should be returned as the length of the run.
Any ideas? I'm stumped.

Here is the RECURSIVE CTE-solution. (islands-and-gaps problems naturally lend themselves to recursive CTE)
WITH RECURSIVE runrun AS (
SELECT event_id, event_time
, event_time - ('30 sec'::interval) AS low_time
, event_time + ('30 sec'::interval) AS high_time
FROM table1
UNION
SELECT t1.event_id, t1.event_time
, LEAST ( rr.low_time, t1.event_time - ('30 sec'::interval) ) AS low_time
, GREATEST ( rr.high_time, t1.event_time + ('30 sec'::interval) ) AS high_time
FROM table1 t1
JOIN runrun rr ON t1.event_time >= rr.low_time
AND t1.event_time < rr.high_time
)
SELECT DISTINCT ON (event_id) *
FROM runrun rr
WHERE rr.event_time >= '2011-01-01 00:00:15'
AND rr.low_time <= '2011-01-01 00:00:15'
AND rr.high_time > '2011-01-01 00:00:15'
;
Result:
event_id | event_time | low_time | high_time
----------+---------------------+---------------------+---------------------
2 | 2011-01-01 00:00:15 | 2010-12-31 23:59:45 | 2011-01-01 00:00:45
3 | 2011-01-01 00:00:29 | 2010-12-31 23:59:45 | 2011-01-01 00:01:28
4 | 2011-01-01 00:00:58 | 2010-12-31 23:59:30 | 2011-01-01 00:01:28
(3 rows)

Could look like this:
WITH x AS (
SELECT event_time
,row_number() OVER w AS rn
,lead(event_time) OVER w AS next_time
FROM table1
WHERE event_id >= <start_id>
WINDOW w AS (ORDER BY event_time, event_id)
)
SELECT COALESCE(
(SELECT x.rn
FROM x
WHERE (x.event_time + interval '30s') < x.next_time
ORDER BY x.rn
LIMIT 1)
,(SELECT count(*) FROM x)
) AS run_length
This version does not rely on a gap-less sequence of IDs, but on event_time only.
Identical event_time's are additionally sorted by event_id to be unambiguous.
Read about the window functions row_number() and lead() and CTE (With clause) in the manual.
Edit
If we cannot assume that a bigger event_id has a later (or equal) event_time, substitute this for the first WHERE clause:
WHERE event_time >= (SELECT event_time FROM table1 WHERE event_id = <start_id>)
Rows with the same event_time as the starting row but a a smaller event_id will still be ignored.
In the special case of one run till the end no end is found and no row returned. COALESCE returns the count of all rows instead.

You can join a table onto itself on a date difference statement. Actually, this is postgres, a simple minus works.
This subquery will find all records that is a 'start event'. That is to say, all event records that does not have another event record occurring within 30 seconds before it:
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
where b.event_time is null) startevent
With a few changes...same logic, except picking up an 'end' event:
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) end_event
Now we can join these together to associate which start event goes to which end event:
(still writing...there's a couple ways at going on this. I'm assuming only the example has linear ID numbers, so you'll want to join the start event time to the end event time having the smallest positive difference on the event times).
Here's my end result...kinda nested a lot of subselects
select a.start_id, case when a.event_id is null then t1.event_id::varchar else 'single event' end as end_id
from
(select start_event.event_id as start_id, start_event.event_time as start_time, last_event.event_id, min(end_event.event_time - start_event.event_time) as min_interval
from
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
where b.event_time is null) start_event
inner join
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) end_event
on end_event.event_time > start_event.event_time
--check for only event
left join
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) last_event
on start_event.event_id = last_event.event_id
group by 1,2,3) a
left join table1 t1 on t1.event_time = a.start_time + a.min_interval
Results as start_id, end_Id:
1;"4"
5;"6"
7;"single event"
8;"single event"
9;"11"
I had to use a third left join to pick out single events as a method of detecting events that were both start events and end events. End result is in ID's and can be linked back to your original table if you want different information than just the ID. Unsure how this solution will scale, if you've got millions of events...could be an issue.

Related

SQL - Combine two rows if difference is below threshhold

I have a table like this in SQL Server:
id start_time end_time
1 10:00:00 10:34:00
2 10:38:00 10:52:00
3 10:53:00 11:23:00
4 11:24:00 11:56:00
5 14:20:00 14:40:00
6 14:41:00 14:59:00
7 15:30:00 15:40:00
What I would like to have is a query that outputs consolidated records based on the time difference between two consecutive records (end_time of row n and start_time row n+1) . All records where the time difference is less than 2 minutes should be combined into one time entry and the ID of the first record should be kept. This should also combine more than two records if multiple consecutive records have a time difference less than 2 minutes.
This would be the expected output:
id start_time end_time
1 10:00:00 10:34:00
2 10:38:00 11:56:00
5 14:20:00 14:59:00
7 15:30:00 15:40:00
Thanks in advance for any tips how to build the query.
Edit:
I started with following code to calculate the lead_time and the time difference but do not know how to group and consolidate.
WITH rows AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Id) AS rn
FROM #temp
)
SELECT mc.id, mc.start_time, mc.end_time, mp.start_time lead_time, DATEDIFF(MINUTE, mc.[end_time], mp.[start_time]) as DiffToNewSession
FROM rows mc
LEFT JOIN rows mp
ON mc.rn = mp.rn - 1

The window function in t-sql can realize a lot of data statistics, such as
create table #temp(id int identity(1,1), start_time time, end_time time)
insert into #temp(start_time, end_time)
values ('10:00:00', '10:34:00')
, ('10:38:00', '10:52:00')
, ('10:53:00', '11:23:00')
, ('11:24:00', '11:56:00')
, ('14:20:00', '14:40:00')
, ('14:41:00', '14:59:00')
, ('15:30:00', '15:40:00')
;with c0 as(
select *, LAG(end_time,1,'00:00:00') over (order by id) as lag_time
from #temp
), c1 as(
select *, case when DATEDIFF(MI, lag_time, start_time) <= 2 then 1 else -0 end as gflag
from c0
), c2 as(
select *, SUM(case when gflag=0 then 1 else 0 end) over(order by id) as gid
from c1
)
select MIN(id) as id, MIN(start_time) as start_time, MAX(end_time) as end_time
from c2
group by gid
In order to better describe the process of data construction, I simply use c0, c1, c2... to represent levels, you can merge some levels and optimize.
If you can’t use id as a sorting condition, then you need to change the sorting part in the above statement.

You can use a recursive cte to get the result that you want. This method just simple compare current end_time with next start_time. If it is less than the 2 mintues threshold use the same start_time as grp_start. And the end, simple do a GROUP BY on the grp_start
with rcte as
(
-- anchor member
select *, grp_start = start_time
from tbl
where id = 1
union all
-- recursive member
select t.id, t.start_time, t.end_time,
grp_start = case when datediff(second, r.end_time, t.start_time) <= 120
then r.grp_start
else t.start_time
end
from tbl t
inner join rcte r on t.id = r.id + 1
)
select id = min(id), grp_start as start_time, max(end_time) as end_time
from rcte
group by grp_start
demo

I guess this should do the trick without recursion. Again I used several ctes in order to make the solution a bit easier to read. guess it can be reduced a little...
INSERT INTO T1 VALUES
(1,'10:00:00','10:34:00')
,(2,'10:38:00','10:52:00')
,(3,'10:53:00','11:23:00')
,(4,'11:24:00','11:56:00')
,(5,'14:20:00','14:40:00')
,(6,'14:41:00','14:59:00')
,(7,'15:30:00','15:40:00')
GO
WITH cte AS(
SELECT *
,ROW_NUMBER() OVER (ORDER BY id) AS rn
,DATEDIFF(MINUTE, ISNULL(LAG(endtime) OVER (ORDER BY id), starttime), starttime) AS diffMin
,COUNT(*) OVER (PARTITION BY (SELECT 1)) as maxRn
FROM T1
),
cteFirst AS(
SELECT *
FROM cte
WHERE rn = 1 OR diffMin > 2
),
cteGrp AS(
SELECT *
,ISNULL(LEAD(rn) OVER (ORDER BY id), maxRn+1) AS nextRn
FROM cteFirst
)
SELECT f.id, f.starttime, MAX(ISNULL(n.endtime, f.endtime)) AS endtime
FROM cteGrp f
LEFT JOIN cte n ON n.rn >= f.rn AND n.rn < f.nextRn
GROUP BY f.id, f.starttime

Count overlapping intervals by ID BigQuery

I want to count how many overlapping interval I have, according to the ID
WITH table AS (
SELECT 1001 as id, 1 AS start_time, 10 AS end_time UNION ALL
SELECT 1001, 2, 5 UNION ALL
SELECT 1002, 3, 4 UNION ALL
SELECT 1003, 5, 8 UNION ALL
SELECT 1003, 6, 8 UNION ALL
SELECT 1001, 6, 20
)
In this case the desired result should be:
2 overlapping for ID=1001
1 overlapping for ID=1003
0 overlapping for ID=1002
TOT OVERLAPPING = 3
Whenever there is a overlapping (even partial) I need to count it as such.
How can I achieve this in BigQuery?

Below is for BigQuery Standard SQL and is simple and quite straightforward self-joining and checking and counting overlaps
#standardSQL
SELECT a.id,
COUNTIF(
a.start_time BETWEEN b.start_time AND b.end_time
OR a.end_time BETWEEN b.start_time AND b.end_time
OR b.start_time BETWEEN a.start_time AND a.end_time
OR b.end_time BETWEEN a.start_time AND a.end_time
) overlaps
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.id = b.id AND TO_JSON_STRING(a) < TO_JSON_STRING(b)
GROUP BY id
If to apply to sample data in your question - it results with
Row id overlaps
1 1001 2
2 1002 0
3 1003 1
Another option (to avoid self-joining in favor of using analytics functions)
#standardSQL
SELECT id,
SUM((SELECT COUNT(1) FROM y.arr x
WHERE y.start_time BETWEEN x.start_time AND x.end_time
OR y.end_time BETWEEN x.start_time AND x.end_time
OR x.start_time BETWEEN y.start_time AND y.end_time
OR x.end_time BETWEEN y.start_time AND y.end_time
)) overlaps
FROM (
SELECT id, start_time, end_time,
ARRAY_AGG(STRUCT(start_time, end_time))
OVER(PARTITION BY id ORDER BY TO_JSON_STRING(t)
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) arr
FROM `project.dataset.table` t
) y
GROUP BY id
Obviously with same result / output as previous version

The logic for all overlaps compares the start and end times:
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM `project.dataset.table` t1 LEFT JOIN
`project.dataset.table` t2
ON t1.id = t2.id
GROUP BY t1.id;
That is not exactly what you want, because this compares every interval to every other interval, including itself. Removing the "same" one basically requires a unique identifier. We can get this using row_number().
Further, you don't seem to want to count overlaps twice. So:
with t as (
select t.*, row_number() over (partition by id order by start_time) as seqnum
from `project.dataset.table` t
)
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM t t1 LEFT JOIN
t t2
ON t1.id = t2.id AND t1.seqnum < t2.seqnum
GROUP BY t1.id;

compare next record in the same table sql

I have a table having 2 columns trans_date and amount.
I want to a query that give me the amount if the transdate diff of a record and the next record is 1 day or same day.
explanation:
AMOUNT TRANS_DATE
2645 2011-05-11 20:57:27.000
2640 2011-05-12 00:00:00.000
2645 2011-05-15 18:01:11.000
2645 2011-06-15 18:27:45.000
2645 2011-06-16 17:06:33.000
2645 2011-06-18 15:19:19.000
2645 2011-06-23 15:42:18.000
the query should show me only
AMOUNT TRANS_DATE
2645 2011-05-11 20:57:27.000
2640 2011-05-12 00:00:00.000
2645 2011-05-15 18:01:11.000
2645 2011-06-15 18:27:45.000
2645 2011-06-16 17:06:33.000
all i have tried is
select DATEDIFF(DAY,a.TRANS_DATE,b.TRANS_DATE) from FIN_AP_PAYMENTS a inner join ( select * from (select a.*,rank() over (order by id) as ra from FIN_AP_PAYMENTS a, FIN_AP_PAYMENTS b )tbl )
select a.TRANS_DATE,b.TRANS_DATE,rank() over (order by a.id) as ra1,rank() over (order by b.id) as ra2 from FIN_AP_PAYMENTS a,FIN_AP_PAYMENTS b
select DATEDIFF(day,tbl.TRANS_DATE,tbl2.TRANS_DATE) from (select a.*,rank() over (order by id) as ra from FIN_AP_PAYMENTS a) tbl inner join (select a.*,rank() over (order by a.id) as ra1 from FIN_AP_PAYMENTS a ) tbl2 on tbl.id=tbl2.id

Use lead() and lag() to get the next and previous values. Then check the timing between them for filtering:
select t.amount, t.trans_date
from (select t.*, lead(trans_date) over (order by trans_date) as next_td,
lag(trans_date) over (order by trans_date) as prev_td
from FIN_AP_PAYMENTS t
) t
where datediff(second, prev_td, trans_date) < 24*60*60 or
datediff(second, trans_date, next_trans_date) < 24*60*60;
EDIT:
In SQL Server 2008, you can do this using outer apply:
select t.amount, t.trans_date
from (select t.*, tlead.trans_date as next_td,
tlag.trans_date as prev_td
from FIN_AP_PAYMENTS t outer apply
(select top 1 t2.*
from FIN_AP_PAYMENTS t2
where t2.trans_date < t.trans_date
order by trans_date desc
) tlag outer apply
(select top 1 t2.*
from FIN_AP_PAYMENTS t2
where t2.trans_date > t.trans_date
order by trans_date asc
) tlead
) t
where datediff(second, prev_td, trans_date) < 24*60*60 or
datediff(second, trans_date, next_trans_date) < 24*60*60;

Pre SQL Server 2012 you can use a combination of ROW_NUMBER and self joins instead of LEAD and LAG.
Example
WITH Example AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY Trans_Date) AS rn,
r.*
FROM
(
VALUES
(2645, '2011-05-11 20:57:27.000'),
(2640, '2011-05-12 00:00:00.000'),
(2645, '2011-05-15 18:01:11.000'),
(2645, '2011-06-15 18:27:45.000'),
(2645, '2011-06-16 17:06:33.000'),
(2645, '2011-06-18 15:19:19.000'),
(2645, '2011-06-23 15:42:18.000')
) AS r(Amount, Trans_Date)
)
SELECT
curr.*,
FROM
Example AS curr
LEFT OUTER JOIN Example AS prv ON prv.rn = curr.rn - 1
INNER JOIN Example AS nxt ON nxt.rn = curr.rn + 1
WHERE
DATEDIFF(DAY, curr.Trans_Date, nxt.Trans_Date) IN (0, 1)
OR DATEDIFF(DAY, prv.Trans_Date, curr.Trans_Date) IN (0, 1)
;
The CTE allows you to reuse the row number multiple times. The row number provides a sequence for the self joins. The joins allow to you see the previous and next values on the same row, for comparison.
The output of this query doesn't match your example, see my question in the comments for more on this.
I'm not sure that telling people, who are giving up their time to help you, what you are / are not here to discuss is a good idea. It certainly made me think twice before posting.

SQL: grouping by number of entries and entry date

I have the following table log:
event_time | name |
-------------------------
2014-07-16 11:40 Bob
2014-07-16 10:00 John
2014-07-16 09:20 Bob
2014-07-16 08:20 Bob
2014-07-15 11:20 Bob
2014-07-15 10:20 John
2014-07-15 09:00 Bob
I would like to generate a report, where I can group data by number of entries per day and by entry day. So the resulting report for the table above would be something like this:
event_date | 0-2 | 3 | 4-99 |
-------------------------------
2014-07-16 1 1 0
2014-07-15 2 0 0
I use the following approached to solve it:
Select with grouping in range
How to select the count of values grouped by ranges
If I find answer before anybody post it here, I will share it.
Added
I would like to count a number of daily entries for each name. Then I check to which column this value belongs to, and the I add 1 to that column.

I took it in two steps. Inner query gets the base counts. The outer query uses case statements to sum counts.
SQL Fiddle Example
select event_date,
sum(case when cnt between 0 and 2 then 1 else 0 end) as "0-2",
sum(case when cnt = 3 then 1 else 0 end) as "3",
sum(case when cnt between 4 and 99 then 1 else 0 end) as "4-99"
from
(select cast(event_time as date) as event_date,
name,
count(1) as cnt
from log
group by cast(event_time as date), name) baseCnt
group by event_date
order by event_date

try like this
select da,sum(case when c<3 then 1 else 0 end) as "0-2",
sum(case when c=3 then 1 else 0 end) as "3",
sum(case when c>3 then 1 else 0 end) as "4-66" from (
select cast(event_time as date) as da,count(*) as c from
table1 group by cast(event_time as date),name) as aa group by da

First aggregate in two steps:
SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2;
Names should be completely irrelevant according to your description.
Then take the query and run it through crosstab():
SELECT *
FROM crosstab(
$$SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES ('0-2'::text), ('3'), ('4_or_more')$$
) AS f (day date, "0-2" int, "3" int, "4_or_more" int);
crosstab() is supplied by the additional module tablefunc. Details and instructions in this related answer:
PostgreSQL Crosstab Query

This is a variation on a PIVOT query (although PostgreSQL supports this via the crosstab(...) table functions). The existing answers cover the basic technique, I just prefer to construct queries without the use of CASE, where possible.
To get started, we need a couple of things. The first is essentially a Calendar Table, or entries from one (if you don't already have one, they're among the most useful dimension tables). If you don't have one, the entries for the specified dates can easily be generated:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-01' AS DATE),
CAST('2014-08-01' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay))
SQL Fiddle Demo
This is primarily used to create the first step in the double aggregate, like so:
SELECT Calendar_Range.startOfDay, COUNT(Log.name)
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name
SQL Fiddle Demo
Remember that most aggregate columns with a nullable expression (here, COUNT(Log.name)) will ignore null values (not count them). This is also one of the few times it's acceptable to not include a grouped-by column in the SELECT list (normally it makes the results ambiguous). For the actual queries I'll put this into a subquery, but it would also work as a CTE.
We also need a way to construct our COUNT ranges. That's pretty easy too:
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SQL Fiddle Demo
We'll be querying these as "exclusive upper-bound" as well.
We now have all the pieces we need to do the query. We can actually use these virtual tables to make queries in both veins of the current answers.
First, the SUM(CASE...) style.
For this query, we'll take advantage of the null-ignoring qualities of aggregate functions again:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT startOfDay,
COUNT(Zero_To_Two.text) AS Zero_To_Two,
COUNT(Three.text) AS Three,
COUNT(Four_And_Up.text) AS Four_And_Up
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
LEFT JOIN Count_Range Zero_To_Two
ON Zero_To_Two.text = '0 - 2'
AND Entry_Count.count >= Zero_To_Two.start
AND Entry_Count.count < Zero_To_Two.next
LEFT JOIN Count_Range Three
ON Three.text = '3'
AND Entry_Count.count >= Three.start
AND Entry_Count.count < Three.next
LEFT JOIN Count_Range Four_And_Up
ON Four_And_Up.text = '4+'
AND Entry_Count.count >= Four_And_Up.start
GROUP BY startOfDay
ORDER BY startOfDay
SQL Fiddle Example
The other option is of course the crosstab query, where the CASE was being used to segment the results. We'll use the Count_Range table to decode the values for us:
SELECT startOfDay, "0 -2", "3", "4+"
FROM CROSSTAB($$WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT Calendar_Range.startOfDay, Count_Range.text, COUNT(*) AS count
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
JOIN Count_Range
ON Entry_Count.count >= Count_Range.start
AND (Entry_Count.count < Count_Range.end OR Count_Range.end IS NULL)
GROUP BY Calendar_Range.startOfDay, Count_Range.text
ORDER BY Calendar_Range.startOfDay, Count_Range.text$$,
$$VALUES('0 - 2', '3', '4+')$$) Data(startOfDay DATE, "0 - 2" INT, "3" INT, "4+" INT)
(I believe this is correct, but don't have a way to test it - Fiddle doesn't seem to have the crosstab functionality loaded. In particular, CTEs probably must go inside the function itself, but I'm not sure....)

Discrete Derivative in SQL

I've got sensor data in a table in the form:
Time Value
10 100
20 200
36 330
46 440
I'd like to pull the change in values for each time period. Ideally, I'd like to get:
Starttime Endtime Change
10 20 100
20 36 130
36 46 110
My SQL skills are pretty rudimentary, so my inclination is to pull all the data out to a script that processes it and then push it back to the new table, but I thought I'd ask if there was a slick way to do this all in the database.

Select a.Time as StartTime
, b.time as EndTime
, b.time-a.time as TimeChange
, b.value-a.value as ValueChange
FROM YourTable a
Left outer Join YourTable b ON b.time>a.time
Left outer Join YourTable c ON c.time<b.time AND c.time > a.time
Where c.time is null
Order By a.time

Select a.Time as StartTime, b.time as EndTime, b.time-a.time as TimeChange, b.value-a.value as ValueChange
FROM YourTable a, YourTable b
WHERE b.time = (Select MIN(c.time) FROM YourTable c WHERE c.time>a.time)

you could use a SQL window function, below is an example based on BIGQUERY syntax.
SELECT
LAG(time, 1) OVER (BY time) AS start_time,
time AS end_time,
(value - LAG(value, 1) OVER (BY time))/value AS Change
from data

First off, I would add an id column to the table so that you have something that predictably increases from row to row.
Then, I would try the following query:
SELECT t1.Time AS 'Starttime', t2.Time AS 'Endtime',
(t2.Value - t1.Value) AS 'Change'
FROM SensorData t1
INNER JOIN SensorData t2 ON (t2.id - 1) = t1.id
ORDER BY t1.Time ASC
I'm going to create a test table to try this for myself so I don't know if it works yet but it's worth a shot!
Update
Fixed with one minor issue (CHANGE is a protected word and had to be quoted) but tested it and it works! It produces exactly the results defined above.

Does this work?
WITH T AS
(
SELECT [Time]
, Value
, RN1 = ROW_NUMBER() OVER (ORDER BY [Time])
, RN2 = ROW_NUMBER() OVER (ORDER BY [Time]) - 1
FROM SensorData
)
SELECT
StartTime = ISNULL(t1.[time], t2.[time])
, EndTime = ISNULL(t2.[time], 0)
, Change = t2.value - t1.value
FROM T t1
LEFT OUTER JOIN
T t2
ON t1.RN1 = t2.RN2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Querying for a 'run' of consecutive columns in Postgres - sql

Related

SQL - Combine two rows if difference is below threshhold

Count overlapping intervals by ID BigQuery

compare next record in the same table sql

SQL: grouping by number of entries and entry date

Discrete Derivative in SQL

Categories

Resources