Time between SQL row state changes - sql

I have a sqlite3 table that records the state of my heating system and furnce every 30 seconds. The table looks like this
TABLE CLIMATESYSTEM (
id INTEGER PRIMARY KEY AUTOINCREMENT,
Timestamp INT,
FAN INT,
SYSTEM INT
);
timestamp is the seconds since the epoch ((int)time.time) in python
a few lines of the table looks like this
5577|1452049280|1|1
5578|1452049339|1|1
5579|1452049399|1|1
5580|1452049459|1|1
5581|1452049520|0|0
5582|1452049580|0|0
5583|1452049644|1|1
5584|1452049700|1|1
5585|1452049760|1|1
5586|1452049820|0|0
what I am trying to do is count the seconds in time between when the state transition goes from on (1) to off (0) and the next transition from off to on.
example count the seconds between #5577 and #5581 -> add to TIME_SYS_ON
example count the seconds between #5581 and #5583 -> add to TIME_SYS_OFF
What I am intending on doing is to measure the total time in a 24 hour period that my heating system is running
any ideas on a starting point?
Thanks

First, look up the next timestamp for each row:
SELECT Timestamp,
(SELECT MIN(Timestamp)
FROM ClimateSystem AS CS2
WHERE CS2.Timestamp > ClimateSystem.Timestamp
) AS NextTimestamp,
System
FROM ClimateSystem;
Then use that to compute the length of each interval:
SELECT NextTimestamp - Timestamp,
System
FROM (SELECT Timestamp,
(SELECT MIN(Timestamp)
FROM ClimateSystem AS CS2
WHERE CS2.Timestamp > ClimateSystem.Timestamp
) AS NextTimestamp,
System
FROM ClimateSystem);
Then add filters as needed:
SELECT SUM(NextTimestamp - Timestamp),
System
FROM (SELECT Timestamp,
(SELECT MIN(Timestamp)
FROM ClimateSystem AS CS2
WHERE CS2.Timestamp > ClimateSystem.Timestamp
) AS NextTimestamp,
System
FROM ClimateSystem)
WHERE Timestamp BETWEEN :StartOfDay AND :EndOfDay
GROUP BY System;

Related

How to iterate over table and delete rows based on specific condition on previous row - PostgreSQL

I have a table of ships, which consists of:
row id (number)
ship id (character varying)
timestamp (timestamp in yyyy-mm-dd hh:mm:ss format)
Timestamp is the time that the specific ship (ship id) emitted a signal during its course. The table looks like this:
What I need to do (in PostgreSQL - pgAdmin) is for every ship_id, find if a signal has been emitted 5 seconds or less after another signal from the same ship, and then delete the row with the latter.
In the example table shown above, for the ship "foo" the signals are almost 9 minutes apart so it's all good, but for the ship "bar" the signal with row_id 4 was emitted 3 seconds after the previous one with row_id 3, so it needs to go.
Thanks a lot in advance.
Windowing functions Lag/Lead in this case will do the trick.
Add a LAG to calculate the difference between timestamps for the same ships. This will allow you to calculate the time difference for the same ship and its most recent posting.
Use that to filter out what to delete
SELECT ROW_ID, SHIP_ID, EXTRACT(EPOCH FROM (TIMESTAMP - LAG (TIMESTAMP,1) OVER (PARTITION BY SHIP_ID ORDER BY TIMESTAMP ASC))) AS SECONDS_DIFF
--THEN SOMETHING LIKE THIS TO FIND WHICH ROWS TO DELETE
DELETE FROM SHIP_TABLE WHERE ROW_ID IN
(SELECT ROW_ID FROM
(SELECT ROW_ID, SHIP_ID, EXTRACT(EPOCH FROM (TIMESTAMP - LAG (TIMESTAMP,1) OVER (PARTITION BY SHIP_ID ORDER BY TIMESTAMP ASC))) AS SECONDS_DIFF) SUB_1
WHERE SECONDS_DIFF <= 10 --THRESHOLD
) SUB_2

Get totals from difference between rows

I have a table, with the following structure:
(
id SERIAL PRIMARY KEY,
user_id integer NOT NULL REFERENCES user(id) ON UPDATE CASCADE,
status text NOT NULL,
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL
)
Example data:
"id","user_id","status","created_at","updated_at"
416,38,"ONLINE","2018-08-07 14:40:51.813+00","2018-08-07 14:40:51.813+00"
417,39,"ONLINE","2018-08-07 14:45:00.717+00","2018-08-07 14:45:00.717+00"
418,38,"OFFLINE","2018-08-07 15:43:22.678+00","2018-08-07 15:43:22.678+00"
419,38,"ONLINE","2018-08-07 16:21:30.725+00","2018-08-07 16:21:30.725+00"
420,38,"OFFLINE","2018-08-07 16:49:10.3+00","2018-08-07 16:49:10.3+00"
421,38,"ONLINE","2018-08-08 11:37:53.639+00","2018-08-08 11:37:53.639+00"
422,38,"OFFLINE","2018-08-08 12:29:08.234+00","2018-08-08 12:29:08.234+00"
423,39,"ONLINE","2018-08-14 15:22:00.539+00","2018-08-14 15:22:00.539+00"
424,39,"OFFLINE","2018-08-14 15:22:02.092+00","2018-08-14 15:22:02.092+00"
When a user on my application goes online, a new row is inserted with status ONLINE. When they go offline, a row with status OFFLINE is inserted. There are other entries created to record different events, but for this query only OFFLINE and ONLINE are important.
I want to produce a chart, showing the total number of users online over a time period (e.g 5 minutes), within a date range. If a user is online for any part of that period they should be counted.
Example:
datetime, count
2019-05-22T12:00:00+0000, 53
2019-05-22T12:05:00+0000, 47
2019-05-22T12:10:00+0000, 49
2019-05-22T12:15:00+0000, 55
2019-05-22T12:20:00+0000, 59
2019-05-22T12:25:00+0000, 56
I'm able to produce a similar chart for an individual user by fetching all status rows within the date range then processing manually, however this approach won't scale to all users.
I believe something like this could be accomplished with window functions, but I'm not really sure where to start
As your question is very vague nobody realy can help you to 100%. Well, you can achive what you want maybe with a combination of of "with" clauses and window functions. With the "with" clause you can easily break down big problems in small parts. Maybe following query (not looking at any performace) may help, you replace public.tbl_test with your table:
with temp_online as (
select
*
from public.tbl_test
where public.tbl_test.status ilike 'online'
order by created_at
),
temp_offline as (
select
*
from public.tbl_test
where public.tbl_test.status ilike 'offline'
order by created_at
),
temp_change as (
select
* ,
(
select temp_offline.created_at from temp_offline where temp_offline.created_at > temp_online.created_at and temp_offline.user_id = temp_online.user_id order by created_at asc limit 1
) as go_offline
from temp_online
),
temp_result as
(
select *,
go_offline - created_at as online_duration
from temp_change
),
temp_series as
(
SELECT (generate_series || ' minute')::interval + '2019-05-22 00:00:00'::timestamp as temp_date
FROM generate_series(0, 1440,5)
)
select
temp_series.temp_date,
(select count(*) from temp_result where temp_result.created_at <= temp_series.temp_date and temp_result.go_offline >= temp_series.temp_date) as count_users
from
temp_series

Analytics in sql

I have a table with the following structure:
use_id (int) - event (str) - time (timestamp) - value (int)
Event can take several values : install, login, buy, etc.
I need to get all user records before updating the application.
For example moment of release of my application - 1 January 2019, but users may be install new version on any day.
How can i get sum(value) by the first and second versions. ---------
I tried self-join table, but I think that this is not the best solution.
Help me, please.
Here is the definition of your table (as I understood it from your comments and description):
CREATE TABLE user_events (
user_id integer,
event varchar,
time timestamp without time zone,
value integer
);
Here is the query you asked for:
SELECT
COUNT(user_id),
SUM(value)
FROM (
SELECT
DISTINCT ON (user_id)
user_id,time,value
FROM user_events
WHERE event='install'
ORDER BY user_id, time DESC
) last_installations
WHERE
time BETWEEN date '2018-01-01' AND date '2019-01-01';
Some explanations:
inner query ( last_installations ) selects last install events for each user
outer query filters out only installations of first and second versions, and calculates SUM(value) (as you asked) and COUNT(user_id) (I added for clarity - how many users are using 1 and 2 versions now)
UPDATE
sum value for all events by version
SELECT
event,
CASE
WHEN time BETWEEN date '2018-01-01' AND timestamp '2018-05-30 23:59:59' THEN 1
WHEN time BETWEEN date '2018-06-01' AND timestamp '2018-12-31 23:59:59' THEN 2
WHEN time > date '2018-01-01' THEN 3
ELSE 0 -- unknown version
END AS version,
SUM(value)
FROM user_events
GROUP BY 1,2

Every 10th row based on timestamp

I have a table with signal name, value and timestamp. these signals where recorded at sampling rate of 1sample/sec. Now i want to plot a graph on values of months, and it is becoming very heavy for the system to perform it within seconds. So my question is " Is there any way to view 1 value/minute in other words i want to see every 60th row.?"
You can use the row_number() function to enumerate the rows, and then use modulo arithmetic to get the rows:
select signalname, value, timestamp
from (select t.*,
row_number() over (order by timestamp) as seqnum
from table t
) t
where seqnum % 60 = 0;
If your data really is regular, you can also extract the seconds value and check when that is 0:
select signalname, value, timestamp
from table t
where datepart(second, timestamp) = 0
This assumes that timestamp is stored in an appropriate date/time format.
Instead of sampling, you could use the one minute average for your plot:
select name
, min(timestamp)
, avg(value)
from Yourtable
group by
name
, datediff(minute, '2013-01-01', timestamp)
If you are charting months, even the hourly average might be detailed enough.

How do I produce a time interval query in SQLite?

I have an events based table that I would like to produce a query, by minute for the number of events that were occuring.
For example, I have an event table like:
CREATE TABLE events (
session_id TEXT,
event TEXT,
time_stamp DATETIME
)
Which I have transformed into the following type of table:
CREATE TABLE sessions (
session_id TEXT,
start_ts DATETIME,
end_ts DATETIME,
duration INTEGER
);
Now I want to create a query that would group the sessions by a count of those that were active during a particular minute. Where I would essentially get back something like:
TIME_INTERVAL ACTIVE_SESSIONS
------------- ---------------
18:00 1
18:01 5
18:02 3
18:03 0
18:04 2
Ok, I think I got more what I wanted. It doesn't account for intervals that are empty, but it is good enough for what I need.
select strftime('%Y-%m-%dT%H:%M:00.000',start_ts) TIME_INTERVAL,
(select count(session_id)
from sessions s2
where strftime('%Y-%m-%dT%H:%M:00.000',s1.start_ts) between s2.start_ts and s2.end_ts) ACTIVE_SESSIONS
from sessions s1
group by strftime('%Y-%m-%dT%H:%M:00.000',start_ts);
This will generate a row per minute for the period that the data covers with a count for the number of sessions that were had started (start_ts) but hadn't finished (end_ts).
PostgreSQL allows the following query.
In contrast to your example, this returns an additional column for the day, and it omits the minutes where nothing happened (count=0).
select
day, hour, minute, count(*)
from
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),
(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),
(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),
(50),(51),(52),(53),(54),(55),(56),(57),(58),(59))
as minutes (minute),
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23))
as hours (hour),
(select distinct cast(start_ts as date) from sessions
union
select distinct cast(end_ts as date) from sessions)
as days (day),
sessions
where
(day,hour,minute)
between (cast(start_ts as date),extract(hour from start_ts),extract(minute from start_ts))
and (cast(end_ts as date), extract(hour from end_ts), extract(minute from end_ts))
group by
day, hour, minute
order by
day, hour, minute;
This isn't exactly your query, but I think it could help. Did you look into the SQLite R-Tree module? This would allow you to create a virtual index on the start/stop time:
CREATE VIRTUAL TABLE sessions_index USING rtree (id, start, end);
Then you could search via:
SELECT * FROM sessions_index WHERE end >= <first minute> AND start <= <last minute>;