Aggregate data based on unix time stamp crate database - sql

I'm very new to SQL and time series database. I'm using crate database ( it think which is used PostgreSQL).i want to aggregate the data by hour,day ,week and month. Unix time stamp is used to store the data. following is my sample database.
|sensorid | reading | timestamp|
====================================
|1 | 1604192522 | 10 |
|1 | 1604192702 | 9.65 |
|2 | 1605783723 | 8.1 |
|2 | 1601514122 | 9.6 |
|2 | 1602292210 | 10 |
|2 | 1602291611 | 12 |
|2 | 1602291615 | 10 |
i tried the sql query using FROM_UNIXTIME not supported .
please help me?
im looking the answer for hourly data as follows.
sensorid ,reading , timestamp
1 19.65(10+9.65) 1604192400(starting hour unixt time)
2 8.1 1605783600(starting hour unix time)
2 9.6 1601514000(starting hour unix time)
2 32 (10+12+10) 1602291600(starting hour unix time)
im looking the answer for monthly data is like
sensorid , reading , timestamp
1 24.61(10+9.65+8.1) 1604192400(starting month unix time)
2 41.6(9.6+10+12+10) 1601510400(starting month unix time)

A straight-forward approach is:
SELECT
(date '1970-01-01' + unixtime * interval '1 second')::date as date,
extract(hour from date '1970-01-01' + unixtime * interval '1 second') AS hour,
count(c.user) AS count
FROM core c
GROUP BY 1,2
If you are content with having the date and time in the same column (which would seem more helpful to me), you can use date_trunc():
select
date_trunc('hour', date '1970-01-01' + unixtime * interval '1 second') as date_hour,
count(c.user) AS count
FROM core c
GROUP BY 1,2

You can convert a unix timestamp to a date/time value using to_timestamp(). You can aggregate along multiple dimensions at the same time using grouping sets. So, you might want:
select date_trunc('year', v.ts) as year,
date_trunc('month', v.ts) as month,
date_trunc('week', v.ts) as week,
date_trunc('day', v.ts) as day,
date_trunc('hour', v.ts) as hour,
count(*), avg(reading), sum(reading)
from t cross join lateral
(values (to_timestamp(timestamp))) v(ts)
group by grouping sets ( (year), (month), (week), (day), (hour) );

Related

how to query time-series data in postgresql to find spikes

I have a table called cpu_usages and I'm trying to find spikes of cpu usage. My table stores 4 columns:
id serial
at timestamp
cpu_usage float
cpu_core int
the at column stores a timestamp of every minute ever day.
I want to select all rows where I take each timestamp and get the next 3 minutes and if any of the timestamps has a cpu_value over at least 3% higher than the starting value for that timestamp, then return it
So for example if I have these rows:
id|at|cpu_values,cpu_core
1 | 2019-01-01-00:00|1|0
2 | 2019-01-01-00:01|1|0
3 | 2019-01-01-00:02|4|0
4 | 2019-01-01-00:03|1|0
5 | 2019-01-01-00:04|1|0
6 | 2019-01-01-00:05|1|0
7 | 2019-01-01-00:06|1|0
8 | 2019-01-01-00:07|1|0
9 | 2019-01-01-00:08|6|0
10 | 2019-01-01-00:00|1|1
11 | 2019-01-01-00:01|1|1
12| 2019-01-01-00:02|4|1
13 | 2019-01-01-00:03|1|1
14 | 2019-01-01-00:04|1|1
15 | 2019-01-01-00:05|1|1
16 | 2019-01-01-00:06|1|1
17 | 2019-01-01-00:07|1|1
18 | 2019-01-01-00:08|6|1
It would return rows:
1,2,6,7,8
I am not sure how to do this because it sounds like it needs some sort of nested joins.
Can anyone assist me with this?
This answers the original version of the question.
Just use window functions. Assuming you want the larger value, then you want to look back not forward:
select t.*
from (select t.*,
max(cpu_value) over (order by timestamp
range between interval '3 minute' preceding and interval '1 second' preceding
) as previous_min
from t
) t
where previous_min * 1.03 < cpu_value;
EDIT:
Looking backwards, this would be:
select t.*
from (select t.*,
min(cpu_value) over (order by timestamp
range between interval '1 second' following and interval '3 minute' following
) as next_min
from t
) t
where cpu_value * 1.03 > next_min;

Group By day for custom time interval

I'm very new to SQL and time series database. I'm using crate database. I want to aggregate the data by day. But the I want to start each day start time is 9 AM not 12AM..
Time interval is 9 AM to 11.59 PM.
Unix time stamp is used to store the data. following is my sample database.
|sensorid | reading | timestamp|
====================================
|1 | 1616457600 | 10 |
|1 | 1616461200 | 100 |
|2 | 1616493600 | 1 |
|2 | 1616493601 | 10 |
Currently i grouped using following command. But it gives the start time as 12 AM.
select date_trunc('day', v.timestamp) as day,sum(reading)
from sensor1 v(timestamp)
group by (DAY)
From the above table. i want sum of 1616493600 and 1616493601 data (11 is result). because 1616457600 and 1616461200 are less than 9 am data.
You want to add nine hours to midnight:
date_trunc('day', v.timestamp) + interval '9' hour
Edit: If you want to exclude hours before 9:00 from the data you add up, you must add a WHERE clause:
where extract(hour from v.timestamp) >= 9
Here is a complete query with all relevant data:
select
date_trunc('day', v.timestamp) as day,
date_trunc('day', v.timestamp) + interval '9' hour as day_start,
min(v.timestamp) as first_data,
max(v.timestamp) as last_data,
sum(reading) as total_reading
from sensor1 v(timestamp)
where extract(hour from v.timestamp) >= 9
group by day
order by day;

Get a rolling count of timestamps in SQL

I have a table (in an Oracle DB) that looks something like what is shown below with about 4000 records. This is just an example of how the table is designed. The timestamps range for several years.
| Time | Action |
| 9/25/2019 4:24:32 PM | Yes |
| 9/25/2019 4:28:56 PM | No |
| 9/28/2019 7:48:16 PM | Yes |
| .... | .... |
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval. I would like this done by looking at each timestamp and getting a count of timestamps that appear within 15 minutes of that timestamp.
My goal would to have something like
| Interval | Count |
| 9/25/2019 4:24:00 PM - 9/25/2019 4:39:00 | 2 |
| 9/25/2019 4:25:00 PM - 9/25/2019 4:40:00 | 2 |
| ..... | ..... |
| 9/25/2019 4:39:00 PM - 9/25/2019 4:54:00 | 0 |
I am not sure how I would be able to do this, if at all. Any ideas or advice would be much appreciated.
If you want any 15 minute interval in the data, then you can use:
select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t;
If you want the maximum, then use rank() on this:
select t.*
from (select t.*, rank() over (order by cnt_15 desc) as seqnum
from (select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t
) t
) t
where seqnum = 1;
This doesn't produce exactly the results you specify in the query. But it does answer the question:
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval.
You could enumerate the minutes with a recursive query, then bring the table with a left join:
with recursive cte (start_dt, max_dt) as (
select trunc(min(time), 'mi'), max(time) from mytable
union all
select start_dt + interval '1' minute, max_dt from cte where start_dt < max_dt
)
select
c.start_dt,
c.start_dt + interval '15' minute end_dt,
count(t.time) cnt
from cte c
left join mytable t
on t.time >= c.start_dt
and t.time < c.start_dt + interval '15' minute
group by c.start_dt

Easy substraction of year's values

I do have the following database table containing the timestamps in unix format as well as the total yield (summing up) of my solar panels every 5 mins:
| Timestamp | TotalYield |
|------------|------------|
| 1321423500 | 1 |
| 1321423800 | 5 |
| ... | |
| 1573888800 | 44094536 |
Now I would like to calculate how much energy was produced each year. I thought of reading the first and last timestamp using UNION of each year:
SELECT strftime('%d.%m.%Y',datetime(TimeStamp,'unixepoch')), TotalYield FROM PascalsDayData WHERE TimeStamp IN (
SELECT MAX(TimeStamp) FROM PascalsDayData GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
UNION
SELECT MIN(TimeStamp) FROM DayData GROUP BY strftime('%Y',datetime(TimeStamp,'unixepoch'))
)
This works fine but I need to do some post processing to substract end year's value with the first year's one. There must be a more elegant way to do this in SQL, right?
Thanks,
Anton
You can aggregate by year and subtract the min and max value:
SELECT MAX(TotalYield) - MIN(TotalYield)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
This assumes that TotalYield does not decrease -- which your question implies.
If you actually want the next year's value, you can use LEAD():
SELECT (LEAD(MIN(TotalYield), 1, MAX(TotalYield) OVER (ORDER BY MIN(TimeStamp) -
MIN(TotalYield)
)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))

Populating a table with all dates in a given range in Google BigQuery

Is there any convenient way to populate a table with all dates in a given range in Google BigQuery? What I need are all dates from 2015-06-01 till CURRENT_DATE(), so something like this:
+------------+
| date |
+------------+
| 2015-06-01 |
| 2015-06-02 |
| 2015-06-03 |
| ... |
| 2016-07-11 |
+------------+
Optimally, the next step would be to also get all weeks between the two dates, i.e.:
+---------+
| week |
+---------+
| 2015-23 |
| 2015-24 |
| 2015-25 |
| ... |
| 2016-28 |
+---------+
I've been fiddling around with the following answers I found, but I can't get them to work, mostly because core functions aren't supported and I can't find proper ways to replace them.
Easiest way to populate a temp table with dates between and including 2 date parameters
Generate Dates between date ranges
Your help is very much appreciated!
Best,
Max
Mikhail's answer works for BigQuery's legacy sql syntax perfectly. This solution is a slightly easier one if you're using the standard SQL syntax.
BigQuery standard SQL syntax actually has a built in function, GENERATE_DATE_ARRAY for creating an array from a date range. It takes a start date, end date and INTERVAL. For example:
SELECT day
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)
) AS day
If you wanted the week and year you could use
SELECT EXTRACT(YEAR FROM day), EXTRACT(WEEK FROM day)
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 WEEK)
) AS day
all dates from 2015-06-01 till CURRENT_DATE()
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
all weeks between the two dates
SELECT YEAR(DAY) AS y, WEEK(DAY) AS w
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
)
GROUP BY y, w