Get Data at the interval of 10 minutes - sql

I have a table which has the below structure.
+ ----------------------+--------------+--------+---------+
| timeStamp | value | type | id |
+ ----------------------+--------------+--------+---------|
| '2010-01-14 00:00:00' | '11787.3743' | 'mean' | 1 |
| '2018-04-03 00:07:21' | '9.9908' | 'std' | 1 |
| '2018-04-03 00:10:00' | '11787.3743' | 'min' | 1 |
+ ----------------------+--------------+--------+---------+
Now i want to write a select query where i can fetch the data on the basis of type.
Here you can see i want the data at the interval of 10 minutes only and the column 'mean_type'/'min_type'/'std_type' should be dynamic and should be made using the concat query like concat(id,'_','mean')
+ ----------------------+--------------+-------------+----------+
| timeStamp | 1_mean | 1_min | 1_std |
+ ----------------------+--------------+-------------+----------+
| '2010-01-14 00:00:00' | '11787.3743' | | |
| '2018-04-03 00:10:00' | | '11787.3743 | |
+ ----------------------+--------------+-------------+----------+
I have used the below query but it is not working :-
Query :-
select
to_timestamp(floor((extract('epoch' from m.timeStamp) / 600 )) * 600)
AT TIME ZONE 'UTC' as t,
case type when 'mean' then value end as concat(id,'mean'),
case type when 'min' then value end as concat(id,'min'),
case type when 'std' then value end as concat(id,'std'),
from measure m
where id=1
GROUP by t
order by t;
I am using postgres DB.

Related

SQL SUM label summary

Is there any possibility in SQL to add a ROW with a summary MEAN, for example, sum and average. For example, something like this
| 2021-01 | 16 |
| 2020-12 | 15 |
| -------- | -------------- |
| SUM | 31 |
| Mean | 15.5 |
My code:
proc sql;
create table diff as
select today.policy_vintage
, today.number_policy as POLICY_TODAY
, prior.number_policy as POLICY_PRIOR
, today.number_policy - prior.number_policy as DIFFERENCE
, avg(prior.number_policy) as POLICY_MEAN_PRIOR
, today.number_policy - mean(prior.number_policy) as DIFFRENCE_MEAN
from policy_vintage_weekly today
LEFT JOIN
(select *
from _work.POLICY_VINTAGE_WEEKLY
where run_date < today()
having run_date = max(run_date)
) prior
ON today.policy_vintage = prior.policy_vintage
;
quit;
If your table contains:
Date
value
2021-01-00
16
2020-12-00
15
Than this query will get you the result you want:
SELECT * FROM test.test
union
select "SUM", sum(value) from test.test
union
select "Mean", avg(value) from test.test;
+------------+---------+
| date | value |
+------------+---------+
| 2021-01-00 | 16.0000 |
| 2020-12-00 | 15.0000 |
| SUM | 31.0000 |
| Mean | 15.5000 |
+------------+---------+
4 rows in set (0.000 sec)
Tested on Mariadb 10.6.4
But having said that, it would be more something that is calculated in some client software you are using.

30 day rolling count of distinct IDs

So after looking at what seems to be a common question being asked and not being able to get any solution to work for me, I decided I should ask for myself.
I have a data set with two columns: session_start_time, uid
I am trying to generate a rolling 30 day tally of unique sessions
It is simple enough to query for the number of unique uids per day:
SELECT
COUNT(DISTINCT(uid))
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '30 days'
it is also relatively simple to calculate the daily unique uids over a date range.
SELECT
DATE_TRUNC('day',session_start_time) AS "date"
,COUNT(DISTINCT uid) AS "count"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date(session_start_time)
I then I tried several ways to do a rolling 30 day unique count over a time interval
SELECT
DATE(session_start_time) AS "running30day"
,COUNT(distinct(
case when date(session_start_time) >= running30day - interval '30 days'
AND date(session_start_time) <= running30day
then uid
end)
) AS "unique_30day"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '3 months'
GROUP BY date(session_start_time)
Order BY running30day desc
I really thought this would work but when looking into the results, it appears I'm getting the same results as I was when doing the daily unique rather than the unique over 30days.
I am writing this query from Metabase using the SQL query editor. the underlying tables are in redshift.
If you read this far, thank you, your time has value and I appreciate the fact that you have spent some of it to read my question.
EDIT:
As rightfully requested, I added an example of the data set I'm working with and the desired outcome.
+-----+-------------------------------+
| UID | SESSION_START_TIME |
+-----+-------------------------------+
| | |
| 10 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 5 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 3 | 2020-01-18T02:49:23.000-05:00 |
| | |
| 9 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 2 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 8 | 2020-03-31T23:13:33.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 2 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 9 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 8 | 2020-09-15T16:40:29.000-04:00 |
| | |
| 3 | 2020-09-21T20:49:09.000-04:00 |
| | |
| 1 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 6 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 5 | 2020-12-12T04:42:00.000-05:00 |
+-----+-------------------------------+
bellow is what the result I would like looks like:
+------------+---------------------+
| DATE | UNIQUE 30 DAY COUNT |
+------------+---------------------+
| | |
| 2020-01-13 | 3 |
| | |
| 2020-01-18 | 1 |
| | |
| 2020-03-06 | 3 |
| | |
| 2020-03-31 | 1 |
| | |
| 2020-08-28 | 4 |
| | |
| 2020-09-15 | 2 |
| | |
| 2020-09-21 | 1 |
| | |
| 2020-11-05 | 2 |
| | |
| 2020-12-12 | 2 |
+------------+---------------------+
Thank you
You can approach this by keeping a counter of when users are counted and then uncounted -- 30 (or perhaps 31) days later. Then, determine the "islands" of being counted, and aggregate. This involves:
Unpivoting the data to have an "enters count" and "leaves" count for each session.
Accumulate the count so on each day for each user you know whether they are counted or not.
This defines "islands" of counting. Determine where the islands start and stop -- getting rid of all the detritus in-between.
Now you can simply do a cumulative sum on each date to determine the 30 day session.
In SQL, this looks like:
with t as (
select uid, date_trunc('day', session_start_time) as s_day, 1 as inc
from users_sessions
union all
select uid, date_trunc('day', session_start_time) + interval '31 day' as s_day, -1
from users_sessions
),
tt as ( -- increment the ins and outs to determine whether a uid is in or out on a given day
select uid, s_day, sum(inc) as day_inc,
sum(sum(inc)) over (partition by uid order by s_day rows between unbounded preceding and current row) as running_inc
from t
group by uid, s_day
),
ttt as ( -- find the beginning and end of the islands
select tt.uid, tt.s_day,
(case when running_inc > 0 then 1 else -1 end) as in_island
from (select tt.*,
lag(running_inc) over (partition by uid order by s_day) as prev_running_inc,
lead(running_inc) over (partition by uid order by s_day) as next_running_inc
from tt
) tt
where running_inc > 0 and (prev_running_inc = 0 or prev_running_inc is null) or
running_inc = 0 and (next_running_inc > 0 or next_running_inc is null)
)
select s_day,
sum(sum(in_island)) over (order by s_day rows between unbounded preceding and current row) as active_30
from ttt
group by s_day;
Here is a db<>fiddle.
I'm pretty sure the easier way to do this is to use a join. This creates a list of all the distinct users who had a session on each day and a list of all distinct dates in the data. Then it one-to-many joins the user list to the date list and counts the distinct users, the key here is the expanded join criteria that matches a range of dates to a single date via a system of inequalities.
with users as
(select
distinct uid,
date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01'),
dates as
(select
distinct date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01')
select
count(distinct uid),
dates.dt
from users
join
dates
on users.dt >= dates.dt - 29
and users.dt <= dates.dt
group by dates.dt
order by dt desc
;

SQL Count In Range

How could I count data in range which could be configured
Something like this,
CAR_AVBL
+--------+-----------+
| CAR_ID | DATE_AVBL |
+--------------------|
| JJ01 | 1 |
| JJ02 | 1 |
| JJ03 | 3 |
| JJ04 | 10 |
| JJ05 | 13 |
| JJ06 | 4 |
| JJ07 | 10 |
| JJ08 | 1 |
| JJ09 | 23 |
| JJ10 | 11 |
| JJ11 | 20 |
| JJ12 | 3 |
| JJ13 | 19 |
| JJ14 | 22 |
| JJ15 | 7 |
+--------------------+
ZONE_CFG
+--------+------------+
| DATE | ZONE_DESCR |
+--------+------------+
| 15 | GREEN_ZONE |
| 25 | YELLOW_ZONE|
| 30 | RED_ZONE |
+--------+------------+
Table ZONE_CFG is configurable, so I could not use static value for this
The DATE column mean maximum date for each ZONE
And the result what I expected :
+------------+----------+
| ZONE_DESCR | AVBL_CAR |
+------------+----------+
| GREEN_ZONE | 11 |
| YELLOW_ZONE| 4 |
| RED_ZONE | 0 |
+------------+----------+
Please could someone help me with this
You can use LAG and group by as following:
SELECT
ZC.ZONE_DESCR,
COUNT(1) AS AVBL_CAR
FROM
CAR_AVBL CA
JOIN ( SELECT
ZONE_DECR,
COALESCE(LAG(DATE) OVER(ORDER BY DATE) + 1, 0) AS START_DATE,
DATE AS END_DATE
FROM ZONE_CFG ) ZC
ON ( CA.DATE_AVBL BETWEEN ZC.START_DATE AND ZC.END_DATE )
GROUP BY
ZC.ZONE_DESCR;
Note: Don't use oracle preserved keywords (DATE, in your case) as the name of the columns. Try to change it to something like DATE_ or DATE_START or etc..
Cheers!!
If you want the zero 0, I might suggest a correlated subquery instead:
select z.*,
(select count(*)
from car_avbl c
where c.date_avbl >= start_date and
c.date_avbl <= date
) as avbl_car
from (select z.*,
lag(date, 1, 0) as start_date
from zone_cfg z
) z;
In Oracle 12C, can phrase this using a lateral join:
select z.*,
(c.cnt - lag(c.cnt, 1, 0) over (order by z.date)) as cnt
from zone_cfg z left join lateral
(select count(*) as cnt
from avbl_car c
where c.date_avbl <= z.date
) c
on 1=1

Arranging the data on the basis of column value

I have a table which has the below structure.
+ ----------------------+--------------+--------+
| timeStamp | value | type |
+ ----------------------+--------------+--------+
| '2010-01-14 00:00:00' | '11787.3743' | 'mean' |
| '2018-04-03 14:19:21' | '9.9908' | 'std' |
| '2018-04-03 14:19:21' | '11787.3743' | 'min' |
+ ----------------------+--------------+--------+
Now i want to write a select query where i can fetch the data on the basis of type.
+ ----------------------+--------------+-------------+----------+
| timeStamp | mean_type | min_type | std_type |
+ ----------------------+--------------+-------------+----------+
| '2010-01-14 00:00:00' | '11787.3743' | | |
| '2018-04-03 14:19:21' | | | '9.9908' |
| '2018-04-03 14:19:21' | | '11787.3743 | |
+ ----------------------+--------------+-------------+----------+
Please help me how can i do this in postgres DB by writing a query.I also want to get the data at the interval of 10 minutes only.
Use CASE ... WHEN ...:
with my_table(timestamp, value, type) as (
values
('2010-01-14 00:00:00', 11787.3743, 'mean'),
('2018-04-03 14:19:21', 9.9908, 'std'),
('2018-04-03 14:19:21', 11787.3743, 'min')
)
select
timestamp,
case type when 'mean' then value end as mean_type,
case type when 'min' then value end as min_type,
case type when 'std' then value end as std_type
from my_table;
timestamp | mean_type | min_type | std_type
---------------------+------------+------------+----------
2010-01-14 00:00:00 | 11787.3743 | |
2018-04-03 14:19:21 | | | 9.9908
2018-04-03 14:19:21 | | 11787.3743 |
(3 rows)

SQL to group time series meeting a criteria, according to start and end time

I am analyzing power systems time series data, and I am trying to find the contiguous data points that meet a certain boolean flag.
I would like to query this table by returning the start and end time corresponding to the inflection points wherein the value changed from 1 to 0, and 0 to 1.
How should go about implementing the pseudo-sql code below?
SELECT Time
FROM InputTable
WHERE InputTable.Value = 1
INTO OutputTable??, TimeStart??, TimeEnd??;
Input:
+-------+---------+------+
| Index | Time | Value|
+-------+---------+------+
| 0 | 00:00:01| 1 |
| 1 | 00:00:02| 1 |
| 2 | 00:00:03| 1 |
| 3 | 00:00:04| 0 |
| 4 | 00:00:05| 1 |
| 5 | 00:00:06| 1 |
| 6 | 00:00:07| 0 |
| 7 | 00:00:08| 1 |
+-------+---------+------+
Output:
+-------+-----------+----------+
| Index | TimeStart | TimeEnd |
+-------+-----------+----------+
| 0 | 00:00:01 | 00:00:03 |
| 1 | 00:00:05 | 00:00:06 |
| 2 | 00:00:08 | 00:00:08 |
+-------+-----------+----------+
You need to group the values based on adjacent "1"s. This is tricky in MS Access. One method that can be used in Access is to count the number of "0"s (or non-"1" values) before each row.
select ind, min(time), max(time)
from (select t.*,
(select 1 + count(*)
from inputtable as t2
where t2.value = 0 and t2.time < t.time
) as ind
from inputtable as t
) as t
where value = 1
group by ind