SQL - Recursive average based on preceding row (AR model) - sql

I was wondering how to use either While or Recursion to create an AR(1) model.
In my database I have the following variables in one table (Y is a value):
Period
Values
20171
Y_0
20172
Y_1
20173
Y_2
20174
Y_3
20181
Y_4
I'm trying to create a query that will create a new column AR which is defined as:
Period
Value
AR
20171
Y_0
Y_0
20172
Y_1
AVG( AR_0 & Y_1)
20173
Y_2
AVG( AR_1 & Y_2)
such as the following:
Image of desired dataflow from excel
I tried the following:
SELECT Period , Values, Values as AR,
INTO #Beginning
FROM table
WHERE Period = (SELECT MIN(PERIOD) FROM table)
SELECT Period , Values, Values as AR,
FROM #Beginning
UNION ALL
SELECT Period , Values, NULL as AR,
FROM table
WHERE Period >(SELECT MIN(PERIOD) FROM table)
Which results in a table with the first row in the desired result. However I can't seem to get the rest of the AR column, since these are dependent on one another. As of this moment these are null.
Is it possible to use recursion in SQL to create a column, where each row is dependent on one column in the same row, and one column in the preceding row?

You would use window functions. For instance:
select period, value,
avg(value) over (order by period rows between 1 preceding and current row)
from t;

Related

Return column with the second largest date in the dataset

I have a simple query that returns the dataset and a measure that represents a column with the highest date found.
select *, max(date) over () as date_max
from table
I need to increment this query and insert a new column that returns the second highest date found (it will still contain that column with the highest date). Any idea how to do this?
You can just query for the second largest like
select max(date) from table where date < (select max(date) from table)
You can use this in a subquery in the main query for example. Alternatively, you can get the max date first, and then use it to get the second largest value, also using a window function like you did, like so:
with top1 as
(
select i, max(i) over () as largest
from d
)
select i, largest
-- ignore the largest value when aggregating
, max(case when i = largest then null else i end) over () as second_largest
from top1
Keep in mind both of them will return the actual second largest value, duplicates excluded (so 100, 100, 99 would return 99 as the second largest).
live demo on dbfiddle

How to compare the value of one row with the upper row in one column of an ordered table?

I have a table in PostgreSQL that contains the GPS points from cell phones. It has an integer column that stores epoch (the number of seconds from 1960). I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
I did it with GeoPandas. However, it is too slow. I want to do it inside the PostgreSQL. How can I compare each row of the ordered table with the previous row (to see if the epoch has a difference of 2 minutes or more)?
In fact, I do not know how to compare each row with the upper row.
You can use lag():
select t.*
from (select t.*,
lag(timestamp_epoch) over (partition by trip order by timestamp_epoch) as last_timestamp_epoch
from t
) t
where last_timestamp_epoch < timestamp_epoch - 120
I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
After comparing to the previous (or next) row, with the window function lag() (or lead()), form groups based on the gaps to get sub trip numbers:
SELECT *, count(*) FILTER (WHERE step) OVER (PARTITION BY trip ORDER BY timestamp_epoch) AS sub_trip
FROM (
SELECT *
, (timestamp_epoch - lag(timestamp_epoch) OVER (PARTITION BY trip ORDER BY timestamp_epoch)) > 120 AS step
FROM tbl
) sub;
Further reading:
Select longest continuous sequence

Count rows with equal values in a window function

I have a time series in a SQLite Database and want to analyze it.
The important part of the time series consists of a column with different but not unique string values.
I want to do something like this:
Value concat countValue
A A 1
A A,A 1
B A,A,B 1
B A,B,B 2
B B,B,B 3
C B,B,C 1
B B,C,B 2
I don't know how to get the countValue column. It should count all Values of the partition equal to the current rows Value.
I tried this but it just counts all Values in the partition and not the Values equal to this rows Value.
SELECT
Value,
group_concat(Value) OVER wind AS concat,
Sum(Case When Value Like Value Then 1 Else 0 End) OVER wind AS countValue
FROM TimeSeries
WINDOW
wind AS (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY
date
;
The query is also limited by these factors:
The query should work with any amount of unique Values
The query should work with any Partition Size (ROWS BETWEEN n PRECEDING AND CURRENT ROW)
Is this even possible using only SQL?
Here is an approach using string functions:
select
value,
group_concat(value) over wind as concat,
(
length(group_concat(value) over wind) - length(replace(group_concat(value) over wind, value, ''))
) / length(value) cnt_value
from timeseries
window wind as (order by date rows between 2 preceding and current row)
order by date;

Generate a random value in a row based on a value from another table

I want to create a large amount of mock data in a table (in Postgresql). The schema of the table looks like this
price float,
id id,
period timestamptz
For price, this will be a random float number between 1-5
For id, this will be a value from another table that contain all value in id column (which may have a lot of id)
For period, this will generate a random datetime value in a specific range of time.
Here, I want to create a single query that can generate all these rows equal to amount of id I have to a specific range of time that I select.
E.g.
Let say I have 3 ids (a,b,c) in another table and I want to generate time series between 2019-08-01 00:00:00+00 and 2019-08-05 00:00:00+00
The result from this query will generate value like this:
price id period
3.4 b 2019-08-03 10:01:22+00
2.5 a 2019-08-04 05:44:31+00
4.8 c 2019-08-04 14:51:10+00
The price and id are random. Also period, but with specific range. Key thing is that, all ids need to be generated.
Generating random number and datetime is not hard but how can I create a query that can generate rows based on all id gathered from another table.
Ps. I have edited the example which might mislead my question
This answers a reasonable interpretation of the original question.
Getting a random value from a second table can be a little tricky. If the second table is not too big, then this works:
select distinct on (gs.ts) gs.ts, ids.id, cast(random() * 4.1 + 1 as numeric(2, 1))
from generate_series('2019-08-01 00:00:00+00'::timestamp, '2019-08-05 00:00:00+00'::timestamp, interval '30 minute') gs(ts) cross join
ids
order by gs.ts, random()
Use the function make_timestamptz generating a random integer for each part, except year and month. This will create random timestamps. As for getting the id from another table just select from that table.
/*
function to generate random integers. (Lots of then needed.)
*/
create or replace function utl_gen_random_integer(
int1_in integer,
int2_in integer)
returns integer
language sql volatile strict
as
$$
/* return a random integer between, inclusively, two integers, relative values of the integers does not matter. */
with ord as ( select greatest(int1_in, int2_in) as hi
, least(int1_in, int2_in) as low
)
select floor(random()*(hi-low+1)+l)::integer from ord;
$$;
-- create the id source table and populate
create table id_source( id text) ;
insert into id_source( id)
with id_range as ( select 'abcdefgh'::text idl)
select substring(idl,utl_gen_random_integer(1,length(idl)), 1)
from id_range, generate_series(1,20) ;
And the generation query:
select trunc((utl_gen_random_integer(1,4) + (utl_gen_random_integer(0,100))/100.0),2) Price
, id
, make_timestamptz ( 2019 -- year
, 08 -- month
, utl_gen_random_integer(1,5) -- day
, utl_gen_random_integer(1,24)-1 -- hours
, utl_gen_random_integer(1,60)-1 -- min
, (utl_gen_random_integer(1,60)-1)::float -- sec
, '+00'
)
from id_source;
The result generates the time at UTC (+00). However any subsequent Postgres will display the result converted to local time with offset. To view in UCT append "at time zone 'UCT'" to the query.

Calculate stdev over a variable range in SQL Server

Table format is as follows:
Date ID subID value
-----------------------------
7/1/1996 100 1 .0543
7/1/1996 100 2 .0023
7/1/1996 200 1 -.0410
8/1/1996 100 1 -.0230
8/1/1996 200 1 .0121
I'd like to apply STDEV to the value column where date falls within a specified range, grouping on the ID column.
Desired output would like something like this:
DateRange, ID, std_v
1 100 .0232
2 100 .0323
1 200 .0423
One idea I've had that works but is clunky, involves creating an additional column (which I've called 'partition') to identify a 'group' of values over which STDEV is taken (by using the OVER function and PARTITION BY applied to 'partition' and 'ID' variables).
Creating the partition variable involves a CASE statement prior where a given record is assigned a partition based on its date falling within a given range (ie,
...
, partition = CASE
WHEN date BETWEEN '7/1/1996' AND '10/1/1996' THEN 1
WHEN date BETWEEN '10/1/1996' AND '1/1/1997' THEN 2
...
Ideally, I'd be able to apply STDEV and the OVER function partitioning on the variable ID and variable date ranges (eg, say, trailing 3 months for a given reference date). Once this works for the 3 month period described above, I'd like to be able to make the date range variable, creating an additional '#dateRange' variable at the start of the program to be able to run this for 2, 3, 6, etc month ranges.
I ended up coming upon a solution to my question.
You can join the original table to a second table, consisting of a unique list of the dates in the first table, applying a BETWEEN clause to specify desired range.
Sample query below.
Initial table, with columns (#excessRets):
Date, ID, subID, value
Second table, a unique list of dates in the previous, with columns (#dates):
Date
select d.date, er.id, STDEV(er.value)
from #dates d
inner join #excessRet er
on er.date between DATEADD(m, -36, d.date) and d.date
group by d.date, er.id
order by er.id, d.date
To achieve the desired next step referenced above (making range variable), simply create a variable at the outset and replace "36" with the variable.