Rolling Average in sqlite - sql

I want to calculate a rolling average in a table and keep track of the starting time of each calculated window frame.
My problem is, that I expect result count reduced compared of the rows in the table. But my query retuns the exact same row number. I think I understand why it does not work, but I don't know the remedy.
Let's say I have a table with example data that looks like this:
+------+-------+
| Tick | Value |
+------+-------+
| 1 | 1 |
| 2 | 3 |_
| 3 | 5 |
| 4 | 7 |_
| 5 | 9 |
| 6 | 11 |_
| 7 | 13 |
| 8 | 15 |_
| 9 | 17 |
| 10 | 19 |_
+------+-------+
I want to calculate the average of every nth item, for example of two rows (see marks above) so that I get an result of:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 3 | 6 |
| 5 | 10 |
| 7 | 14 |
| 9 | 18 |
+--------------+--------------+
I tried that with
SELECT
FIRST_VALUE(Tick) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) OccurredTick,
AVG(Value) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) ValueAverage
FROM TableName;
What I get in return is:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |
| 4 | 8 |
| 5 | 10 |
| 6 | 12 |
| 7 | 14 |
| 8 | 16 |
| 9 | 18 |
| 10 | 19 |
+--------------+--------------+

You could use aggregation. If tick is always increasing with no gaps:
select min(tick), avg(value) avg_value
from mytable
group by cast((tick - 1) / 2 as integer)
You can change 2 to whatever group size suits to best.
If tick are not sequentially increasing, we can generate a sequence with row_number()
select min(tick), avg(value) avg_value
from (
select t.*, row_number() over(order by tick) rn
from mytable t
) t
group by cast((rn - 1) / 2 as integer)

Related

SQL generate unique ID from rolling ID

I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+

SQL Windowing accumulative sum with grouping

I've got a table like this
|week_no|value|attribute|
-------------------------
| 1 | 3 | a |
| 2 | 3 | a |
| 3 | 3 | a |
| 1 | 4 | b |
| 2 | 4 | b |
| 3 | 4 | b |
I'd like to have an accumulative account of the value column
|week_no|value|attribute|accum_value|
-------------------------------------
| 1 | 3 | a | 3 |
| 2 | 3 | a | 6 |
| 3 | 3 | a | 9 |
| 1 | 4 | b | 4 |
| 2 | 4 | b | 8 |
| 3 | 4 | b | 12 |
I've attempted doing the above by using this windowing function though it doesn't seem to be returning the correct values
SUM(value) OVER(ORDER BY 1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS accum_value
The correct window function would use partition by:
SUM(value) OVER (PARTITION BY attribute ORDER BY week_no
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS accum_value

Alternation of positive and negative values

thank you for attention.
I have a table called "PROD_COST" with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
Now i need to group by Cost_change. It can be
positive,negative or 0 values.
+----+---+------+------+------+
| 1 | 1 | 10 | 8,5 | -1,5 |
| 2 | 1 | 8,5 | 12,2 | 3,7 |
| 3 | 1 | 12,2 | 5,3 | -6,9 |
| 4 | 1 | 5,3 | 4,2 | 1,2 |
| 5 | 1 | 4,2 | 6,2 | 2 |
| 6 | 1 | 6,2 | 9,2 | 3 |
| 7 | 1 | 9,2 | 7,5 | -2,7 |
| 8 | 1 | 7,5 | 6,2 | -1,3 |
| 9 | 1 | 6,2 | 6,3 | 0,1 |
| 10 | 1 | 6,3 | 7,2 | 0,9 |
| 11 | 1 | 7,2 | 7,5 | 0,3 |
| 12 | 1 | 7,5 | 0 | 7,5 |
+----+---+------+------+------+`
I need to make a query, which will group it by first negative or positive value (+ - + - + -). Last one field is what i want.
Sorry for my English `
+----+---+------+------+------+---+
| 1 | 1 | 10 | 8,5 | -1,5 | 1 |
| 2 | 1 | 8,5 | 12,2 | 3,7 | 2 |
| 3 | 1 | 12,2 | 5,3 | -6,9 | 3 |
| 4 | 1 | 5,3 | 4,2 | 1,2 | 4 |
| 5 | 1 | 4,2 | 6,2 | 2 | 4 |
| 6 | 1 | 6,2 | 9,2 | 3 | 4 |
| 7 | 1 | 9,2 | 7,5 | -2,7 | 5 |
| 8 | 1 | 7,5 | 6,2 | -1,3 | 5 |
| 9 | 1 | 6,2 | 6,3 | 0,1 | 6 |
| 10 | 1 | 6,3 | 7,2 | 0,9 | 6 |
| 11 | 1 | 7,2 | 7,5 | 0,3 | 6 |
| 12 | 1 | 7,5 | 0 | 7,5 | 6 |
+----+---+------+------+------+---+`
If you're in SQL Server 2012 you can use the window functions to do this:
select
id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select
*,
case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) then 1 else 0 end as GRP
from
PROD_COST
) X
Lag will get the value from previous row, check the sign of it and compare it to the current row. If the values don't match, the case will return 1. The outer select will do a running total of these numbers, and every time there is 1, it will increase the total.
It is possible to use the same logic with older versions too, you'll just have to fetch the previous row from the table using the id and do running total by re-calculating all rows before the current one.
Example in SQL Fiddle
James's answer is close but it doesn't handle the zero value correctly. This is a pretty easy modification. One tricky approximation uses differences between product changes:
select id, COST_CHANGE, sum(IsNewGroup) over (order by id asc) +1
from (select pc.*,
(case when sign(cost_change) - sign(lag(cost_change) over (order by id)) between -1 and 1
then 0
else 1 -- `NULL` intentionally goes here
end) IsNewGroup
from Prod_Cost
) pc
For clarity, here is a SQL Fiddle with zero values. From my understanding of the question, the OP only wants an actual sign change.
This may still not be correct. The OP simply is not clear about what to do about 0 values.

Oracle rank function issue

Iam experiencing an issue in oracle analytic functions
I want the rank in oracle to be displayed sequentialy but require a cyclic fashion.But this ranking should happen within a group.
Say I have 10 groups
In 10 groups each group must be ranked in till 9. If greater than 9 the rank value must start again from 1 and then end till howmuch so ever
emp id date1 date 2 Rank
123 13/6/2012 13/8/2021 1
123 14/2/2012 12/8/2014 2
.
.
123 9/10/2013 12/12/2015 9
123 16/10/2013 15/10/2013 1
123 16/3/2014 15/9/2015 2
In the above example the for the group of rows of the empid 123 i have split the rank in two subgroup fashion.Sequentially from 1 to 9 is one group and for the rest of the rows the rank again starts from 1.How to achieve this in oracle rank functions.
as per suggestion from Egor Skriptunoff above:
select
empid, date1, date2
, row_number() over(order by date1, date2) as "rank"
, mod(row_number() over(order by date1, date2)-1, 9)+1 as "cycle_9"
from yourtable
example result
| empid | date1 | date2 | rn | ranked |
|-------|----------------------|----------------------|----|--------|
| 72232 | 2016-10-26T00:00:00Z | 2017-03-07T00:00:00Z | 1 | 1 |
| 04365 | 2016-11-03T00:00:00Z | 2017-07-29T00:00:00Z | 2 | 2 |
| 79203 | 2016-12-15T00:00:00Z | 2017-05-16T00:00:00Z | 3 | 3 |
| 68638 | 2016-12-18T00:00:00Z | 2017-02-08T00:00:00Z | 4 | 4 |
| 75784 | 2016-12-24T00:00:00Z | 2017-11-18T00:00:00Z | 5 | 5 |
| 72836 | 2016-12-24T00:00:00Z | 2018-09-10T00:00:00Z | 6 | 6 |
| 03679 | 2017-01-24T00:00:00Z | 2017-10-14T00:00:00Z | 7 | 7 |
| 43527 | 2017-02-12T00:00:00Z | 2017-01-15T00:00:00Z | 8 | 8 |
| 03138 | 2017-02-26T00:00:00Z | 2017-01-30T00:00:00Z | 9 | 9 |
| 89758 | 2017-03-29T00:00:00Z | 2018-04-12T00:00:00Z | 10 | 1 |
| 86377 | 2017-04-14T00:00:00Z | 2018-10-07T00:00:00Z | 11 | 2 |
| 49169 | 2017-04-28T00:00:00Z | 2017-04-21T00:00:00Z | 12 | 3 |
| 45523 | 2017-05-03T00:00:00Z | 2017-05-07T00:00:00Z | 13 | 4 |
SQL Fiddle