How to calculate a value based on the previous row's value - sql

Having the following fields in a table...
+---------+---+---+
| myTime | x | y |
+---------+---+---+
| 13:00 | 0 | 0 |
| 13:05 | 2 | 1 |
| 13:10 | 4 | 2 |
| 13:15 | 1 | 3 |
+---------+---+---+
I need to generate a third one (z) as follows...
+---------+---+---+---+
| myTime | x | y | z |
+---------+---+---+---+
| 13:00 | 0 | 0 | 0 |
| 13:05 | 2 | 1 | 1 |
| 13:10 | 4 | 2 | 3 |
| 13:15 | 1 | 3 | 1 |
+---------+---+---+---+
In the first row z will have a value of 0 and in the next ones, z will be calculated as x-y + (previous row's) z.
I've tried using the row number for each record and LAG to try reading values from previous rows...
WITH rows_sorted AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY myTime) AS row_num
FROM table)
SELECT myTime, x, y
IF(row_num = 1, 0, x - y + LAG(z, 1) OVER (ORDER BY row_num)) AS z
FROM rows_sorted
ORDER BY row_num
...but evidently wouldn't work as in LAG(z, 1), z has not been generated yet.
Any suggestion on how such a thing can be done? I'm using standard SQL in Google BigQuery
Thanks in advance
Since the text above oversimplifies the real calculation, here's a closer approach to what I need to achieve:
+---------+----+----+----+
| myTime | x | y | z |
+---------+----+----+----+
| 13:00 | 15 | 22 | 0 |
| 13:05 | 7 | 21 | 0 |
| 13:10 | 7 | 5 | 2 |
| 13:15 | 9 | 16 | 0 |
| 13:20 | 14 | 5 | 9 |
+---------+----+----+----+
Where z for each row is calculated as follows:
WHEN row_number() = 1 THEN z = 0 (already achieved thanks to the
answer below)
WHEN x+(previous row's)z < y THEN z = 0
WHEN x+(previous row's)z >= y THEN z = x+(previous row's)z - y

Hmmm . . . You can get what you want using:
select t.*,
sum(x - y) over (order by mytime) as z
from t;
The first row has values of 0 for all the columns, so this works for your sample data. If you wanted to explicitly set it to 0, then:
select t.*,
(case when row_number() over order by mytime) = 1
then 0
else sum(x - y) over (order by mytime) - first_value(x - y) over (order by mytime)
end) as z
from t;
This subtracts out the value from the first row from the cumulative sum. However, that seems unnecessary.

Related

How to split 2 numbers into the equal ranges in PostgreSQL?

In PostgreSQL database I have table called layers. It looks like this:
| ID | TOTAL_SUBSCRIBERS | DENSITY |
|----|-------------------|---------|
| 1 | 34440 | |
| 2 | 41994 | |
| 3 | 102824 | |
| 4 | 19608 | |
| 5 | 1287 | |
| 6 | 4944 | |
I found max and min values of the TOTAL_SUBSCRIBERS column.
select
MIN(total_subscribers),
MAX(total_subscribers)
from
layers;
Right now I need to split the max and min into 6 range and check if each TOTAL_SUBSCRIBERS included in a certain interval. Depending on which interval is included in TOTAL_SUBSCRIBERS, I need to write the number of the interval in the DENSITY column.
For example in this table max value is 102824, min value is 1287.
RANGES:
102824 - 1287 = 101537
101537 / 6 = 16922.8333 ~ 16923
1 range: [1287-18210]
2 range: [18211-35133]
3 range: [35134-52056]
4 range: [52057-68979]
5 range: [68980-85902]
6 range: [85903-102825]
FINAL RESULT:
| ID | TOTAL_SUBSCRIBERS | DENSITY |
|----|-------------------|---------|
| 1 | 34440 | 3 | < 34440 in 3 range
| 2 | 41994 | 3 | < 41994 in 3 range
| 3 | 102824 | 6 | < 102824 in 6 range
| 4 | 19608 | 2 | < 19608 in 2 range
| 5 | 1287 | 1 | < 1287 in 1 range
| 6 | 4944 | 1 | < 4944 in 1 range
In a CTE calculate the min and max of TOTAL_SUBSCRIBERS and also the length of each interval and then cross join to the table to make the calculation:
with cte as (
select
min(TOTAL_SUBSCRIBERS) minsub,
((max(TOTAL_SUBSCRIBERS) - min(TOTAL_SUBSCRIBERS)) + 1) / 6 dist
from layers
)
select l.*,
(l.TOTAL_SUBSCRIBERS - c.minsub) / c.dist + 1 DENSITY
from layers l cross join (select * from cte) c
See the demo.
Results:
| id | total_subscribers | density |
| --- | ----------------- | ------- |
| 1 | 34440 | 2 |
| 2 | 41994 | 3 |
| 3 | 102824 | 6 |
| 4 | 19608 | 2 |
| 5 | 1287 | 1 |
| 6 | 4944 | 1 |
In your expected results the row with id = 1 should have DENSITY = 2, right?
Also your ranges should be:
1 range: [1287-18209]
2 range: [18210-35132]
3 range: [35133-52055]
4 range: [52056-68978]
5 range: [68979-85901]
6 range: [85902-102824]
so they are equally distrbuted.
I believe a combination of a generate_series and a int4range containment operation might be what you're looking for. The following code is tested on PostgreSQL 11 - see db fiddle, but should also work with 9.4+.
Sample data
CREATE TEMPORARY TABLE layers (id SERIAL, total_subscribers INT);
INSERT INTO layers (total_subscribers)
VALUES (34440),(41994),(102824),(19608),(1287),(4944);
Query
SELECT id, total_subscribers,range_id AS density,range_min,range_max
FROM layers,(
WITH j AS (
SELECT min(total_subscribers) AS min_value,
max(total_subscribers) AS max_value,
(max(total_subscribers)-min(total_subscribers))/count(*) AS var
FROM layers)
SELECT
generate_series(1,(SELECT count(*) FROM layers)) AS range_id,
generate_series(j.min_value, j.max_value-var, var)::INT AS range_min,
generate_series(j.min_value+var, j.max_value+var, var+min_value)::INT AS range_max
FROM j) j
WHERE layers.total_subscribers <# int4range(j.range_min, range_max)
ORDER BY id;
id | total_subscribers | density | range_min | range_max
----+-------------------+---------+-----------+-----------
1 | 34440 | 2 | 18209 | 36418
2 | 41994 | 3 | 35131 | 54627
3 | 102824 | 6 | 85897 | 109254
4 | 19608 | 2 | 18209 | 36418
5 | 1287 | 1 | 1287 | 18209
6 | 4944 | 1 | 1287 | 18209
(6 Zeilen)
Further reading: Common Table Expressions (CTE)
select the Min value and divide the value of Max - Min by six as Range from layers as table b.
select min(TOTAL_SUBSCRIBERS) as M,
(max(TOTAL_SUBSCRIBERS)-min(TOTAL_SUBSCRIBERS))/6 as R from layers b
Then select all data from layers and using TOTAL_SUBSCRIBERS to minus Min(TOTAL_SUBSCRIBERS) and divide by Range plus 1 then you can know which range(1-6) the TOTAL_SUBSCRIBERS is.
select a.*,((a.TOTAL_SUBSCRIBERS-b.M)/b.R)+1 as DENSITY from(
select layers.ID ,layers.TOTAL_SUBSCRIBERS from layers )a,
(select min(TOTAL_SUBSCRIBERS) as M,
(max(TOTAL_SUBSCRIBERS)-min(TOTAL_SUBSCRIBERS))/6 as R from layers) b

Multiply with Previous Value in Oracle SQL

Its easy to multiply (or sum/divide/etc.) with previous row in Excel spreadsheet, however, I could not do it so far in Oracle SQL.
A B C
199901 3.81 51905
199902 -6.09 48743.9855
199903 4.75 51059.32481
199904 6.39 54322.01567
199905 -2.35 53045.4483
199906 2.65 54451.15268
199907 1.1 55050.11536
199908 -1.45 54251.88869
199909 0 54251.88869
199910 4.37 56622.69622
Above, column B is static and column C has the formula as:
((B2/100)+1)*C1
((B3/100)+1)*C2
((B4/100)+1)*C3
Example: 51905 from row 1 multiplied with -6.09 from row 2:
((-6.09/100)+1)*51905
I have been trying analytical and window functions, but not succeeded yet. LAG function can give previous row value in current row, but cannot give calculated previous value.
This can be done with a help of MODEL clause
select *
FROM (
SELECT t.*,
row_number() over (order by a) as rn
from table1 t
)
MODEL
DIMENSION BY (rn)
MEASURES ( A, B, 0 c )
RULES (
c[rn=1] = 51905, -- value in a first row
c[rn>1] = round( c[cv()-1] * (b[cv()]/100 +1), 6 )
)
;
Demo: http://sqlfiddle.com/#!4/9756ed/11
| RN | A | B | C |
|----|--------|-------|--------------|
| 1 | 199901 | 3.81 | 51905 |
| 2 | 199902 | -6.09 | 48743.9855 |
| 3 | 199903 | 4.75 | 51059.324811 |
| 4 | 199904 | 6.39 | 54322.015666 |
| 5 | 199905 | -2.35 | 53045.448298 |
| 6 | 199906 | 2.65 | 54451.152678 |
| 7 | 199907 | 1.1 | 55050.115357 |
| 8 | 199908 | -1.45 | 54251.888684 |
| 9 | 199909 | 0 | 54251.888684 |
| 10 | 199910 | 4.37 | 56622.696219 |

Oracle SQL - Summing a time series partition based on month

I am working with a data set that looks like this:
MTD | ID | Active
-----------------------
01-APR-16 | A | y
01-MAY-16 | A | y
01-JUN-16 | A | n
01-JUL-16 | A | y
01-AUG-16 | A | n
01-APR-16 | B | n
01-MAY-16 | B | y
01-JUN-16 | B | y
01-JUL-16 | B | y
01-AUG-16 | B | y
I would like to add a count column to the data set that counts the number of times an ID has been active ('y') AFTER the current MTD. The desired output is:
MTD | ID | Active | COUNT
-------------------------------
01-APR-16 | A | y | 2
01-MAY-16 | A | y | 1
01-JUN-16 | A | n | 1
01-JUL-16 | A | y | 0
01-AUG-16 | A | n | 0
01-APR-16 | B | n | 4
01-MAY-16 | B | y | 3
01-JUN-16 | B | y | 2
01-JUL-16 | B | y | 1
01-AUG-16 | B | y | 0
The query I am thinking of is:
SELECT
MTD,
ID,
ACTIVE,
SUM(CASE WHEN MTD > (current records MTD)
AND ACTIVE = 'y' THEN 1 ELSE 0 END)
OVER (PARTITION BY ID)
as COUNT
I'm not sure how to compare each record's MTD to the current record's MTD in the window sum. How can I amend the first line of the case statement?
Thank you,
Ryan Barker
Use count() over() with a range specification so you look at the rows following the current row (for each id) for an active flag y and only count them. This assumes mtd is a date column for the ordering to work.
SELECT
MTD,
ID,
ACTIVE,
COUNT(case when active='y' then 1 end) OVER(partition by ID order by mtd range between 1 following and unbounded following)
FROM your_table
Sample Demo
To me, it looks like you want to sum the number of rows with a "y" in reverse. Something like this:
select t.*,
greatest(sum(case when active = 'y' then 1 else 0 end) over (partition by id order by mtd desc) - 1,
0)
from t;
Your idea is quite close. You just need an order by in the partitioning clause.

Window running function except current row

I have a theoretical question, so I'm not interested in alternative solutions. Sorry.
Q: Is it possible to get the window running function values for all previous rows, except current?
For example:
with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over (partition by x order by i) - y as sum,
max(y) over (partition by x order by i) as max,
count(*) filter (where y > 2) over (partition by x order by i) as cnt
from
t;
Actual result is
i | x | y | sum | max | cnt
---+---+---+-----+-----+-----
1 | 1 | 1 | 0 | 1 | 0
2 | 1 | 3 | 1 | 3 | 1
3 | 1 | 2 | 4 | 3 | 1
4 | 2 | 4 | 0 | 4 | 1
5 | 2 | 2 | 4 | 4 | 1
6 | 2 | 8 | 6 | 8 | 2
(6 rows)
I want to have max and cnt columns behavior like sum column, so, result should be:
i | x | y | sum | max | cnt
---+---+---+-----+-----+-----
1 | 1 | 1 | 0 | | 0
2 | 1 | 3 | 1 | 1 | 0
3 | 1 | 2 | 4 | 3 | 1
4 | 2 | 4 | 0 | | 0
5 | 2 | 2 | 4 | 4 | 1
6 | 2 | 8 | 6 | 4 | 1
(6 rows)
It can be achieved using simple subquery like
select t.*, lag(y,1) over (partition by x order by i) as yy from t
but is it possible using only window function syntax, without subqueries?
Yes, you can. This does the trick:
with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over w as sum,
max(y) over w as max,
count(*) filter (where y > 2) over w as cnt
from t
window w as (partition by x order by i
rows between unbounded preceding and 1 preceding);
The frame_clause selects just those rows from the window frame that you are interested in.
Note that in the sum column you'll get null rather than 0 because of the frame clause: the first row in the frame has no row before it. You can coalesce() this away if needed.
SQLFiddle

BigQuery: running last value and table join

Table_1 is my Sales table:
Time | item | ...
-----------------
1 | X | ...
1 | Y | ...
2 | X | ...
4 | X | ...
6 | X | ...
6 | Y | ...
Table_2 is my Cost table
Time | item | Cost
-----------------
1 | X | a
1 | Y | b
3 | X | c
4 | X | d
4 | Y | e
5 | X | f
What I'm trying to achieve is:
For each row in Table_1, get the latest Cost value from Table_2 (i.e. with at most, Table_1 row's Time)
The result should look like this:
Time | item | ... | Cost
------------------------
1 | X | ... | a
1 | Y | ... | b
2 | X | ... | a
4 | X | ... | d
6 | X | ... | f
6 | Y | ... | e
(I know it's straight forward with traditional SQL using a subquery in the SELECT section or unequal joins, but BigQuery doesn't allow it)
Try below:
SELECT sales.time AS [time], sales.item AS item, cost
FROM (
SELECT sales.item, sales.time, cost,
cost.time - sales.time AS delta,
ROW_NUMBER() OVER(PARTITION BY sales.item, sales.time ORDER BY delta DESC) AS win
FROM Table_1 as sales
LEFT JOIN Table_2 as cost
ON sales.item = cost.item
WHERE cost.time - sales.time <= 0
)
WHERE win = 1
ORDER BY 1, 2
Should give you exactly result you expect
time item cost
1 x a
1 y b
2 x a
4 x d
6 x f
6 y e