Dealing with consecutive rows calculations

Dealing with consecutive rows calculations - sql

Assume the following situation:
Week 1:
0 previous cases
10 new cases
3 resolved cases
Week 2:
7 previous cases
13 new cases
15 resolved cases
Week 3:
5 previous cases
6 new cases
7 resolved cases
This information is stored in a resumed table of the sort:
RESUME_TABLE:
WEEK | TOTAL_NEW | TOTAL_SOLVED
1 | 10 | 3
2 | 13 | 15
3 | 6 | 7
I am having a hard time build a query to obtain the following result:
REPORT_TABLE:
WEEK | PREV_TOTAL | NEW_CASES | SOLVED_CASES | NEW_TOTAL
1 | 0 | 10 | 3 | 7
2 | 7 | 13 | 15 | 5
3 | 5 | 6 | 7 | 4
The idea seems pretty trivial, NEW_TOTAL = PREV_TOTAL + NEW_CASES - SOLVED_CASES, though I have been struggling with the idea of carrying the PREV_TOTAL to the next row in order to go on.
I am trying to do it using a view over the RESUME table (Oracle 11g).
Can someone help me with some example code?

Pretty simple and neat with analytic functions:
12:57:06 HR#vm_xe> l
1 select week
2 ,lag(total_cases_by_now - total_solved_by_now) over (order by week) prev_total
3 ,total_new new_cases
4 ,total_solved solved_cases
5 ,total_cases_by_now - total_solved_by_now new_total
6 from (
7 select week
8 ,total_new
9 ,total_solved
10 ,sum(total_new) over(order by week asc) as total_cases_by_now
11 ,sum(total_solved) over (order by week asc) as total_solved_by_now
12 from resume_table
13* )
12:57:07 HR#vm_xe> /
WEEK PREV_TOTAL NEW_CASES SOLVED_CASES NEW_TOTAL
---------- ------------ ---------- ------------ ----------
1 10 3 7
2 7 13 15 5
3 5 6 7 4
3 rows selected.
Elapsed: 00:00:00.01

You can solve this with MODEL clause:
with resume_table as
(
select 1 week, 10 total_new, 3 total_solved from dual union all
select 2 week, 13 total_new, 15 total_solved from dual union all
select 3 week, 6 total_new, 7 total_solved from dual
)
select week, prev_total, total_new, total_solved, new_total
from resume_table
model
dimension by (week)
measures (total_new, total_solved, 0 prev_total, 0 new_total)
rules sequential order
(
new_total[any] order by week =
nvl(new_total[cv(week)-1], 0) + total_new[cv()] - total_solved[cv()]
,prev_total[any] order by week = nvl(new_total[cv(week)-1], 0)
)
order by week;
Although this makes the assumption that WEEK is always a consecutive number. If that's not true, you will want to add a row_number(). Otherwise, the -1 may not reference the previous value.
See this SQL Fiddle.

Add one column in RESUME_TABLE (or create a view, which I think may be better):
RESUME_LEFT
WEEK | LEFT
1 | 7
2 | -2
3 | -1
Something like this:
CREATE VIEW resume_left
(SELECT week,total_new-total_solved "left" FROM resume_table)
So in REPORT_TABLE, you can have an definition like this:
PREV_TOTAL=(SELECT sum(left) FROM RESUME_LEFT WHERE week<REPORT_TABLE.week)
Edit
OK, the view is unnecessary:
PREV_TOTAL=(SELECT sum(total_new)-sum(total_solved)
FROM resume_table
WHERE week<REPORT_TABLE.week)

Related

How to combine and sum consequent values until new value in column

I need some help with summing subsequent values of a column based on category in another column, until that category reaches new value. Here's what my data looks like
id | site_id | date_id | hour_id | location_id | status | status_minutes
1 1 20210101 1 1 Offline 60
2 1 20210101 2 1 Offline 57
3 1 20210101 2 1 Available 3
4 1 20210101 3 1 Available 20
5 1 20210101 3 1 Offline 40
... ... ... ... ... ... ...
25 1 20210101 23 1 Offline 60
26 1 20210102 0 1 Offline 23
As you can see in the above data is at hourly level, and so if status minutes column equals to 60, it'll be just one row for that hour. However, if not, then status minutes will be spread across rows that would add up to 60, as you can see in rows 2 and 3, and in rows 4 and 5.
Now, my goal is to understand stretches of time of how long each status was going on, until next status kicked in. So the output for the example above would be:
site_id | date_id | location_id | status | status_minutes
1 20210101 1 Offline 117
1 20210101 1 Available 23
1 20210101 1 Offline 40
... ... ... ... ...
1 20210101 1 Offline 60
1 20210102 1 Offline 23
Important part is that this operation should only be confined within each day, as seen in the last two rows of example and the output. So the summing happens only within a given day, and then starts again with the 0th hour next day.

This is a gaps and island problem. The section_num is being used to determine groups before finding the total status_minutes.
You may try the following:
SELECT
site_id,
date_id,
location_id,
status,
SUM(status_minutes) as status_minutes
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id
ORDER BY hour_id
) - ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id,status
ORDER BY hour_id
) as section_num
FROM
my_table
) t
GROUP BY
site_id,
date_id,
location_id,
status,
section_num
ORDER BY
site_id,
date_id,
location_id,
section_num
View working demo on db fiddle

calculate avg(value) for last 10 records postgresql

i have a tricky task,
lets assume we have table "Racings", and there we have columns TRACK, CAR, CIRCLE_TIME
here is an example how data could be look like:
id
track
car
circle_time
10
1
10
15
9
1
10
14
8
1
10
16
7
1
10
15
6
1
10
13
5
2
10
7
4
2
10
4
3
2
10
5
2
3
10
8
1
3
10
10
what i need, i to add one more coumn like avg3_circle_time which will show me an average time from last 3 circle_time from each track, example:
id
track
car
circle_time
avg3_circle_time
10
1
10
15
15
9
1
10
14
15
8
1
10
16
14.6
7
1
10
15
null
6
1
10
13
null
5
2
10
7
5.3
4
2
10
4
null
3
2
10
5
null
2
3
10
8
null
1
3
10
10
null
I know how it could works in oracle, you could use something like rowid, but in case of postgresql i don't know, i have a draft like .....avg(circle_time) OVER(PARTITION BY track,car.....) as avg3_circle_time..... help me to solve that task please

You can use window functions to calculate moving averages:
SELECT track, id, car, circle_time, AVG(circle_time) OVER (
PARTITION BY track
ORDER BY id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
FROM t
ORDER BY track, id
Depending on your definition of previous three, the window could be ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING.

If you want only values when at least 3 circles available
select *
, case when lag(id, 2) over(partition by TRACK, CAR order by id) is not null then
avg(CIRCLE_TIME) over(partition by TRACK, CAR order by id rows between 2 preceding and current row) end a
from Racing
order by id desc;
db<>fiddle
Output
id track car circle_time a
10 1 10 15 15.0000000000000000
9 1 10 14 15.0000000000000000
8 1 10 16 14.6666666666666667
7 1 10 15 null
6 1 10 13 null
5 2 10 7 5.3333333333333333
4 2 10 4 null
3 2 10 5 null
2 3 10 8 null
1 3 10 10 null

Use LAED() then checking one of the next 2 rows is NULL or not. THEN sum of three values for calculating average.
-- PostgreSQL
SELECT *
, CASE WHEN next_circle_time IS NULL OR next_next_circle_time IS NULL
THEN NULL
ELSE ((t.circle_time + COALESCE(next_circle_time, 0) + COALESCE(next_next_circle_time, 0)) / 3 :: DECIMAL) :: DECIMAL(10, 1)
END avg_circle_time
FROM (SELECT *
, LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) next_circle_time
, LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) next_next_circle_time
FROM Racings) t
Another way Use AVG()
SELECT *
, CASE WHEN LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
OR LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
THEN NULL
ELSE AVG(circle_time) OVER (PARTITION BY track ORDER BY id DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END :: DECIMAL(10, 2) avg_circle_time
FROM Racings
Please check from url where both query exists https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f0cd868623725a1b92bf988cfb2deba3

Several of the posted answers end up repeating the window definition. You can avoid this with the window clause:
select *,
case when row_number() over(track_window) > 2
then trunc(avg(CIRCLE_TIME) over(track_window rows 2 preceding), 1)
end a
from Racing
window track_window as (partition by track order by id)
order by id desc
Note how, in this sample, track_window is defined once, then reused for both row_number and avg. In the latter case, the window clause is embellished with a frame as well (rows 2 preceding).

using 'parititon by' and window functions to return more then one row in postgres?

:)
I'm learning Postgresql, working with version 11.4
there are rows in my table that relate to other rows by their common id (lets call the column common_id ) and by direction (1 or 2).
each common_id can and must have only one direction 1, it can have 0 to 10 rows from direction 2.
the table is huge so I don't want to join to the same table, I have two scenarios, I solved one.. didn't solve the other.
so lets say I have the following table:
common_id | direction | price | time
1 1 0 1
2 1 1 4
2 2 2.5 5
3 1 5 8
3 2 7 10
3 2 10 12
so the first scenario is to connect the row with direction 1 to the row of direction 2 with the newest time. of course all in the same common_id.
so here I can just do PARTITION BY common_id order by direction, time RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and the first_value() is the line with direction 1 and last_value() is the line with direction 2 if the count(1) is greater then 1, if not then there is no direction 2.
so this is the result:
common_id | price_1 | time_1 | price_2 | time_2
1 0 1 null null
2 1 4 2.5 5
3 5 8 10 12
this work blazing fast and I'm happy. so here for every common_id I have one line. but what happens when I need to have more then one line for each partition ?
the 2nd scenario is that I need to get for each common_id all the lines with direction 2 and to connect to each of the lines the row with same common_id with direction 1.
so here the expected result should be:
common_id | price_1 | time_1 | price_2 | time_2
2 1 4 2.5 5
3 5 8 7 10
3 5 8 10 12
I would love if there is a way to resolve it using PARTITION BY, if not some other solution but I cannot use another join to that same table because of performance issue since it's a really huge table.
I hope I explained myself properly.
thank you

Try this (untested) query:
SELECT common_id,
price_1,
time_1,
price_2,
time_2
FROM (SELECT common_id,
direction,
first_value(price) OVER w AS price_1,
first_value(time) OVER w AS time_1,
price AS price_2,
time AS time_2
FROM atable
WINDOW w AS (PARTITION BY common_id
ORDER BY direction
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) q
WHERE direction = 2;

sql best strategy to partition same values based on temporal sequence

I have data that looks like this, where there are multiple values for each ID that correspond to an ascending date variable:
ID LEVEL DATE
1 10 10/1/2000
1 10 11/20/2001
1 10 12/01/2001
1 30 02/15/2002
1 30 02/15/2002
1 20 05/17/2002
1 20 01/04/2003
1 30 07/20/2003
1 30 03/16/2004
1 30 04/15/2004
I want to acquire a count per each ID/LEVEL/DATE block that looks like this:
ID LEVEL COUNT
1 10 3
1 30 2
1 20 2
1 30 3
The problem is that if I use the count windows function and partition by level, it groups 30 together regardless of the temporal sequence. I want the count for level 30 both before and after 20 to be distinct. Does anyone know how to do that?

A standard gaps and islands solution using ROW_NUMBER(), if it's available on your particular DBMS...
WITH
ordered AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS set_ordinal,
ROW_NUMBER() OVER (PARTITION BY id, level ORDER BY date) AS grp_ordinal
FROM
yourData
)
SELECT
id,
level,
set_ordinal - grp_ordinal,
MIN(date),
COUNT(*)
FROM
ordered
GROUP BY
id,
level,
set_ordinal - grp_ordinal
ORDER BY
id,
MIN(date)
Visualising the effect of the two row numbers...
ID LEVEL DATE set_ordinal grp_ordinal set-grp GROUP
-- ----- ---------- ----------- ----------- ------- --------
1 10 10/01/2000 1 1 0 1,10,0
1 10 11/20/2001 2 2 0 1,10,0
1 10 12/01/2001 3 3 0 1,10,0
1 30 02/15/2002 4 1 3 1,30,3
1 30 02/15/2002 5 2 3 1,30,3
1 20 05/17/2002 6 1 5 1,20,5
1 20 01/04/2003 7 2 5 1,20,5
1 30 07/20/2003 8 3 5 1,30,5
1 30 03/16/2004 9 4 5 1,30,5
1 30 04/15/2004 10 5 5 1,30,5

Single SQL query to display aggregate data while grouping by 3 fields

I have a table that contains basic info:
CREATE TABLE testing.testtable
(
recordId serial NOT NULL,
nameId integer,
teamId integer,
countryId integer,
goals integer,
outs integer,
assists integer,
win integer,
sys_time timestamp with time zone NOT NULL DEFAULT now(),
CONSTRAINT testtable_pkey PRIMARY KEY (recordid)
)
I want one single SQL query, (with one record per person-team-country) to display the following data. Note that I want it to group by nameId, teamId, and countryId
Name, Team, and Country
Goal/out ratio (G/O)
Goal + Assist / out ratio (GA/O)
Win percentage (Win%)
The difference between the current goal/out ratio and what it was one month ago (rDif)
The difference between the current goal+assist/out ratio and what it was one month ago (fDif)
The difference between the current win % and what it was one month ago (winDif)
Example Table with all records:
Id nameId teamId countryId goals outs assists win sys_time
1 1 3 5 2 4 11 1 2013-01-01
2 1 3 5 9 4 19 1 2013-01-01
3 1 3 4 10 2 1 0 2013-01-01
4 1 3 4 11 50 14 1 2013-01-01
5 2 2 2 10 5 4 1 2013-01-01
6 2 3 5 4 7 15 0 2013-01-01
7 1 3 5 4 8 22 0 2014-07-01
8 1 3 4 11 3 5 1 2014-07-01
9 3 1 4 44 1 4 1 2014-07-01
Example desired output record (1-3-5):
nameId teamId countryId G/O GA/O Win% rDif fDif winDif
1 3 5 0.938 4.19 66 0.44 0.94 -0.34
The ratios are easy enough to retrieve.. for the differences, I've done the following:
select tt.nameid
avg(tt.goals) - avg(case when tt.sys_time < date_trunc('day', NOW() - interval '1 month') then tt.goals end) as change
from testing.testtable tt
group by tt.nameid
order by change desc
This works if I want the differences for only the nameIds. But I want it to pull one record for each combination of name-team-country. I can't seem to get that working.

You can group by multiple fields:
select tt.nameid, tt.teamID, tt.countryID,
avg(tt.goals) - avg(case when tt.sys_time < date_trunc('day', NOW() - interval '1 month') then tt.goals end) as change
from testing.testtable tt
group by tt.nameid, tt.teamID, tt.countryID
order by change desc

just off the top of my head I think it would work for you to use
group by tt.nameid, tt.teamId, tt.countryId

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Dealing with consecutive rows calculations - sql

Related

How to combine and sum consequent values until new value in column

calculate avg(value) for last 10 records postgresql

using 'parititon by' and window functions to return more then one row in postgres?

sql best strategy to partition same values based on temporal sequence

Single SQL query to display aggregate data while grouping by 3 fields

Categories

Resources