How to combine and sum consequent values until new value in column - sql

I need some help with summing subsequent values of a column based on category in another column, until that category reaches new value. Here's what my data looks like
id | site_id | date_id | hour_id | location_id | status | status_minutes
1 1 20210101 1 1 Offline 60
2 1 20210101 2 1 Offline 57
3 1 20210101 2 1 Available 3
4 1 20210101 3 1 Available 20
5 1 20210101 3 1 Offline 40
... ... ... ... ... ... ...
25 1 20210101 23 1 Offline 60
26 1 20210102 0 1 Offline 23
As you can see in the above data is at hourly level, and so if status minutes column equals to 60, it'll be just one row for that hour. However, if not, then status minutes will be spread across rows that would add up to 60, as you can see in rows 2 and 3, and in rows 4 and 5.
Now, my goal is to understand stretches of time of how long each status was going on, until next status kicked in. So the output for the example above would be:
site_id | date_id | location_id | status | status_minutes
1 20210101 1 Offline 117
1 20210101 1 Available 23
1 20210101 1 Offline 40
... ... ... ... ...
1 20210101 1 Offline 60
1 20210102 1 Offline 23
Important part is that this operation should only be confined within each day, as seen in the last two rows of example and the output. So the summing happens only within a given day, and then starts again with the 0th hour next day.

This is a gaps and island problem. The section_num is being used to determine groups before finding the total status_minutes.
You may try the following:
SELECT
site_id,
date_id,
location_id,
status,
SUM(status_minutes) as status_minutes
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id
ORDER BY hour_id
) - ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id,status
ORDER BY hour_id
) as section_num
FROM
my_table
) t
GROUP BY
site_id,
date_id,
location_id,
status,
section_num
ORDER BY
site_id,
date_id,
location_id,
section_num
View working demo on db fiddle

Related

How to select rows with conditional values of one column in SQL

Say I have this table:
id
timeline
1
BASELINE
1
MIDTIME
1
ENDTIME
2
BASELINE
2
MIDTIME
3
BASELINE
4
BASELINE
5
BASELINE
5
MIDTIME
5
ENDTIME
6
MIDTIME
6
ENDTIME
7
RISK
7
RISK
So this is what the data looks like except the data has more observations (few thousands)
How do I get the output so that it will look like this:
id
timeline
1
BASELINE
1
MIDTIME
2
BASELINE
2
MIDTIME
5
BASELINE
5
MIDTIME
How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.
I used
SELECT *
FROM df
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)
and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.
id timeline
---|--------|
1 | BASELINE |
1 | MIDTIME |
2 | BASELINE |
2 | MIDTIME |
5 | BASELINE |
5 | MIDTIME |
6 | MIDTIME | ---- dont want this
6 | ENDTIME | ---- dont want this
7 | RISK | ---- dont want this
7 | RISK | ---- dont want this
Many Thanks.
You can try using exists -
DEMO
select * from t t1 where timeline in ('BASELINE','MIDTIME') and
exists
(select 1 from t t2 where t1.id=t2.id and timeline in ('BASELINE','MIDTIME')
group by t2.id having count(distinct timeline)=2)
OUTPUT:
id timeline
1 BASELINE
1 MIDTIME
2 BASELINE
2 MIDTIME
5 BASELINE
5 MIDTIME
I think this query should give you the result you want.
NOTE: As i understand, you don't want the ID where exists a "ENDTIME", and in your sample data, there is an "ENDTIME" for ID 1. I assumed this was an error so i made a query that excludes all id containing "ENDTIME".
WITH CTE AS
(
SELECT
id
FROM
df
WHERE
timeline IN ('ENDTIME', 'RISK')
)
SELECT
id,
timeline
FROM
df
WHERE
id NOT IN (SELECT id FROM CTE);
There's probably a number of ways to do this, here's one way that will pick up BASELINE and MIDTIME rows where only they exist, ensuring there are only 2 rows per returned ID. Without knowing the ordering of timeline, it's not possible to go further I don't think:
SELECT
id
, timeline
FROM (
SELECT
*
, SUM(CASE WHEN timeline = 'BASELINE' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS BaselineCount
, SUM(CASE WHEN timeline = 'MIDTIME' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS MidtimeCount
FROM df
WHERE df.timeline IN ('BASELINE', 'MIDTIME')
) subquery
WHERE subquery.BaselineCount > 0
AND subquery.MidtimeCount > 0
GROUP BY
id
, timeline
;

sql best strategy to partition same values based on temporal sequence

I have data that looks like this, where there are multiple values for each ID that correspond to an ascending date variable:
ID LEVEL DATE
1 10 10/1/2000
1 10 11/20/2001
1 10 12/01/2001
1 30 02/15/2002
1 30 02/15/2002
1 20 05/17/2002
1 20 01/04/2003
1 30 07/20/2003
1 30 03/16/2004
1 30 04/15/2004
I want to acquire a count per each ID/LEVEL/DATE block that looks like this:
ID LEVEL COUNT
1 10 3
1 30 2
1 20 2
1 30 3
The problem is that if I use the count windows function and partition by level, it groups 30 together regardless of the temporal sequence. I want the count for level 30 both before and after 20 to be distinct. Does anyone know how to do that?
A standard gaps and islands solution using ROW_NUMBER(), if it's available on your particular DBMS...
WITH
ordered AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS set_ordinal,
ROW_NUMBER() OVER (PARTITION BY id, level ORDER BY date) AS grp_ordinal
FROM
yourData
)
SELECT
id,
level,
set_ordinal - grp_ordinal,
MIN(date),
COUNT(*)
FROM
ordered
GROUP BY
id,
level,
set_ordinal - grp_ordinal
ORDER BY
id,
MIN(date)
Visualising the effect of the two row numbers...
ID LEVEL DATE set_ordinal grp_ordinal set-grp GROUP
-- ----- ---------- ----------- ----------- ------- --------
1 10 10/01/2000 1 1 0 1,10,0
1 10 11/20/2001 2 2 0 1,10,0
1 10 12/01/2001 3 3 0 1,10,0
1 30 02/15/2002 4 1 3 1,30,3
1 30 02/15/2002 5 2 3 1,30,3
1 20 05/17/2002 6 1 5 1,20,5
1 20 01/04/2003 7 2 5 1,20,5
1 30 07/20/2003 8 3 5 1,30,5
1 30 03/16/2004 9 4 5 1,30,5
1 30 04/15/2004 10 5 5 1,30,5

Oracle: Get the smaller values and the first greater value

I have a table like this;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
4 Sample4 40
And I would like to get all of the rows that contain smaller values and the first row that contains greater value.
For example when I send '25' as a parameter to Value column, I want to have following table;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
I'm stuck at this point, thanks in advance.
Analytic functions to the rescue!
create table your_table (
id number,
value number)
insert into your_table
select level, level * 10
from dual
connect by level <= 5
select * from your_table
id | value
----+------
1 | 10
2 | 20
3 | 30
4 | 40
5 | 50
Ok, now we use lag(). Specify field, offset and the default value (for the first row that has no previous one).
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table
id | value | previous_value
---+-------+---------------
1 | 10 | 10
2 | 20 | 10
3 | 30 | 20
4 | 40 | 30
5 | 50 | 40
Now apply where.
select id, value
from (
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table)
where previous_value < 25
Works for me.
id | value
----+------
1 | 10
2 | 20
3 | 30
Of course you have to have some policy on ties. For example, what happens if two rows have the same value and they are both first — do you want to keep both or only one of them. Or maybe you have some other criterion for breaking the tie (say, sort by id). But the idea is fairly simple.
you can try a query like this :
SELECT * FROM YourTableName WHERE Value < 25 OR ID IN (SELECT TOP 1 ID FROM YourTableName WHERE Value >= 25 ORDER BY Value)
in Oracle, you can try this (but see "That Young Man" answer, I think it's better than mine):
SELECT * FROM (
SELECT ID, NAME, VALUE, 1 AS RN
FROM YT
WHERE VALUE < 25
UNION ALL
SELECT ID, NAME, VALUE, ROW_NUMBER()OVER (ORDER BY VALUE) AS RN
FROM YT
WHERE VALUE > 25
) A
WHERE RN=1;

Update duplicate latitude values by iteratively increasing margin

I have lat and long columns in an Oracle database table stored as regular numbers.
Some of which are duplicates. I'd like a way to add a very small margin to either column to eliminate duplication. Problem is, for each identical pair the number of duplicate records might vary. In this case I have to adjust the margin I add iteratively for each pair.
example:
ID | LAT | LONG
==================
1 | 1 | 1
2 | 1 | 1
3 | 1 | 1
in this case, I'd like to add a margin of .0003 to either column to eliminate the duplication, but I can't just blindly add that .0003 to IDs 2 and 3 because they would still be duplicates, so I have to do original_value + (margin*i) for i in (0...number of duplicate rows)
so I'd like to end up with something like this:
ID | LAT | LONG
1 | 1 | 1
2 | 1.0003 | 1
3 | 1.0006 | 1
How do I do this in SQL? I can mimic imperative programming apparently with cursors but it does not seem to be the SQL way. Can I somehow achieve this with INSERT INTO SELECT?
I don't know what your exact data looks like, but suppose you have this table, called tbl:
ID LAT LON
---------- ---------- ----------
1 20 25
2 30 33
3 30 33
4 55 60
5 55 60
6 55 60
You could run the following:
select id,
case when rn > 1 then lat+rn-1 else lat end as lat,
lon
from(
select t.*,
row_number() over(partition by lat, lon order by id) as rn
from tbl t
) x;
To get:
ID LAT LON
---------- ---------- ----------
1 20 25
2 30 33
3 31 33
4 55 60
5 56 60
6 57 60
Notice how IDs 2 and 3 were dups, and IDs 4, 5, and 6, were dups. They are no longer exact dups because the lat value has increased, sequentially, to make the rows not duplicates. They go up by one for each next duplicate.
Fiddle: http://sqlfiddle.com/#!4/ef959/1/0
Edit (based on your edit)
select id,
case when rn > .0003 then lat+rn-.0003 else lat end as lat,
lon
from(
select t.*,
row_number() over(partition by lat, lon order by id)*.0003 as rn
from tbl t
) x;
The above will ascend by .0003 rather than 1.
See new fiddle here: http://sqlfiddle.com/#!4/21506/6/0

Dealing with consecutive rows calculations

Assume the following situation:
Week 1:
0 previous cases
10 new cases
3 resolved cases
Week 2:
7 previous cases
13 new cases
15 resolved cases
Week 3:
5 previous cases
6 new cases
7 resolved cases
This information is stored in a resumed table of the sort:
RESUME_TABLE:
WEEK | TOTAL_NEW | TOTAL_SOLVED
1 | 10 | 3
2 | 13 | 15
3 | 6 | 7
I am having a hard time build a query to obtain the following result:
REPORT_TABLE:
WEEK | PREV_TOTAL | NEW_CASES | SOLVED_CASES | NEW_TOTAL
1 | 0 | 10 | 3 | 7
2 | 7 | 13 | 15 | 5
3 | 5 | 6 | 7 | 4
The idea seems pretty trivial, NEW_TOTAL = PREV_TOTAL + NEW_CASES - SOLVED_CASES, though I have been struggling with the idea of carrying the PREV_TOTAL to the next row in order to go on.
I am trying to do it using a view over the RESUME table (Oracle 11g).
Can someone help me with some example code?
Pretty simple and neat with analytic functions:
12:57:06 HR#vm_xe> l
1 select week
2 ,lag(total_cases_by_now - total_solved_by_now) over (order by week) prev_total
3 ,total_new new_cases
4 ,total_solved solved_cases
5 ,total_cases_by_now - total_solved_by_now new_total
6 from (
7 select week
8 ,total_new
9 ,total_solved
10 ,sum(total_new) over(order by week asc) as total_cases_by_now
11 ,sum(total_solved) over (order by week asc) as total_solved_by_now
12 from resume_table
13* )
12:57:07 HR#vm_xe> /
WEEK PREV_TOTAL NEW_CASES SOLVED_CASES NEW_TOTAL
---------- ------------ ---------- ------------ ----------
1 10 3 7
2 7 13 15 5
3 5 6 7 4
3 rows selected.
Elapsed: 00:00:00.01
You can solve this with MODEL clause:
with resume_table as
(
select 1 week, 10 total_new, 3 total_solved from dual union all
select 2 week, 13 total_new, 15 total_solved from dual union all
select 3 week, 6 total_new, 7 total_solved from dual
)
select week, prev_total, total_new, total_solved, new_total
from resume_table
model
dimension by (week)
measures (total_new, total_solved, 0 prev_total, 0 new_total)
rules sequential order
(
new_total[any] order by week =
nvl(new_total[cv(week)-1], 0) + total_new[cv()] - total_solved[cv()]
,prev_total[any] order by week = nvl(new_total[cv(week)-1], 0)
)
order by week;
Although this makes the assumption that WEEK is always a consecutive number. If that's not true, you will want to add a row_number(). Otherwise, the -1 may not reference the previous value.
See this SQL Fiddle.
Add one column in RESUME_TABLE (or create a view, which I think may be better):
RESUME_LEFT
WEEK | LEFT
1 | 7
2 | -2
3 | -1
Something like this:
CREATE VIEW resume_left
(SELECT week,total_new-total_solved "left" FROM resume_table)
So in REPORT_TABLE, you can have an definition like this:
PREV_TOTAL=(SELECT sum(left) FROM RESUME_LEFT WHERE week<REPORT_TABLE.week)
Edit
OK, the view is unnecessary:
PREV_TOTAL=(SELECT sum(total_new)-sum(total_solved)
FROM resume_table
WHERE week<REPORT_TABLE.week)