Oracle - Split records based on overlapping ranges - sql

Data sample:
id lowerlimt upperlimit
1 5 10 ---Master Record
2 8 12
3 3 8
4 8 9
5 11 15
In the above table, Let us assume record with id=1 as Master Record. I want Compare the other records with the first record in the same table, and Split each record if it overlaps the range and assign a flag. If it overlaps assign 'Y', else 'N'.
If it overlaps partially split the record in to two, one for overlapping range and other for the non-overlapping range.
id lowerlimt upperlimit flag
2 8 10 y
2 10 12 n
3 3 5 n
3 5 8 y
4 8 9 y
5 11 15 n

I added more "test data" for testing and illustration. The main computation is to break down the input ranges into up to three pieces (some of those don't make sense and are eliminated in the final stage - the maximum of three is reached when the input range is strictly overlapping the Master Record in both directions).
For efficiency, it is best if each input row is accessed just once. So, instead of using union all (the easiest route), I prefer to create all three subranges and corresponding flags simultaneously, with the result having nine columns instead of three for the subranges and flags. Then I use unpivot to put them into separate rows.
with
test_data ( id, lowerlimit, upperlimit ) as (
select 1, 5, 10 from dual union all ---Master Record
select 2, 8, 12 from dual union all
select 3, 3, 8 from dual union all
select 4, 8, 9 from dual union all
select 5, 11, 15 from dual union all
select 6, 2, 5 from dual union all
select 7, 1, 14 from dual
)
-- end of test data (not part of the solution)
-- SQL query begins BELOW THIS LINE (use your actual table name)
select id, lowerlimit, upperlimit, flag
from (
select id,
t.lowerlimit as x1, least(t.upperlimit, m.ll) as y1, 'n' as f1,
greatest(t.lowerlimit, m.ll) as x2, least(t.upperlimit, m.ul) as y2, 'y' as f2,
greatest(t.lowerlimit, m.ul) as x3, t.upperlimit as y3, 'n' as f3
from test_data t cross join
( select lowerlimit ll, upperlimit ul
from test_data
where id = 1
) m
where t.id != 1
)
unpivot ( ( lowerlimit, upperlimit, flag )
for ( x, y, f ) in ( ( x1, y1, f1), (x2, y2, f2), (x3, y3, f3) ) )
where lowerlimit < upperlimit
order by id, lowerlimit -- if needed
;
Output:
ID LOWERLIMIT UPPERLIMIT FLAG
-- ---------- ---------- ----
2 8 10 y
2 10 12 n
3 3 5 n
3 5 8 y
4 8 9 y
5 11 15 n
6 2 5 n
7 1 5 n
7 5 10 y
7 10 14 n
10 rows selected.

One method is to split this into three overlapping conditions, essentially "before", "during", and "after". Because you want multiple different rows for each existing row, you can do this using union all:
select t.id, t.lowerlimit,
least(tm.lowerlimit, t.upperlimit) as upperlimit,
'n' as overlaps
from t join
t tm
on t.id <> 1 and tm.id = 1 and
t.lowerlimit < tm.lowerlimit
union all
select t.id,
greatest(t.lowerlimit, tm.lowerlimit),
least(t.upperlimit, tm.upperlimit), 'y' as overlaps
from t join
t tm
on t.id <> 1 and tm.id = 1 and
t.lowerlimit <= tm.lowerlimit and
t.upperlimit >= tm.upperlimit
union all
select t.id, greatest(tm.upperlimit, t.upperlimit),
t.upperlimit, 'n'
from t join
t tm
on t.id <> 1 and tm.id = 1 and
t.upperlimit > tm.upperlimit;

Related

Oracle SQL to Update rows in repeating pattern

How to update rows with a given repeating number sequence.
my table is as follows
line_type
line_val
line_pattern
A
1
null
A
2
null
B
5
null
B
6
null
C
3
null
C
4
null
Now I want to update the column value with the repeating pattern of 8532
So the table after the update will look like
line_type
line_val
line_pattern
A
1
8
A
2
5
B
5
3
B
6
2
C
3
8
C
4
5
How can I achieve this in a update statement ?
With the data you have provided it is not possible to satisfy your requirement. The data in a table is not stored in a specific order. If you want the order to be guaranteed in a select statement, you need to provide an ORDER BY clause.
In the code below there is an additional column "ORDER_BY" to specify the order in which the records need to be processed. The repeating pattern is calculated using the MOD function to convert the row number to a repeating sequence of 4 numbers and then CASE maps each of those numbers to its respective pattern location.
WITH test_data (order_by, line_type, line_val)
AS
(
SELECT 1, 'A',1 FROM DUAL UNION ALL
SELECT 2, 'A',2 FROM DUAL UNION ALL
SELECT 3, 'B',5 FROM DUAL UNION ALL
SELECT 4, 'B',6 FROM DUAL UNION ALL
SELECT 5, 'C',3 FROM DUAL UNION ALL
SELECT 6, 'C',4 FROM DUAL
)
SELECT
CASE MOD(ROW_NUMBER() OVER (ORDER BY order_by),4)
WHEN 1 THEN 8
WHEN 2 THEN 5
WHEN 3 THEN 3
WHEN 0 THEN 2
END as line_pattern,
t.*
FROM
test_data t
LINE_PATTERN ORDER_BY L LINE_VAL
------------ ---------- - ----------
8 1 A 1
5 2 A 2
3 3 B 5
2 4 B 6
8 5 C 3
5 6 C 4
If you don't care about the order then use this form:
UPDATE mytable
SET line_pattern =
CASE MOD (ROWNUM, 4)
WHEN 1 THEN 8
WHEN 2 THEN 5
WHEN 3 THEN 3
WHEN 0 THEN 2
END

Oracle SQL intersection of 2 comma separated string

What can I apply as function?
Query:
Select x, f(y) from table where y like '%ab%cd%ef';
sample table(y is sorted alphabatically)
x. y
1 ab
2 ab,cd
3 cd,ef
4 ab,ef,gh,yu
5 de,ef,rt
Expected Output:
Output:
x y
1 ab
2 ab,cd
3 cd,ef
4 ab,ef
5 ef
Use regexp_substr function with connect by level expressions as
with tab(x,y) as
(
select 1,'ab' from dual union all
select 2,'ab,cd' from dual union all
select 3,'cd,ef' from dual union all
select 4,'ab,ef,gh,yu' from dual union all
select 5,'de,ef,rt' from dual
), tab2 as
(
Select x, regexp_substr(y,'[^,]+',1,level) as y
from tab
connect by level <= regexp_count(y,',') + 1
and prior x = x
and prior sys_guid() is not null
), tab3 as
(
select x, y
from tab2
where y like '%ab%'
or y like '%cd%'
or y like '%ef%'
)
select x, listagg(y,',') within group (order by y) as y
from tab3
group by x;
X Y
1 ab
2 ab,cd
3 cd,ef
4 ab,ef
5 ef
Demo
Follow comments written within the code.
SQL> with test (x, y) as
2 -- your sample table
3 (select 1, 'ab' from dual union all
4 select 2, 'ab,cd' from dual union all
5 select 3, 'cd,ef' from dual union all
6 select 4, 'ab,ef,gh,yu' from dual union all
7 select 5, 'de,ef,rt' from dual
8 ),
9 srch (val) as
10 -- a search string, which is to be compared to the sample table's Y column values
11 (select 'ab,cd,ef' from dual),
12 --
13 srch_rows as
14 -- split search string into rows
15 (select regexp_substr(val, '[^,]+', 1, level) val
16 from srch
17 connect by level <= regexp_count(val, ',') + 1
18 ),
19 test_rows as
20 -- split sample values into rows
21 (select x,
22 regexp_substr(y, '[^,]+', 1, column_value) y
23 from test,
24 table(cast(multiset(select level from dual
25 connect by level <= regexp_count(y, ',') + 1
26 ) as sys.odcinumberlist))
27 )
28 -- the final result
29 select t.x, listagg(t.y, ',') within group (order by t.y) result
30 from test_rows t join srch_rows s on s.val = t.y
31 group by t.x
32 order by t.x;
X RESULT
---------- --------------------
1 ab
2 ab,cd
3 cd,ef
4 ab,ef
5 ef
SQL>

Flag individuals that share common features with Oracle SQL

Consider the following table:
ID Feature
1 1
1 2
1 3
2 3
2 4
2 6
3 5
3 10
3 12
4 12
4 18
5 10
5 30
I would like to group the individuals based on overlapping features. If two of these groups again have overlapping features, I would consider both as one group. This process should be repeated until there are no overlapping features between groups. The result of this procedure on the table above would be:
ID Feature Flag
1 1 A
1 2 A
1 3 A
2 3 A
2 4 A
2 6 A
3 5 B
3 10 B
3 12 B
4 12 B
4 18 B
5 10 B
5 30 B
So actually the problem I am trying to solve is finding connected components in a graph. Here [1,2,3] is the graph with ID 1 (see https://en.wikipedia.org/wiki/Connectivity_(graph_theory)). The problem is equivalent to this problem, however I would like to solve it with Oracle SQL.
Here is one way to do this, using a hierarchical ("connect by") query. The first step is to extract the initial relationships from the base data; the hierarchical query is built on the result from this first step. I added one more row to the inputs to illustrate a node that is a connected component by itself.
You marked the connected components as A and B - of course, that won't work if you have, say, 30,000 connected components. In my solution, I use the minimum node name as the marker for each connected component.
with
sample_data (id, feature) as (
select 1, 1 from dual union all
select 1, 2 from dual union all
select 1, 3 from dual union all
select 2, 3 from dual union all
select 2, 4 from dual union all
select 2, 6 from dual union all
select 3, 5 from dual union all
select 3, 10 from dual union all
select 3, 12 from dual union all
select 4, 12 from dual union all
select 4, 18 from dual union all
select 5, 10 from dual union all
select 5, 30 from dual union all
select 6, 40 from dual
)
-- select * from sample_data; /*
, initial_rel(id_base, id_linked) as (
select distinct s1.id, s2.id
from sample_data s1 join sample_data s2
on s1.feature = s2.feature and s1.id <= s2.id
)
-- select * from initial_rel; /*
select id_linked as id, min(connect_by_root(id_base)) as id_group
from initial_rel
start with id_base <= id_linked
connect by nocycle prior id_linked = id_base and id_base < id_linked
group by id_linked
order by id_group, id
;
Output:
ID ID_GROUP
------- ----------
1 1
2 1
3 3
4 3
5 3
6 6
Then, if you need to add the ID_GROUP as a FLAG to the base data, you can do so with a trivial join.

BigQuery: select the nth smallest value in window, ordered by another value

My table has two integer columns: a and b. For each row, I want to select the nth smallest value of b among the rows with smaller a values. Here's a sample input/output, with n=2.
Input:
a | b
-------
1 | 4
2 | 2
3 | 5
4 | 3
5 | 9
6 | 1
7 | 7
8 | 6
9 | 0
Output:
a | 2th min b
-------------
1 | null ← only 1 element in [4], no 2nd min
2 | 4 ← 2nd min between [4,2]
3 | 4 ← 2nd min between [4,2,5]
4 | 3 ← 2nd min between [4,2,5,3]
5 | 3 ← etc.
6 | 2
7 | 2
8 | 2
9 | 1
I used n=2 here to keep it simple, but in practice, I want the 2000th smallest value (or some other large-ish constant). The column a can be assumed to contain distinct integers (and even 1, 2, 3, … if that's easier).
The problem is that if I use ORDER BY b in my window clause and NTH_VALUE, it just computes the answer on the wrong set of values:
WITH data AS (
SELECT 1 AS a, 4 AS b
UNION ALL SELECT 2 AS a, 2 AS b
UNION ALL SELECT 3 AS a, 5 AS b
UNION ALL SELECT 4 AS a, 3 AS b
UNION ALL SELECT 5 AS a, 9 AS b
UNION ALL SELECT 6 AS a, 1 AS b
)
SELECT nth_value(b, 2) over (order by a)
from data
returns [null, 2, 2, 2, 2, 2]: the values are ordered by a (so in the same order than they appear), so the value b=2 is always the one in second place. I want to order by a and then take the nth smallest value of b. Any idea how to write this in BigQuery (preferably Standard SQL)?
Below is for BigQuery Standard SQL and produces correct result for given example.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT
a,
(SELECT b FROM
(SELECT b FROM UNNEST(c) b ORDER BY b LIMIT 2)
ORDER BY b DESC LIMIT 1
) b2
FROM (
SELECT a, IF(ARRAY_LENGTH(c) > 1, c, [NULL]) c
FROM (
SELECT a, ARRAY_AGG(b) OVER (ORDER BY a) c
FROM `project.dataset.table`
)
)
-- ORDER BY a
with expected result as below
Row a b2
1 1 null
2 2 4
3 3 4
4 4 3
5 5 3
6 6 2
7 7 2
8 8 2
9 9 1
Note: to make it work for 2000th element you might change 2 to 2000 in LIMIT 2
meantime, i can admit it looks a little ugly/messy to me and not sure about scalability but you can give it a shot
Quick Update
Below is a little less ugly looking version (same output of course)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT a, c[SAFE_ORDINAL(2)] b2 FROM (
SELECT x.a, ARRAY_AGG(y.b ORDER BY y.b LIMIT 2) c
FROM `project.dataset.table` x
CROSS JOIN `project.dataset.table` y
WHERE y.a <= x.a
GROUP BY x.a
)
-- ORDER BY a
For 2000th element replace 2 to 2000 in LIMIT 2 and SAFE_ORDINAL(2)
Still potentially same issue with scalability because of (now) explicit CROSS JOIN

Oracle SQL running total on change of field (SUM on column only when field changes)

I have a question in regards to how to SUM on a column only when a field is changing.
Take for example the table below:
Note that Column A and Column B are different tables. I.e. A was selected from Table X and B was selected from Table Y
SELECT X.A, Y.B
FROM X
INNER JOIN Y ON X.DATE = Y.DATE AND X.VAL1 =
Y.VAL1 AND X.VAL2 = Y.VAL2
A B
123 5
123 5
456 10
789 15
789 15
I need to sum column B on change of field on column A:
I.e. the query should return 5 + 10 + 15 = 30 (5 the first time because value in column A is 123, 10 the second time because column A changed from 123 to 456 - note that the second row was skipped because column A still contains value 123 - hence the change of field logic and so on).
I can't do a simple SUM(B) because that would return 50. I also cannot do SUM(B) OVER (PARTITION BY A) because that would do a running total by group, not by change of field.
My output needs to look like this:
A B X
123 5 5
123 5 5
456 10 15
789 15 30
789 15 30
I am trying to do this within a simple query. Is there a particular function I can use to do this?
For the simple data set provided, the following should work. You will, of course, want to review the ORDER BY clauses for correctness in your exact use case.
SELECT a
,b
,SUM(CASE WHEN a = prev_a THEN 0 ELSE b END) OVER (ORDER BY a RANGE UNBOUNDED PRECEDING) AS x
FROM (
SELECT a
,b
,LAG(a) OVER (ORDER BY a) AS prev_a
FROM {your_query}
)
This solution makes use of the LAG function, which returns the specified column from the previous result. Then the outer query's SUM gives the value only when the previous row didn't have the same value. And there is also the windowing clause involved in the SUM because you specified that you needed a running total.
Ta-daaa?
SQL> with test (a, b) as
2 (select 123, 5 from dual union all
3 select 123, 5 from dual union all
4 select 456, 10 from dual union all
5 select 789, 15 from dual union all
6 select 789, 15 from dual
7 ),
8 proba as(
9 select a, b,
10 case when a <> nvl(lag(a) over (order by a), 0) then 'Y' else 'N' end switch
11 from test
12 )
13 select a, b,
14 sum(decode(switch, 'Y', b, 0)) over (partition by null order by a) x
15 from proba
16 order by a;
A B X
---------- ---------- ----------
123 5 5
123 5 5
456 10 15
789 15 30
789 15 30
SQL>
you can also create a function and use it, see sample below,
create package test_pkg123
as
a number;
r_sum NUMBER;
function get_r_sum(p_a number, p_val NUMBER, rown NUMBER) return number;
end;
/
create or replace package body test_pkg123
as
function get_r_sum(p_a number, p_val NUMBER, rown NUMBER) return number
is
begin
if rown = 1 then
r_sum := p_val;
return r_sum;
end if;
if p_a != a then
r_sum := nvl(r_sum, 0) + nvl(p_val, 0);
end if;
a := p_a;
return r_sum;
end;
end;
/
with test (a, b) as
(select 123, 5 from dual union all
select 123, 5 from dual union all
select 456, 10 from dual union all
select 789, 15 from dual union all
select 789, 15 from dual union all
select 789, 15 from dual union all
select 123, 2 from dual
)
select a, b, test_pkg123.get_r_sum(a, b, rownum) r_sum
from test;
Output:
A B R_SUM
123 5 5
123 5 5
456 10 15
789 15 30
789 15 30
789 15 30
123 2 32
7 rows selected