I am trying to "flatten" a list of ranges in a defined order (alphabetically by name in the examples provided) to a single merged result. Newer Ranges overwrite values of older ranges. Conceptually it looks like this, with "e" being the newest range:
0 1 2 3 4 5 6 7
|-------------a-------------|
|---b---|
|---c---|
|---d---|
|---e---|
|-a-|---c---|---e---|-d-|-a-| <-- expected result
To prevent further confusion: The expected result here is indeed correct. The values 0 - 7 are just the ranges' values, not a progression in time. I use integers for simplicity here, but the values might not be discrete but continuous.
Note that b is completely overshadowed and not relevant anymore.
the data may be modeled like this in SQL:
create table ranges (
name varchar(1),
range_start integer,
range_end integer
);
insert into ranges (name, range_start, range_end) values ('a', 0, 7);
insert into ranges (name, range_start, range_end) values ('b', 2, 4);
insert into ranges (name, range_start, range_end) values ('c', 1, 3);
insert into ranges (name, range_start, range_end) values ('d', 4, 6);
insert into ranges (name, range_start, range_end) values ('e', 3, 5);
-- assume alphabetical order by name
It would be perfect if there was a way to directly query the result in SQL, e.g. like this:
select *magic* from ranges;
-- result:
+------+-------------+-----------+
| a | 0 | 1 |
| c | 1 | 3 |
| e | 3 | 5 |
| d | 5 | 6 |
| a | 6 | 7 |
+------+-------------+-----------+
But I suspect that is not realistically feasible, therefore I need to at least filter out all ranges that are overshadowed by newer ones, as is the case for b in the example above. Otherwise the query would need to transfer more and more irrelevant data as the database grows and new ranges overshadow older ones. For the example above, such a query could return all entries except for b, e.g.:
select *magic* from ranges;
-- result:
+------+-------------+-----------+
| a | 0 | 7 |
| c | 1 | 3 |
| d | 4 | 6 |
| e | 3 | 5 |
+------+-------------+-----------+
I was unable to construct such a filter in SQL. The only thing I managed to do is query all data and then calculate the result in code, for example in Java using the Google Guava library:
final RangeMap<Integer, String> rangeMap = TreeRangeMap.create();
rangeMap.put(Range.closedOpen(0, 7), "a");
rangeMap.put(Range.closedOpen(2, 4), "b");
rangeMap.put(Range.closedOpen(1, 3), "c");
rangeMap.put(Range.closedOpen(4, 6), "d");
rangeMap.put(Range.closedOpen(3, 5), "e");
System.out.println(rangeMap);
// result: [[0..1)=a, [1..3)=c, [3..5)=e, [5..6)=d, [6..7)=a]
Or by hand in python:
import re
from collections import namedtuple
from typing import Optional, List
Range = namedtuple("Range", ["name", "start", "end"])
def overlap(lhs: Range, rhs: Range) -> Optional[Range]:
if lhs.end <= rhs.start or rhs.end <= lhs.start:
return None
return Range(None, min(lhs.start, rhs.start), max(lhs.end, rhs.end))
def range_from_str(str_repr: str) -> Range:
name = re.search(r"[a-z]+", str_repr).group(0)
start = str_repr.index("|") // 4
end = str_repr.rindex("|") // 4
return Range(name, start, end)
if __name__ == '__main__':
ranges: List[Range] = [
# 0 1 2 3 4 5 6 7
range_from_str("|-------------a-------------|"),
range_from_str(" |---b---| "),
range_from_str(" |---c---| "),
range_from_str(" |---d---| "),
range_from_str(" |---e---| "),
# result: |-a-|---c---|---e---|-d-|-a-|
]
result: List[Range] = []
for range in ranges:
for i, res in enumerate(result[:]):
o = overlap(range, res)
if o:
result.append(Range(res.name, o.start, range.start))
result.append(Range(res.name, range.end, o.end))
result[i] = Range(res.name, 0, 0)
result.append(range)
result = sorted(filter(lambda r: r.start < r.end, result), key=lambda r: r.start)
print(result)
# result: [Range(name='a', start=0, end=1), Range(name='c', start=1, end=3), Range(name='e', start=3, end=5), Range(name='d', start=5, end=6), Range(name='a', start=6, end=7)]
The following simple query returns all smallest intervals with top name:
with
all_points(x) as (
select range_start from ranges
union
select range_end from ranges
)
,all_ranges(range_start, range_end) as (
select *
from (select
x as range_start,
lead(x) over(order by x) as range_end
from all_points)
where range_end is not null
)
select *
from all_ranges ar
cross apply (
select max(name) as range_name
from ranges r
where r.range_end >= ar.range_end
and r.range_start <= ar.range_start
)
order by 1,2;
Results:
RANGE_START RANGE_END RANGE_NAME
----------- ---------- ----------
0 1 a
1 2 c
2 3 c
3 4 e
4 5 e
5 6 d
6 7 a
So we need to merge connected intervals with the same names:
Final query without new oracle-specific features
with
all_points(x) as (
select range_start from ranges
union
select range_end from ranges
)
,all_ranges(range_start, range_end) as (
select *
from (select
x as range_start,
lead(x) over(order by x) as range_end
from all_points)
where range_end is not null
)
select
grp,range_name,min(range_start) as range_start,max(range_end) as range_end
from (
select
sum(start_grp_flag) over(order by range_start) grp
,range_start,range_end,range_name
from (
select
range_start,range_end,range_name,
case when range_name = lag(range_name)over(order by range_start) then 0 else 1 end start_grp_flag
from all_ranges ar
cross apply (
select max(name) as range_name
from ranges r
where r.range_end >= ar.range_end
and r.range_start <= ar.range_start
)
)
)
group by grp,range_name
order by 1;
Results:
GRP RANGE_NAME RANGE_START RANGE_END
---------- ---------- ----------- ----------
1 a 0 1
2 c 1 3
3 e 3 5
4 d 5 6
5 a 6 7
Or using actual oracle specific features:
with
all_ranges(range_start, range_end) as (
select * from (
select
x as range_start,
lead(x) over(order by x) as range_end
from (
select distinct x
from ranges
unpivot (x for r in (range_start,range_end))
))
where range_end is not null
)
select *
from all_ranges ar
cross apply (
select max(name) as range_name
from ranges r
where r.range_end >= ar.range_end
and r.range_start <= ar.range_start
)
match_recognize(
order by range_start
measures
first(range_start) as r_start,
last(range_end) as r_end,
last(range_name) as r_name
pattern(STRT A*)
define
A as prev(range_name)=range_name and prev(range_end) = range_start
);
Here is a hierarchical query that would give you the desired output:
WITH ranges(NAME, range_start, range_end) AS
(SELECT 'a', 0, 7 FROM dual UNION ALL
SELECT 'b', 2, 4 FROM dual UNION ALL
SELECT 'c', 1, 3 FROM dual UNION ALL
SELECT 'd', 4, 6 FROM dual UNION ALL
SELECT 'e', 3, 5 FROM dual UNION ALL
SELECT 'f', -3, -2 FROM dual UNION ALL
SELECT 'g', 8, 20 FROM dual UNION ALL
SELECT 'h', 12, 14 FROM dual)
, rm (NAME, range_start, range_end) AS
(SELECT r.*
FROM (SELECT r.NAME
, r.range_start
, NVL(r2.range_start, r.range_end) range_end
FROM ranges r
OUTER apply (SELECT *
FROM ranges
WHERE range_start BETWEEN r.range_start AND r.range_end
AND NAME > r.NAME
ORDER BY range_start, NAME DESC
FETCH FIRST 1 ROWS ONLY) r2
ORDER BY r.range_start, r.NAME desc
FETCH FIRST 1 ROWS ONLY) r
UNION ALL
SELECT r2.NAME
, r2.range_start
, r2.range_end
FROM rm
CROSS apply (SELECT r.NAME
, GREATEST(rm.range_end, r.range_start) range_start
, NVL(r2.range_start, r.range_end) range_end
FROM ranges r
OUTER apply (SELECT *
FROM ranges
WHERE range_start BETWEEN GREATEST(rm.range_end, r.range_start) AND r.range_end
AND NAME > r.NAME
ORDER BY range_start, NAME DESC
FETCH FIRST 1 ROWS ONLY) r2
WHERE r.range_end > rm.range_end
AND NOT EXISTS (SELECT 1 FROM ranges r3
WHERE r3.range_end > rm.range_end
AND (GREATEST(rm.range_end, r3.range_start) < GREATEST(rm.range_end, r.range_start)
OR (GREATEST(rm.range_end, r3.range_start) = GREATEST(rm.range_end, r.range_start)
AND r3.NAME > r.NAME)))
FETCH FIRST 1 ROWS ONLY) r2)
CYCLE NAME, range_start, range_end SET cycle TO 1 DEFAULT 0
SELECT * FROM rm
First you get the first entry ordered by range_start desc, name which will give you the most resent entry with the lowest name.
Then you search for a range with higher name that intersect with this range. If there is one the range_start of this interval will be the range_end of you final interval.
With this start you search more or less the next entry with the same condition.
There is also another less effective but easier and shorter approach: to generate all points and just aggregate them.
For example this simple query will generate all intermediate points:
select x,max(name)
from ranges,
xmltable('xs:integer($A) to xs:integer($B)'
passing range_start as a
,range_end as b
columns x int path '.'
)
group by x
Results:
X M
---------- -
0 a
1 c
2 c
3 e
4 e
5 e
6 d
7 a
Then we can merge them:
select *
from (
select x,max(name) name
from ranges,
xmltable('xs:integer($A) to xs:integer($B)-1'
passing range_start as a
,range_end as b
columns x int path '.'
)
group by x
order by 1
)
match_recognize(
order by x
measures
first(x) as r_start,
last(x)+1 as r_end,
last(name) as r_name
pattern(STRT A*)
define
A as prev(name)=name and prev(x)+1 = x
);
Results:
R_START R_END R
---------- ---------- -
0 1 a
1 3 c
3 5 e
5 6 d
6 7 a
I don't understand your results -- as I've explained in the comments. The "b" should be present, because it is most recent at time 2.
That said, the idea is to unpivot the times and figure out the most recent name at each time -- both beginnings and ends. Then, combine these using gaps-and-islands ideas. This is what the query looks like:
with r as (
select name, range_start as t
from ranges
union all
select null, range_end as t
from ranges
),
r2 as (
select r.*,
(select r2.name
from ranges r2
where r2.range_start <= r.t and
r2.range_end >= r.t
order by r2.range_start desc
fetch first 1 row only
) as imputed_name
from (select distinct t
from r
) r
)
select imputed_name, t,
lead(t) over (order by t)
from (select r2.*,
lag(imputed_name) over ( order by t) as prev_imputed_name
from r2
) r2
where prev_imputed_name is null or prev_imputed_name <> imputed_name;
Here is a db<>fiddle.
Basically the same code should run in Postgres as well.
Related
Is there any way in Bigquery to group by not the absolute value but a range of values?
I have a query that looks in a product table with 4 different numeric group by's.
What I am looking for is an efficient way to group by in a way like:
group by "A±1000" etc. or "A±10%ofA".
thanks in advance,
You can generate a column as a "named range" then group by the column. As an example for your A+-1000 case:
with data as (
select 100 as v union all
select 200 union all
select 2000 union all
select 2100 union all
select 2200 union all
select 4100 union all
select 8000 union all
select 8000
)
select count(v), ARRAY_AGG(v), ranges
FROM data, unnest([0, 2000, 4000, 6000, 8000]) ranges
WHERE data.v >= ranges - 1000 AND data.v < ranges + 1000
GROUP BY ranges
Output:
+-----+------------------------+--------+
| f0_ | f1_ | ranges |
+-----+------------------------+--------+
| 2 | ["100","200"] | 0 |
| 3 | ["2000","2100","2200"] | 2000 |
| 1 | ["4100"] | 4000 |
| 2 | ["8000","8000"] | 8000 |
+-----+------------------------+--------+
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
)
SELECT
CASE
WHEN price > 0 AND price <= 100 THEN ' 0 - 100'
WHEN price > 100 AND price <= 200 THEN '100 - 200'
ELSE '200+'
END AS range_group,
COUNT(1) AS cnt
FROM `project.dataset.example`
GROUP BY range_group
-- ORDER BY range_group
with result
Row range_group cnt
1 0 - 100 2
2 100 - 200 3
3 200+ 1
As you can see, in above solution you need construct CASE statement to reflect your ranges - if you have multiple - this can be quite boring - so below is more generic (but more verbose) solution - and it uses recently introduced RANGE_BUCKET function
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
), ranges AS (
SELECT [100.0, 200.0] ranges_array
), temp AS (
SELECT OFFSET, IF(prev_val = val, CONCAT(prev_val, ' - '), CONCAT(prev_val, ' - ', val)) rng FROM (
SELECT OFFSET, IFNULL(CAST(LAG(val) OVER(ORDER BY OFFSET) AS STRING), '') prev_val, CAST(val AS STRING) AS val
FROM ranges, UNNEST(ARRAY_CONCAT(ranges_array, [ARRAY_REVERSE(ranges_array)[OFFSET(0)]])) val WITH OFFSET
)
)
SELECT
RANGE_BUCKET(price, ranges_array) range_group,
rng,
COUNT(1) AS cnt
FROM `project.dataset.example`, ranges
JOIN temp ON RANGE_BUCKET(price, ranges_array) = OFFSET
GROUP BY range_group, rng
-- ORDER BY range_group
with result
Row range_group rng cnt
1 0 - 100 2
2 1 100 - 200 3
3 2 200 - 1
As you can see, in second solution you need to define your your ranges in ranges as simple array enlisting your boundaries as SELECT [100.0, 200.0] ranges_array
Then temp does all needed calculation
You can do math operations on the GROUP BY, creating groups by any arbitrary criteria.
For example:
WITH data AS (
SELECT repo.name, COUNT(*) price
FROM `githubarchive.month.201909`
GROUP BY 1
HAVING price>100
)
SELECT FORMAT('range %i-%i', MIN(price), MAX(price)) price_range, COUNT(*) c
FROM data
GROUP BY CAST(LOG(price) AS INT64)
ORDER BY MIN(price)
Suppose I have a column A and currently fetched value of A is null. I need to go back to previous rows and find the non -null value of column A. Then I need to find the sum of another column B from the point non value is seen till the current point. After that I need to add the sum of B with A, which will be new value of A.
For finding the column A non null value I have written the query as
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
But I need to do the calculation of B as mentioned above.
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
Can anyone please help me out ?
Sample data
A B date
null 20 14/06/2019
null 40 13/06/2019
10 50 12/06/2019
here value of A on 14/06/2019 should be replaced by sum of B + value of A on 12/06/2019(which is the 1st non null value of A)=20+40+50+10=120
If you have version 12c or higher:
with t( A,B, dte ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select * from t
match_recognize(
order by dte desc
measures
nvl(
first(a),
y.a + sum(b)
) as a,
first(b) as b,
first(dte) as dte
after match skip to next row
pattern(x* y{0,1})
define x as a is null,
y as a is not null
);
A B DTE
------ ---------- ----------
120 20 2019-14-06
100 40 2019-13-06
10 50 2019-12-06
Use conditional count to divide data into separate groups, then use this group for analytical calculation:
select a, b, dt, grp, sum(nvl(a, 0) + nvl(b, 0)) over (partition by grp order by dt) val
from (
select a, b, dt, count(case when a is not null then 1 end) over (order by dt) grp
from t order by dt desc)
order by dt desc
Sample result:
A B DT GRP VAL
------ ---------- ----------- ---------- ----------
20 2019-06-14 4 120
40 2019-06-13 4 100
10 50 2019-06-12 4 60
5 2 2019-06-11 3 7
6 1 2019-06-10 2 7
3 2019-06-09 1 14
7 4 2019-06-08 1 11
demo
I think what you want is handled by using
sum(<column>) over (...) together with last_value over (...) function as below
:
with t( A,B, "date" ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select nvl(a,sum(b) over (order by 1)+
last_value(a) ignore nulls
over (order by 1 desc)
) as a,
b, "date"
from t;
A B date
--- -- ----------
120 20 14.06.2019
120 40 13.06.2019
10 50 12.06.2019
Demo
The data in by table is stored by effective date. Can you please help me with an ORACLE SQL statement, that replicates the 8/1 data onto 8/2, 8/3,8/4 and repeat the 8/5 value after?
DATE VALUE1 VALUE2
8/1/2017 x 1
8/1/2017 x 2
8/7/2017 y 4
8/7/2017 x 3
Desired output :
DATE VALUE1 VALUE2
8/1/2017 x 1
8/1/2017 x 2
8/2/2017 x 1
8/2/2017 x 2
... repeat to 8/6
8/7/2017 y 4
8/7/2017 x 3
8/8/2017 y 4
8/8/2017 x 3
... repeat to sysdate - 1
Here is one way to do this. It's not the most elegant or efficient, but it is the most elementary way I could think of (short of really inefficient things like correlated subqueries which can't be unwound easily to joins).
In the first subquery, aliases as a, I create all the needed dates. In the second subquery, b, I create the date ranges, for which we will need to repeat specific rows (in the test data, I allow the number of rows which must be repeated to be variable, to make one of the subtleties of the problem more evident).
With these in hand, it's easy to get the result by joining these two subqueries and the original data. Alas, this approach requires reading the base table three times; hopefully you don't have too much data to process.
with
inputs ( dt, val1, val2 ) as (
select date '2017-08-14', 'x', 1 from dual union all
select date '2017-08-14', 'x', 2 from dual union all
select date '2017-08-17', 'y', 4 from dual union all
select date '2017-08-17', 'x', 3 from dual union all
select date '2017-08-19', 'a', 5 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- Use your actual table and column names in the SQL query below.
select a.dt, i.val1, i.val2
from (
select min_dt + level - 1 as dt
from ( select min(dt) as min_dt from inputs )
connect by level <= sysdate - min_dt
) a
join
(
select dt, lead(dt, 1, sysdate) over (order by dt) as lead_dt
from (select distinct dt from inputs)
) b
on a.dt >= b.dt and a.dt < b.lead_dt
join
inputs i on i.dt = b.dt
order by dt, val1, val2
;
Output:
DT VAL1 VAL2
---------- ---- ----
2017-08-14 x 1
2017-08-14 x 2
2017-08-15 x 1
2017-08-15 x 2
2017-08-16 x 1
2017-08-16 x 2
2017-08-17 x 3
2017-08-17 y 4
2017-08-18 x 3
2017-08-18 y 4
2017-08-19 a 5
2017-08-20 a 5
You want to make use of the LAST_VALUE analytic function, something like this:
select
fakedate,
CASE
WHEN flip=1 THEN
LAST_VALUE(yourvalue1rown1 IGNORE NULLS) OVER(ORDER BY fakedate)
ELSE
LAST_VALUE(yourvalue1rown2 IGNORE NULLS) OVER(ORDER BY fakedate)
END as lastvalue1,
CASE
WHEN flip=1 THEN
LAST_VALUE(yourvalue2rown1 IGNORE NULLS) OVER(ORDER BY fakedate)
ELSE
LAST_VALUE(yourvalue2rown2 IGNORE NULLS) OVER(ORDER BY fakedate)
END as lastvalue2
from
select
fakedate, flip,
CASE WHEN rown = 1 THEN yourvalue1 END as yourvalue1rown1,
CASE WHEN rown = 2 THEN yourvalue1 END as yourvalue1rown2,
CASE WHEN rown = 1 THEN yourvalue2 END as yourvalue2rown1,
CASE WHEN rown = 2 THEN yourvalue2 END as yourvalue2rown2
from
(select (sysdate - 100) + trunc(rownum/2) fakedate, mod(rownum, 2)+1 as flip from dual connect by level <= 100) fakedates
left outer join
(select yt.*, row_number() over(partition by yourdate order by yourvalue1) as rown) yourtable
on
fakedate = yourdate and flip = rown
You'll have to adjust the column names to match your table. You'll also have to adjust the 100 to reflect how many days back you need to go to get to the start of your date data.
Please note this is untested (SQLFiddle is having some oracle issues for me at the momnt) so if you get any syntax errors or other minor things you cant fix, comment and I'll address them
I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.
create table data
( a_sno integer not null,
b_sno integer not null,
PRIMARY KEY (a_sno,b_sno)
);
insert into data (a_sno,b_sno) values
( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);
As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.
The resulting output should look like this:
group | value
------+------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.
Simple case
All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:
I start by getting minimum a_sno per group, with the minimum associated b_sno:
SELECT row_number() OVER (ORDER BY a_sno) AS grp
, a_sno, min(b_sno) AS b_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno;
This only needs a single query level since a window function can be built on an aggregate:
Get the distinct sum of a joined table column
Result:
grp a_sno b_sno
1 4 5
2 9 10
3 11 15
I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.
Create a unique index on a non-unique column
Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round (b_sno, a_sno):
Is a composite index also good for queries on the first field?
WITH RECURSIVE t AS (
SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
, a_sno, min(b_sno) AS b_sno -- the smallest one
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp
, (SELECT b_sno -- correlated subquery
FROM data
WHERE a_sno = c.sno
AND a_sno < b_sno
ORDER BY b_sno
LIMIT 1)
FROM cte c
WHERE c.sno IS NOT NULL
)
SELECT * FROM cte
WHERE sno IS NOT NULL -- eliminate row with NULL
UNION ALL -- no duplicates
SELECT grp, a_sno FROM t
ORDER BY grp, sno;
Less simple case
All nodes can be reached in ascending order with one or more branches from the root (smallest sno).
This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:
WITH RECURSIVE t AS (
SELECT rank() OVER (ORDER BY d.a_sno) AS grp
, a_sno, b_sno -- get all rows for smallest a_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp, d.b_sno
FROM cte c
JOIN data d ON d.a_sno = c.sno
AND d.a_sno < d.b_sno -- join to all connected rows
)
SELECT grp, sno FROM cte
UNION -- eliminate duplicates
SELECT grp, a_sno FROM t -- add first rows
ORDER BY grp, sno;
Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).
Both should perform very well - especially with long chains / many branches. Result as desired:
SQL Fiddle (with added rows to demonstrate difficulty).
Undirected graph
If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.
I want to say another way, it may be useful, you can do it in 2 steps:
1. take the max(sno) per each group:
select q.sno,
row_number() over(order by q.sno) gn
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
result:
sno gn
7 1
14 2
15 3
2. use a recursive cte to find all related members in groups:
with recursive cte(sno,gn,path,cycle)as(
select q.sno,
row_number() over(order by q.sno) gn,
array[q.sno],false
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
union all
select d.a_sno,c.gn,
d.a_sno || c.path,
d.a_sno=any(c.path)
from data d
join cte c on d.b_sno=c.sno
where not cycle
)
select distinct gn,sno from cte
order by gn,sno
Result:
gn sno
1 4
1 5
1 6
1 7
2 9
2 10
2 13
2 14
3 11
3 15
here is the demo of what I did.
Here is a start that may give some ideas on an approach. The recursive query starts with a_sno of each record and then tries to follow the path of b_sno until it reaches the end or forms a cycle. The path is represented by an array of sno integers.
The unnest function will break the array into rows, so a sno value mapped to the path array such as:
4, {6, 5, 4}
will be transformed to a row for each value in the array:
4, 6
4, 5
4, 4
The array_agg then reverses the operation by aggregating the values back into a path, but getting rid of the duplicates and ordering.
Now each a_sno is associated with a path and the path forms the grouping. dense_rank can be used to map the grouping (cluster) to a numeric.
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
Output:
cluster | sno
--------------+-----
{4,5,6,7} | 4
{4,5,6,7} | 5
{4,5,6,7} | 6
{4,5,6,7} | 7
{9,10,13,14} | 9
{9,10,13,14} | 10
{9,10,13,14} | 13
{9,10,13,14} | 14
{11,15} | 11
{11,15} | 15
(10 rows)
Wrap it one more time for the ranking:
SELECT dense_rank() OVER(order by cluster) AS rank
,sno
FROM (
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
) z
Output:
rank | sno
------+-----
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
(10 rows)
Is it possible to solve this situation by sql query in ORACLE?
I have a table like this:
TYPE UNIT
A 230
B 225
C 60
D 45
E 5
F 2
I need to separate units to the three(variable) 'same'(equally sized) intervals and foreach figure out the count? It means something like this:
0 - 77 -> 4
78 - 154 -> 0
155 - 230 -> 2
You can use the maximum value and a connect-by query to generate the upper and lower values for each range:
select ceil((level - 1) * int) as int_from,
floor(level * int) - 1 as int_to
from (select round(max(unit) / 3) as int from t42)
connect by level <= 3;
INT_FROM INT_TO
---------- ----------
0 76
77 153
154 230
And then do a left outer join to your original table to do the count for each range, so you get the zero value for the middle range:
with intervals as (
select ceil((level - 1) * int) as int_from,
floor(level * int) - 1 as int_to
from (select round(max(unit) / 3) as int from t42)
connect by level <= 3
)
select i.int_from || '-' || i.int_to as range,
count(t.unit)
from intervals i
left join t42 t
on t.unit between i.int_from and i.int_to
group by i.int_from, i.int_to
order by i.int_from;
RANGE COUNT(T.UNIT)
---------- -------------
0-76 4
77-153 0
154-230 2
Yes, this can be done in Oracle. The hard part is the definition of the bounds. You can use the maximum value and some arithmetic on a sequence with values of 1, 2, and 3.
After that, the rest is just a cross join and aggregation:
with bounds as (
select (case when n = 1 then 0
when n = 2 then trunc(maxu / 3)
else trunc(2 * maxu / 3)
end) as lowerbound,
(case when n = 1 then trunc(maxu / 3)
when n = 2 then trunc(2*maxu / 3)
else maxu
end) as upperbound
from (select 1 as n from dual union all select 2 from dual union all select 3 from dual
) n cross join
(select max(unit) as maxu from atable t)
)
select b.lowerbound || '-' || b.upperbound,
sum(case when units between b.lowerbound and b.upperbound then 1 else 0 end)
from atable t cross join
bounds b
group by b.lowerbound || '-' || b.upperbound;