SQL grouping interescting/overlapping rows - sql

I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.
create table data
( a_sno integer not null,
b_sno integer not null,
PRIMARY KEY (a_sno,b_sno)
);
insert into data (a_sno,b_sno) values
( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);
As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.
The resulting output should look like this:
group | value
------+------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15

Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.
Simple case
All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:
I start by getting minimum a_sno per group, with the minimum associated b_sno:
SELECT row_number() OVER (ORDER BY a_sno) AS grp
, a_sno, min(b_sno) AS b_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno;
This only needs a single query level since a window function can be built on an aggregate:
Get the distinct sum of a joined table column
Result:
grp a_sno b_sno
1 4 5
2 9 10
3 11 15
I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.
Create a unique index on a non-unique column
Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round (b_sno, a_sno):
Is a composite index also good for queries on the first field?
WITH RECURSIVE t AS (
SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
, a_sno, min(b_sno) AS b_sno -- the smallest one
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp
, (SELECT b_sno -- correlated subquery
FROM data
WHERE a_sno = c.sno
AND a_sno < b_sno
ORDER BY b_sno
LIMIT 1)
FROM cte c
WHERE c.sno IS NOT NULL
)
SELECT * FROM cte
WHERE sno IS NOT NULL -- eliminate row with NULL
UNION ALL -- no duplicates
SELECT grp, a_sno FROM t
ORDER BY grp, sno;
Less simple case
All nodes can be reached in ascending order with one or more branches from the root (smallest sno).
This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:
WITH RECURSIVE t AS (
SELECT rank() OVER (ORDER BY d.a_sno) AS grp
, a_sno, b_sno -- get all rows for smallest a_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp, d.b_sno
FROM cte c
JOIN data d ON d.a_sno = c.sno
AND d.a_sno < d.b_sno -- join to all connected rows
)
SELECT grp, sno FROM cte
UNION -- eliminate duplicates
SELECT grp, a_sno FROM t -- add first rows
ORDER BY grp, sno;
Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).
Both should perform very well - especially with long chains / many branches. Result as desired:
SQL Fiddle (with added rows to demonstrate difficulty).
Undirected graph
If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.

I want to say another way, it may be useful, you can do it in 2 steps:
1. take the max(sno) per each group:
select q.sno,
row_number() over(order by q.sno) gn
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
result:
sno gn
7 1
14 2
15 3
2. use a recursive cte to find all related members in groups:
with recursive cte(sno,gn,path,cycle)as(
select q.sno,
row_number() over(order by q.sno) gn,
array[q.sno],false
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
union all
select d.a_sno,c.gn,
d.a_sno || c.path,
d.a_sno=any(c.path)
from data d
join cte c on d.b_sno=c.sno
where not cycle
)
select distinct gn,sno from cte
order by gn,sno
Result:
gn sno
1 4
1 5
1 6
1 7
2 9
2 10
2 13
2 14
3 11
3 15
here is the demo of what I did.

Here is a start that may give some ideas on an approach. The recursive query starts with a_sno of each record and then tries to follow the path of b_sno until it reaches the end or forms a cycle. The path is represented by an array of sno integers.
The unnest function will break the array into rows, so a sno value mapped to the path array such as:
4, {6, 5, 4}
will be transformed to a row for each value in the array:
4, 6
4, 5
4, 4
The array_agg then reverses the operation by aggregating the values back into a path, but getting rid of the duplicates and ordering.
Now each a_sno is associated with a path and the path forms the grouping. dense_rank can be used to map the grouping (cluster) to a numeric.
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
Output:
cluster | sno
--------------+-----
{4,5,6,7} | 4
{4,5,6,7} | 5
{4,5,6,7} | 6
{4,5,6,7} | 7
{9,10,13,14} | 9
{9,10,13,14} | 10
{9,10,13,14} | 13
{9,10,13,14} | 14
{11,15} | 11
{11,15} | 15
(10 rows)
Wrap it one more time for the ranking:
SELECT dense_rank() OVER(order by cluster) AS rank
,sno
FROM (
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
) z
Output:
rank | sno
------+-----
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
(10 rows)

Related

T-SQL Select all combinations of ranges that meet aggregate criteria

Problem restated per comments
Say we have the following integer id's and counts...
id count
1 0
2 10
3 0
4 0
5 0
6 1
7 9
8 0
We also have a variable #id_range int.
Given a value for #id_range, how can we select all combinations of id ranges, without using while loops or cursors, that meet the following criteria?
1) No two ranges in a combination can overlap (min and max of each range are inclusive)
2) sum(count) for a combination of ranges must equal sum(count) of the initial data set (20 in this case)
3) Only include ranges where sum(count) > 0
The simplest case would be when #id_range = max(id) - min(id), or 7 given the above data. In this case, there's only one solution:
minId maxId count
---------------------
1 8 20
But if #id_range = 1 for example, there would be 4 possible solutions:
Solution 1:
minId maxId count
---------------------
1 2 10
5 6 1
7 8 9
Solution 2:
minId maxId count
---------------------
1 2 10
6 7 10
Solution 3:
minId maxId count
---------------------
2 3 10
5 6 1
7 8 9
Solution 4:
minId maxId count
---------------------
2 3 10
6 7 10
The end goal is to identify which solutions have the fewest number of ranges (solution # 2 and 4, in above example where #id_range = 1).
this solution does not list all possible combination but just try to get group it in smallest possible no of rows.
Hopefully it will cover all possible scenario
-- create the sample table
declare #sample table
(
id int,
[count] int
)
-- insert some sample data
insert into #sample select 1, 0
insert into #sample select 2, 10
insert into #sample select 3, 0
insert into #sample select 4, 0
insert into #sample select 5, 0
insert into #sample select 6, 1
insert into #sample select 7, 9
insert into #sample select 8, 0
-- the #id_range
declare #id_range int = 1
-- the query
; with
cte as
(
-- this cte identified those rows with count > 0 and group them together
-- sign(0) gives 0, sign(+value) gives 1
-- basically it is same as case when [count] > 0 then 1 else 0 end
select *,
grp = row_number() over (order by id)
- dense_rank() over(order by sign([count]), id)
from #sample
),
cte2 as
(
-- for each grp in cte, assign a sub group (grp2). each sub group
-- contains #id_range number of rows
select *,
grp2 = (row_number() over (partition by grp order by id) - 1)
/ (#id_range + 1)
from cte
where count > 0
)
select MinId = min(id),
MaxId = min(id) + #id_range,
[count] = sum(count)
from cte2
group by grp, grp2

SQL query to find counts of numbers in running total

Suppose the table has 1 column ID and the values are as below:
ID
5
5
5
6
5
5
6
6
the output should be
ID count
5 3
6 1
5 2
6 2
How can we do that in a single SQL query.
If you want to find the Total count of the Records you have you can write like
select count(*) from database_name order by column_name;
In relational databases data in the table has no any order, see this: https://en.wikipedia.org/wiki/Table_(database)
the database system does not guarantee any ordering of the rows unless
an ORDER BY clause is specified in the SELECT statement that queries
the table.
therefore, in order to get desired results, you must have an additional colum in the table that defines an order of rows (and can by used in ORDER BY clause).
In the below examle cn column defines such an order:
select * from tab123 ORDER BY rn;
RN ID
---------- -------
1 5
2 5
3 5
4 6
5 5
6 5
7 6
8 6
Starting from Oracle version 12c new MATCH_REGOGNIZE clause can be used:
select * from tab123
match_recognize(
order by rn
measures
strt.id as id,
count(*) as cnt
one row per match
after match skip past last row
pattern( strt ss* )
define ss as ss.id = prev( ss.id )
);
On earlier versions that support windows function (Oracle 10 and above) you can use two windows functions: LAG ... over and SUM ... over, in this way
select max( id ) as id, count(*) as cnt
FROM (
select id, sum( xxx ) over (order by rn ) as yyy
from (
select t.*,
case lag( id ) over (order by rn )
when id then 0 else 1 end as xxx
from tab123 t
)
)
GROUP BY yyy
ORDER BY yyy;

How to get closest n rows for specific row in table?

I have a table foo with its primary key id and some other columns.
My goal is to find for instance rows with id=3 and id=4 and rows with id=6 and id=7 for row with id=5 - in case I would like to find 2 closest previous and next rows.
In case there is only one or no such rows (e.g. for id=2 there is only previous row) I would like to get only possible ones.
The problem is there can be some rows missing.
Is there a common practice to make such queries?
I would try the following:
SELECT * FROM table WHERE id > ? ORDER BY id ASC LIMIT 2
followed by
SELECT * FROM table WHERE id <= ? ORDER BY id DESC LIMIT 2
You may be able to combine the above into the following:
SELECT * FROM table WHERE id > ? ORDER BY id ASC LIMIT 2
UNION
SELECT * FROM table WHERE id <= ? ORDER BY id DESC LIMIT 2
I think this would fit your description.
Select * from table where id between #n-2 and #n+2 and id <> #n
One way is this:
with your_table(id) as(
select 1 union all
select 2 union all
select 4 union all
select 5 union all
select 10 union all
select 11 union all
select 12 union all
select 13 union all
select 14
)
select * from (
(select * from your_table where id <= 10 order by id desc limit 3+1)
union all
(select * from your_table where id > 10 order by id limit 3)
) t
order by id
(Here 10 is start point and 3 is n rows you want)
This is a possible solution by numbering all the records and fetching those where row number is 2 rows greater or lower than the selected ID.
create table foo(id int);
insert into foo values (1),(2),(4),(6),(7),(8),(11),(12);
-- using ID = 6
with rnum as
(
select id, row_number() over (order by id) rn
from foo
)
select *
from rnum
where rn >= (select rn from rnum where id = 6) - 2
and rn <= (select rn from rnum where id = 6) + 2;
id | rn
-: | -:
2 | 2
4 | 3
6 | 4
7 | 5
8 | 6
-- using ID = 2
with rnum as
(
select id, row_number() over (order by id) rn
from foo
)
select *
from rnum
where rn >= (select rn from rnum where id = 2) - 2
and rn <= (select rn from rnum where id = 2) + 2;
id | rn
-: | -:
1 | 1
2 | 2
4 | 3
6 | 4
dbfiddle here

SQL: first row of group by after join and order

Assuming we got two below tables:
TravelTimes
OriginId DestinationId TotalJourneyTime
1 1 10
1 2 20
2 2 30
2 3 40
1 3 50
Destinations
DestinationId Name
1 Destination 1
2 Destination 2
3 Destination 3
How do I find the quickest journey between each origin and destination?
I want to join TravelTimes with Destinations by DestinationId and then group them by OriginId and sort each group by TotalJourneyTime and select the first row of each group.
I did try joining and grouping, but it seems group by is not the solution for my case as I don't have any aggregation column in the output.
Expected output
OriginId DestinationId DestinationName TotalJourneyTime
1 1 Destination 1 10
2 3 Destination 3 40
Use a RANK to rank each journey partitioned by the origin and destination and ordered by the travel time
WITH RankedTravelTimes
AS
(
select originid,
destinationId,
totaljourneytime,
rank() over (partition by originid,destinationid order by totaljourneytime ) as r
from traveltimes
)
SELECT rtt.*, d.name
FROM RankedTravelTimes rtt
INNER JOIN Destinations d
ON rtt.destinationId = d.id
WHERE rtt.r=1
The above will include both the journey from 1-2 and 2-2 as separate. If you're only interested in the destination you can remove originId out of the partition.
Not sure I see the problem here with just joining and grouping the data with a MIN on the journey time:
CREATE TABLE #Traveltimes
(
[OriginId] INT ,
[DestinationId] INT ,
[TotalJourneyTime] INT
);
INSERT INTO #Traveltimes
( [OriginId], [DestinationId], [TotalJourneyTime] )
VALUES ( 1, 1, 10 ),
( 1, 2, 20 ),
( 2, 2, 30 ),
( 2, 3, 40 ),
( 2, 3, 50 );
CREATE TABLE #Destinations
(
[DestinationId] INT ,
[Name] VARCHAR(13)
);
INSERT INTO #Destinations
( [DestinationId], [Name] )
VALUES ( 1, 'Destination 1' ),
( 2, 'Destination 2' ),
( 3, 'Destination 3' );
SELECT d.DestinationId ,
d.Name ,
tt.OriginId ,
MIN(tt.TotalJourneyTime) MinTime
FROM #Destinations d
INNER JOIN #Traveltimes tt ON tt.DestinationId = d.DestinationId
GROUP BY tt.OriginId ,
d.DestinationId ,
d.Name
DROP TABLE #Destinations
DROP TABLE #Traveltimes
Gives you:
DestinationId Name OriginId MinTime
1 Destination 1 1 10
2 Destination 2 1 20
2 Destination 2 2 30
3 Destination 3 2 40
Note: why do you travel from destination 1 to itself?
I think you want the following:
;with cte as(select *, row_number() over(partition by DestinationId order by TotalJourneyTime) rn
from TravelTimes)
select * from cte c
join Destinations d on c.DestinationId = d.DestinationId
where c.rn = 1

SELECT records until new value SQL

I have a table
Val | Number
08 | 1
09 | 1
10 | 1
11 | 3
12 | 0
13 | 1
14 | 1
15 | 1
I need to return the last values where Number = 1 (however many that may be) until Number changes, but do not need the first instances where Number = 1. Essentially I need to select back until Number changes to 0 (15, 14, 13)
Is there a proper way to do this in MSSQL?
Based on following:
I need to return the last values where Number = 1
Essentially I need to select back until Number changes to 0 (15, 14,
13)
Try (Fiddle demo ):
select val, number
from T
where val > (select max(val)
from T
where number<>1)
EDIT: to address all possible combinations (Fiddle demo 2)
;with cte1 as
(
select 1 id, max(val) maxOne
from T
where number=1
),
cte2 as
(
select 1 id, isnull(max(val),0) maxOther
from T
where val < (select maxOne from cte1) and number<>1
)
select val, number
from T cross join
(select maxOne, maxOther
from cte1 join cte2 on cte1.id = cte2.id
) X
where val>maxOther and val<=maxOne
I think you can use window functions, something like this:
with cte as (
-- generate two row_number to enumerate distinct groups
select
Val, Number,
row_number() over(partition by Number order by Val) as rn1,
row_number() over(order by Val) as rn2
from Table1
), cte2 as (
-- get groups with Number = 1 and last group
select
Val, Number,
rn2 - rn1 as rn1, max(rn2 - rn1) over() as rn2
from cte
where Number = 1
)
select Val, Number
from cte2
where rn1 = rn2
sql fiddle demo
DEMO: http://sqlfiddle.com/#!3/e7d54/23
DDL
create table T(val int identity(8,1), number int)
insert into T values
(1),(1),(1),(3),(0),(1),(1),(1),(0),(2)
DML
; WITH last_1 AS (
SELECT Max(val) As val
FROM t
WHERE number = 1
)
, last_non_1 AS (
SELECT Coalesce(Max(val), -937) As val
FROM t
WHERE EXISTS (
SELECT val
FROM last_1
WHERE last_1.val > t.val
)
AND number <> 1
)
SELECT t.val
, t.number
FROM t
CROSS
JOIN last_1
CROSS
JOIN last_non_1
WHERE t.val <= last_1.val
AND t.val > last_non_1.val
I know it's a little verbose but I've deliberately kept it that way to illustrate the methodolgy.
Find the highest val where number=1.
For all values where the val is less than the number found in step 1, find the largest val where the number<>1
Finally, find the rows that fall within the values we uncovered in steps 1 & 2.
select val, count (number) from
yourtable
group by val
having count(number) > 1
The having clause is the key here, giving you all the vals that have more than one value of 1.
This is a common approach for getting rows until some value changes. For your specific case use desc in proper spots.
Create sample table
select * into #tmp from
(select 1 as id, 'Alpha' as value union all
select 2 as id, 'Alpha' as value union all
select 3 as id, 'Alpha' as value union all
select 4 as id, 'Beta' as value union all
select 5 as id, 'Alpha' as value union all
select 6 as id, 'Gamma' as value union all
select 7 as id, 'Alpha' as value) t
Pull top rows until value changes:
with cte as (select * from #tmp t)
select * from
(select cte.*, ROW_NUMBER() over (order by id) rn from cte) OriginTable
inner join
(
select cte.*, ROW_NUMBER() over (order by id) rn from cte
where cte.value = (select top 1 cte.value from cte order by cte.id)
) OnlyFirstValueRecords
on OriginTable.rn = OnlyFirstValueRecords.rn and OriginTable.id = OnlyFirstValueRecords.id
On the left side we put an original table. On the right side we put only rows whose value is equal to the value in first line.
Records in both tables will be same until target value changes. After line #3 row numbers will get different IDs associated because of the offset and will never be joined with original table:
LEFT RIGHT
ID Value RN ID Value RN
1 Alpha 1 | 1 Alpha 1
2 Alpha 2 | 2 Alpha 2
3 Alpha 3 | 3 Alpha 3
----------------------- result set ends here
4 Beta 4 | 5 Alpha 4
5 Alpha 5 | 7 Alpha 5
6 Gamma 6 |
7 Alpha 7 |
The ID must be unique. Ordering by this ID must be same in both ROW_NUMBER() functions.