How can I stop my Postgres recusive CTE from indefinitely looping? - sql

Background
I'm running Postgres 11 on CentOS 7.
I recently learned the basics of recursive CTEs in Postgres thanks to S-Man's answer to my recent question.
The problem
While working on a closely related issue (counting parts sold within bundles and assemblies) and using this recursive CTE, I ran into a problem where the query looped indefinitely and never completed.
I tracked this down to the presence of non-spurious 'self-referential' entries in the relator table, i.e. rows with the same value for parent_name and child_name.
I know that these are the source of the problem because when I recreated the situation with test tables and data, the undesired looping behavior occurred when these rows were present, and disappeared when these rows were absent or when UNION (which excludes duplicate returned rows) was used in the CTE rather than UNION ALL .
I think the data model itself probably needs adjusting so that these 'self-referential' rows aren't necessary, but for now, what I need to do is get this query to return the desired data on completion and stop looping.
How can I achieve this result? All guidance much appreciated!
Tables and test data
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
/* NOTE: listing_name below is like an alias of a relator.parent_name as it appears in a catalog,
required to know because it is these listing_names that are reflected by sales.sold_name */
CREATE TABLE the_schema.catalog_listings (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
listing_name TEXT NOT NULL,
parent_name TEXT NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
CREATE VIEW the_schema.relationships_with_child_category AS (
SELECT
c.listing_name,
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.catalog_listings c
INNER JOIN
the_schema.relator r
ON c.parent_name = r.parent_name
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('child2', 'assembly'),('subChild1', 'component'),
('subChild2', 'component'), ('subChild3', 'component');
INSERT INTO the_schema.catalog_listings (listing_name, parent_name)
VALUES ('listing1', 'parent1'), ('parent1', 'child1'), ('parent1','child2'), ('child1', 'child1'), ('child2', 'child2');
INSERT INTO the_schema.catalog_listings (listing_name, parent_name)
VALUES ('parent1', 'child1'), ('parent1','child2');
/* note the two 'self-referential' entries */
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 1), ('child1', 'subChild2', 1)
('parent1', 'child2', 1),('child2', 'subChild1', 1), ('child2', 'subChild3', 1), ('child1', 'child1', 1), ('child2', 'child2', 1);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2), ('listing1', 1);
The present query, loops indefinitely with the required UNION ALL
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category as category
FROM
the_schema.sales s
JOIN the_schema.relationships_with_child_category r
ON s.sold_name = r.listing_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category
FROM cte
JOIN the_schema.relationships_with_child_category r
ON cte.child_name = r.parent_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
;

In catalog_listings table listing_name and parent_name is same for child1 and child2
In relator table parent_name and child_name is also same for child1 and child2
These rows are creating cycling recursion.
Just remove those two rows from both the tables:
delete from catalog_listings where id in (4,5)
delete from relator where id in (7,8)
Then your desired output will be as below:
child_name
sum
subChild2
8
subChild3
8
subChild1
16
Is this the result you are looking for?
If you can't delete the rows you can use below add parent_name<>child_name condition to avoid those rows:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category as category
FROM
the_schema.sales s
JOIN the_schema.relationships_with_child_category r
ON s.sold_name = r.listing_name and r.parent_name <>r.child_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category
FROM cte
JOIN the_schema.relationships_with_child_category r
ON cte.child_name = r.parent_name and r.parent_name <>r.child_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name ;

You may be able to avoid infinite recursion simply by using UNION instead of UNION ALL.
The documentation describes the implementation:
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
"Getting rid of the duplicates" should cause the intermediate table to be empty at some point, which ends the iteration.

Related

How to efficiently compute in step 1 differences between columns and in step 2 aggregate those differences?

This is a follow-up question to this Is there a way to use functions that can take multiple columns as input (e.g. `GREATEST`) without typing out every column name?, where I asked only about the second part of my problem. The feedback was that the data model is most likely not appropriate.
I was thinking again about the data model but still, have trouble figuring out a good way to do what I want.
The complete problem is as follows:
I got time series data for multiple technical devices with columns like energy_consumption and voltage.
Furthermore, I got columns with sensitivities towards multiple external factors for each device which I just added as additional columns (denoted with the cc_ in the example).
There are queries where I want to operate on the raw sensitivities. However, there are also queries for which I need to take first some differences such as cc_a - cc_b and cc_b -cc_c and then compute the max of those differences. The combinations for which the differences are to be computed are a predefined subset (around 30) of all possible combinations. The set of combinations that is of interest might change in the future so that for different time intervals different combinations have to be applied (e.g. from 2022-01-01 to 2024-12-31 take combination set A and from 2025-01-01 to ... take combination set B). However, it is very unlikely that the combination change very often.
Here is an example of how I am doing it at the moment
CREATE TEMP TABLE foo (device_id int, voltage int, energy_consumption int, cc_a int, cc_b int, cc_c int);
INSERT INTO foo VALUES (3, 12, 5, '1', '2', '3'), (4, 6, 3, '15', '4', '100');
WITH diff_table AS (
SELECT
id,
(cc_a - cc_b) as diff_ab,
(cc_a - cc_c) as diff_ac,
(cc_b - cc_c) as diff_bc
FROM foo
)
SELECT
id,
GREATEST(diff_ab, diff_ab, diff_bc) as max_cc
FROM diff_table
Since I got more than 100 sensitivities and also differences I am looking for a way how to do this efficiently, both computationally and in terms of typical query length.
What would be a good data model to perform such operations?
The solution I type below assumes all pairings are considered, and the you don't want the points where these are reached.
CREATE TABLE sources (
source_id int
,source_name varchar(10)
,PRIMARY KEY(source_id))
CREATE TABLE foo_values(
device_id int not null --device_id for "foo"
,source_id int -- you may change that with a foreign key
,value int
,CONSTRAINT fk_source_id
FOREIGN KEY(source_id )
REFERENCES sources(source_id ) )
With the exemple set you gave
INSERT INTO sources ( source_id, source_name ) VALUES
(1,'cc_a')
,(2,'cc_b')
,(3,'cc_c')
-- and so on
INSERT INTO foo_values (device_id,source_id, value ) VALUES
(3,1,1),(3,2,2),(3,3,3)
,(4,1,15),(4,2,4),(4,2,100)
doing this way, the query will be
SELECT device_id
, MAX(value)-MIN(value) as greatest_diff
FROM foo_values
group by device_id
Bonus : with such a schema, you can even tell where the maximum and minimum are reached
WITH ranked as (
SELECT
f.device_id
,f.value
,f.source_id
,RANK() OVER (PARTITION BY f.device_id ORDER BY f.value ) as low_first
,RANK() OVER (PARTITION BY f.device_id ORDER BY f.value DESC) as high_first
FROM foo_values as f)
SELECT h.device_id
, hs.source_name as source_high
, ls.source_name as source_low
, h.value as value_high
, l.value as value_low
, h.value - l.value as greatest_diff
FROM ranked l
INNER JOIN ranked h
on l.device_id = h.device_id
INNER JOIN sources ls
on ls.source_id = l.source_id
INNER JOIN sources hs
on hs.source_id = h.source_id
WHERE l.low_first =1 AND h.high_first = 1
Here is a fiddle for this solution.
EDIT : since you need to control the pairings, you must add a table that list them:
CREATE TABLE high_low_source
(high_source_id int
,low_source_id int
, constraint fk_low
FOREIGN KEY(low_source_id )
REFERENCES sources(source_id )
,constraint fk_high
FOREIGN KEY(high_source_id )
REFERENCES sources(source_id )
);
INSERT INTO high_low_source VALUES
(1,2),(1,3),(2,3)
The query looking for the greatest difference becomes :
SELECT h.device_id
, hs.source_name as source_high
, ls.source_name as source_low
, h.value as value_high
, l.value as value_low
, h.value - l.value as my_diff
, RANK() OVER (PARTITION BY h.device_id ORDER BY (h.value - l.value) DESC) as greatest_first
FROM foo_values l
INNER JOIN foo_values h
on l.device_id = h.device_id
INNER JOIN high_low_source hl
on hl.low_source_id = l.source_id
AND hl.high_source_id = h.source_id
INNER JOIN sources ls
on ls.source_id = l.source_id
INNER JOIN sources hs
on hs.source_id = h.source_id
ORDER BY device_id, greatest_first
With the values you have listed, there is a tie for device 3.
Extended fiddle

How can I write a SQL query to calculate the quantity of components sold with their parent assemblies? (Postgres 11/recursive CTE?)

My goal
To calculate the sum of components sold as part of their parent assemblies.
I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.
Background
I'm running Postgres 11 on CentOS 7.
I have some tables like as follows:
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS (
SELECT
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.relator r
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold
------------+----------
subChild1 | 30
subChild2 | 6
(2 rows)
The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way.
Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter's answer linked above, I arrive at:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total
-----------+------------+-------
parent1 | child1 | 1
parent1 | child1 | 2
parent1 | subChild1 | 10
parent1 | subChild1 | 20
parent1 | subChild2 | 2
parent1 | subChild2 | 4
(6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:
sold_name | child_name | total
-----------+------------+-------
(0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!
Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
My entire query looks like this (which only differs in the details I mentioned above from yours):
demo:db<>fiddle
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category as category
FROM
sales s
JOIN relator r
ON s.sold_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category
FROM cte
JOIN relator r ON cte.child_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
Note: I didn't use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that's just the way I personally like it :)
Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :
SELECT
cte.child_name,
sum(cte.total)
FROM
(
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte ) AS cte
INNER JOIN
the_schema.relationships_with_child_catetgory r1
ON cte.child_name = r1.child_name
WHERE r1.child_category = 'component'
GROUP BY cte.child_name
;
which gives the desired rows:
child_name | sum
------------+-----
subChild2 | 6
subChild1 | 30
(2 rows)
Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.

Adding a LEFT JOIN on a INSERT INTO....RETURNING

My query Inserts a value and returns the new row inserted
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING comment_id, date_posted, e_id, created_by, parent_id, body, num_likes, thread_id
I want to join the created_by with the user_id from my user's table.
SELECT * from users WHERE user_id = created_by
Is it possible to join that new returning row with another table row?
Consider using a WITH structure to pass the data from the insert to a query that can then be joined.
Example:
-- Setup some initial tables
create table colors (
id SERIAL primary key,
color VARCHAR UNIQUE
);
create table animals (
id SERIAL primary key,
a_id INTEGER references colors(id),
animal VARCHAR UNIQUE
);
-- provide some initial data in colors
insert into colors (color) values ('red'), ('green'), ('blue');
-- Store returned data in inserted_animal for use in next query
with inserted_animal as (
-- Insert a new record into animals
insert into animals (a_id, animal) values (3, 'fish') returning *
) select * from inserted_animal
left join colors on inserted_animal.a_id = colors.id;
-- Output
-- id | a_id | animal | id | color
-- 1 | 3 | fish | 3 | blue
Explanation:
A WITH query allows a record returned from an initial query, including data returned from a RETURNING clause, which is stored in a temporary table that can be accessed in the expression that follows it to continue work on it, including using a JOIN expression.
You were right, I misunderstood
This should do it:
DECLARE mycreated_by event_comments.created_by%TYPE;
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING created_by into mycreated_by
SELECT * from users WHERE user_id = mycreated_by

Postgres union query returns strange result

I have the following tables:
CREATE TABLE geodat(
vessel UUID NOT NULL,
trip UUID NOT NULL,
geom geometry(LineString,4326),
PRIMARY KEY(vessel,trip)
);
CREATE TABLE areas(
gid SERIAL NOT NULL,
/* --other columns of little interest-- */
geom geometry(MultiPolygon,3035),
PRIMARY KEY(gid)
);
The following query is supposed to return the area that has been crossed the least, as well as how many times it was crossed and by which vessels.
SELECT vessel,MIN(cnt) as min_crossing,gid
FROM (
SELECT vessel,COUNT(*) as cnt, gid
FROM (
SELECT vessel, null as geo1, geom as geo2, null as gid
FROM geodat
UNION ALL
SELECT null,geom,null,gid FROM areas ) as P
WHERE ST_Crosses(geo1,geo2) AND geo1 IS NOT NULL AND geo2 IS NOT NULL
GROUP BY gid,vessel) as P1
GROUP BY gid,vessel
Theoretically, this query should solve the question above. The problem is that I am getting (0 rows) as an answer, although I have been assured as to the opposite. I discovered it has something to do with the null values the UNION produced, but I don't have a clue how to fix this.
Any ideas?
NOTE: The two tables have 31822 rows and 27308 rows respectively which makes a JOIN impractical.
You have the condition
WHERE ST_Crosses(geo1,geo2) AND geo1 IS NOT NULL AND geo2 IS NOT NULL
However, in the union all you are explicitly setting geo1 to null and geo2 to null. Hence the query is returning 0 rows.
You can change the where condition above to or, which would return rows.
WHERE ST_Crosses(geo1,geo2) AND geo1 IS NOT NULL OR geo2 IS NOT NULL

Recursive MySQL query

My database schema looks like this:
Table t1:
id
valA
valB
Table t2:
id
valA
valB
What I want to do, is, for a given set of rows in one of these tables, find rows in both tables that have the same valA or valB (comparing valA with valA and valB with valB, not valA with valB). Then, I want to look for rows with the same valA or valB as the rows in the result of the previous query, and so on.
Example data:
t1 (id, valA, valB):
1, a, B
2, b, J
3, d, E
4, d, B
5, c, G
6, h, J
t2 (id, valA, valB):
1, b, E
2, d, H
3, g, B
Example 1:
Input: Row 1 in t1
Output:
t1/4, t2/3
t1/3, t2/2
t2/1
...
Example 2:
Input: Row 6 in t1
Output:
t1/2
t2/1
I would like to have the level of the search at that the row was found in the result (e.g. in Example 1: Level 1 for t1/2 and t2/1, level 2 for t1/5, ...) A limited depth of recursion is okay. Over time, I maybe want to include more tables following the same schema into the query. It would be nice if it was easy to extend the query for that purpose.
But what matters most, is the performance. Can you tell me the fastest possible way to accomplish this?
Thanks in advance!
try this although it's not fully tested but looked like it was working :P (http://pastie.org/1140339)
drop table if exists t1;
create table t1
(
id int unsigned not null auto_increment primary key,
valA char(1) not null,
valB char(1) not null
)
engine=innodb;
drop table if exists t2;
create table t2
(
id int unsigned not null auto_increment primary key,
valA char(1) not null,
valB char(1) not null
)
engine=innodb;
drop view if exists t12;
create view t12 as
select 1 as tid, id, valA, valB from t1
union
select 2 as tid, id, valA, valB from t2;
insert into t1 (valA, valB) values
('a','B'),
('b','J'),
('d','E'),
('d','B'),
('c','G'),
('h','J');
insert into t2 (valA, valB) values
('b','E'),
('d','H'),
('g','B');
drop procedure if exists find_children;
delimiter #
create procedure find_children
(
in p_tid tinyint unsigned,
in p_id int unsigned
)
proc_main:begin
declare done tinyint unsigned default 0;
declare dpth smallint unsigned default 0;
create temporary table children(
tid tinyint unsigned not null,
id int unsigned not null,
valA char(1) not null,
valB char(1) not null,
depth smallint unsigned default 0,
primary key (tid, id, valA, valB)
)engine = memory;
insert into children select p_tid, t.id, t.valA, t.valB, dpth from t12 t where t.tid = p_tid and t.id = p_id;
create temporary table tmp engine=memory select * from children;
/* http://dec.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */
while done <> 1 do
if exists(
select 1 from t12 t
inner join tmp on tmp.valA = t.valA or tmp.valB = t.valB and tmp.depth = dpth) then
insert ignore into children
select
t.tid, t.id, t.valA, t.valB, dpth+1
from t12 t
inner join tmp on tmp.valA = t.valA or tmp.valB = t.valB and tmp.depth = dpth;
set dpth = dpth + 1;
truncate table tmp;
insert into tmp select * from children where depth = dpth;
else
set done = 1;
end if;
end while;
select * from children order by depth;
drop temporary table if exists children;
drop temporary table if exists tmp;
end proc_main #
delimiter ;
call find_children(1,1);
call find_children(1,6);
You can do it with stored procedures (see listings 7 and 7a):
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
You just need to figure out a query for the step of the recursion - taking the already-found rows and finding some more rows.
If you had a database which supported SQL-99 recursive common table expressions (like PostgreSQL or Firebird, hint hint), you could take the same approach as in the above link, but using a rCTE as the framework, so avoiding the need to write a stored procedure.
EDIT: I had a go at doing this with an rCTE in PostgreSQL 8.4, and although i can find the rows, i can't find a way to label them with the depth at which they were found. First, i create a a view to unify the tables:
create view t12 (tbl, id, vala, valb) as (
(select 't1', id, vala, valb from t1)
union
(select 't2', id, vala, valb from t2)
)
Then do this query:
with recursive descendants (tbl, id, vala, valb) as (
(select *
from t12
where tbl = 't1' and id = 1) -- the query that identifies the seed rows, here just t1/1
union
(select c.*
from descendants p, t12 c
where (p.vala = c.vala or p.valb = c.valb)) -- the recursive term
)
select * from descendants;
You would imagine that capturing depth would be as simple as adding a depth column to the rCTE, set to zero in the seed query, then somehow incremented in the recursive step. However, i couldn't find any way to do that, given that you can't write subqueries against the rCTE in the recursive step (so nothing like select max(depth) + 1 from descendants in the column list), and you can't use an aggregate function in the column list (so no max(p.depth) + 1 in the column list coupled with a group by c.* on the select).
You would also need to add a restriction to the query to exclude already-selected rows; you don't need to do that in the basic version, because of the distincting effect of the union, but if you add a count column, then a row can be included in the results more than once with different counts, and you'll get a Cartesian explosion. But you can't easily prevent it, because you can't have subqueries against the rCTE, which means you can't say anything like and not exists (select * from descendants d where d.tbl = c.tbl and d.id = c.id)!
I know all this stuff about recursive queries is of no use to you, but i find it riveting, so please do excuse me.