Build Traversal path for each BST node - sql

Code to create the table:
create table BST
(
N Int,
P Int
)
insert into BST values
(1,3),
(3,8),
(4,6),
(6,3),
(7,6),
(8,NULL),
(10,8),
(13,14),
(14,10)
The BST hierarchy looks like this:
I am trying to build a query such that for each node it shows the traversal path needed to reach that particular node.
I tried applying recursive CTE, but I am not sure if I applied it in the correct way or not.
WITH NodeCTE (N, P, [Level])
AS
(
SELECT N,
P,
1
FROM BST
WHERE P IS NULL
UNION ALL
SELECT BST.N,
BST.P,
NodeCTE.[Level] + 1
FROM BST
JOIN NodeCTE ON BST.P = NodeCTE.N
)
SELECT CTE1.N AS Node,
CTE1.[Level]
FROM NodeCTE CTE1
LEFT JOIN NodeCTE CTE2 ON CTE1.P = CTE2.N
In the end I need to use STRING_AGG to format the data that is what I found by googling, but I am unable to figure out how to get the data in the format required prior to applying the STRING_AGG
Expected Output:
| N | TraversalPath |
|-------|----------------|
|1 |8->3->1 |
|3 |8->3 |
|4 |8->3->6->4 |
|6 |8->3->6 |
|7 |8->3->6->7 |
|8 |8 |
|10 |8->10 |
|13 |8->10->14->13 |
|14 |8->10->14 |

You did much of the dirty work, so the additions are minimal:
WITH NodeCTE (N, P, [Level], [path])
AS
(
SELECT N,
P,
1,
convert(NVARCHAR(MAX),N)
FROM BST
WHERE P IS NULL
UNION ALL
SELECT BST.N,
BST.P,
NodeCTE.[Level] + 1,
NodeCTE.[path] + '->' + convert(NVARCHAR(MAX),BST.N)
FROM BST
JOIN NodeCTE ON BST.P = NodeCTE.N
)
SELECT N,
P,
Level,
path AS Traversal
FROM NodeCTE
ORDER BY N

Related

Create recursive CTE for this table [duplicate]

This question already has answers here:
The maximum recursion 100 has been exhausted before statement completion
(2 answers)
Closed 3 months ago.
I have a table like this:
|id |name |parent|
+-------+----------+------+
|1 |iran | |
|2 |iraq | |
|3 |tehran |1 |
|4 |tehran |3 |
|5 |Vaiasr St |4 |
|6 |Fars |1 |
|7 |shiraz |6 |
It's about addresses from country to street. I want to create address by recursive cte like this:
with cte_address as
(
select
ID, [Name], parent
from
[Address]
where
Parent is null
union all
select
a.ID, a.[name], a.Parent
from
address a
inner join
cte_address c on a.parent = c.id
)
select *
from cte_address
But I get an error:
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
you have to use option (maxrecursion 0) at the end of your select query,Maxrecursion 0 allows infinite recursion:
with cte_address as
(
...
...
)
select * from cte_address
option (maxrecursion 0)
Note :
Limiting the number of recursions allowed for a specific query in SQL Server with the 100 default value prevents the cause of an infinite loop situation due to a poorly designed recursive CTE query.

Recursive split of path with H2 DB and SQL

I've path names of the following common form (path depth not limited):
/a/b/c/d/e/...
Example
/a/b/c/d/e
Expected result
What I'd like to achieve now is to split the path into a table containing the folder and the respective parent:
parent
folder
/a/b/c/d/
e
/a/b/c/
d
/a/b/
c
/a/
b
/
a
The capabilities of the H2 db are a bit limited when it comes to splitting strings, thus my assumption was it must be solved recursively (especially since the path depth is not limited).
Any help would be appreciated :)
You need to use a recursive query, for example:
WITH RECURSIVE CTE(S, F, T) AS (
SELECT '/a/b/c/d/e', 0, 1
UNION ALL
SELECT S, T, LOCATE('/', S, T + 1)
FROM CTE
WHERE T <> 0
)
SELECT
SUBSTRING(S FROM 1 FOR F) PARENT,
SUBSTRING(S FROM F + 1 FOR
CASE T WHEN 0 THEN CHARACTER_LENGTH(S) ELSE T - F - 1 END) FOLDER
FROM CTE WHERE F > 0;
It produces
PARENT
FOLDER
/
a
/a/
b
/a/b/
c
/a/b/c/
d
/a/b/c/d/
e
Do something like this:
with recursive
p(p) as (select '/a/b/c/d/e' as p),
t(path, parent, folder, i) as (
select
p,
REGEXP_REPLACE(p, '(.*)/\w+', '$1'),
REGEXP_REPLACE(p, '.*/(\w+)', '$1'),
1
from p
union
select
t.parent,
REGEXP_REPLACE(t.parent, '(.*)/\w+', '$1'),
REGEXP_REPLACE(t.parent, '.*/(\w+)', '$1'),
t.i + 1
from t
where t.parent != ''
)
select *
from t;
resulting in
|PATH |PARENT |FOLDER|I |
|----------|--------|------|---|
|/a/b/c/d/e|/a/b/c/d|e |1 |
|/a/b/c/d |/a/b/c |d |2 |
|/a/b/c |/a/b |c |3 |
|/a/b |/a |b |4 |
|/a | |a |5 |
Not sure if you're really interested in trailing / characters, but you can easily fix the query according to your needs.

Looking to find duplicates using DIFFERENCE() among 2+ columns

I'm trying to write a SQL Select query that uses the DIFFERENCE() function to find similar names in a database to identify duplicates.
The short version of the code I'm using is:
SELECT *, DIFFERENCE(FirstName, LEAD(FirstName) OVER (ORDER BY SOUNDEX(FirstName))) d
WHERE d >= 3
The problem is my database has additional columns that include middle names and nicknames. So if I have a customer who has multiple names they go by, they might be in the database multiple times, and I need to compare a variety of columns against each other.
Sample Data:
+----+--------+--------+--------+--------+
|ID |First |Middle |AKA1 |AKA2 |
+----+--------+--------+--------+--------+
|1 |Sally |Ann |NULL |NULL |
|2 |Ann |NULL |NULL |NULL |
|3 |Sue |NULL |NULL |NULL |
|4 |Suzy |NULL |NULL |NULL |
|5 |Patricia|NULL |Trish |Patty |
|6 |Patty |NULL |Patricia|Trish |
|7 |Trish |NULL |Patty |Patricia|
+----+--------+--------+--------+--------+
In the above, rows 1+2 are duplicates of each other, as are 3+4, and 5+6+7.
So I'm not sure the best way to get what I want. Here's the longer version of the code I'm actually using:
WITH A AS (SELECT *,
SOUNDEX(FirstName) AS "FirstSoundex",
SOUNDEX(LastName) AS "LastSoundex",
LAG (SOUNDEX(FirstName)) OVER (ORDER BY SOUNDEX(FirstName)) AS "PreviousFirstSoundex",
LAG (SOUNDEX(LastName)) OVER (ORDER BY SOUNDEX(LastName)) AS "PreviousLastSoundex"
FROM Clients),
B AS (
SELECT *,
ISNULL(DIFFERENCE(FirstName, LEAD(FirstName) OVER (ORDER BY FirstSoundex)),0) AS "FirstScore",
ISNULL(DIFFERENCE(LastName, LEAD(LastName) OVER (ORDER BY LastSoundex)),0) AS "LastScore"
FROM A),
C AS (
SELECT *,
ISNULL(LAG (FirstScore) OVER (ORDER BY FirstSoundex),0) AS "PreviousFirstScore",
ISNULL(LAG (LastScore) OVER (ORDER BY LastSoundex),0) AS "PreviousLastScore"
FROM B
),
D AS (
SELECT *,
(CASE WHEN (PreviousFirstScore >=3 AND PreviousLastScore >=3) THEN (PreviousFirstSoundex + PreviousLastSoundex)
WHEN (FirstScore >= 3 AND LastScore >=3) THEN (FirstSoundex + LastSoundex)
END) AS "GroupName"
FROM C
WHERE ((PreviousFirstScore >=3 AND PreviousLastScore >=3) OR (FirstScore >= 3 AND LastScore >=3))
)
SELECT *,
LAG(GroupName) OVER (ORDER BY GroupName) AS "PreviousGroup",
LEAD(GroupName) OVER (ORDER BY GroupName) AS "NextGroup"
FROM D
WHERE (D.GroupName = D.PreviousGroup OR D.GroupName = D.NextGroup)
This lets me group together bundles of potential duplicates and it works well for me. However, I now want to add in a way to check against multiple columns, and I don't know how to do that.
I was thinking about creating a union, something like:
SELECT ClientID,
LastName,
FirstName AS "TempName"
FROM Clients
UNION
SELECT ClientID,
LastName,
MiddleName AS "TempName"
FROM Clients
WHERE MiddleName IS NOT NULL
...etc
But then my LAG() and LEAD() wouldn't work because I'd have multiple rows with the same ClientID. I don't want to identify a single Client as a duplicate of itself.
Anyways, any suggestions? Thanks in advance.

Iterating over groups in table

I have the following data:
cte1
===========================
m_ids |p_id |level
---------|-----------|-----
{123} |98 |1
{123} |111 |2
{432,222}|215 |1
{432,222}|215 |1
{432,222}|240 |2
{432,222}|240 |2
{432,222}|437 |3
{432,222}|275 |3
I have to perform the following operation:
Extract p_id by the following algorithm
For every row with same m_ids
In each group:
2.I. Group records by p_id
2.II. Order desc records by level
2.III. Select p_id with exact count as the m_ids length and with the biggest level
So far I fail to write this algorithm completely, but I wrote (probably wrong where I'm getting array_length) this for the last part of it:
SELECT id
FROM grouped_cte1
GROUP BY id,
level
HAVING Count(*) = array_length(grouped_cte1.m_ids, 1)
ORDER BY level DESC
LIMIT 1
where grouped_cte1 for m_ids={123} is
m_ids |p_id |level
---------|-----------|-----
{123} |98 |1
{123} |111 |2
and for m_ids={432,222} is
m_ids |p_id |level
---------|-----------|-----
{432,222}|215 |1
{432,222}|215 |1
{432,222}|240 |2
{432,222}|240 |2
{432,222}|437 |3
{432,222}|275 |3
etc.
2) Combine query from p.1 with the following. The following extracts p_id with level=1 for each m_ids:
select m_ids, p_id from cte1 where level=1 --also selecting m_ids for joining later`
which results in the following:
m_ids |p_id
---------|----
{123} |98
{432,222}|215
Desirable result:
m_ids |result_1 |result_2
---------|-----------|--------
{123} |111 |98
{432,222}|240 |215
So could anyone please help me solve the first part of algorithm and (optionally) combine it in a single query with the second part?
EDIT: So far I fail at:
1. Breaking the presented table into subtables by m_ids while iterating over it.
2. Performing computation of array_length(grouped_cte1.m_ids, 1) for corresponding rows in query.
For the first part of the query you're on the right track, but you need to change the grouping logic and then join again to the table to filter it out by highest level per m_ids for which you could use DISTINCT ON clause combined with proper sorting:
select
distinct on (t.m_ids)
t.m_ids, t.p_id, t.level
from cte1 t
join (
select
m_ids,
p_id
from cte1
group by m_ids, p_id
having count(*) = array_length(m_ids, 1)
) as g using (m_ids, p_id)
order by t.m_ids, t.level DESC;
This would give you:
m_ids | p_id | level
-----------+------+-------
{123} | 111 | 2
{432,222} | 240 | 2
And then when combined with second query (using FULL JOIN for displaying purposes, when the first query is missing such conditions) which I modified by adding distinct since there can be (and in fact is) more than one record for m_ids, p_id pair with first level it would look like:
select
coalesce(r1.m_ids, r2.m_ids) as m_ids,
r1.p_id AS result_1,
r2.p_id AS result_2
from (
select
distinct on (t.m_ids)
t.m_ids, t.p_id, t.level
from cte1 t
join (
select
m_ids,
p_id
from cte1
group by m_ids, p_id
having count(*) = array_length(m_ids, 1)
) as g using (m_ids, p_id)
order by t.m_ids, t.level DESC
) r1
full join (
select distinct m_ids, p_id
from cte1
where level = 1
) r2 on r1.m_ids = r2.m_ids
giving you result:
m_ids | result_1 | result_2
-----------+----------+----------
{123} | 111 | 98
{432,222} | 240 | 215
that looks different from what you've expected but from my understanding of the logic it is the correct one. If I misunderstood anything, please let me know.
Just for the sake of logic explanation, one point:
Why m_ids with {123} returns 111 for result_1?
for group of m_ids = {123} we have two distinct p_id values
both 98 and 111 account for the condition of equality count with the m_ids length
p_id = 111 has a higher level, thus is chosen for the result_1

Aggregate multiple select statements without replicating data

How do I aggregate 2 select clauses without replicating data.
For instance, suppose I have tab_a that contains the data from 1 to 10:
|id|
|1 |
|2 |
|3 |
|. |
|. |
|10|
And then, I want to generate the combination of tab_b and tab_c making sure that result has 10 lines and add the column of tab_a to the result tuple
Script:
SELECT tab_b.id, tab_c.id, tab_a.id
from tab_b, tab_c, tab_a;
However this is replicating data from tab_a for each combination of tab_b and tab_c, I only want to add and would that for each combination of tab_b x tab_c I add a row of tab_a.
Example of data from tab_b
|id|
|1 |
|2 |
Example of data from tab_c
|id|
|1 |
|2 |
|3 |
|4 |
|5 |
I would like to get this output:
|tab_b.id|tab_c.id|tab_a.id|
|1 |1 |1 |
|2 |1 |2 |
|1 |2 |3 |
|... |... |... |
|2 |5 |10 |
Your question includes an unstated, invalid assumption: that the position of the values in the table (the row number) is meaningful in SQL. It's not. In SQL, rows have no order. All joins -- everything, in fact -- are based on values. To join tables, you have to supply the values the DBMS should use to determine which rows go together.
You got a hint of that with your attempted join: from tab_b, tab_c, tab_a. You didn't supply any basis for joining the rows, which in SQL means there's no restriction: all rows are "the same" for the purpose of this join. They all match, and voila, you get them all!
To do what you want, redesign your tables with at least one more column: the key that serves to identify the value. It could be a number; for example, your source data might be an array. More commonly each value has a name of some kind.
Once you have tables with keys, I think you'll find the join easier to write and understand.
Perhaps you're new to SQL, but this is generally not the way things are done with RDBMSs. Anyway, if this is what you need, PostgreSQL can deal with it nicely, using different strategies:
Window Functions:
with
tab_a (id) as (select generate_series(1,10)),
tab_b (id) as (select generate_series(1,2)),
tab_c (id) as (select generate_series(1,5))
select tab_b_id, tab_c_id, tab_a.id
from (select *, row_number() over () from tab_a) as tab_a
left join (
select tab_b.id as tab_b_id, tab_c.id as tab_c_id, row_number() over ()
from tab_b, tab_c
order by 2, 1
) tabs_b_c ON (tabs_b_c.row_number = tab_a.row_number)
order by tab_a.id;
Arrays:
with
tab_a (id) as (select generate_series(1,10)),
tab_b (id) as (select generate_series(1,2)),
tab_c (id) as (select generate_series(1,5))
select bc[s][1], bc[s][2], a[s]
from (
select array(
select id
from tab_a
order by 1
) a,
array(
select array[tab_b.id, tab_c.id]
from tab_b, tab_c
order by tab_c.id, tab_b.id
) bc
) arr
join lateral generate_subscripts(arr.a, 1) s on true
If i understand your question correctly maybe this is what you are looking for ..
SELECT bctable.b_id, bctable.c_id, atable.a_id
FROM (SELECT a_id, ROW_NUMBER () OVER () AS arnum FROM a) atable
JOIN (SELECT p.b_id, p.c_id, ROW_NUMBER () OVER () AS bcrnum
FROM ( SELECT b.b_id, c.c_id
FROM b CROSS JOIN c
ORDER BY c.c_id, b.b_id) p) bctable
ON atable.arnum = bctable.bcrnum
Please check the SQLFiddle .