Get missing nodes in hierarchy column - sql

I have a table with hierarchy column which consists of numbers separated by colons, as well as number of current node and its parent:
id = '3:234:657:978'
currNode = 978
parent = 657
I also have a query which returns id's and some other columns from other tables, but some of the links are missing, for example 2 rows are returned, one with id of 3:234 and another one of id 3:234:567:890. I need the row with 3:234:567 id to form a hierarchy, but it`s not returned.
How can I join the table so I get the missing nodes (with fields other than id being NULL), but only the missing ones (excluding the ones which are not needed to form the hierarchy, e.g. are below the tree of the returned results)?
EDIT:
Sample data:
CREATE TABLE ids (
id VARCHAR(100)
, currNode INT PRIMARY KEY
, parent INT
, name VARCHAR(50)
);
CREATE TABLE someotherdata (
data VARCHAR(10)
, currnode INT
);
INSERT ALL
INTO ids(id, currnode, parent, name)
VALUES('3', 3, NULL, 'Node1')
INTO ids(id, currnode, parent, name)
VALUES('3:4', 4, 3, 'Node2')
INTO ids(id, currnode, parent, name)
VALUES('3:4:5', 5, 4, 'Node3')
INTO ids(id, currnode, parent, name)
VALUES('3:4:5:6', 6, 5, 'Node4')
INTO ids(id, currnode, parent, name)
VALUES('3:4:5:6:7', 7, 6, 'Node5')
SELECT * FROM dual; COMMIT;
INSERT ALL
INTO someotherdata (name, id)
VALUES('data1', '3:4')
INTO someotherdata (name, id)
VALUES('data2', '3:4:5:6')
SELECT * FROM dual; COMMIT;
Desired result (id is given as parameter to the query, here it equals to '3'):
id name data
3 Node1 NULL
3:4 Node2 data1
3:4:5 Node3 NULL
3:4:5:6 Node4 data2
(3:4:5:6:7 is excluded from the result since it is not needed to form hierarchy with records that return data)

This is not so nice. but seems to work:
SELECT it.id, it.name, ost.data
FROM
(SELECT DISTINCT t.id, t.name
FROM ids t JOIN someotherdata st
ON instr(':'||st.currnode||':', ':'||t.currnode||':') >0) it LEFT JOIN someotherdata ost
ON it.id = ost.currnode
Edit ok, this is nicer:
select distinct t.id, t.name, st.data
from ids t left outer join someotherdata st on t.id = st.currnode
start with t.id in (select ist.currnode from someotherdata ist)
connect by prior t.parent = t.currnode
order by t.id
Here is a sqlfiddle demo

Related

Adding a LEFT JOIN on a INSERT INTO....RETURNING

My query Inserts a value and returns the new row inserted
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING comment_id, date_posted, e_id, created_by, parent_id, body, num_likes, thread_id
I want to join the created_by with the user_id from my user's table.
SELECT * from users WHERE user_id = created_by
Is it possible to join that new returning row with another table row?
Consider using a WITH structure to pass the data from the insert to a query that can then be joined.
Example:
-- Setup some initial tables
create table colors (
id SERIAL primary key,
color VARCHAR UNIQUE
);
create table animals (
id SERIAL primary key,
a_id INTEGER references colors(id),
animal VARCHAR UNIQUE
);
-- provide some initial data in colors
insert into colors (color) values ('red'), ('green'), ('blue');
-- Store returned data in inserted_animal for use in next query
with inserted_animal as (
-- Insert a new record into animals
insert into animals (a_id, animal) values (3, 'fish') returning *
) select * from inserted_animal
left join colors on inserted_animal.a_id = colors.id;
-- Output
-- id | a_id | animal | id | color
-- 1 | 3 | fish | 3 | blue
Explanation:
A WITH query allows a record returned from an initial query, including data returned from a RETURNING clause, which is stored in a temporary table that can be accessed in the expression that follows it to continue work on it, including using a JOIN expression.
You were right, I misunderstood
This should do it:
DECLARE mycreated_by event_comments.created_by%TYPE;
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING created_by into mycreated_by
SELECT * from users WHERE user_id = mycreated_by

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

SQL Tree / Hierarchial Data

This is my first post, I am trying to make a sql tree table that traverses. For example, If a person clicks on a drop down list called Categories, it will display Electric, and InterC. Then, if the user clicks on electric, it will drop down relays and switches, next if the person clicks on relays it will drop down X relays and if the person clicks on switches it will drop down Y switches. I have attempted below , but the part i don't understand is if i have another category InterC, how do I make that another level of drop downs ?
Table Category
insert test select 1, 0,'Electric'
insert test select 2, 1,'Relays'
insert test select 3, 1,'Switches'
insert test select 5, 2,'X Relays'
insert test select 6, 2,'Y Switches'
insert test select 7, 0,'InterC'
insert test select 8, 1,'x Sockets'
insert test select 9, 1,'y Sockets'
insert test select 10, 2,'X Relays'
insert test select 11, 2,'Y Relays'
;
create table test(id int,parentId int,name varchar(50))
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test WHERE parentid = 0
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Your hierarchical T-SQL query should return all the records in the table, both those under Electric and InterC.
However, you should make parentId nullable and have the root records have a null rather than 0. That will let you add a foreign key that protects your data integrity (it won't be possible to add orphaned records by mistake).
You hierarchy query returns all of your records, I'm guessing that you want to return just one at a time - for that add a where condition to the starting query.
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test
WHERE name = #category AND
parentId is null
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Then set #category to 'Electric' or'InterC' to get one or the other hierarchy.

To find infinite recursive loop in CTE

I'm not a SQL expert, but if anybody can help me.
I use a recursive CTE to get the values as below.
Child1 --> Parent 1
Parent1 --> Parent 2
Parent2 --> NULL
If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop and gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.
Child1 --> Parent 1
Parent1 --> Child1
or
Child1 --> Parent 1
Parent1 --> Parent2
Parent2 --> Child1
With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.
Setup:
create table hierarchy (id integer, parent_id integer);
insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3),
(5, 4),
(3, 5); -- endless loop
Recursive query:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;
To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents,
id as root_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id,
p.root_id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
and c.root_id = p.root_id
)
select *
from tree;
Update for Postgres 14
Postgres 14 introduced the (standard compliant) CYCLE option to detect cycles:
with recursive tree as (
select id,
parent_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id
from hierarchy c
join tree p
on c.parent_id = p.id
)
cycle id -- track cycles for this column
set is_cycle -- adds a boolean column is_cycle
using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
You haven't specified the dialect or your column names, so it is difficult to make the perfect example...
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 2, 'SubChild')
-- End random data
;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
UNION ALL
SELECT R.StartingID, R.Level + 1,
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.*
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description
FROM RecursiveCTE
ORDER BY StartingID, Level
Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.
In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).
The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.
A little more complex, that generates a "better" report:
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 3, 'SubChild')
-- End random data
-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
SELECT MT1.* FROM #MyTable MT1
WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)
, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
SELECT ID, -- StartingID
1, -- Level
'|' + CAST(ID AS VARCHAR(MAX)) + '|',
'|' + CAST(Description AS VARCHAR(MAX)) + '|',
0, -- Loop
ParentID
FROM WithoutChildren
UNION ALL
SELECT R.StartingID, -- StartingID
R.Level + 1, -- Level
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.ParentID
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT * FROM RecursiveCTE
WHERE ParentID IS NULL OR Loop = 1
This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.
Here's an alternate method for detecting cycles in adjacency lists (parent/child relationships) where nodes can only have one parent which can be enforced with a unique constraint on the child column (id in the table below). This works by computing the closure table for the adjacency list via a recursive query. It starts by adding every node to the closure table as its own ancestor at level 0 then iteratively walks the adjacency list to expand the closure table. Cycles are detected when a new record's child and ancestor are the same at any level other than the original level zero (0):
-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (
with cte(ancestor, child, lev, cycle) as (
select id, id, 0, 0 from Table1
union all
select cte.ancestor
, Table1.id
, case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
, case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
from Table1
join cte
on cte.child = Table1.PARENT_ID
where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
ancestor
, child
, lev
, max(cycle) over (partition by ancestor) cycle
from cte
Given the following adjacency list for Table1:
| parent_id | id |
|-----------|----|
| (null) | 1 |
| (null) | 2 |
| 1 | 3 |
| 3 | 4 |
| 1 | 5 |
| 2 | 6 |
| 6 | 7 |
| 7 | 8 |
| 9 | 10 |
| 10 | 11 |
| 11 | 9 |
The above query which works on SQL Sever (and Oracle, PostgreSQL and MySQL 8 when modified as directed) rightly detects that nodes 9, 10, and 11 participate in a cycle of length 3.
SQL(/DB) Fiddles demonstrating this in various DBs can be found below:
Oracle 11gR2
SQL Server 2017
PostgeSQL 9.5
MySQL 8
You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.
For the initial selection, the distance between Child and Grandchild columns is 1. Every selection from union all increases the depth of Child by 1, and that of Grandchild by 2. The distance between them increases by 1.
If you have any loop, since the distance only increases by 1 each time, at some point after Child is in the loop, the distance will be a multiple of the cycle length. When that happens, the Child and the Grandchild columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.
SQL Server sample:
declare #LinkTable table (Parent int, Child int);
insert into #LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);
with cte as (
select lt1.Parent, lt1.Child, lt2.Child as Grandchild
from #LinkTable lt1
inner join #LinkTable lt2 on lt2.Parent = lt1.Child
union all
select cte.Parent, lt1.Child, lt3.Child as Grandchild
from cte
inner join #LinkTable lt1 on lt1.Parent = cte.Child
inner join #LinkTable lt2 on lt2.Parent = cte.Grandchild
inner join #LinkTable lt3 on lt3.Parent = lt2.Child
where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;
Remove one of the LinkTable records that causes the cycle, and you will find that the select no longer returns any data.
Try to limit the recursive result
WITH EMP_CTE AS
(
SELECT
0 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees
WHERE ManagerId IS NULL
UNION ALL
SELECT
[LEVEL] + 1 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees e
INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId
AND s.LEVEL < 100 --RECURSION LIMIT
)
SELECT * FROM EMP_CTE WHERE [Level] = 100
Here is the solution for SQL Server:
Table Insert script:
CREATE TABLE MyTable
(
[ID] INT,
[ParentID] INT,
[Name] NVARCHAR(255)
);
INSERT INTO MyTable
(
[ID],
[ParentID],
[Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');
Script to find the exact records which are the culprit:
;WITH RecursiveCTE
AS (
-- Get all parents:
-- Any record in MyTable table could be an Parent
-- We don't know here yet which record can involve in an infinite recursion.
SELECT ParentID AS StartID,
ID,
CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM MyTable
UNION ALL
-- Recursively try finding all the childrens of above parents
-- Keep on finding it until this child become parent of above parent.
-- This will bring us back in the circle to parent record which is being
-- keep in the StartID column in recursion
SELECT RecursiveCTE.StartID,
t.ID,
CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM RecursiveCTE
INNER JOIN MyTable AS t
ON t.ParentID = RecursiveCTE.ID
WHERE RecursiveCTE.StartID != RecursiveCTE.ID)
-- FInd the ones which causes the infinite recursion
SELECT StartID,
[ParentChildRelationPath],
RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);
Output of above query:

postgres - with recursive

I expected the following to return all the tuples, resolving each parent in the hierarchy up to the top, but it only returns the lowest levels (whose ID is specified in the query). How do I return the whole tree for a given level_id?
create table level(
level_id int,
level_name text,
parent_level int);
insert into level values (197,'child',177), ( 177, 'parent', 3 ), ( 2, 'grandparent', null );
WITH RECURSIVE recursetree(level_id, levelparent) AS (
SELECT level_id, parent_level
FROM level
where level_id = 197
UNION ALL
SELECT t.level_id, t.parent_level
FROM level t, recursetree rt
WHERE rt.level_id = t.parent_level
)
SELECT * FROM recursetree;
First of all, your (2, 'grandparent', null) should be (3, 'grandparent', null) if it really is a grandparent. Secondly, your (implicit) join condition in the recursive half of your query is backwards, you want to get the parent out of rt.levelparent rather than t.parent_level:
WITH RECURSIVE recursetree(level_id, levelparent) AS (
SELECT level_id, parent_level
FROM level
WHERE level_id = 197
UNION ALL
SELECT t.level_id, t.parent_level
FROM level t JOIN recursetree rt ON rt.levelparent = t.level_id
-- join condition fixed and ANSI-ified above
)
SELECT * FROM recursetree;