Creating cte statement to traverse a tree created with a bridge table - sql

I will preface this by saying that I am completely new to cte statements.
I have 2 tables an items table and a linker table. the linker table containing the parent id and child id of the relationship.
I have seen lots of examples on how to traverse a tree with what I believe is called an associative system, where the parent id is stored on the child record instead of in a bridge table. But I can not seem to extrapolate this out to my scenario.
I believe I need to have a bridge table like this because any single item could have multiple parents and each parent will most likely have multiple children.
I hacked together my own way of traversing the tree but it was within another language, xojo, and I really did not like how it turned out. Essentially I was using recursive functions to dig down into each tree and queried the database each time I needed a child.
Now I am trying to create a cte statement that does the same thing. Keeping the descendants ordered below the parents is not a huge deal
So I have created a sample database and other materials to describe my issue:
This diagram shows visually what the relationships are:
diagram
This is what I would like returned from the database (some items show up in multiple places:
1 : Audio
2 : Speaker
3 : Microphone
4 : Mic Pack
3 : Microphone
5 : Di
6 : Passive Di
11 : Rapco Di
13 : Dbx Di
7 : Lighting
9 : Safety
12 : Small Safety
8 : Rigging
10 : Light Rigging
9 : Safety
12 : Small Safety
An example table:
CREATE TABLE items ( id INTEGER, name TEXT, PRIMARY KEY(id) );
CREATE TABLE `linker` ( `parent` INTEGER, `child` INTEGER, PRIMARY KEY(`parent`,`child`) );
Insert Into items(id, name) Values(1, 'Audio');
Insert Into items(id, name) Values(2, 'Speaker');
Insert Into items(id, name) Values(3, 'Microphone');
Insert Into items(id, name) Values(4, 'Mic Pack');
Insert Into items(id, name) Values(5, 'Di');
Insert Into items(id, name) Values(6, 'Passive Di');
Insert Into items(id, name) Values(7, 'Lighting');
Insert Into items(id, name) Values(8, 'Rigging');
Insert Into items(id, name) Values(9, 'Safety');
Insert Into items(id, name) Values(10, 'Lighting Rigging');
Insert Into items(id, name) Values(11, 'Rapco Di');
Insert Into items(id, name) Values(12, 'Small Safety');
Insert Into items(id, name) Values(13, 'Dbx Di');
Insert Into linker(parent, child) Values(1, 2);
Insert Into linker(parent, child) Values(1, 4);
Insert Into linker(parent, child) Values(1, 3);
Insert Into linker(parent, child) Values(4, 3);
Insert Into linker(parent, child) Values(4, 5);
Insert Into linker(parent, child) Values(5, 6);
Insert Into linker(parent, child) Values(6, 11);
Insert Into linker(parent, child) Values(6, 13);
Insert Into linker(parent, child) Values(7, 9);
Insert Into linker(parent, child) Values(9, 12);
Insert Into linker(parent, child) Values(8, 10);
Insert Into linker(parent, child) Values(10, 9);
This is the cte that I came up with that I believe came the closest, but its probably still pretty far off:
with cte As
(
Select
id,
name,
0 as level,
Cast(name as varchar(255) as sort
From items i
Left outer Join
linker li
On i.id = li.child
And li.parent is Null
Union All
Select
id,
name,
cte.level + 1,
Cast(cte.sort + '.' + i.name As Varchar(255)) as sort
From cte
Left Outer Join linker li
on li.child = cte.id
Inner Join items i
On li.parent = i.id
)
Select
id,
name,
level,
sort
From cte
Order By Sort;
Thanks for any help in advance. I am very open to the idea that everything I am doing from the data structure up is wrong, so keep that in mind when you are answering.
Edit: It is probably worth noting that the results don't need to be in order. I plan on creating a ancestry path field in the cte statement, and using that patb to populate my tree.
Edit: oops, I copied and pasted the wrong bit of cte code. I am on mobile so I did my best to change it to what I was doing on my desktop. Once I have a chance I will double check the cte statement against my notes.

Alright, seems like my brain just needed some rest. I have figured out a solution to my issue with much less frustration than yesterday.
My cte statement was:
with cte As(
Select
id,
name,
li.parent,
li.child,
Cast(name as Varchar(255)) as ancestory
From items i
Left Outer Join linker li
On i.id = li.child
Where li.parent is null
Union All
Select
i.id,
i.name,
li.parent,
li.child,
Cast(cte.ancestory || "." || i.name as Varchar(100)) as ancestory
From cte
Left join linker li
On cte.id = li.parent
Inner Join items i
On li.child = i.id
)
select * from cte
The results come back as:
id name parent child ancestory
1 Audio Audio
7 Lighting Lighting
8 Rigging Rigging
2 Speaker 1 2 Audio.Speaker
3 Microphone 1 3 Audio.Microphone
4 Mic Pack 1 4 Audio.Mic Pack
9 Safety 7 9 Lighting.Safety
10 Lighting 8 10 Rigging.Lighting Rigging
Rigging
3 Microphone 4 3 Audio.Mic Pack.Microphone
5 Di 4 5 Audio.Mic Pack.Di
12 Small 9 12 Lighting.Safety.Small Safety
Safety
9 Safety 10 9 Rigging.Lighting Rigging.Safety
6 Passive Di 5 6 Audio.Mic Pack.Di.Passive Di
12 Small Safety 9 12 Rigging.Lighting Rigging.Safety.Small Safety
11 Rapco Di 6 11 Audio.Mic Pack.Di.Passive Di.Rapco Di
13 Dbx Di 6 13 Audio.Mic Pack.Di.Passive Di.Dbx Di

Related

oracle correlated subquery using distinct listagg

I have an interesting query I'm trying to figure out. I have a view which is getting a column added to it. This column is pivoted data coming from other tables, to form into a single row. Now, I need to wipe out duplicate entries in this pivoted data. Listagg is great for getting the data to a single row, but I need to make it unique. While I know how to make it unique, I'm tripping up on the fact that correlated sub-queries only go 1 level deep. So... not really sure how to get a distinct list of values. I can get it to work if I don't do the distinct just fine. Anyone out there able to work some SQL magic?
Sample data:
drop table test;
drop table test_widget;
create table test (id number, description Varchar2(20));
create table test_widget (widget_id number, test_fk number, widget_type varchar2(20));
insert into test values(1, 'cog');
insert into test values(2, 'wheel');
insert into test values(3, 'spring');
insert into test_widget values(1, 1, 'A');
insert into test_widget values(2, 1, 'A');
insert into test_widget values(3, 1, 'B');
insert into test_widget values(4, 1, 'A');
insert into test_widget values(5, 2, 'C');
insert into test_widget values(6, 2, 'C');
insert into test_widget values(7, 2, 'B');
insert into test_widget values(8, 3, 'A');
insert into test_widget values(9, 3, 'C');
insert into test_widget values(10, 3, 'B');
insert into test_widget values(11, 3, 'B');
insert into test_widget values(12, 3, 'A');
commit;
Here is an example of the query that works, but shows duplicate data:
SELECT A.ID
, A.DESCRIPTION
, (SELECT LISTAGG (WIDGET_TYPE, ', ') WITHIN GROUP (ORDER BY WIDGET_TYPE)
FROM TEST_WIDGET
WHERE TEST_FK = A.ID) widget_types
FROM TEST A
Here is an example of what does NOT work due to the depth of where I try to reference the ID:
SELECT A.ID
, A.DESCRIPTION
, (SELECT LISTAGG (WIDGET_TYPE, ', ') WITHIN GROUP (ORDER BY WIDGET_TYPE)
FROM (SELECT DISTINCT WIDGET_TYPE
FROM TEST_WIDGET
WHERE TEST_FK = A.ID))
WIDGET_TYPES
FROM TEST A
Here is what I want displayed:
1 cog A, B
2 wheel B, C
3 spring A, B, C
If anyone knows off the top of their head, that would fantastic! Otherwise, I can post up some sample create statements to help you with dummy data to figure out the query.
You can apply the distinct in a subquery, which also has the join - avoiding the level issue:
SELECT ID
, DESCRIPTION
, LISTAGG (WIDGET_TYPE, ', ')
WITHIN GROUP (ORDER BY WIDGET_TYPE) AS widget_types
FROM (
SELECT DISTINCT A.ID, A.DESCRIPTION, B.WIDGET_TYPE
FROM TEST A
JOIN TEST_WIDGET B
ON B.TEST_FK = A.ID
)
GROUP BY ID, DESCRIPTION
ORDER BY ID;
ID DESCRIPTION WIDGET_TYPES
---------- -------------------- --------------------
1 cog A, B
2 wheel B, C
3 spring A, B, C
I was in a unique situation using the Pentaho reports writer and some inconsistent data. The Pentaho writer uses Oracle to query data, but has limitations. The data pieces were unique but not classified in a consistent manner, so I created a nested listagg inside of a left join to present the data the way I wanted to:
left join
(
select staff_id, listagg(thisThing, ' --- '||chr(10) ) within group (order by this) as SCHED_1 from
(
SELECT
staff_id, RPT_STAFF_SHIFTS.ORGANIZATION||': '||listagg(
RPT_STAFF_SHIFTS.DAYS_OF_WEEK
, ',' ) within group (order by BEGIN_DATE desc)
as thisThing
FROM "RPT_STAFF_SHIFTS" where "RPT_STAFF_SHIFTS"."END_DATE" is null
group by staff_id, organization)
group by staff_id
) schedule_1 on schedule_1.staff_id = "RPT_STAFF"."STAFF_ID"
where "RPT_STAFF"."STAFF_ID" ='555555'
This is a different approach than using the nested query, but it some situations it might work better by taking into account the level issue when developing the query and taking an extra step to fully concatenate the results.

Flatten the tree path in SQL server Hierarchy ID

I am using SQL Hierarchy data type to model a taxonomy structure in my application.
The taxonomy can have the same name in different levels
During the setup this data needs to be uploaded via an excel sheet.
Before inserting any node I would like to check if the node at a particular path already exists so that I don't duplicate the entries.
What is the easiest way to check if the node # particular absolute path already exists or not?
for e.g Before inserting say "Retail" under "Bank 2" I should be able to check "/Bank 2/Retail" is not existing
Is there any way to provide a flattened representation of the entire tree structure so that I can check for the absolute path and then proceed?
Yes, you can do it using a recursive CTE.
In each iteration of the query you can append a new level of the hierarchy name.
There are lots of examples of this technique on the internet.
For example, with this sample data:
CREATE TABLE Test
(id INT,
parent_id INT null,
NAME VARCHAR(50)
)
INSERT INTO Test VALUES(1, NULL, 'L1')
INSERT INTO Test VALUES(2, 1, 'L1-A')
INSERT INTO Test VALUES(3, 2, 'L1-A-1')
INSERT INTO Test VALUES(4, 2, 'L1-A-2')
INSERT INTO Test VALUES(5, 1, 'L1-B')
INSERT INTO Test VALUES(6, 5, 'L1-B-1')
INSERT INTO Test VALUES(7, 5, 'L1-B-2')
you can write a recursive CTE like this:
WITH H AS
(
-- Anchor: the first level of the hierarchy
SELECT id, parent_id, name, CAST(name AS NVARCHAR(300)) AS path
FROM Test
WHERE parent_id IS NULL
UNION ALL
-- Recursive: join the original table to the anchor, and combine data from both
SELECT T.id, T.parent_id, T.name, CAST(H.path + '\' + T.name AS NVARCHAR(300))
FROM Test T INNER JOIN H ON T.parent_id = H.id
)
-- You can query H as if it was a normal table or View
SELECT * FROM H
WHERE PATH = 'L1\L1-A' -- for example to see if this exists
The result of the query (without the where filter) looks like this:
1 NULL L1 L1
2 1 L1-A L1\L1-A
5 1 L1-B L1\L1-B
6 5 L1-B-1 L1\L1-B\L1-B-1
7 5 L1-B-2 L1\L1-B\L1-B-2
3 2 L1-A-1 L1\L1-A\L1-A-1
4 2 L1-A-2 L1\L1-A\L1-A-2

Is the outer WHERE clause optimized in a Recursive CTE?

With the following table definition:
CREATE TABLE Nodes(id INTEGER, child INTEGER);
INSERT INTO Nodes(id, child) VALUES(1, 10);
INSERT INTO Nodes(id, child) VALUES(1, 11);
INSERT INTO Nodes(id, child) VALUES(1, 12);
INSERT INTO Nodes(id, child) VALUES(10, 100);
INSERT INTO Nodes(id, child) VALUES(10, 101);
INSERT INTO Nodes(id, child) VALUES(10, 102);
INSERT INTO Nodes(id, child) VALUES(2, 20);
INSERT INTO Nodes(id, child) VALUES(2, 21);
INSERT INTO Nodes(id, child) VALUES(2, 22);
INSERT INTO Nodes(id, child) VALUES(20, 200);
INSERT INTO Nodes(id, child) VALUES(20, 201);
INSERT INTO Nodes(id, child) VALUES(20, 202);
With the following query:
WITH RECURSIVE members(base, id, level) AS (
SELECT n1.id, n1.id, 0
FROM Nodes n1
LEFT OUTER JOIN Nodes n2 ON n2.child = n1.id
WHERE n2.id IS NULL
UNION
SELECT m.base, n.child, m.level + 1
FROM members m
INNER JOIN Nodes n ON m.id=n.id
)
SELECT m.id, m.level
FROM members m
WHERE m.base IN (1)
Is the outer WHERE clause optimized in a Recursive CTE? An alternate that I have considered using is:
WITH RECURSIVE members(id, level) AS (
VALUES (1, 0)
UNION
SELECT n.child, m.level + 1
FROM members m
INNER JOIN Nodes n ON m.id=n.id
)
SELECT m.id, m.level
FROM members m
but it has the problem of not being able to create a view out of it. Therefore, if the performance difference between the two is minimal, I'd prefer to create a view out of the recursive CTE and then just query that.
To be able to apply the WHERE clause to the queries inside the CTE, the database would be required to prove that
all values in the first column are unchanged by the recursion and go back to the base query, and, in general, that
it is not possible for any filtered-out row to have any children that could show up in the result of the query, or affect the CTE in any other way.
Such a prover does not exist.
See restriction 22 of Subquery flattening.
To see why your first query is non-optimal, try running both with UNION ALL instead of just UNION. With the sample data given, the first will return 21 rows while the second returns only 7.
The duplicate rows in the actual first query are subsequently eliminated by performing a sort and duplicate elimination, while this step is not necessary in the actual second query.

Naming Category and SubCategory Tables

I'm trying to create a bunch of lookup tables in a database but am stuck when it comes to naming them. The tables are like this:
1. dbo.AccountType (this is the highest level category)
2. dbo.AccountSubType (this is a 2nd level category)
3. dbo.AccountSubSubType (this is a 3rd level category)
The above naming convention breaks easily. So perhaps this is better:
1. dbo.AccountType1 (highest level)
2. dbo.AccountType2 (second level)
3. dbo.AccountType3 (third level)
4. dbo.AccountType-N (and so on...)
I know naming conventions are opinion based, but surely there has to be some logical way to do this that is scalable and not confusing to developers.
Example of how the data looks in the dbo.AccountType2 table using the second solution:
AccountTypeID (FK) | AccountType1ID (FK) | AccountType2ID (PK) | AccountType2
=============================================================================
1 4 1 Credit Card
1 5 2 Savings
Is there any better way to store hierarchical data in a database and name the tables correctly?
This would probably be better represented as a single table with a hierarchical relationship:
E.g.
CREATE TABLE [dbo].[AccountType] (
Id int NOT NULL
,ParentId int NULL
CONSTRAINT [FK_AccountType_AccountType_Parent] REFERENCES [dbo].[AccountType] (Id)
,Name nvarchar(200) NOT NULL
CONSTRAINT [PK_AccountType] PRIMARY KEY CLUSTERED ([Id])
)
Then populate it with data as follows:
INSERT INTO dbo.AccountType (Id, ParentId, Name) VALUES (1, NULL, 'Credit Card')
INSERT INTO dbo.AccountType (Id, ParentId, Name) VALUES (2, 1, 'Credit Card Sub-Type')
INSERT INTO dbo.AccountType (Id, ParentId, Name) VALUES (3, 2, 'Credit Card Sub-Sub-Type')
INSERT INTO dbo.AccountType (Id, ParentId, Name) VALUES (4, NULL, 'Savings')
INSERT INTO dbo.AccountType (Id, ParentId, Name) VALUES (5, 4, 'Savingsd Sub-Type')
INSERT INTO dbo.AccountType (Id, ParentId, Name) VALUES (6, 5, 'Savings Sub-Sub-Type')
Anything with a ParentId of NULL is a root value, otherwise it is a child of the specified parent...
Edit: To query you'd use a CTE. E.g.
WITH ParentAccountType ( Id, ParentId, Name, ParentName )
AS
(
SELECT Id, ParentId, Name, CAST('N/A' AS nvarchar(200)) AS ParentName
FROM AccountType
WHERE ParentId IS NULL
UNION ALL
SELECT c.Id, c.ParentId, c.Name, p.Name AS ParentName
FROM
AccountType c
INNER JOIN ParentAccountType p ON c.ParentId = p.Id
)
SELECT ParentName, Name
FROM ParentAccountType
GO
SQL Fiddler here

Selecting leaf id + root name from a table in oracle

I have a table that is self referencing, with id, parentid (referencing id), name, ordering as columns.
What I want to do is to select the first leaf node of each root and have a pairing of the id of the leaf node with the name of the root node.
The data can have unbounded levels, and siblings have an order (assigned by the "ordering" column). "First leaf node" means the first child's first child's first child's (etc..) child.
The data looks something like this, siblings ordered by ordering:
A
--a
--b
----b.1
----b.2
----b.3
B
--c
----c.1
----c.2
--d
C
--e
----e.1
------e.1.1
I want to be able to produce a mapping as follows:
name of A, id of a
name of B, id of c.1
name of C, id of e.1.1
This is the sql I'm using to achieve this, but I'm not too sure if it will recurse correctly for unbounded levels:
select id,
connect_by_root name name
from table
where connect_by_isleaf = 1
and ((level = 2 and ordering = 1)
or (level > 2 and ordering = 1 and prior ordering = 1))
start with parentid is null
connect by prior id = parentid;
Is there any way I can make rewrite the sql to make it unbounded?
I would use a subquery:
SQL> SELECT root_name, MIN(leaf_name) first_leaf
2 FROM (SELECT id, connect_by_root(r.NAME) root_name, r.NAME leaf_name
3 FROM recurse r
4 WHERE connect_by_isleaf = 1
5 START WITH parentid IS NULL
6 CONNECT BY PRIOR id = parentid)
7 GROUP BY root_name;
ROOT_NAME FIRST_LEAF
---------- ----------
A a
B c.1
C e.1.1
This will give you the first leaf (ordered by the leaf name) for each root.
Update
This is the script I used to generate your data:
CREATE TABLE recurse (
ID NUMBER PRIMARY KEY,
name VARCHAR2(10),
parentid NUMBER REFERENCES recurse (ID));
INSERT INTO recurse VALUES (1, 'A', '');
INSERT INTO recurse VALUES (3, 'b', 1);
INSERT INTO recurse VALUES (4, 'b.1', 3);
INSERT INTO recurse VALUES (5, 'b.2', 3);
INSERT INTO recurse VALUES (6, 'b.3', 3);
INSERT INTO recurse VALUES (7, 'B', '');
INSERT INTO recurse VALUES (8, 'c', 7);
INSERT INTO recurse VALUES (9, 'c.1', 8);
INSERT INTO recurse VALUES (10, 'c.2', 8);
INSERT INTO recurse VALUES (11, 'd', 7);
INSERT INTO recurse VALUES (12, 'C', '');
INSERT INTO recurse VALUES (13, 'e', 12);
INSERT INTO recurse VALUES (14, 'e.2', 13);
INSERT INTO recurse VALUES (15, 'e.1', 13);
INSERT INTO recurse VALUES (16, 'a', 1);
INSERT INTO recurse VALUES (20, 'e.1.1', 15);
As you can see I anticipated that your ordering would not be by name (this is really unclear from your question though).
Now suppose you want to order by ID (or really any other column it doesn't matter), you want to use analytics, for example:
SQL> SELECT DISTINCT root_name,
2 first_value(leaf_name)
3 over(PARTITION BY root_name ORDER BY ID) AS first_leaf_name
4 FROM (SELECT id, connect_by_root(r.NAME) root_name, r.NAME leaf_name
5 FROM recurse r
6 WHERE connect_by_isleaf = 1
7 START WITH parentid IS NULL
8 CONNECT BY PRIOR id = parentid)
9 ORDER BY root_name;
ROOT_NAME FIRST_LEAF_NAME
---------- ---------------
A b.1
B c.1
C e.2