Reference CTE inside CTE? - sql

I found this stored proc in our codebase:
ALTER PROCEDURE [dbo].[MoveNodes]
(
#id bigint,
#left bigint,
#right bigint,
#parentid bigint,
#offset bigint,
#caseid bigint,
#userid bigint
)
AS
BEGIN
WITH q AS
(
SELECT id, parent, lft, rgt, title, type, caseid, userid, 0 AS level,
CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [dbo].DM_FolderTree hc
WHERE id = #id and caseid = #caseid
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.title, hc.type, hc.caseid, hc.userid, level + 1,
CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [dbo].DM_FolderTree hc
ON hc.parent = q.id
)
UPDATE [dbo].DM_FolderTree
SET lft = ((-lft) + #offset), rgt = ((-rgt) + #offset), userid = #userid
WHERE id in (select id from q) AND lft <= (-(#left)) AND rgt >= (-(#right)) AND caseid = #caseid;
UPDATE [dbo].DM_FolderTree SET parent = #parentid, userid = #userid WHERE id = #id AND caseid = #caseid;
END
where you'll notice that the CTE q is being used called on the UNION. What exactly are we calling here? Everything before the UNION, the whole CTE? What exactly is happening here.
I'm assuming that this code is legal, since its been in production for quite some time (FLW, I know). But still, I have no idea what's happening here.

This is a recursive query. It calls the CTE again and again until all ID's and CaseIDs have walked the tree.
Think about nesting of folders in a directory. This query simply walks all the directors to get final the "file path" for all files in all folders.
Notice how Level starts at 0 and then gets added to. The second time though level is now 1 and becomes 2 and then 3 and so on.
To better understand:
Grab the select cte portion (with q as...) and replace the update with Select * from q and run it. Just so you can see what it does. Bit rough learning to start with but walking though an example by doing the above will help.
Specific answers to questions:
What exactly are we calling here?
Your building a baseline which denotes the all roots you wish to start with and then traversing all the levels under that root/folder. So in essense you're crawling the entire structure for hc.parent = q.id
Everything before the UNION, the whole CTE?
The whole cte. Recursion powerfully cool stuff!

Related

Recursive CTE for URL path generation

We have a parent child relationship for pages in our content management system. The table is as follows
ContentID
Parent
Title
1
0
This Page
2
1
That Page
3
0
Another Page
4
3
Child of Another
5
4
Child of child
A parent of 0 indicates the ultimate parent.
I want to output just the parents of a given contentid in a path. As an example, the output for ContentID = 5 would be:
/another-page/child-of-another/
I've tried many recursive CTE examples that generate breadcrumbs and such like, but they always output the whole path:
/another-page/child-of-another/child-of-child/
I have a function to replace the spaces in the URL, so I'm just interested in the SQL to achieve the above.
One method, using an rCTE and STRING_AGG:
CREATE TABLE dbo.YourTable (ContentID int IDENTITY(1,1),
Parent int,
Title varchar(50));
INSERT INTO dbo.YourTable (Parent, Title)
VALUES(0,'This Page'),
(1,'That Page'),
(0,'Another Page'),
(3,'Child of Another'),
(4,'Child of child');
GO
DECLARE #ID int = 5;
WITH rCTE AS(
SELECT ContentID,
Parent,
Title,
1 AS Level
FROM dbo.YourTable
WHERE ContentID = #ID
UNION ALL
SELECT YT.ContentID,
YT.Parent,
YT.Title,
r.level+1 AS Level
FROM dbo.YourTable YT
JOIN rCTE r ON YT.ContentID = r.Parent)
SELECT '/' + STRING_AGG(REPLACE(Title,' ','-'),'/') WITHIN GROUP (ORDER BY Level DESC) + '/' AS Path
FROM rCTE
WHERE ContentID != #ID;
GO
DROP TABLE dbo.YourTable;
An alternative method, without using STRING_AGG, and instead using an additional JOIN in the rCTE:
DECLARE #ID int = 5;
WITH rCTE AS(
SELECT YT.ContentID,
YT.Parent,
CONVERT(varchar(8000),'/' + YT.Title) AS Title,
1 AS Level
FROM dbo.YourTable YT
JOIN dbo.YourTable P ON YT.ContentID = P.Parent
WHERE P.ContentID = #ID
UNION ALL
SELECT YT.ContentID,
YT.Parent,
CONVERT(varchar(8000),'/' + YT.Title + r.Title),
r.Level + 1
FROM dbo.YourTable YT
JOIN rCTE r ON YT.ContentID = r.Parent)
SELECT TOP 1 REPLACE(Title,' ','-') + '/' AS Path
FROM rCTE
ORDER BY Level DESC;

Conversion of CTE to temp table in SQL Server to get All Possible Parents

I have one user table in which I maintain parent child relationship and I want to generate the result with all user id along with its parentid and all possible hierarchical parents as comma separated strings.
My table structure is as follows.
CREATE TABLE [hybarmoney].[Users](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[USERID] [nvarchar](100) NULL,
[REFERENCEID] [bigint] NULL
)
and I am getting the result using the below CTE
;WITH Hierarchy (
ChildId
,ChildName
,ParentId
,Parents
)
AS (
SELECT Id
,USERID
,REFERENCEID
,CAST('' AS VARCHAR(MAX))
FROM hybarmoney.Users AS FirtGeneration
WHERE REFERENCEID = 0
UNION ALL
SELECT NextGeneration.ID
,NextGeneration.UserID
,Parent.ChildId
,CAST(CASE
WHEN Parent.Parents = ''
THEN (CAST(NextGeneration.REFERENCEID AS VARCHAR(MAX)))
ELSE (Parent.Parents + ',' + CAST(NextGeneration.REFERENCEID AS VARCHAR(MAX)))
END AS VARCHAR(MAX))
FROM hybarmoney.Users AS NextGeneration
INNER JOIN Hierarchy AS Parent ON NextGeneration.REFERENCEID = Parent.ChildId
)
SELECT *
FROM Hierarchy
ORDER BY ChildId
OPTION (MAXRECURSION 0)
But I have the limitation of MAXRECURSION and when I googled, I got to know that temp tables are an alternative solution but I was not able to do the same
and also i don't want to get all possible top parents, for my purpose I want to find 15 levels of hierarchical parents for each users. Is it possible to use temp tables for my purpose if possible how.
What you could do, in order to get only N levels of your CTE is to create an additional column where you keep track of each level.
;WITH Hierarchy (
ChildId
,ChildName
,ParentId
,LEVEL
,Parents
)
AS (
SELECT Id
,USERID
,REFERENCEID
,0 AS LEVEL
,CAST('' AS VARCHAR(MAX))
FROM hybarmoney.Users AS FirtGeneration
WHERE REFERENCEID = 0
UNION ALL
SELECT NextGeneration.ID
,NextGeneration.UserID
,Parent.ChildId
,LEVEL+1 AS LEVEL
,CAST(CASE
WHEN Parent.Parents = ''
THEN (CAST(NextGeneration.REFERENCEID AS VARCHAR(MAX)))
ELSE (Parent.Parents + ',' + CAST(NextGeneration.REFERENCEID AS VARCHAR(MAX)))
END AS VARCHAR(MAX))
FROM hybarmoney.Users AS NextGeneration
INNER JOIN Hierarchy AS Parent ON NextGeneration.REFERENCEID = Parent.ChildId
)
SELECT *
FROM Hierarchy
WHERE LEVEL <= 15
ORDER BY ChildId
OPTION (MAXRECURSION 0)
This works, assuming I understood correctly your following statement: "for my purpose I want to find 15 levels of hierarchical parents for each users", where you actually meant 15 levels of hierarchical parents for a single user (in your case REFERENCEID=0).
If you want this to generate a list of 15 hierarchical parents for each user in your hybarmoney.Users table, then move your CTE in a table valued function and implement a similar solution as explained here.
I got the same result using the flowing query if there may be better solution
Create PROC UspUpdateUserAndFiftenLevelIDs (#UserID BIGINT)
AS
BEGIN
DECLARE #REFERENCEIDString NVARCHAR(max)
SET #REFERENCEIDString = ''
DECLARE #ReferenceID BIGINT
SET #ReferenceID = #UserID
DECLARE #Count INT
SET #Count = 0
WHILE (#count < 15)
BEGIN
SELECT #ReferenceID = U.REFERENCEID
,#REFERENCEIDString = #REFERENCEIDString + CASE
WHEN #REFERENCEIDString = ''
THEN (CAST(U.REFERENCEID AS VARCHAR(100)))
ELSE (',' + CAST(U.REFERENCEID AS VARCHAR(MAX)))
END
FROM hybarmoney.Users U
WHERE U.ID = #ReferenceID
SET #Count = #Count + 1
END
SELECT #UserID
,#REFERENCEIDString
END

Is this select field really necessary if its not labeled/referenced?

In our code stack I came across this stored proc:
ALTER PROCEDURE [dbo].[MoveNodes]
(
#id bigint,
#left bigint,
#right bigint,
#parentid bigint,
#offset bigint,
#caseid bigint,
#userid bigint
)
AS
BEGIN
WITH q AS
(
SELECT id, parent, lft, rgt, title, type, caseid, userid, 0 AS level,
CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [dbo].DM_FolderTree hc
WHERE id = #id and caseid = #caseid
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.title, hc.type, hc.caseid, hc.userid, level + 1,
CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [dbo].DM_FolderTree hc
ON hc.parent = q.id
)
UPDATE [dbo].DM_FolderTree
SET lft = ((-lft) + #offset), rgt = ((-rgt) + #offset), userid = #userid
WHERE id in (select id from q)
AND lft <= (-(#left)) AND rgt >= (-(#right))
AND caseid = #caseid;
UPDATE [dbo].DM_FolderTree
SET parent = #parentid, userid = #userid
WHERE id = #id AND caseid = #caseid;
END
Please, note the field labeled bc and how that variable is used below in the second CAST function, which we'll call for this discussion the unlabeled field, UF.
Two question pop out at me from the above.
UF is not used anywhere in the CTE, q, other than to select the field...does it serve any function?
Since bc is only used to generate UF, is bc necessary at all, either?
Looks like several fields are not being used.
I would question if the update is doing what is really needed or if the query could be simplified and give the same results in the fields you are actually using. There is no point in doing work you don't need to do. But only you can only know which one is wrong based on your local business requirements. All I can do is tell you that likely one of them is wrong.
If you want to check if your simplified version of the CTE give the same results in the fields you use, do this:
Declare #id bigint,
#left bigint,
#right bigint,
#parentid bigint,
#offset bigint,
#caseid bigint,
#userid bigint
-- then set the variables to typical values
WITH q AS
(
SELECT id, parent, lft, rgt, title, type, caseid, userid, 0 AS level,
CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [dbo].DM_FolderTree hc
WHERE id = #id and caseid = #caseid
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.title, hc.type, hc.caseid, hc.userid, level + 1,
CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [dbo].DM_FolderTree hc
ON hc.parent = q.id
)
SELECT dft.id, dft.lft, q.lft + #offset), dft.rgt,((-q.rgt) + #offset),
FROM [dbo].DM_FolderTree dft
JOIN q on q.id = dft.id
WHERE lft <= (-(#left))
AND rgt >= (-(#right))
AND caseid = #caseid;
WITH q2 AS
(
SELECT id, parent, lft, rgt
FROM [dbo].DM_FolderTree hc
WHERE id = #id and caseid = #caseid
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt,
FROM q2
JOIN [dbo].DM_FolderTree hc
ON hc.parent = q2.id
)
SELECT dft.Id, dft.lft, q2.lft + #offset), dft.rgt,((-q2.rgt) + #offset),
FROM [dbo].DM_FolderTree dft
JOIN q2 on q2.id = dft.id
WHERE lft <= (-(#left))
AND rgt >= (-(#right))
AND caseid = #caseid;
If your updated CTE gives the same results in the queries then the change is good. I used the id field and the values that are currently in the dft table and what you are replacing them with in the update to do the check. This can also tell you if the update results were what you were expecting. You can return any other fields that might help you make that determination if you want. Not knowing your data meaning, I don't know what else might be useful to you.
It's just me but I would not use the correlated subquery in the update either, I would rather see it as a join similar to what I used in the queries to check results.
bc is an alias in the "anchor" level, not a variable, of the recursive CTE. Basically, the ID being forced into a 10 character ID.
1 = 100000000
217 = 2170000000
etc.
In the recursive level of the cte, this ID, bc, is being concatenated to itself separated with a period (.).
1000000000.1000000000
2170000000.2170000000
The reason it is unnamed (UF) is that the field takes the name created in the first SELECT of a UNION. If you ran the CTE on its own, you'd see it labeled bc.
It doesn't look to provide any useful purpose other than to designate the anchor record. This might be a holdover from development or part of a design feature that was never fleshed out.
I'd move the comma onto the same line and just comment it out for now. Be careful removing too many fields though. Unlike this one, the rest may be required to provide a distinct enough record for the recursive count to work properly.
SELECT id, parent, lft, rgt, title, type, caseid, userid, 0 AS level
--, CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [dbo].DM_FolderTree hc
WHERE id = #id and caseid = #caseid
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.title, hc.type, hc.caseid, hc.userid, level + 1
--, CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [dbo].DM_FolderTree hc

Querying up tree for a particular value

I'm a bit of a SQL novice, so I could definitely use some assistance hashing out the general design of a particular query. I'll be giving a SQL example of what I'm trying to do below. It may contain some syntax errors, and I do apologize for that- I'm just trying to get the design down before I go running and testing it!
Side note- I have 0 control over the design scheme, so redesign is not an option. My example tables may have an error due to oversight on my part, but the overall design scheme of bottom-up value searching will remain the same. I'm querying an existing database filled with tons of data already in it.
The scenario is this: There is a tree of elements. Each element has an ID and a parent ID (table layouts below). Parent ID is a recursive foreign key to itself. There is a second table that contains values. Each value has an elementID that is a foreign key to the element table. So to get the value of a particular variable for a particular element, you must join the two tables.
The variable hierarchy goes Bottom-Up by way of inheritance. If you have an element and want to get its variable value, you first look at that element. If it doesn't have a value, then check the element's parent. If that doesn't check the parent's parent- all the way to the top. Every variable is guaranteed to have a value by the time you reach the top! (if I search for variableID 21- I know that 21 will exist. If not at the bottom, then definitely at the top) The lowest element on the tree gets priority, though- if the bottom element has a value for that variable, don't go any farther up!
The tables would look roughly like this:
Element_Table
--------------
elementID (PK)
ParentID (FK to elementID)
Value_Table
--------------
valueID (PK)
variableID
value (the value that we're looking for)
elementID (FK to Element_Table.elementID)
So, what I'm looking to do is create a function that cleanly (key word here. Nice, clean and efficient code) search, bottom-up, across the tree looking for a variable value. Once I find it- return that value and move on!
Here is an example of what I'm thinking:
CREATE FUNCTION FindValueInTreeBottomUp
(#variableID int, #element varchar(50))
RETURNS varchar(50)
AS
BEGIN
DECLARE #result varchar(50)
DECLARE #ID int
DECLARE #parentID int
SET #result = NULL, #ID = #element
WHILE (#result IS NULL)
BEGIN
SELECT #result = vals.value, #parentID = eles.ParentID
FROM Value_Table vals
JOIN Element_Table eles
ON vals.elementID = eles.elementID
WHERE eles.elementID = #ID AND vals.variableID = #variableID
IF(#result IS NULL)
#ID = #parentID
CONTINUE
ELSE
BREAK
END
RETURN #result
END
Again, I apologize if there are any syntactical errors. Still a SQL novice and haven't run this yet! I'm especially a novice at functions- I can query all day, but functions/sprocs are still rather new to me.
So, SQL gurus out there- can you think of a better way to do this? The design of the tables won't be changing; I have NO control over that. All I can do is produce the query to check the already existing design.
I think you could do something like this (it's untested, have to try it in sql fiddle):
;with cte1 as (
select e.elementID, e.parentID, v.value
from Element_Table as e
left outer join Value_Table as v on e.elementID = e.elementID and v.variableID = #variableID
), cte2 as (
select v.value, v.parentID, 1 as aDepth
from cte1 as v
where v.elementID = #elementID
union all
select v.value, v.parentID, c.aDepth + 1
from cte2 as c
inner join cte1 as v on v.elementID = c.ParentID
where c.value is null
)
select top 1 value
from cte2
where value is not null
order by aDepth
test infrastructure:
declare #Elements table (ElementID int, ParentID int)
declare #Values table (VariableID int, ElementID int, Value nvarchar(128))
declare #variableID int, #elementID int
select #variableID = 1, #elementID = 2
insert into #Elements
select 1, null union all
select 2, 1
insert into #Values
select 1, 1, 'test'
;with cte1 as (
select e.elementID, e.parentID, v.value
from #Elements as e
left outer join #Values as v on e.elementID = e.elementID and v.variableID = #variableID
), cte2 as (
select v.value, v.parentID, 1 as aDepth
from cte1 as v
where v.elementID = #elementID
union all
select v.value, v.parentID, c.aDepth + 1
from cte2 as c
inner join cte1 as v on v.elementID = c.ParentID
where c.value is null
)
select top 1 value
from cte2
where value is not null
order by aDepth

What is the most efficient way to concatenate a string from all parent rows using T-SQL?

I have a table that has a self-referencing foreign key that represents its parent row. To illustrate the problem in its simplest form we'll use this table:
CREATE TABLE Folder(
id int IDENTITY(1,1) NOT NULL, --PK
parent_id int NULL, --FK
folder_name varchar(255) NOT NULL)
I want to create a scalar-valued function that would return a concatenated string of the folder's name and all its parent folder names all the way to the root folder, which would be designated by a null parent_id value.
My current solution is a procedural approach which I assume is not ideal. Here is what I'm doing:
CREATE FUNCTION dbo.GetEntireLineage
(#folderId INT)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #lineage VARCHAR(MAX)
DECLARE #parentFolderId INT
SELECT #lineage = folder_name, #parentFolderId = parent_id FROM Folder WHERE id = #folderId
WHILE NOT #parentFolderId IS NULL
BEGIN
SET #parentFolderId = (SELECT parent_id FROM Folder WHERE parent_id = #parentFolderId)
SET #lineage = (SELECT #lineage + '-' + (SELECT folder_name FROM Folder WHERE parent_id = #parentFolderId))
END
RETURN #lineage
END
Is there a more ideal way to do this? I'm an experienced programmer but T-SQL not a familiar world to me and I know these problems generally require a different approach due to the nature of set based data. Any help finding a solution or any other tips and tricks to deal with T-SQL would be much appreciated.
To know for sure about performance you need to test. I have done some testing using your version (slightly modified) and a recursive CTE versions suggested by others.
I used your sample table with 2048 rows all in one single folder hierarchy so when passing 2048 as parameter to the function there are 2048 concatenations done.
The loop version:
create function GetEntireLineage1 (#id int)
returns varchar(max)
as
begin
declare #ret varchar(max)
select #ret = folder_name,
#id = parent_id
from Folder
where id = #id
while ##rowcount > 0
begin
select #ret = #ret + '-' + folder_name,
#id = parent_id
from Folder
where id = #id
end
return #ret
end
Statistics:
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 122 ms.
The recursive CTE version:
create function GetEntireLineage2(#id int)
returns varchar(max)
begin
declare #ret varchar(max);
with cte(id, name) as
(
select f.parent_id,
cast(f.folder_name as varchar(max))
from Folder as f
where f.id = #id
union all
select f.parent_id,
c.name + '-' + f.folder_name
from Folder as f
inner join cte as c
on f.id = c.id
)
select #ret = name
from cte
where id is null
option (maxrecursion 0)
return #ret
end
Statistics:
SQL Server Execution Times:
CPU time = 187 ms, elapsed time = 183 ms.
So between these two it is the loop version that is more efficient, at least on my test data. You need to test on your actual data to be sure.
Edit
Recursive CTE with for xml path('') trick.
create function [dbo].[GetEntireLineage4](#id int)
returns varchar(max)
begin
declare #ret varchar(max) = '';
with cte(id, lvl, name) as
(
select f.parent_id,
1,
f.folder_name
from Folder as f
where f.id = #id
union all
select f.parent_id,
lvl + 1,
f.folder_name
from Folder as f
inner join cte as c
on f.id = c.id
)
select #ret = (select '-'+name
from cte
order by lvl
for xml path(''), type).value('.', 'varchar(max)')
option (maxrecursion 0)
return stuff(#ret, 1, 1, '')
end
Statistics:
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 37 ms.
use a recursive query to traverse the parents and then this method for concatenating into a string.
A hierarchyid is often overkill unless you have a really deep hierarchy or very large sets of data that can take advantage of the indexing. This is as fast as you can get without changing your schema.
with recursiveCTE (parent_id,concatenated_name) as (
select parent_id,folder_name
from folder
union all
select f.parent_id,r.concatenated_name +f.folder_name
from folder f
inner join recursiveCTE r on r.parent_id = f.id
)
select folder_name from recursiveCTE
This works for you:
with cte (Parent_id, Path) as
(
select Parent_Id,Folder_Name
from folder
union all
select f.Parent_Id,r.Path + '\' + f.Folder_Name
from Folder as f
inner join cte as c on c.Parent_Id = f.Id
)
select Folder_Name from cte