Populate the record links using Recursive CTE - sql

I have the contact table records which has a link of other contact record or the contact record is not linked to anything (null)
As per below example id 21 is a parent for contact 1
I need to populate the temptable using T-SQL records (Using the recursive CTE) with all the contact links for the each and every contact id in contact table as below
As one contact id is associated with multiple contact ids, the Link1,Link2,link3 columns should be dynamically created if possible.
Could anybody please help me with this script

Try this (necessary remarks in comments):
--data definition
declare #contactTable table (contactId int, linkContactId int)
insert into #contactTable values
(1,21),
(2,null),
(3,450),
(4,1),
(5,900),
(6,5),
(7,3),
(8,1)
--recursive cte
;with cte as (
(select 1 n, contactId from #contactTable
where linkContactId = 1
union
select 1, linkContactId from #contactTable
where contactId = 1)
union all
--this part might seem confusing, I tried writing recursive part similairly as anchor part,
--but it needed to joins, which isn't allowed in recursive part of cte, so I worked around it
select n + 1,
case when cte.n + 1 = t.contactId then t.linkContactId else t.contactId end
from cte join #contactTable [t] on
(cte.n + 1 = t.contactId or cte.n + 1 = t.linkContactId)
)
--grouping results by contactId concatenating all linkContacts
select n [contactId],
(select distinct cast(contactId as varchar(5)) + ',' from cte where n = c.n for xml path(''), type).value('(.)[1]', 'varchar(100)') [linkContactId]
from cte [c]
group by n

As per your above script i was able to nearly get the results
As 4,8 have already been included in the first row, it should not be shown as seperate record/records
Can you please adjust your query and please provide me the skipping script

Related

How can run second query based on first query?

Im using two query's, 1st separated one column to two columns and inserted one table and second query (PIVOT) fetching based on inserted table,
1st Query,
SELECT A.MDDID, A.DeviceNumber,
Split.a.value('.', 'VARCHAR(100)') AS MetReading
FROM (
SELECT MDDID,DeviceNumber,
CAST ('<M>' + REPLACE(Httpstring, ':', '</M><M>') + '</M>' AS XML) AS MetReading
FROM [IOTDBV1].[dbo].[MDASDatas] E
Where E.MDDID = 49101
) AS A CROSS APPLY MetReading.nodes ('/M') AS Split(a);
2nd Query
SELECT * FROM
(
Select ID,MDDID,DeviceNumber,ReceivedDate
, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY (SELECT 1)) AS ID2
, SPLT.MR.value('.','VARCHAR(MAX)') AS LIST FROM (
Select ID,MDDID,DeviceNumber,ReceivedDate
, CAST( '<M>'+REPLACE(MeterReading,',','</M><M>')+'</M>' AS XML) AS XML_MR
From [dbo].[PARSEMDASDatas] E
Where E.MeterReading is Not Null
)E
CROSS APPLY E.XML_MR.nodes('/M') AS SPLT(MR)
)A
PIVOT
(
MAX(LIST) FOR ID2 IN ([1],[2],[3],[4],[5],[6],[7],[8])
)PV
I want 2nd query run based on first query no need to require table.
any help would be appreciated.
Your question is not very clear... And it is a very good example, why you always should add a MCVE, including DDL, sample data, own attempts, wrong output and expected output. This time I do this for you, please try to prepare such a MCVE the next time yourself...
If I get this correctly, your source table includes a CSV column with up to 8 (max?) values. This might be solved much easier, no need to break this up in two queries, no need for an intermediate table and not even for PIVOT.
--create a mockup-table to simulate your situation (slightly shortened for brevity)
DECLARE #YourTable TABLE(ID INT,MDDID INT, DeviceNumber VARCHAR(100),MetReading VARCHAR(2000));
INSERT INTO #YourTable VALUES
(2,49101,'NKLDEVELOPMENT02','DCPL,981115,247484,9409') --the character code and some numbers
,(3,49101,'NKLDEVELOPMENT02','SPPL,,,,,,,,') --eigth empty commas
,(4,49101,'NKLDEVELOPMENT02','BLAH,,,999,,'); --A value somewhere in the middle
--The cte will return the table as is. The only difference is a cast to XML (as you did it too)
WITH Splitted AS
(
SELECT ID
,MDDID
,DeviceNumber
,CAST('<x>' + REPLACE(MetReading,',','</x><x>') + '</x>' AS XML) AS Casted
FROM #YourTable t
)
SELECT s.ID
,s.MDDID
,s.DeviceNumber
,s.Casted.value('/x[1]','varchar(100)') AS [1]
,s.Casted.value('/x[2]','varchar(100)') AS [2]
,s.Casted.value('/x[3]','varchar(100)') AS [3]
,s.Casted.value('/x[4]','varchar(100)') AS [4]
,s.Casted.value('/x[5]','varchar(100)') AS [5]
,s.Casted.value('/x[6]','varchar(100)') AS [6]
,s.Casted.value('/x[7]','varchar(100)') AS [7]
,s.Casted.value('/x[8]','varchar(100)') AS [8]
FROM Splitted s;
the result
ID MDDID DeviceNumber 1 2 3 4 5 6 7 8
2 49101 NKLDEVELOPMENT02 DCPL 981115 247484 9409 NULL NULL NULL NULL
3 49101 NKLDEVELOPMENT02 SPPL
4 49101 NKLDEVELOPMENT02 BLAH 999 NULL NULL
The idea in short:
Each CSV is tranformed to a XML similar to this:
<x>DCPL</x>
<x>981115</x>
<x>247484</x>
<x>9409</x>
Using a position predicate in the XPath, we can call the first, the second, the third <x> easily.
CTE: WITH common_table_expression is answer
you can prepare some data in first query and user in second
WITH cte_table AS
(
SELECT *
FROM sys.objects
)
SELECT *
FROM cte_table
where name like 'PK%'

Remove duplicated subsets from very large table

The data I'm working with is fairly complicated, so I'm just going to provide a simpler example so I can hopefully expand that out to what I'm working on.
Note: I've already found a way to do it, but it's extremely slow and not scalable. It works great on small datasets, but if I applied it to the actual tables it needs to run on, it would take forever.
I need to remove entire duplicate subsets of data within a table. Removing duplicate rows is easy, but I'm stuck finding an efficient way to remove duplicate subsets.
Example:
GroupID Subset Value
------- ---- ----
1 a 1
1 a 2
1 a 3
1 b 1
1 b 3
1 b 5
1 c 1
1 c 3
1 c 5
2 a 1
2 a 2
2 a 3
2 b 4
2 b 5
2 b 6
2 c 1
2 c 3
2 c 6
So in this example, from GroupID 1, I would need to remove either subset 'b' or subset 'c', doesn't matter which since both contain Values 1,2,3. For GroupID 2, none of the sets are duplicated, so none are removed.
Here's the code I used to solve this on a small scale. It works great, but when applied to 10+ Million records...you can imagine it would be very slow (I was later informed of the number of records, the sample data I was given was much smaller)...:
DECLARE #values TABLE (GroupID INT NOT NULL, SubSet VARCHAR(1) NOT NULL, [Value] INT NOT NULL)
INSERT INTO #values (GroupID, SubSet, [Value])
VALUES (1,'a',1),(1,'a',2),(1,'a',3) ,(1,'b',1),(1,'b',3),(1,'b',5) ,(1,'c',1),(1,'c',3),(1,'c',5),
(2,'a',1),(2,'a',2),(2,'a',3) ,(2,'b',2),(2,'b',4),(2,'b',6) ,(2,'c',1),(2,'c',3),(2,'c',6)
SELECT *
FROM #values v
ORDER BY v.GroupID, v.SubSet, v.[Value]
SELECT x.GroupID, x.NameValues, MIN(x.SubSet)
FROM (
SELECT t1.GroupID, t1.SubSet
, NameValues = (SELECT ',' + CONVERT(VARCHAR(10), t2.[Value]) FROM #values t2 WHERE t1.GroupID = t2.GroupID AND t1.SubSet = t2.SubSet ORDER BY t2.[Value] FOR XML PATH(''))
FROM #values t1
GROUP BY t1.GroupID, t1.SubSet
) x
GROUP BY x.GroupID, x.NameValues
All I'm doing here is grouping by GroupID and Subset and concatenating all of the values into a comma delimited string...and then taking that and grouping on GroupID and Value list, and taking the MIN subset.
I'd go with something like this:
;with cte as
(
select v.GroupID, v.SubSet, checksum_agg(v.Value) h, avg(v.Value) a
from #values v
group by v.GroupID, v.SubSet
)
delete v
from #values v
join
(
select c1.GroupID, case when c1.SubSet > c2.SubSet then c1.SubSet else c2.SubSet end SubSet
from cte c1
join cte c2 on c1.GroupID = c2.GroupID and c1.SubSet <> c2.SubSet and c1.h = c2.h and c1.a = c2.a
)x on v.GroupID = x.GroupID and v.SubSet = x.SubSet
select *
from #values
From Checksum_Agg:
The CHECKSUM_AGG result does not depend on the order of the rows in
the table.
This is because it is a sum of the values: 1 + 2 + 3 = 3 + 2 + 1 = 3 + 3 = 6.
HashBytes is designed to produce a different value for two inputs that differ only in the order of the bytes, as well as other differences. (There is a small possibility that two inputs, perhaps of wildly different lengths, could hash to the same value. You can't take an arbitrary input and squeeze it down to an absolutely unique 16-byte value.)
The following code demonstrates how to use HashBytes to return for each GroupId/Subset.
-- Thanks for the sample data!
DECLARE #values TABLE (GroupID INT NOT NULL, SubSet VARCHAR(1) NOT NULL, [Value] INT NOT NULL)
INSERT INTO #values (GroupID, SubSet, [Value])
VALUES (1,'a',1),(1,'a',2),(1,'a',3) ,(1,'b',1),(1,'b',3),(1,'b',5) ,(1,'c',1),(1,'c',3),(1,'c',5),
(2,'a',1),(2,'a',2),(2,'a',3) ,(2,'b',2),(2,'b',4),(2,'b',6) ,(2,'c',1),(2,'c',3),(2,'c',6);
SELECT *
FROM #values v
ORDER BY v.GroupID, v.SubSet, v.[Value];
with
DistinctGroups as (
select distinct GroupId, Subset
from #Values ),
GroupConcatenatedValues as (
select GroupId, Subset, Convert( VarBinary(256), (
select Convert( VarChar(8000), Cast( Value as Binary(4) ), 2 ) AS [text()]
from #Values as V
where V.GroupId = DG.GroupId and V.SubSet = DG.SubSet
order by Value
for XML Path('') ), 2 ) as GroupedBinary
from DistinctGroups as DG )
-- To see the intermediate results from the CTE you can use one of the
-- following two queries instead of the last select :
-- select * from DistinctGroups;
-- select * from GroupConcatenatedValues;
select GroupId, Subset, GroupedBinary, HashBytes( 'MD4', GroupedBinary ) as Hash
from GroupConcatenatedValues
order by GroupId, Subset;
You can use checksum_agg() over a set of rows. If the checksums are the same, this is strong evidence that the 'values' columns are equal within the grouped fields.
In the 'getChecksums' cte below, I group by the group and subset, with a checksum based on your 'value' column.
In the 'maybeBadSubsets' cte, I put a row_number over each aggregation just to identify the 2nd+ row in the event the checksums match.
Finally, I delete any subgroups so identified.
with
getChecksums as (
select groupId,
subset,
cs = checksum_agg(value)
from #values v
group by groupId,
subset
),
maybeBadSubsets as (
select groupId,
subset,
cs,
deleteSubset =
case
when row_number() over (
partition by groupId, cs
order by subset
) > 1
then 1
end
from getChecksums
)
delete v
from #values v
where exists (
select 0
from maybeBadSubsets mbs
where v.groupId = mbs.groupId
and v.SubSet = mbs.subset
and mbs.deleteSubset = 1
);
I don't know what the exact likelihood is for checksums to match. If you're not comfortable with the false positive rate, you can still use it to eliminate some branches in a more algorithmic approach in order to vastly improve performance.
Note: CTE's can have a quirk performance-wise. If you find that the query engine is running 'maybeBadSubsets' for each row of #values, you may need to put its results into a temp table or table variable before using it. But I believe with 'exists' you're okay as far at that goes.
EDIT:
I didn't catch it, but as the OP noticed, checksum_agg seems to perform very poorly in terms of false hits/misses. I suspect it might be due to the simplicity of the input. I changed
cs = checksum_agg(value)
above to
cs = checksum_agg(convert(int,hashbytes('md5', convert(char(1),value))))
and got better results. But I don't know how it would perform on larger datasets.

Active Directory: Convert canonicalName node value from string to integer

Are there any methods available to convert the string text value contained within the AD canonicalName attribute to an incremented integer value? Or, does this need to be performed manually?
For example:
canonicalName (what I am getting) hierarchyNode (what I need)
\domain.com\ /1/
\domain.com\Corporate /1/1/
\domain.com\Corporate\Hr /1/1/1/
\domain.com\Corporate\Accounting /1/1/2/
\domain.com\Users\ /1/2/
\domain.com\Users\Sales /1/2/1/
\domain.com\Users\Whatever /1/2/2/
\domain.com\Security\ /1/3/
\domain.com\Security\Servers /1/3/1/
\domain.com\Security\Administrative /1/3/2/
\domain.com\Security\Executive /1/3/3/
I am extracting user objects into a SQL Server database for reporting purposes. The user objects are spread throughout multiple OU's in the forest. So, by identifying the highest node on the tree that contains users, I can then utilize the SQL Server GetDescendent() method to quickly retrieve users recursively without having to write 1 + n number of sub-selects.
For reference: https://learn.microsoft.com/en-us/sql/t-sql/data-types/hierarchyid-data-type-method-reference
UPDATE:
I am able to convert the canonicalName from string to integer (see below using SQL Server 2014). However, this doesn't seem to solve my problem. I have only built the branches of the tree by stripping off the leafs, that way I can get IsDescendant() by tree branch. But now, I cannot insert the leafs in batch as it appears I need to GetDescendant(), which appears to be built for handling inserts one at a time.
How can I build the Active Directory directory tree, which resembles file system paths, as a SQL Hierarchy? All examples treat the hierarchy as an immediately parent/child relationship and use a recursive CTE to build from the root, which requires the parent child relationship to already be know. In my case, the parent child relationship is known only through the '/' delimiter.
-- Drop and re-create temp table(s) that are used by this procedure.
IF OBJECT_ID(N'Tempdb.dbo.#TEMP_TreeHierarchy', N'U') IS NOT NULL
BEGIN
DROP TABLE #TEMP_TreeHierarchy
END;
-- Drop and re-create temp table(s) that are used by this procedure.
IF OBJECT_ID(N'Tempdb.dbo.#TEMP_AdTreeHierarchyNodeNames', N'U') IS NOT NULL
BEGIN
DROP TABLE #TEMP_AdTreeHierarchyNodeNames
END;
-- CREATE TEMP TABLE(s)
CREATE TABLE #TEMP_TreeHierarchy(
TreeHierarchyKey INT IDENTITY(1,1) NOT NULL
,TreeHierarchyId hierarchyid NULL
,TreeHierarchyNodeLevel int NULL
,TreeHierarchyNode varchar(255) NULL
,TreeCanonicalName varchar(255) NOT NULL
PRIMARY KEY CLUSTERED
(
TreeCanonicalName ASC
))
CREATE TABLE #TEMP_AdTreeHierarchyNodeNames (
TreeCanonicalName VARCHAR(255) NOT NULL
,TreeHierarchyNodeLevel INT NOT NULL
,TreeHierarchyNodeName VARCHAR(255) NOT NULL
,IndexValueByLevel INT NULL
PRIMARY KEY CLUSTERED
(
TreeCanonicalName ASC
,TreeHierarchyNodeLevel ASC
,TreeHierarchyNodeName ASC
))
-- Step 1.) INSERT the DISTINCT list of CanonicalName values into #TEMP_TreeHierarchy.
-- Remove the reserved character '/' that has been escaped '\/'. Note: '/' is the delimiter.
-- Remove all of the leaves from the tree, leaving only the root and the branches/nodes.
;WITH CTE1 AS (SELECT CanonicalNameParseReserveChar = REPLACE(A.CanonicalName, '\/', '') -- Remove the reserved character '/' that has been escaped '\/'.
FROM dbo.AdObjects A
)
-- Remove CN from end of string in order to get the distinct list (i.e., remove all of the leaves from the tree, leaving only the root and the branches/nodes).
-- INSERT the records INTO #TEMP_TreeHierarchy
INSERT INTO #TEMP_TreeHierarchy (TreeCanonicalName)
SELECT DISTINCT
CanonicalNameTree = REVERSE(SUBSTRING(REVERSE(C1.CanonicalNameParseReserveChar), CHARINDEX('/', REVERSE(C1.CanonicalNameParseReserveChar), 0) + 1, LEN(C1.CanonicalNameParseReserveChar) - CHARINDEX('/', REVERSE(C1.CanonicalNameParseReserveChar), 0)))
FROM CTE1 C1
-- Step 2.) Get NodeLevel and NodeName (i.e., key/value pair).
-- Get the nodes for each entry by splitting out the '/' delimiter, which provides both the NodeLevel and NodeName.
-- This table will be used as scratch to build the HierarchyNodeByLvl,
-- which is where the heavy lifting of converting the canonicalName value from string to integer occurs.
-- Note: integer is required for the node name - string values are not allowed. Thus this bridge must be build dynamically.
-- Achieve dynamic result by using CROSS APPLY to convert a single delimited row into 1 + n rows, based on the number of nodes.
-- INSERT the key/value pair results INTO a temp table.
-- Use ROW_NUMBER() to identify each NodeLevel, which is the key.
-- Use the string contained between the delimiter, which is the value.
-- Combined, these create a unique identifier that will be used to roll-up the HierarchyNodeByLevel, which is a RECURSIVE key/value pair of NodeLevel and IndexValueByLevel.
-- The rolled-up value contained in HierarchyNodeByLevel is what the SQL Server hierarchyid::Parse() function requires in order to create the hierarchyid.
-- https://blog.sqlauthority.com/2015/04/21/sql-server-split-comma-separated-list-without-using-a-function/
INSERT INTO #TEMP_AdTreeHierarchyNodeNames (TreeCanonicalName, TreeHierarchyNodeLevel, TreeHierarchyNodeName)
SELECT TreeCanonicalName
,TreeHierarchyNodeLevel = ROW_NUMBER() OVER(PARTITION BY TreeCanonicalName ORDER BY TreeCanonicalName)
,TreeHierarchyNodeName = LTRIM(RTRIM(m.n.value('.[1]','VARCHAR(MAX)')))
FROM (SELECT TH.TreeCanonicalName
,x = CAST('<XMLRoot><RowData>' + REPLACE(TH.TreeCanonicalName,'/','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML)
FROM #TEMP_TreeHierarchy TH
) SUB1
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)
-- Step 3.) Get the IndexValueByLevel RECURSIVE key/value pair
-- Get the DISTINCT list of TreeHierarchyNodeLevel, TreeHierarchyNodeName first
-- Use TreeHierarchyNodeLevel is the key
-- Use ROW_NUMBER() to identify each IndexValueByLevel, which is the value.
-- Since the IndexValueByLevel exists for each level, the value for each level must be concatenated together to create the final value that is stored in TreeHierarchyNode
;WITH CTE1 AS (SELECT DISTINCT TreeHierarchyNodeLevel, TreeHierarchyNodeName
FROM #TEMP_AdTreeHierarchyNodeNames
),
CTE2 AS (SELECT C1.*
,IndexValueByLevel = ROW_NUMBER() OVER(PARTITION BY C1.TreeHierarchyNodeLevel ORDER BY C1.TreeHierarchyNodeName)
FROM CTE1 C1
)
UPDATE TMP1
SET TMP1.IndexValueByLevel = C2.IndexValueByLevel
FROM #TEMP_AdTreeHierarchyNodeNames TMP1
INNER JOIN CTE2 C2
ON TMP1.TreeHierarchyNodeLevel = C2.TreeHierarchyNodeLevel
AND TMP1.TreeHierarchyNodeName = C2.TreeHierarchyNodeName
-- Step 4.) Build the TreeHierarchyNodeByLevel.
-- Use FOR XML to roll up all duplicate keys in order to concatenate their values into one string.
-- https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/
;WITH CTE1 AS (SELECT DISTINCT TreeCanonicalName
,TreeHierarchyNodeByLevel =
(SELECT '/' + CAST(IndexValueByLevel AS VARCHAR(10))
FROM #TEMP_AdTreeHierarchyNodeNames TMP1
WHERE TMP1.TreeCanonicalName = TMP2.TreeCanonicalName
FOR XML PATH(''))
FROM #TEMP_AdTreeHierarchyNodeNames TMP2
),
CTE2 AS (SELECT C1.TreeCanonicalName
,C1.TreeHierarchyNodeByLevel
,TreeHierarchyNodeLevel = MAX(TMP1.TreeHierarchyNodeLevel)
FROM CTE1 C1
INNER JOIN #TEMP_AdTreeHierarchyNodeNames TMP1
ON TMP1.TreeCanonicalName = C1.TreeCanonicalName
GROUP BY C1.TreeCanonicalName, C1.TreeHierarchyNodeByLevel
)
UPDATE TH
SET TH.TreeHierarchyNodeLevel = C2.TreeHierarchyNodeLevel
,TH.TreeHierarchyNode = C2.TreeHierarchyNodeByLevel + '/'
,TH.TreeHierarchyId = hierarchyid::Parse(C2.TreeHierarchyNodeByLevel + '/')
FROM #TEMP_TreeHierarchy TH
INNER JOIN CTE2 C2
ON TH.TreeCanonicalName = C2.TreeCanonicalName
INSERT INTO AD.TreeHierarchy (EffectiveStartDate, EffectiveEndDate, TreeCanonicalName, TreeHierarchyNodeLevel, TreeHierarchyNode, TreeHierarchyId)
SELECT EffectiveStartDate = CAST(GETDATE() AS DATE)
,EffectiveEndDate = '12/31/9999'
,TH.TreeCanonicalName
,TH.TreeHierarchyNodeLevel
,TH.TreeHierarchyNode
,TH.TreeHierarchyId
FROM #TEMP_TreeHierarchy TH
ORDER BY TH.TreeHierarchyKey
---- For testing purposes only.
SELECT * FROM AD.TreeHierarchy TH
SELECT * FROM #TEMP_AdTreeHierarchyNodeNames
SELECT * FROM #TEMP_TreeHierarchy
-- Clean-up. DROP TEMP TABLE(s).
DROP TABLE #TEMP_TreeHierarchy
DROP TABLE #TEMP_AdTreeHierarchyNodeNames
This is where my thinking takes me
I gave you 9 levels, but the pattern is easy to see and expand
Without a proper sequence I defaulted to alphabetical by node.
It also supports multiple root nodes as well
Example
Select A.*
,Nodes = concat('/',dense_rank() over (Order By N1),'/'
,left(nullif(dense_rank() over (Partition By N1 Order By N2)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2 Order By N3)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2,N3 Order By N4)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2,N3,N4 Order By N5)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2,N3,N4,N5 Order By N6)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2,N3,N4,N5,N6 Order By N7)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2,N3,N4,N5,N6,N7 Order By N8)-1,0),5)+'/'
,left(nullif(dense_rank() over (Partition By N1,N2,N3,N4,N5,N6,N7,N8 Order By N9)-1,0),5)+'/'
)
From YourTable A
Cross Apply (
Select N1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,N2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,N3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,N4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,N5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,N6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,N7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,N8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,N9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(stuff([canonicalName],1,1,''),'\','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
) B
Order By 1
Returns
canonicalName Nodes
\domain.com\ /1/
\domain.com\Corporate /1/1/
\domain.com\Corporate\Accounting /1/1/1/
\domain.com\Corporate\Hr /1/1/2/
\domain.com\Security\ /1/2/
\domain.com\Security\Administrative /1/2/1/
\domain.com\Security\Executive /1/2/2/
\domain.com\Security\Servers /1/2/3/
\domain.com\Users\ /1/3/
\domain.com\Users\Sales /1/3/1/
\domain.com\Users\Whatever /1/3/2/

Is it possible to concatenate column values into a string using CTE?

Say I have the following table:
id|myId|Name
-------------
1 | 3 |Bob
2 | 3 |Chet
3 | 3 |Dave
4 | 4 |Jim
5 | 4 |Jose
-------------
Is it possible to use a recursive CTE to generate the following output:
3 | Bob, Chet, Date
4 | Jim, Jose
I've played around with it a bit but haven't been able to get it working. Would I do better using a different technique?
I do not recommend this, but I managed to work it out.
Table:
CREATE TABLE [dbo].[names](
[id] [int] NULL,
[myId] [int] NULL,
[name] [char](25) NULL
) ON [PRIMARY]
Data:
INSERT INTO names values (1,3,'Bob')
INSERT INTO names values 2,3,'Chet')
INSERT INTO names values 3,3,'Dave')
INSERT INTO names values 4,4,'Jim')
INSERT INTO names values 5,4,'Jose')
INSERT INTO names values 6,5,'Nick')
Query:
WITH CTE (id, myId, Name, NameCount)
AS (SELECT id,
myId,
Cast(Name AS VARCHAR(225)) Name,
1 NameCount
FROM (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e
WHERE id = 1
UNION ALL
SELECT e1.id,
e1.myId,
Cast(Rtrim(CTE.Name) + ',' + e1.Name AS VARCHAR(225)) AS Name,
CTE.NameCount + 1 NameCount
FROM CTE
INNER JOIN (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e1
ON e1.id = CTE.id + 1
AND e1.myId = CTE.myId)
SELECT myID,
Name
FROM (SELECT myID,
Name,
(Row_number() OVER (PARTITION BY myId ORDER BY namecount DESC)) AS id
FROM CTE) AS p
WHERE id = 1
As requested, here is the XML method:
SELECT myId,
STUFF((SELECT ',' + rtrim(convert(char(50),Name))
FROM namestable b
WHERE a.myId = b.myId
FOR XML PATH('')),1,1,'') Names
FROM namestable a
GROUP BY myId
A CTE is just a glorified derived table with some extra features (like recursion). The question is, can you use recursion to do this? Probably, but it's using a screwdriver to pound in a nail. The nice part about doing the XML path (seen in the first answer) is it will combine grouping the MyId column with string concatenation.
How would you concatenate a list of strings using a CTE? I don't think that's its purpose.
A CTE is just a temporarily-created relation (tables and views are both relations) which only exists for the "life" of the current query.
I've played with the CTE names and the field names. I really don't like reusing fields names like id in multiple places; I tend to think those get confusing. And since the only use for names.id is as a ORDER BY in the first ROW_NUMBER() statement, I don't reuse it going forward.
WITH namesNumbered as (
select myId, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY id
) as nameNum
FROM names
)
, namesJoined(myId, Name, nameCount) as (
SELECT myId,
Cast(Name AS VARCHAR(225)),
1
FROM namesNumbered nn1
WHERE nameNum = 1
UNION ALL
SELECT nn2.myId,
Cast(
Rtrim(nc.Name) + ',' + nn2.Name
AS VARCHAR(225)
),
nn.nameNum
FROM namesJoined nj
INNER JOIN namesNumbered nn2 ON nn2.myId = nj.myId
and nn2.nameNum = nj.nameCount + 1
)
SELECT myId, Name
FROM (
SELECT myID, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY nameCount DESC
) AS finalSort
FROM namesJoined
) AS tmp
WHERE finalSort = 1
The first CTE, namesNumbered, returns two fields we care about and a sorting value; we can't just use names.id for this because we need, for each myId value, to have values of 1, 2, .... names.id will have 1, 2 ... for myId = 1 but it will have a higher starting value for subsequent myId values.
The second CTE, namesJoined, has to have the field names specified in the CTE signature because it will be recursive. The base case (part before UNION ALL) gives us records where nameNum = 1. We have to CAST() the Name field because it will grow with subsequent passes; we need to ensure that we CAST() it large enough to handle any of the outputs; we can always TRIM() it later, if needed. We don't have to specify aliases for the fields because the CTE signature provides those. The recursive case (after the UNION ALL) joins the current CTE with the prior one, ensuring that subsequent passes use ever-higher nameNum values. We need to TRIM() the prior iterations of Name, then add the comma and the new Name. The result will be, implicitly, CAST()ed to a larger field.
The final query grabs only the fields we care about (myId, Name) and, within the subquery, pointedly re-sorts the records so that the highest namesJoined.nameCount value will get a 1 as the finalSort value. Then, we tell the WHERE clause to only give us this one record (for each myId value).
Yes, I aliased the subquery as tmp, which is about as generic as you can get. Most SQL engines require that you give a subquery an alias, even if it's the only relation visible at that point.

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.