Determine the hierarchy of records in a SQL database - sql

I've got a problem I was wondering if there's an elegant solution to. It is a real business problem and not a class assignment!
I have a table with thousands of records, some of which are groups related to each other.
The database is SQL 2005.
ID is the primary key. If the record replaced an earlier record, the ID of that record is in the REP_ID column.
ID REP_ID
E D
D B
C B
B A
A NULL
So in this example, A was the original row, B replaced A, C replaced B unsuccessfully, D replaced B successfully and finally E replaced D.
I'd like to be able to display all the records in this table in a grid.
Then, I'd like for the user to be able to right click any record in any
group, and for the system to locate all the related records and display them
in a some sort of tree.
Now I can obviously brute force a solution to this but I'd like to ask the
community if they can see a more elegant answer.

It's a recursive CTE you need, something like (untested)
;WITH myCTE AS
(
SELECT
ID
FROM
myTable
WHERE
REP_ID IS NULL
UNION ALL
SELECT
ID
FROM
myTable T
JOIN
myCTE C ON T.REP_ID = C.ID
)
SELECT
*
FROM
myCTE
However, the links C->B and D->B
So you want the C->B or both?
Do you want a ranking?
etc?

Use a CTE to build your hierarchy. Something like
CREATE TABLE #test(ID CHAR(1), REP_ID CHAR(1) NULL)
INSERT INTO #test VALUES('E','D')
INSERT INTO #test VALUES('D','B')
INSERT INTO #test VALUES('C','B')
INSERT INTO #test VALUES('B','A')
INSERT INTO #test VALUES('A',NULL)
WITH tree( ID,
REP_ID,
Depth
)
AS
(
SELECT
ID,
REP_ID,
1 AS [Depth]
FROM
#test
WHERE
REP_ID IS NULL
UNION ALL
SELECT
[test].ID,
[test].REP_ID,
tree.[Depth] + 1 AS [Depth]
FROM
#test [test]
INNER JOIN
tree
ON
[test].REP_ID = tree.ID
)
SELECT * FROM tree

You probably already considered it but have you looked into simply adding a row to store the "original_id"? That'd make your queries lightning fast compared to building a tree of who inherited from whom.
Barring that, just google for "SQL tree DFS".
Just make sure you have an optimization for your DFS as follows: if you know most records only have <=3 revisions, you can start with a 3-way joint to find A, B and C right away.

Related

Is there a better way to get the last level of groups from a table?

I have a Group table and within that table, there is a ParentId column that denotes a groups parent within the Group table. The purpose is to build a dynamic menu from these groups. I know I can loop and grab the last child and construct a result set, but I'm curious if there's a more SQL-y way of accomplishing this.
The table has Id, ParentId and Title fields of int, int and varchar.
Basically, a hierarchy may be constructed this way (People is the base group):
People -> Male -> Boy
-> Man
-> Female
I want to grab the last child(ren) of each branch. So, {Boy, Man, Female} in this case.
As I mentioned, getting that info isn't a problem. I'm just looking for a better way of getting it without having to write a bunch of unions and loops where I can basically change the base group and traverse the entire hierarchy outward, dynamically. I'm not really a Db guy, so I don't know if there's a slick way of doing this or not.
To get the leaf levels for one of many hierarchies, you can use a Recursive Common Table Expressions (CTEs) to enumerate the hierarchy, and then check which members are not the parent of another group to filter to the leaves:
Declare #RootID int = 1
;with cte as (
select
Id,
ParentId,
Title
From
Groups
Where
Id = #RootID
Union All
Select
g.Id,
g.ParentId,
g.Title
From
cte c
Inner Join
Groups g
On c.Id = g.ParentID
)
Select
*
From
cte g
Where
Not Exists (
Select
'x'
From
Groups g2
Where
g2.ParentID = g.Id
);
You can also do this with a left join rather than a not exists
http://sqlfiddle.com/#!6/8f1aa/9
Since you are using SQL Server 2012 you could take advantage of hierarchyid; here is an example following Laurence's schema:
CREATE TABLE Groups
(
Id INT NOT NULL
PRIMARY KEY
, Title VARCHAR(20)
, HID HIERARCHYID
)
INSERT INTO Groups
VALUES ( 1, 'People', '/' ),
( 2, 'Male', '/1/' ),
( 3, 'Female', '/2/' ),
( 4, 'Boy', '/1/1/' ),
( 5, 'Man', '/1/2/' );
SELECT Id
, Title
FROM Groups
WHERE HID NOT IN ( SELECT HID.GetAncestor(1)
FROM Groups
WHERE HID.GetAncestor(1) IS NOT NULL )
http://sqlfiddle.com/#!6/00330/1/0
Results:
ID TITLE
3 Female
4 Boy
5 Man

Database design - Multiple 1 to many relationships

What would be the best way to model 1 table with multiple 1 to many relatiionships.
With the above schema if Report contains 1 row, Grant 2 rows and Donation 12. When I join the three together I end up with a Cartesian product and result set of 24. Report joins to Grant and creates 2 rows, then Donation joins on that to make 24 rows.
Is there a better way to model this to avoid the caresian product?
example code
DECLARE #Report
TABLE (
ReportID INT,
Name VARCHAR(50)
)
INSERT
INTO #Report
(
ReportID,
Name
)
SELECT 1,'Report1'
DECLARE #Grant
TABLE (
GrantID INT IDENTITY(1,1) PRIMARY KEY(GrantID),
GrantMaker VARCHAR(50),
Amount DECIMAL(10,2),
ReportID INT
)
INSERT
INTO #Grant
(
GrantMaker,
Amount,
ReportID
)
SELECT 'Grantmaker1',10,1
UNION ALL
SELECT 'Grantmaker2',999,1
DECLARE #Donation
TABLE (
DonationID INT IDENTITY(1,1) PRIMARY KEY(DonationID),
DonationMaker VARCHAR(50),
Amount DECIMAL(10,2),
ReportID INT
)
INSERT
INTO #Donation
(
DonationMaker,
Amount,
ReportID
)
SELECT 'Grantmaker1',10,1
UNION ALL
SELECT 'Grantmaker2',3434,1
UNION ALL
SELECT 'Grantmaker3',45645,1
UNION ALL
SELECT 'Grantmaker4',3,1
UNION ALL
SELECT 'Grantmaker5',34,1
UNION ALL
SELECT 'Grantmaker6',23,1
UNION ALL
SELECT 'Grantmaker7',67,1
UNION ALL
SELECT 'Grantmaker8',78,1
UNION ALL
SELECT 'Grantmaker9',98,1
UNION ALL
SELECT 'Grantmaker10',43,1
UNION ALL
SELECT 'Grantmaker11',107,1
UNION ALL
SELECT 'Grantmaker12',111,1
SELECT *
FROM #Report r
INNER JOIN
#Grant g
ON r.ReportID = g.ReportID
INNER JOIN
#Donation d
ON r.ReportID = d.ReportID
Update 1 2011-03-07 15:20
Cheers for the feedback so far, to add to this scenario there are also 15 other 1 to many relationships coming from the one report table. These tables can't for various business reasons be grouped together.
Is there any relationship at all between Grants and Donations? If there isn't, does it make sense to pull back a query that shows a pseudo relationship between them?
I'd do one query for grants:
SELECT r.*, g.*
FROM #Report r
JOIN #Grant g ON r.ReportID = g.ReportID
And another for donations:
SELECT r.*, d.*
FROM #Report r
JOIN #Donation d ON r.ReportID = d.ReportID
Then let your application show the appropriate data.
However, if Grants and Donations are similar, then just make a more generic table such as Contributions.
Contributions
-------------
ContributionID (PK)
Maker
Amount
Type
ReportID (FK)
Now your query is:
SELECT r.*, c.*
FROM #Report r
JOIN #Contribution c ON r.ReportID = c.ReportID
WHERE c.Type = 'Grant' -- or Donation, depending on the application
If you're going to join on ReportID, then no, you can't avoid a lot of rows. When you omit the table "Report", and just join "Donation" to "Grant" on ReportId, you still get 24 rows.
SELECT *
FROM Grant g
INNER JOIN
Donation d
ON g.ReportID = d.ReportID
But the essential point is that it doesn't make sense in the real world to match up donations and grants. They're completely independent things that essentially have nothing to do with each other.
In the database, the statement immediately above will join each row in Grants to every matching row in Donation. The resulting 24 rows really shouldn't surprise you.
When you need to present independent things to the user, you should use a report writer or web application (for example) that selects the independent things, well, independently. Select donations and put them into one section of a report or web page, then select grants and put them into another section of the report or web page, and so on.
If the table "Report" is supposed to help you record which sections go into a particular report, then you need a structure more like this:
create table reports (
reportid integer primary key,
report_name varchar(35) not null unique
);
create table report_sections (
reportid integer not null references reports (reportid),
section_name varchar(35), -- Might want to reference a table of section names
section_order integer not null,
primary key (reportid, section_name)
);
The donation and grant tables look almost identical. You could make them one table and add a column that is something like DonationType. Would reduce complexity by 1 table. Now if donations and grants are completely different and have different subtables associated with them then keeping them seperate and only joining on one at a time would be ideal.

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70

How to select only one full row per group in a "group by" query?

In SQL Server, I have a table where a column A stores some data. This data can contain duplicates (ie. two or more rows will have the same value for the column A).
I can easily find the duplicates by doing:
select A, count(A) as CountDuplicates
from TableName
group by A having (count(A) > 1)
Now, I want to retrieve the values of other columns, let's say B and C. Of course, those B and C values can be different even for the rows sharing the same A value, but it doesn't matter for me. I just want any B value and any C one, the first, the last or the random one.
If I had a small table and one or two columns to retrieve, I would do something like:
select A, count(A) as CountDuplicates, (
select top 1 child.B from TableName as child where child.A = base.A) as B
)
from TableName as base group by A having (count(A) > 1)
The problem is that I have much more rows to get, and the table is quite big, so having several children selects will have a high performance cost.
So, is there a less ugly pure SQL solution to do this?
Not sure if my question is clear enough, so I give an example based on AdventureWorks database. Let's say I want to list available States, and for each State, get its code, a city (any city) and an address (any address). The easiest, and the most inefficient way to do it would be:
var q = from c in data.StateProvinces select new { c.StateProvinceCode, c.Addresses.First().City, c.Addresses.First().AddressLine1 };
in LINQ-to-SQL and will do two selects for each of 181 States, so 363 selects. I my case, I am searching for a way to have a maximum of 182 selects.
The ROW_NUMBER function in a CTE is the way to do this. For example:
DECLARE #mytab TABLE (A INT, B INT, C INT)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 2, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (2, 2, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 3)
;WITH numbered AS
(
SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY A ORDER BY B, C)
FROM #mytab AS m
)
SELECT *
FROM numbered
WHERE rn=1
As I mentioned in my comment to HLGEM and Philip Kelley, their simple use of an aggregate function does not necessarily return one "solid" record for each A group; instead, it may return column values from many separate rows, all stitched together as if they were a single record. For example, if this were a PERSON table, with the PersonID being the "A" column, and distinct contact records (say, Home and Word), you might wind up returning the person's home city, but their office ZIP code -- and that's clearly asking for trouble.
The use of the ROW_NUMBER, in conjunction with a CTE here, is a little difficult to get used to at first because the syntax is awkward. But it's becoming a pretty common pattern, so it's good to get to know it.
In my sample I've define a CTE that tacks on an extra column rn (standing for "row number") to the table, that itself groups by the A column. A SELECT on that result, filtering to only those having a row number of 1 (i.e., the first record found for that value of A), returns a "solid" record for each A group -- in my example above, you'd be certain to get either the Work or Home address, but not elements of both mixed together.
It concerns me that you want any old value for fields b and c. If they are to be meaningless why are you returning them?
If it truly doesn't matter (and I honestly can't imagine a case where I would ever want this, but it's what you said) and the values for b and c don't even have to be from the same record, group by with the use of mon or max is the way to go. It's more complicated if you want the values for a particular record for all fields.
select A, count(A) as CountDuplicates, min(B) as B , min(C) as C
from TableName as base
group by A
having (count(A) > 1)
you can do some thing like this if you have id as primary key in your table
select id,b,c from tablename
inner join
(
select id, count(A) as CountDuplicates
from TableName as base group by A,id having (count(A) > 1)
)d on tablename.id= d.id

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.