Different result with * and explicit field list? - sql

I was exploring another question, when I hit this behaviour in Sql Server 2005. This query would exhaust the maximum recursion:
with foo(parent_id,child_id) as (
select parent_id,child_id
from #bar where parent_id in (1,3)
union all
select #bar.* -- Line that changed
from #bar
join foo on #bar.parent_id = foo.child_id
)
select * from foo
But this would work fine:
with foo(parent_id,child_id) as (
select parent_id,child_id
from #bar where parent_id in (1,3)
union all
select #bar.parent_id, #bar.child_id -- Line that changed
from #bar
join foo on #bar.parent_id = foo.child_id
)
select * from foo
Is this a bug in Sql Server, or am I overlooking something?
Here's the table definition:
if object_id('tempdb..#bar') is not null
drop table #bar
create table #bar (
child_id int,
parent_id int
)
insert into #bar (parent_id,child_id) values (1,2)
insert into #bar (parent_id,child_id) values (1,5)
insert into #bar (parent_id,child_id) values (2,3)
insert into #bar (parent_id,child_id) values (2,6)
insert into #bar (parent_id,child_id) values (6,4)

Edit
I think I know what's going on and is a great example of why to avoid select * in the first place.
You defined your table with childId first then parentId, but the CTE Foo expects parentId then childId,
So essentially when you say select #bar.* your saying select childId, parentId but your putting that into parentId, child. This results in a n-level recursive expression as you go to join back on yourself.
So this is not a bug in SQL.
Moral of the lesson: Avoid Select * and save yourself headaches.

I would say it was probably a deliberate choice by the programmers as a safety value because you have a union and all parts of a union must have the same number of fields. I never use select * in a union so I can't say for sure. I tried on SQL Server 2008 and got this message:
Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
That seems to support my theory.
Select * is a poor technique to use in any prodcution code, so I don't see why it would be a problem to simply specify the fields.

Related

Trying to filter on a union of 3 queries

I have a union of 3 queries that summarizes like this...
Select param1 As 'example1' And .... Where...
Union All
Select param1 As 'example2' And .... Where...
Union All
Select param1 As 'example3' And .... Where...
Is there any way to wrap this in a Select and create an optional parameter that filters on example1/example2/example3?
Any help would be greatly appreciated. Thanks!
Sorry everyone...when I was brainstorming in my head, I had my query wrong and I'm not very good at sql anyway. But what I want to do was filter on the created column of AccountStatus as an optional parameter. Is there some way to capture all this and then add an optional parameter to filter on the created column?
Select 'Red Account' As AccountStatus And .... Where OverdueDays >= 30
Union All
Select 'Yellow Account' As AccountStatus And .... Where 10 < OverdueDays < 30
Union All
Select 'Green Account' As AccountStatus And .... Where OverdueDays <= 10
Is there any way to wrap this in a Select and create an optional parameter that filters on example1/example2/example3?
Assuming that I understood your need then you can use common table expression (CTE).
In the following document you can read more about the option of using CTE:
https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver15
In general, you wrap your code into a CTE (in the following example I will name my CTE MyCTE but you can use any name), and then in the outside query you can add the filter which you want.
Do not confuse between CTE and a variable or temporary table. A CTE is not a physical entity but logically for the sake pf the query and the server will parse it as inline code.
For example:
;With MyCTE as ( <here comes the code you want to wrap> )
SELECT <choose columns to select>
FROM MyCTE
<here you can add order by or filetring with where>
In your case it might look like:
;With MyCTE as (
-- when you use UNION then the name of the columns in the result is configured by the names in the first query, so you do not need all the "as..." for the other queries.
-- not clear why you use "And" in the name of the columns you select. If you need more than one column then you should use comma "," between the columns
Select param1 As [example1], column2, column3... Where... <this filter only this specific query>
Union All
Select param1, column2, column3.... Where...
Union All
Select param1, column2, column3.... Where...
)
SELECT param1, column2, column3....
FROM MyCTE -- we select from the logical CTE as it was a table
WHERE <here you can add condition to filter the result of the UNION>
Yes it is possible. And here is where you should make the effort provide consumable information that you and everyone else can use as a basis for a solution. Below is the code in this fiddle which you can play with.
-- "parameters"
--declare #p1 varchar(20) = null;
declare #p1 varchar(20) = 'waggle';
-- some crap data
declare #x table (id int, trandate date);
insert #x (id, trandate) values
(1, '20190101'), (2, '20190425'),
(4, '20200311'), (5, '20200630'), (11, '20200801'),
(12, '20210101'), (13, '20210710');
with crap_union as (
select 'wiggle' as [when], id, trandate
from #x where year(trandate) = 2019
union all
select 'waggle' as [when], id, trandate
from #x where year(trandate) = 2020
union all
select 'waddle' as [when], id, trandate
from #x where year(trandate) = 2021
)
select * from crap_union
where [when] = #p1 or #p1 is null
order by [when], id, trandate
;
Same idea as generically expressed by Ronen. You add some sort of identifier to each query in the UNION. That allows you to "see" which row comes from which query. You stuff that UNION into a CTE or a derived table (same logical effect) and then select from that CTE (or derived table) with the desired filter.
You might consider reading Erland's discussion on dynamic search conditions to understand the performance considerations of this approach. Bookmark Erland's website since it has much useful information.

UNION two SELECT queries but result set is smaller than one of them

In a SQL Server statement there is
SELECT id, book, acnt, prod, category from Table1 <where clause...>
UNION
SELECT id, book, acnt, prod, category from Table2 <where clause...>
The first query returned 131,972 lines of data; the 2nd one, 147,692 lines. I didn't notice there is any commonly shared line of data from these two tables, so I expect the result set after UNION should be the same as the sum of 131972 + 147692 = 279,384.
However the result set after UNION is 133,857. Even though they might have overlapped lines that I accidently missed, the result should be at least the same as the larger result set of those two. I can't figure how the number 133,857 came from.
Is my understanding about SQL UNION correct? I use SQL server in this case.
To expand comment given under the question, which I think states what you already know:
UNION takes care of duplicates also within one table as well.
Just take a look at a example:
SETUP:
create table tbl1 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(3,4);
create table tbl2 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(1,2),
(1,2),
(3,4);
Query
select * from tbl1
union
select * from tbl2;
will produce output
col1 | col2
-----|------
1 | 2
3 | 4
DB fiddle

Select values that don't occur in a table

I'm sure this has been asked somewhere, but I found it difficult to search for.
If I want to get all records where a column value equals one in a list, I'd use the IN operator.
SELECT idSparePart, SparePartName
FROM tabSparePart
WHERE SparePartName IN (
'1234-2043','1237-8026','1238-1036','1238-1039','1223-5172'
)
Suppose this SELECT returns 4 rows although the list has 5 items. How can I select the value that does not occur in the table?
Thanks in advance.
select t.* from (
select '1234-2043' as sparePartName
union select '1237-8026'
union select '1238-1036'
union select '1238-1039'
union select '1223-5172'
) t
where not exists (
select 1 from tabSparePart p WHERE p.SparePartName = t.sparePartName
)
As soon as you mentioned that i have to create a temp table, i remembered my Split-function.
Sorry for answering my own question, but this might be the the best/simplest way for me:
SELECT PartNames.Item
FROM dbo.Split('1234-2043,1237-8026,1238-1036,1238-1039,1223-5172', ',') AS PartNames
LEFT JOIN tabSparePart ON tabSparePart.SparePartName = PartNames.Item
WHERE idSparePart IS NULL
My Split-function:
Help with a sql search query using a comma delimitted parameter
Thank you all anyway.
Update: I misunderstood the question. I guess in that case I would select the values into a temp table, then select the values which are not in that table. Not ideal, I know -- the problem is that you need to get your list of part names to SQL Server somehow (either via IN or putting them in a temp table) but the semantics of IN don't do what you want.
Something like this:
CREATE TABLE tabSparePart
(
SparePartName nvarchar(50)
)
insert into tabSparePart values('1234-2043')
CREATE TABLE #tempSparePartName
(
SparePartName nvarchar(50)
)
insert into #tempSparePartName values('1234-2043')
insert into #tempSparePartName values('1238-1036')
insert into #tempSparePartName values('1237-8026')
select * from #tempSparePartName
where SparePartName not in (select SparePartName from tabSparePart)
With output:
SparePartName
1238-1036
1237-8026
Original (wrong) answer:
You can just use "not in":
SELECT * from tabSparePart WHERE SparePartName NOT in(
'1234-2043','1237-8026','1238-1036','1238-1039','1223-5172'
)
You could try something like this....
declare #test as table
(
items varchar(50)
)
insert into #test
values('1234-2043')
insert into #test
values('1234-2043')
insert into #test
values('1237-8026')
-- the rest of the values --
select * from #test
where items not in (
select theItemId from SparePartName
)
for fun check this out...
http://blogs.microsoft.co.il/blogs/itai/archive/2009/02/01/t-sql-split-function.aspx
It shows you how to take delimited data and return it from a table valued function as separate "rows"... which my make the process of creating the table to select from easier than inserting into a #table or doing a giant select union subquery.

Prevent recursive CTE visiting nodes multiple times

Consider the following simple DAG:
1->2->3->4
And a table, #bar, describing this (I'm using SQL Server 2005):
parent_id child_id
1 2
2 3
3 4
//... other edges, not connected to the subgraph above
Now imagine that I have some other arbitrary criteria that select the first and last edges, i.e. 1->2 and 3->4. I want to use these to find the rest of my graph.
I can write a recursive CTE as follows (I'm using terminology from MSDN):
with foo(parent_id,child_id) as (
// anchor member that happens to select first and last edges:
select parent_id,child_id from #bar where parent_id in (1,3)
union all
// recursive member:
select #bar.* from #bar
join foo on #bar.parent_id = foo.child_id
)
select parent_id,child_id from foo
However, this results in edge 3->4 being selected twice:
parent_id child_id
1 2
3 4
2 3
3 4 // 2nd appearance!
How can I prevent the query from recursing into subgraphs that have already been described? I could achieve this if, in my "recursive member" part of the query, I could reference all data that has been retrieved by the recursive CTE so far (and supply a predicate indicating in the recursive member excluding nodes already visited). However, I think I can access data that was returned by the last iteration of the recursive member only.
This will not scale well when there is a lot of such repetition. Is there a way of preventing this unnecessary additional recursion?
Note that I could use "select distinct" in the last line of my statement to achieve the desired results, but this seems to be applied after all the (repeated) recursion is done, so I don't think this is an ideal solution.
Edit - hainstech suggests stopping recursion by adding a predicate to exclude recursing down paths that were explicitly in the starting set, i.e. recurse only where foo.child_id not in (1,3). That works for the case above only because it simple - all the repeated sections begin within the anchor set of nodes. It doesn't solve the general case where they may not be. e.g., consider adding edges 1->4 and 4->5 to the above set. Edge 4->5 will be captured twice, even with the suggested predicate. :(
The CTE's are recursive.
When your CTE's have multiple initial conditions, that means they also have different recursion stacks, and there is no way to use information from one stack in another stack.
In your example, the recursion stacks will go as follows:
(1) - first IN condition
(1, 2)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3) - no more children
(1, 2) - no more children
(1) - no more children, going to second IN condition
(3) - second condition
(3, 4)
(3) - no more children, returning
As you can see, these recursion stack do not intersect.
You could probably record the visited values in a temporary table, JOIN each value with the temptable and do not follow this value it if it's found, but SQL Server does not support these things.
So you just use SELECT DISTINCT.
This is the approach I used. It has been tested against several methods and was the most performant. It combines the temp table idea suggested by Quassnoi and the use of both distinct and a left join to eliminate redundant paths to the recursion. The level of the recursion is also included.
I left the failed CTE approach in the code so you could compare results.
If someone has a better idea, I'd love to know it.
create table #bar (unique_id int identity(10,10), parent_id int, child_id int)
insert #bar (parent_id, child_id)
SELECT 1,2 UNION ALL
SELECT 2,3 UNION ALL
SELECT 3,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 2,5 UNION ALL
SELECT 5,6
SET NOCOUNT ON
;with foo(unique_id, parent_id,child_id, ord, lvl) as (
-- anchor member that happens to select first and last edges:
select unique_id, parent_id, child_id, row_number() over(order by unique_id), 0
from #bar where parent_id in (1,3)
union all
-- recursive member:
select b.unique_id, b.parent_id, b.child_id, row_number() over(order by b.unique_id), foo.lvl+1
from #bar b
join foo on b.parent_id = foo.child_id
)
select unique_id, parent_id,child_id, ord, lvl from foo
/***********************************
Manual Recursion
***********************************/
Declare #lvl as int
Declare #rows as int
DECLARE #foo as Table(
unique_id int,
parent_id int,
child_id int,
ord int,
lvl int)
--Get anchor condition
INSERT #foo (unique_id, parent_id, child_id, ord, lvl)
select unique_id, parent_id, child_id, row_number() over(order by unique_id), 0
from #bar where parent_id in (1,3)
set #rows=##ROWCOUNT
set #lvl=0
--Do recursion
WHILE #rows > 0
BEGIN
set #lvl = #lvl + 1
INSERT #foo (unique_id, parent_id, child_id, ord, lvl)
SELECT DISTINCT b.unique_id, b.parent_id, b.child_id, row_number() over(order by b.unique_id), #lvl
FROM #bar b
inner join #foo f on b.parent_id = f.child_id
--might be multiple paths to this recursion so eliminate duplicates
left join #foo dup on dup.unique_id = b.unique_id
WHERE f.lvl = #lvl-1 and dup.child_id is null
set #rows=##ROWCOUNT
END
SELECT * from #foo
DROP TABLE #bar
Do you happen to know which of the two edges is on a deeper level in the tree? Because in that case, you could make edge 3->4 the anchor member and start walking up the tree until you find edge 1->2.
Something like this:
with foo(parent_id, child_id)
as
(
select parent_id, child_id
from #bar
where parent_id = 3
union all
select parent_id, child_id
from #bar b
inner join foo f on b.child_id = f.parent_id
where b.parent_id <> 1
)
select *
from foo
(I'm no expert on graphs, just exploring a bit)
The DISTINCT will guarantee each row is distinct, but it won't eliminate graph routes that don't end up in your last edge. Take this graph:
insert into #bar (parent_id,child_id) values (1,2)
insert into #bar (parent_id,child_id) values (1,5)
insert into #bar (parent_id,child_id) values (2,3)
insert into #bar (parent_id,child_id) values (2,6)
insert into #bar (parent_id,child_id) values (6,4)
The results of the query here include (1,5), which is not part of the route from the first edge (1,2) to the last edge (6,4).
You could try something like this, to find only routes that start with (1,2) and end with (6,4):
with foo(parent_id, child_id, route) as (
select parent_id, child_id,
cast(cast(parent_id as varchar) +
cast(child_id as varchar) as varchar(128))
from #bar
union all
select #bar.parent_id, #bar.child_id,
cast(route + cast(#bar.child_id as varchar) as varchar(128))
from #bar
join foo on #bar.parent_id = foo.child_id
)
select * from foo where route like '12%64'
Is this what you want to do?
create table #bar (parent_id int, child_id int)
insert #bar values (1,2)
insert #bar values (2,3)
insert #bar values (3,4)
declare #start_node table (parent_id int)
insert #start_node values (1)
insert #start_node values (3)
;with foo(parent_id,child_id) as (
select
parent_id
,child_id
from #bar where parent_id in (select parent_id from #start_node)
union all
select
#bar.*
from #bar
join foo on #bar.parent_id = foo.child_id
where foo.child_id not in (select parent_id from #start_node)
)
select parent_id,child_id from foo
Edit - #bacar - I don't think this is the temp table solution Quasnoi was proposing. I believe they were suggesting basically duplicate the entire recursion member contents during each recursion, and use that as a join to prevent reprocessing (and that this is not supported in ss2k5). My approach is supported, and the only change to your original is in the predicate in the recursion member to exclude recursing down paths that were explicitly in your starting set. I only added the table variable so that you would define the starting parent_ids in one location, you could just as easily have used this predicate with your original query:
where foo.child_id not in (1,3)
EDIT -- This doesn't work at all. This is a method to stop chasing triangle routes. It doesn't do what the OP wanted.
Or you can use a recursive token separated string.
I'm at home on my laptop ( no sql server ) so this might not be completely right but here goes.....
; WITH NodeNetwork AS (
-- Anchor Definition
SELECT
b.[parent_Id] AS [Parent_ID]
, b.[child_Id] AS [Child_ID]
, CAST(b.[Parent_Id] AS VARCHAR(MAX)) AS [NodePath]
FROM
#bar AS b
-- Recursive Definition
UNION ALL SELECT
b.[Parent_Id]
, b.[child_Id]
, CAST(nn.[NodePath] + '-' + CAST(b.[Parent_Id] AS VARCHAR(MAX)) AS VARCHAR(MAX))
FROM
NodeNetwork AS nn
JOIN #bar AS b ON b.[Parent_Id] = nn.[Child_ID]
WHERE
nn.[NodePath] NOT LIKE '%[-]' + CAST(b.[Parent_Id] AS VARCHAR(MAX)) + '%'
)
SELECT * FROM NodeNetwork
Or similar. Sorry It's late and I can't test it. I'll check on Monday morning. Credit for this must go to Peter Larsson (Peso)
The idea was generated here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=115290

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.