This SELECT query takes 180 seconds to finish - sql

UPDATE:
Just to mention it on a more visible place. When I changed IN for =, the query execution time went from 180 down to 0.00008 seconds. Ridiculous speed difference.
This SQL query takes 180 seconds to finish! How is that possible? is there a way to optimize it to be faster?
SELECT IdLawVersionValidFrom
FROM question_law_version
WHERE IdQuestionLawVersion IN
(
SELECT MAX(IdQuestionLawVersion)
FROM question_law_version
WHERE IdQuestionLaw IN
(
SELECT MIN(IdQuestionLaw)
FROM question_law
WHERE IdQuestion=236 AND IdQuestionLaw>63
)
)
There are only about 5000 rows in each table so it shouldn't be so slow.

(Posting my comment as an answer as apparently it did make a difference!)
Any difference if you change the IN
to =?
If anyone wants to investigate this further I've just done a test and found it very easy to reproduce.
Create Table
CREATE TABLE `filler` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
)
Create Procedure
CREATE PROCEDURE `prc_filler`(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
INSERT
INTO filler
SELECT _cnt;
SET _cnt = _cnt + 1;
END WHILE;
END
Populate Table
call prc_filler(5000)
Query 1
SELECT id
FROM filler
WHERE id = (SELECT MAX(id) FROM filler WHERE id =
( SELECT MIN(id)
FROM filler
WHERE id between 2000 and 3000
)
)
Equals Explain Output http://img689.imageshack.us/img689/5592/equals.png
Query 2 (same problem)
SELECT id
FROM filler
WHERE id in (SELECT MAX(id) FROM filler WHERE id in
( SELECT MIN(id)
FROM filler
WHERE id between 2000 and 3000
)
)
In Explain Output http://img291.imageshack.us/img291/8129/52037513.png

Here is a good explanation why = is better than IN
Mysql has problems with inner queries - not well using indexes (if at all).
Make sure you have indexes on all the fields in the join/where/order etc.
get those Max and MIN values in a separate query (use stored procedure for this entire thing if you want to skip the multiple requests overhead Or just do a request with multiple queries.
Anyway:
SELECT
IdLawVersionValidFrom
FROM
question_law_version
JOIN
question_law
ON
question_law_version.IdQuestionLaw = question_law.IdQuestionLaw
WHERE
question_law.IdQuestion=236
AND
question_law.IdQuestionLaw>63
ORDER BY
IdQuestionLawVersion DESC,
question_law.IdQuestionLaw ASC
LIMIT 1

You can use EXPLAIN to find out how is it possible for a query to execute so slow.
MySQL does not really like nested subselects so probably what happens is that it goes and does sorts on disk to get min and max and fail to reuse results.
Rewriting as joins would probably help it.
If just looking for a quick fix try:
SET #temp1 =
(
SELECT MIN(IdQuestionLaw)
FROM question_law
WHERE IdQuestion = 236 AND IdQuestionLaw > 63
)
SET #temp2 =
(
SELECT MAX(IdQuestionLawVersion)
FROM question_law_version
WHERE IdQuestionLaw = #temp1
)
SELECT IdLawVersionValidFrom
FROM question_law_version
WHERE IdQuestionLawVersion = #temp2

Related

Query performance of T-SQL in SQL Server

I have two ad-hoc query in SQL Server like below,
select *
from Product(nolock)
where id = '12345' and name = 'ABC';
select *
from Product(nolock)
where name = 'ABC' and id = '12345';
We have clustered index on id column and no index on name column. Which query will be faster? And why?
The performance will be the same.
The SQL Server query optimizer is smart enough to see that the conditions are the same - just in different order.
Ordering of the conditions in the WHERE clause isn't relevant.
SQL Server will select foremost by the id, and then in a second step sequentially by name.
In this case, it is irrelevant:
I created a products table with primary key clustered on ID and replicated your test case
Enabling execution plan will give you the answer :)
Code for the test:
drop table if exists products
Create table products
(
id int primary key clustered,
Value1 float
)
;with randowvalues
as(
select 1 id, CAST(RAND(CHECKSUM(NEWID()))*100 as varchar(100)) randomnumber
--select 1 id, RAND(CHECKSUM(NEWID()))*100 randomnumber
union all
select id + 1, CAST(RAND(CHECKSUM(NEWID()))*100 as varchar(100)) randomnumber
from randowvalues
where
id < 1000
)
insert into products
select *
from randowvalues
OPTION(MAXRECURSION 0)
select *
from products
where id = 9 and value1 = 75.6648
select *
from products
where value1 = 75.6648 and id = 9

Doing a join only if count is greater than one

I wonder if the following a bit contrived example is possible without using intermediary variables and a conditional clause.
Consider an intermediary query which can produce a result set that contain either no rows, one row or multiple rows. Most of the time it produces just one row, but when multiple rows, one should join the resulting rows to another table to prune it down to either one or no rows. After this if there is one row (as opposed to no rows), one would want to return multiple columns as produced by the original intermediary query.
I have in my mind something like following, but it won't obviously work (multiple columns in switch-case, no join etc.), but maybe it illustrates the point. What I would like to have is to just return what is currently in the SELECT clause in case ##ROWCOUNT = 1 or in case it is greater, do a INNER JOIN to Auxilliary, which prunes down x to either one row or no rows and then return that. I don't want to search Main more than once and Auxilliary only when x here contains more than one row.
SELECT x.MainId, x.Data1, x.Data2, x.Data3,
CASE
WHEN ##ROWCOUNT IS NOT NULL AND ##ROWCOUNT = 1 THEN
1
WHEN ##ROWCOUNT IS NOT NULL AND ##ROWCOUNT > 1 THEN
-- Use here #id or MainId to join to Auxilliary and there
-- FilteringCondition = #filteringCondition to prune x to either
-- one or zero rows.
END
FROM
(
SELECT
MainId,
Data1,
Data2,
Data3
FROM Main
WHERE
MainId = #id
) AS x;
CREATE TABLE Main
(
-- This Id may introduce more than row, so it is joined to
-- Auxilliary for further pruning with the given conditions.
MainId INT,
Data1 NVARCHAR(MAX) NOT NULL,
Data2 NVARCHAR(MAX) NOT NULL,
Data3 NVARCHAR(MAX) NOT NULL,
AuxilliaryId INT NOT NULL
);
CREATE TABLE Auxilliary
(
AuxilliaryId INT IDENTITY(1, 1) PRIMARY KEY,
FilteringCondition NVARCHAR(1000) NOT NULL
);
Would this be possible in one query without a temporary table variable and a conditional? Without using a CTE?
Some sample data would be
INSERT INTO Auxilliary(FilteringCondition)
VALUES
(N'SomeFilteringCondition1'),
(N'SomeFilteringCondition2'),
(N'SomeFilteringCondition3');
INSERT INTO Main(MainId, Data1, Data2, Data3, AuxilliaryId)
VALUES
(1, N'SomeMainData11', N'SomeMainData12', N'SomeMainData13', 1),
(1, N'SomeMainData21', N'SomeMainData22', N'SomeMainData23', 2),
(2, N'SomeMainData31', N'SomeMainData32', N'SomeMainData33', 3);
And a sample query, which actually behaves as I'd like it to behave with the caveat I'd want to do the join only if querying Main directly with the given ID produces more than one result.
DECLARE #id AS INT = 1;
DECLARE #filteringCondition AS NVARCHAR(1000) = N'SomeFilteringCondition1';
SELECT *
FROM
Main
INNER JOIN Auxilliary AS aux ON aux.AuxilliaryId = Main.AuxilliaryId
WHERE MainId = #id AND aux.FilteringCondition = #filteringCondition;
You don't usually use a join to reduce the result set of the left table. To limit a result set you'd use the where clause instead. In combination with another table this would be WHERE [NOT] EXISTS.
So let's say this is your main query:
select * from main where main.col1 = 1;
It returns one of the following results:
no rows, then we are done
one row, then we are also done
more than one row, then we must extend the where clause
The query with the extended where clause:
select * from main where main.col1 = 1
and exists (select * from other where other.col2 = main.col3);
which returns one of the following results:
no rows, which is okay
one row, which is okay
more than one row - you say this is not possible
So the task is to do this in one step instead. I count records and look for a match in the other table for every record. Then ...
if the count is zero we get no result anyway
if it is one I take that row
if it is greater than one, I take the row for which exists a match in the other table or none when there is no match
Here is the full query:
select *
from
(
select
main.*,
count(*) over () as cnt,
case when exists (select * from other where other.col2 = main.col3) then 1 else 0 end
as other_exists
from main
where main.col1 = 1
) counted_and_checked
where cnt = 1 or other_exists = 1;
UPDATE: I understand that you want to avoid unnecessary access to the other table. This is rather difficult to do however.
In order to only use the subquery when necessary, we could move it outside:
select *
from
(
select
main.*,
count(*) over () as cnt
from main
where main.col1 = 1
) counted_and_checked
where cnt = 1 or exists (select * from other where other.col2 = main.col3);
This looks much better in my opinion. However there is no precedence among the two expressions left and right of an OR. So the DBMS may still execute the subselect on every record before evaluating cnt = 1.
The only operation that I know of using left to right precedence, i.e. doesn't look further once a condition on the left hand side is matched is COALESCE. So we could do the following:
select *
from
(
select
main.*,
count(*) over () as cnt
from main
where main.col1 = 1
) counted_and_checked
where coalesce( case when cnt = 1 then 1 else null end ,
(select count(*) from other where other.col2 = main.col3)
) > 0;
This may look a bit strange, but should prevent the subquery from being executed, when cnt is 1.
You may try something like
select * from Main m
where mainId=#id
and #filteringCondition = case when(select count(*) from Main m2 where m2.mainId=#id) >1
then (select filteringCondition from Auxilliary a where a.AuxilliaryId = m.AuxilliaryId) else #filteringCondition end
but it's hardly very fast query. I'd better use temp table or just if and two queries.

Numeric Overflow in Recursive Query : Teradata

I'm new to teradata. I want to insert numbers 1 to 1000 into the table test_seq, which is created as below.
create table test_seq(
seq_id integer
);
After searching on this site, I came up with recusrive query to insert the numbers.
insert into test_seq(seq_id)
with recursive cte(id) as (
select 1 from test_dual
union all
select id + 1 from cte
where id + 1 <= 1000
)
select id from cte;
test_dual is created as follows and it contains just a single value. (something like DUAL in Oracle)
create table test_dual(
test_dummy varchar(1)
);
insert into test_dual values ('X');
But, when I run the insert statement, I get the error, Failure 2616 Numeric overflow occurred during computation.
What did I do wrong here? Isn't the integer datatype enough to hold numeric value 1000?
Also, is there a way to write the query so that i can do away with test_dual table?
When you simply write 1 the parser assigns the best matching datatype to it, which is a BYTEINT. The valid range of values for BYTEINT is -128 to 127, so just add a typecast to INT :-)
Usually you don't need a dummy DUAL table in Teradata, "SELECT 1;" is valid, but in some cases the parser still insists on a FROM (don't ask me why). This trick should work:
SEL * FROM (SELECT 1 AS x) AS dt;
You can create a view on this:
REPLACE VIEW oDUAL AS SELECT * FROM (SELECT 'X' AS dummy) AS dt;
Explain "SELECT 1 FROM oDUAL;" is a bit stupid, so a real table might be better. But to get efficient access (= single AMP/single row) it must be defined as follows:
CREATE TABLE dual_tbl(
dummy VARCHAR(1) CHECK ( dummy = 'X')
) UNIQUE PRIMARY INDEX(dummy); -- i remember having fun when you inserted another row in Oracle's DUAL :_)
INSERT INTO dual_tbl VALUES ('X');
REPLACE VIEW oDUAL AS SELECT dummy FROM dual_tbl WHERE dummy = 'X';
insert into test_seq(seq_id)
with recursive cte(id) as (
select cast(1 as int) from oDUAL
union all
select id + 1 from cte
where id + 1 <= 1000
)
select id from cte;
But recursion is not an appropriate way to get a range of numbers as it's sequential and always an "all-AMP step" even if it the data resides on a single AMP like in this case.
If it's less than 73414 values (201 years) better use sys_calendar.calendar (or any other table with a known sequence of numbers) :
SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000;
Otherwise use CROSS joins, e.g. to get numbers from 1 to 1,000,000:
WITH cte (i) AS
( SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000
)
SELECT
(t2.i - 1) * 1000 + t1.i
FROM cte AS t1 CROSS JOIN cte AS t2;

Make SQL Select same row multiple times

I need to test my mail server. How can I make a Select statement
that selects say ID=5469 a thousand times.
If I get your meaning then a very simple way is to cross join on a derived query on a table with more than 1000 rows in it and put a top 1000 on that. This would duplicate your results 1000 times.
EDIT: As an example (This is MSSQL, I don't know if Access is much different)
SELECT
MyTable.*
FROM
MyTable
CROSS JOIN
(
SELECT TOP 1000
*
FROM
sysobjects
) [BigTable]
WHERE
MyTable.ID = 1234
You can use the UNION ALL statement.
Try something like:
SELECT * FROM tablename WHERE ID = 5469
UNION ALL
SELECT * FROM tablename WHERE ID = 5469
You'd have to repeat the SELECT statement a bunch of times but you could write a bit of VB code in Access to create a dynamic SQL statement and then execute it. Not pretty but it should work.
Create a helper table for this purpose:
JUST_NUMBER(NUM INT primary key)
Insert (with the help of some (VB) script) numbers from 1 to N. Then execute this unjoined query:
SELECT MYTABLE.*
FROM MYTABLE,
JUST_NUMBER
WHERE MYTABLE.ID = 5469
AND JUST_NUMBER.NUM <= 1000
Here's a way of using a recursive common table expression to generate some empty rows, then to cross join them back onto your desired row:
declare #myData table (val int) ;
insert #myData values (666),(888),(777) --some dummy data
;with cte as
(
select 100 as a
union all
select a-1 from cte where a>0
--generate 100 rows, the max recursion depth
)
,someRows as
(
select top 1000 0 a from cte,cte x1,cte x2
--xjoin the hundred rows a few times
--to generate 1030301 rows, then select top n rows
)
select m.* from #myData m,someRows where m.val=666
substitute #myData for your real table, and alter the final predicate to suit.
easy way...
This exists only one row into the DB
sku = 52 , description = Skullcandy Inkd Green ,price = 50,00
Try to relate another table in which has no constraint key to the main table
Original Query
SELECT Prod_SKU , Prod_Descr , Prod_Price FROM dbo.TB_Prod WHERE Prod_SKU = N'52'
The Functional Query ...adding a not related table called 'dbo.TB_Labels'
SELECT TOP ('times') Prod_SKU , Prod_Descr , Prod_Price FROM dbo.TB_Prod,dbo.TB_Labels WHERE Prod_SKU = N'52'
In postgres there is a nice function called generate_series. So in postgreSQL it is as simple as:
select information from test_table, generate_series(1, 1000) where id = 5469
In this way, the query is executed 1000 times.
Example for postgreSQL:
CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; --To be able to use function uuid_generate_v4()
--Create a test table
create table test_table (
id serial not null,
uid UUID NOT NULL,
CONSTRAINT uid_pk PRIMARY KEY(id));
-- Insert 10000 rows
insert into test_table (uid)
select uuid_generate_v4() from generate_series(1, 10000);
-- Read the data from id=5469 one thousand times
select id, uid, uuid_generate_v4() from test_table, generate_series(1, 1000) where id = 5469;
As you can see in the result below, the data from uid is read 1000 times as confirmed by the generation of a new uuid at every new row.
id |uid |uuid_generate_v4
----------------------------------------------------------------------------------------
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"5630cd0d-ee47-4d92-9ee3-b373ec04756f"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"ed44b9cb-c57f-4a5b-ac9a-55bd57459c02"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"3428b3e3-3bb2-4e41-b2ca-baa3243024d9"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"7c8faf33-b30c-4bfa-96c8-1313a4f6ce7c"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"b589fd8a-fec2-4971-95e1-283a31443d73"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"8b9ab121-caa4-4015-83f5-0c2911a58640"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"7ef63128-b17c-4188-8056-c99035e16c11"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"5bdc7425-e14c-4c85-a25e-d99b27ae8b9f"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"9bbd260b-8b83-4fa5-9104-6fc3495f68f3"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"c1f759e1-c673-41ef-b009-51fed587353c"
5469|"10791df5-ab72-43b6-b0a5-6b128518e5ee"|"4a70bf2b-ddf5-4c42-9789-5e48e2aec441"
Of course other DBs won't necessarily have the same function but it could be done:
See here.
If your are doing this in sql Server
declare #cnt int
set #cnt = 0
while #cnt < 1000
begin
select '12345'
set #cnt = #cnt + 1
end
select '12345' can be any expression
Repeat rows based on column value of TestTable. First run the Create table and insert statement, then run the following query for the desired result.
This may be another solution:
CREATE TABLE TestTable
(
ID INT IDENTITY(1,1),
Col1 varchar(10),
Repeats INT
)
INSERT INTO TESTTABLE
VALUES ('A',2), ('B',4),('C',1),('D',0)
WITH x AS
(
SELECT TOP (SELECT MAX(Repeats)+1 FROM TestTable) rn = ROW_NUMBER()
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
SELECT * FROM x
CROSS JOIN TestTable AS d
WHERE x.rn <= d.Repeats
ORDER BY Col1;
This trick helped me in my requirement.
here, PRODUCTDETAILS is my Datatable
and orderid is my column.
declare #Req_Rows int = 12
;WITH cte AS
(
SELECT 1 AS Number
UNION ALL
SELECT Number + 1 FROM cte WHERE Number < #Req_Rows
)
SELECT PRODUCTDETAILS.*
FROM cte, PRODUCTDETAILS
WHERE PRODUCTDETAILS.orderid = 3
create table #tmp1 (id int, fld varchar(max))
insert into #tmp1 (id, fld)
values (1,'hello!'),(2,'world'),(3,'nice day!')
select * from #tmp1
go
select * from #tmp1 where id=3
go 1000
drop table #tmp1
in sql server try:
print 'wow'
go 5
output:
Beginning execution loop
wow
wow
wow
wow
wow
Batch execution completed 5 times.
The easy way is to create a table with 1000 rows. Let's call it BigTable. Then you would query for the data you want and join it with the big table, like this:
SELECT MyTable.*
FROM MyTable, BigTable
WHERE MyTable.ID = 5469

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.