Improving performance of Union All query - sql

I have a pretty extended Union All query with Union All used 34 times as follows:
INSERT INTO #Temp2
(RowNumber,ValFromUser,ColumnName,ValFromFunc,FuncWeight,percentage)
SELECT RowNumber,#firstname,'firstname',
PercentMatch,fw1.FunctionWeight,
PercentMatch * fw1.FunctionWeight
FROM
dbo.MatchFirstName(#firstname, #checkbool) mfn
CROSS JOIN
dbo.FunctionWeights fw1
WHERE
#firstname is not null and fw1.FunctionId = 1
UNION ALL
SELECT RowNumber, #MiddleName,'Middlename',
PercentMatch, fw2.FunctionWeight,
PercentMatch * fw2.FunctionWeight
FROM
dbo.MatchMiddleName(#MiddleName, #checkbool) mmn
CROSS JOIN
dbo.FunctionWeights fw2
WHERE
#MiddleName IS NOT NULL AND
fw2.FunctionId = 2
UNION ALL
SELECT RowNumber, #LastName,'Lastname',
PercentMatch, fw3.FunctionWeight,
PercentMatch * fw3.FunctionWeight
FROM
dbo.MatchLastName(#LastName, #checkbool) mln
CROSS JOIN
dbo.FunctionWeights fw3
WHERE
#LastName IS NOT NULL AND
fw3.FunctionId = 3
UNION ALL...........
and so on.
dbo.MatchFirstName is an inline TVF which looks like this :
CREATE FUNCTION MatchFirstName (#FirstNameFromUser nvarchar(20) , #checkbool int)
RETURNS TABLE AS RETURN
select RowId As RowNumber, PercentMatch from dbo.MATCH(#FirstNameFromUser, 'firstname' , #checkbool )
GO
It is calling a dbo.Match function which is a multiline UDF and has some logic contained in it for calculating the percentage match based on field names passed as parameters to it.
34 Union All's means collaboration of results from 35 Inline TVF's using select query based on 35 different columns individually in them.All these belong to one table.
Now since I can not afford to have 35 indexes but yet I want to avoid table scans as this query is taking pretty long.
Is there a work around to improve performance here ?

Related

SQL Server: ORDER BY parameters in IN statement

I have a SQL statement that is the following:
SELECT A.ID, A.Name
FROM Properties A
WHERE A.ID IN (110, 105, 104, 106)
When I run this SQL statement, the output is ordered according to the IN list by ID automatically and returns
104 West
105 East
106 North
110 South
I want to know if it is possible to order by the order the parameters are listed within the IN clause. so it would return
110 South
105 East
104 West
106 North
I think the easiest way in SQL Server is to use a JOIN with VALUES:
SELECT p.ID, p.Name
FROM Properties p JOIN
(VALUES (110, 1), (105, 2), (104, 3), (106, 4)) ids(id, ordering)
ON p.id = a.id
ORDER BY ids.ordering;
Sure...
just add an Order clause with a case in it
SELECT A.ID, A.Name
FROM Properties A
WHERE A.ID IN (110,105,104,106)
Order By case A.ID
when 110 then 0
when 105 then 1
when 104 then 2
when 106 then 3 end
With the help of a parsing function which returns the sequence as well
SELECT B.Key_PS
, A.ID
, A.Name
FROM Properties A
Join (Select * from [dbo].[udf-Str-Parse]('110,105,104,106',',')) B on A.ID=B.Key_Value
WHERE A.ID IN (110,105,104,106)
Order by Key_PS
The UDF if you need
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#Delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
-- Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
-- Select * from [dbo].[udf-Str-Parse]('id26,id46|id658,id967','|')
-- Select * from [dbo].[udf-Str-Parse]('hello world. It. is. . raining.today','.')
Returns #ReturnTable Table (Key_PS int IDENTITY(1,1), Key_Value varchar(max))
As
Begin
Declare #XML xml;Set #XML = Cast('<x>' + Replace(#String,#Delimeter,'</x><x>')+'</x>' as XML)
Insert Into #ReturnTable Select Key_Value = ltrim(rtrim(String.value('.', 'varchar(max)'))) FROM #XML.nodes('x') as T(String)
Return
End
The Parser alone would return
Select * from [dbo].[udf-Str-Parse]('110,105,104,106',',')
Key_PS Key_Value
1 110
2 105
3 104
4 106
What you could potentially do is:
Create a TVF that would split string and keep original order.
This questions seems to have this function already written: MS SQL: Select from a split string and retain the original order (keep in mind that there might be other approaches, not only those, covered in this question, I just gave it as an example to understand what function should do)
So now if you'd run this query:
SELECT *
FROM dbo.Split('110,105,104,106', ',') AS T;
It would bring back this table as a result.
items rownum
------------
110 1
105 2
104 3
106 4
Following that, you could simply query your table, join with this TVF passing your IDs as a parameter:
SELECT P.ID, P.Name
FROM Properties AS P
INNER JOIN dbo.Split('110,105,104,106', ',') AS T
ON T.items = P.ID
ORDER BY T.rownum;
This should retain order of parameters.
If you need better performance, I'd advice to put records from TVF into hash table, index it and then join with actual table. See query below:
SELECT T.items AS ID, T.rownum AS SortOrder
INTO #Temporary
FROM dbo.Split('110,105,104,106', ',') AS T;
CREATE CLUSTERED INDEX idx_Temporary_ID
ON #Temporary(ID);
SELECT P.ID, P.Name
FROM Properties AS P
INNER JOIN #Temporary AS T
ON T.ID = P.ID
ORDER BY T.SortOrder;
This should work better on larger data sets and equally well on small ones.
Here is a solution that does not rely on hard codes values or dynamic sql (to eliminate hard coding values).
I would build a table (maybe temp or variable) with OrderByValue and OrderBySort and insert from the application.
OrderByValue OrderBySort
110 1
105 2
104 3
106 4
Then I would join on the value and sort by the sort. The join will be the same as the In clause.
SELECT A.ID, A.Name
FROM Properties A
JOIN TempTable B On A.ID = B.OrderByValue
Order By B.OrderBySort
Another solution for this problem is prepare a temporary table for IN clause like
declare #InTable table (ID int, SortOrder int not null identity(1,1));
We can fill this temp table with your data in order you want
insert into #InTable values (110), (105), (104), (106);
At last we need to modify your question to use this temp table like this
select A.ID, A.Name
from Properties A
inner join #InTable as Sort on A.ID = Sort.ID
order by Sort.SortOrder
On the output you can see this
ID Name
110 South
105 East
104 West
106 North
In this solution you don't need to provide order in special way. You just need to insert values in order you want.

how to get some records in first row in sql

I have q query like this:
Select WarehouseCode from [tbl_VW_Epicor_Warehse]
my output looks like this
WarehouseCode
Main
Mfg
SP
W01
W02
W03
W04
W05
But sometimes I want to get W04 as the first record, sometimes I want to get W01 as the first record .
How I can write a query to get some records in first row??
Any help is appreciated
you could try and select the row with the code you want to appear first by specifying a where condition to select that row alone then you can union all another select with all other rows that doesn't have this name
as follows
SELECT WarehouseCode FROM Table WHERE WarehouseCode ='W04'
UNION ALL
SELECT WarehouseCode FROM Table WHERE WarehouseCode <>'W04'
Use a parameter to choose the top row, which can be passed to your query as required, and sort by a column calculated on whether the value matches the parameter; something like the ORDER BY clause in the following:
DECLARE #Warehouses TABLE (Id INT NOT NULL, Label VARCHAR(3))
INSERT #Warehouses VALUES
(1,'W01')
,(2,'W02')
,(3,'W03')
DECLARE #TopRow VARCHAR(3) = 'W02'
SELECT *
FROM #Warehouses
ORDER BY CASE Label WHEN #TopRow THEN 1 ELSE 0 END DESC
May be you don't need to store this list in table? And you want something like this?
SELECT * FROM (VALUES ('WarehouseCode'),
('Main'),
('Mfg'),
('SP'),
('W01'),
('W02'),
('W03'),
('W04'),
('W05')) as v(s)
Here you can change order manually as you want.
As commented by #ankit bajpai
You are looking for Custom sorting that is achieve by CASE with ORDER BY statement
Whenever you want WAo4 on top you can use
ORDER BY Case When col = 'W04' THEN 1 ELSE 2 END
Example below:
Select col from
(
select 'Main' col union ALL
select 'Mfg' union ALL
select 'SP' union ALL
select 'W01' union ALL
select 'W02' union ALL
select 'W03' union ALL
select 'W04' union ALL
select 'W05'
) randomtable
ORDER BY Case When col = 'W04' THEN 1 ELSE 2 END
EDIT: AFTER MARKED AS ANSWER
IN support of #Maha Khairy because that IS MARKED AS ANSWER and the only answer which is DIFFRENT
rest all are pushing OP to use "ORDER by with case statements"
let`s use UNION ALL APPROCH
create table #testtable (somedata varchar(10))
insert into #testtable
Select col from
(
select 'W05' col union ALL
select 'Main' union ALL
select 'Mfg' union ALL
select 'SP' union ALL
select 'W01' union ALL
select 'W02' union ALL
select 'W03' union ALL
select 'W04'
) randomtable
Select * From #testtable where somedata = 'W04'
Union ALL
Select * From #testtable where somedata <> 'W04'
The result set is rendering data to the grid as requested the OP
idia is to get first all rows where equal to 'W04' is and then
not equal to 'W04' and then concatinate the result. so that rows 'W04'
will always be on the top because its used in the query first, fair enough.
, but that is not the only point to use (custom sorting/sorting) in that fasion there is one another
and a major one that is PERFORMANCE
yes
"case with order by" will never able to take advantages of KEY but Union ALL will be, to explore it more buld the test table
and check the diffrence
CREATE TABLE #Orders
(
OrderID integer NOT NULL IDENTITY(1,1),
CustID integer NOT NULL,
StoreID integer NOT NULL,
Amount float NOT NULL,
makesrowfat nchar(4000)
);
GO
-- Sample data
WITH
Cte0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
Cte1 AS (SELECT 1 AS C FROM Cte0 AS A, Cte0 AS B),--4 rows
Cte2 AS (SELECT 1 AS C FROM Cte1 AS A ,Cte1 AS B),--16 rows
Cte3 AS (SELECT 1 AS C FROM Cte2 AS A ,Cte2 AS B),--256 rows
Cte4 AS (SELECT 1 AS C FROM Cte3 AS A ,Cte3 AS B),--65536 rows
Cte5 AS (SELECT 1 AS C FROM Cte4 AS A ,Cte2 AS B),--1048576 rows
FinalCte AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS Number FROM Cte5)
INSERT #Orders
(CustID, StoreID, Amount)
SELECT
CustID = Number / 10,
StoreID = Number % 4,
Amount = 1000 * RAND(Number)
FROM FinalCte
WHERE
Number <= 1000000;
GO
lets now do the same for custid "93190"
Create NONclustered Index IX_CustID_Orders ON #Orders (CustID)
INCLUDE (OrderID ,StoreID, Amount ,makesrowfat )
WARM CHACHE RESULTS
SET STATISTICS TIME ON
DECLARE #OrderID integer
DECLARE #CustID integer
DECLARE #StoreID integer
DECLARE #Amount float
DECLARE #makesrowfat nchar(4000)
Select #OrderID =OrderID ,
#CustID =CustID ,
#StoreID =StoreID ,
#Amount =Amount ,
#makesrowfat=makesrowfat
FROM
(
Select * From #Orders where custid =93190
Union ALL
Select * From #Orders where custid <>93190
)TLB
**
--elapsed time =2571 ms.
**
DECLARE #OrderID integer
DECLARE #CustID integer
DECLARE #StoreID integer
DECLARE #Amount float
DECLARE #makesrowfat nchar(4000)
Select #OrderID =OrderID ,
#CustID =CustID ,
#StoreID =StoreID ,
#Amount =Amount ,
#makesrowfat=makesrowfat
From #Orders
ORDER BY Case When custid = 93190 THEN 1 ELSE 2 END
elapsed time = 70616 ms
**
UNION ALL performance 2571 ms. ORDER BY CASE performance
70616 ms
**
UNION ALL is a clear winner ORDER BY IS not ever nearby in performance
BUT we forgot that SQL is declarative language,
we have no direct control over how data has fetch by the sql, there is a software code ( that changes with the releases)
IN between
user and sql server database engine, which is SQL SERVER OPTIMIZER that is coded to get the data set
as specified by the USER and its has responsibility to get the data with least amount of resources. so there are chances
that you wont get ALWAYS the result in order until you specify ORDER BY
some other references:
#Damien_The_Unbeliever
Why would anyone offer an ordering guarantee except when an ORDER BY clause is included? -
there's an obvious opportunity for parallelism (if sufficient resources are available) to compute each result set in parallel and serve each result row
(from the parallel queries) to the client in whatever order each individual result row becomes available. –
Conor Cunningham:
If you need order in your query results, put in an ORDER BY. It's that simple. Anything else is like riding in a car without a seatbelt.
ALL COMMENTS AND EDIT ARE WELCOME

More efficient way of doing multiple joins to the same table and a "case when" in the select

At my organization clients can be enrolled in multiple programs at one time. I have a table with a list of all of the programs a client has been enrolled as unique rows in and the dates they were enrolled in that program.
Using an External join I can take any client name and a date from a table (say a table of tests that the clients have completed) and have it return all of the programs that client was in on that particular date. If a client was in multiple programs on that date it duplicates the data from that table for each program they were in on that date.
The problem I have is that I am looking for it to only return one program as their "Primary Program" for each client and date even if they were in multiple programs on that date. I have created a hierarchy for which program should be selected as their primary program and returned.
For Example:
1.)Inpatient
2.)Outpatient Clinical
3.)Outpatient Vocational
4.)Outpatient Recreational
So if a client was enrolled in Outpatient Clinical, Outpatient Vocational, Outpatient Recreational at the same time on that date it would only return "Outpatient Clinical" as the program.
My way of thinking for doing this would be to join to the table with the previous programs multiple times like this:
FROM dbo.TestTable as TestTable
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms1
ON TestTable.date = PreviousPrograms1.date AND PreviousPrograms1.type = 'Inpatient'
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms2
ON TestTable.date = PreviousPrograms2.date AND PreviousPrograms2.type = 'Outpatient Clinical'
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms3
ON TestTable.date = PreviousPrograms3.date AND PreviousPrograms3.type = 'Outpatient Vocational'
LEFT OUTER JOIN dbo.PreviousPrograms as PreviousPrograms4
ON TestTable.date = PreviousPrograms4.date AND PreviousPrograms4.type = 'Outpatient Recreational'
and then do a condition CASE WHEN in the SELECT statement as such:
SELECT
CASE
WHEN PreviousPrograms1.name IS NOT NULL
THEN PreviousPrograms1.name
WHEN PreviousPrograms1.name IS NULL AND PreviousPrograms2.name IS NOT NULL
THEN PreviousPrograms2.name
WHEN PreviousPrograms1.name IS NULL AND PreviousPrograms2.name IS NULL AND PreviousPrograms3.name IS NOT NULL
THEN PreviousPrograms3.name
WHEN PreviousPrograms1.name IS NULL AND PreviousPrograms2.name IS NULL AND PreviousPrograms3.name IS NOT NULL AND PreviousPrograms4.name IS NOT NULL
THEN PreviousPrograms4.name
ELSE NULL
END as PrimaryProgram
The bigger problem is that in my actual table there are a lot more than just four possible programs it could be and the CASE WHEN select statement and the JOINs are already cumbersome enough.
Is there a more efficient way to do either the SELECTs part or the JOIN part? Or possibly a better way to do it all together?
I'm using SQL Server 2008.
You can simplify (replace) your CASE by using COALESCE() instead:
SELECT
COALESCE(PreviousPrograms1.name, PreviousPrograms2.name,
PreviousPrograms3.name, PreviousPrograms4.name) AS PreviousProgram
COALESCE() returns the first non-null value.
Due to your design, you still need the JOINs, but it would be much easier to read if you used very short aliases, for example PP1 instead of PreviousPrograms1 - it's just a lot less code noise.
You can simplify the Join by using a bridge table containing all the program types and their priority (my sql server syntax is a bit rusty):
create table BridgeTable (
programType varchar(30),
programPriority smallint
);
This table will hold all the program types and the program priority will reflect the priority you've specified in your question.
As for the part of the case, that will depend on the number of records involved. One of the tricks that I usually do is this (assuming programPriority is a number between 10 and 99 and no type can have more than 30 bytes, because I'm being lazy):
Select patient, date,
substr( min(cast(BridgeTable.programPriority as varchar) || PreviousPrograms.type), 3, 30)
From dbo.TestTable as TestTable
Inner Join dbo.BridgeTable as BridgeTable
Left Outer Join dbo.PreviousPrograms as PreviousPrograms
on PreviousPrograms.type = BridgeTable.programType
and TestTable.date = PreviousPrograms.date
Group by patient, date
You can achieve this using sub-queries, or you could refactor it to use CTEs, take a look at the following and see if it makes sense:
DECLARE #testTable TABLE
(
[id] INT IDENTITY(1, 1),
[date] datetime
)
DECLARE #previousPrograms TABLE
(
[id] INT IDENTITY(1,1),
[date] datetime,
[type] varchar(50)
)
INSERT INTO #testTable ([date])
SELECT '2013-08-08'
UNION ALL SELECT '2013-08-07'
UNION ALL SELECT '2013-08-06'
INSERT INTO #previousPrograms ([date], [type])
-- a sample user as an inpatient
SELECT '2013-08-08', 'Inpatient'
-- your use case of someone being enrolled in all 3 outpation programs
UNION ALL SELECT '2013-08-07', 'Outpatient Recreational'
UNION ALL SELECT '2013-08-07', 'Outpatient Clinical'
UNION ALL SELECT '2013-08-07', 'Outpatient Vocational'
-- showing our workings, this is what we'll join to
SELECT
PPP.[date],
PPP.[type],
ROW_NUMBER() OVER (PARTITION BY PPP.[date] ORDER BY PPP.[Priority]) AS [RowNumber]
FROM (
SELECT
[type],
[date],
CASE
WHEN [type] = 'Inpatient' THEN 1
WHEN [type] = 'Outpatient Clinical' THEN 2
WHEN [type] = 'Outpatient Vocational' THEN 3
WHEN [type] = 'Outpatient Recreational' THEN 4
ELSE 999
END AS [Priority]
FROM #previousPrograms
) PPP -- Previous Programs w/ Priority
SELECT
T.[date],
PPPO.[type]
FROM #testTable T
LEFT JOIN (
SELECT
PPP.[date],
PPP.[type],
ROW_NUMBER() OVER (PARTITION BY PPP.[date] ORDER BY PPP.[Priority]) AS [RowNumber]
FROM (
SELECT
[type],
[date],
CASE
WHEN [type] = 'Inpatient' THEN 1
WHEN [type] = 'Outpatient Clinical' THEN 2
WHEN [type] = 'Outpatient Vocational' THEN 3
WHEN [type] = 'Outpatient Recreational' THEN 4
ELSE 999
END AS [Priority]
FROM #previousPrograms
) PPP -- Previous Programs w/ Priority
) PPPO -- Previous Programs w/ Priority + Order
ON T.[date] = PPPO.[date] AND PPPO.[RowNumber] = 1
Basically we have our deepest sub-select giving all PreviousPrograms a priority based on type, then our wrapping sub-select gives them row numbers per date so we can select only the ones with a row number of 1.
I am guessing you would need to include a UR Number or some other patient identifier, simply add that as an output to both sub-selects and change the join.

Can this ORDER BY on a CASE clause be made faster?

I'm selecting results from a table of ~350 million records, and it's running extremely slowly - around 10 minutes. The culprit seems to be the ORDER BY, as if I remove it the query only takes a moment. Here's the gist:
SELECT TOP 100
(columns snipped)
FROM (
SELECT
CASE WHEN (e2.ID IS NULL) THEN
CAST(0 AS BIT) ELSE CAST(1 AS BIT) END AS RecordExists,
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
) AS p1
ORDER BY p1.RecordExists
Basically, I'm ordering the results by whether Files have a corresponding Record, as those without need to be handled first. I could run two queries with WHERE clauses, but I'd rather do it in a single query if possible.
Is there any way to speed this up?
The ultimate issue is that the use of CASE in the sub-query introduces an ORDER BY over something that is not being used in a sargable manner. Thus the entire intermediate result-set must first be ordered to find the TOP 100 - this is all 350+ million records!2
In this particular case, moving the CASE to the outside SELECT and use a DESC ordering (to put NULL values, which means "0" in the current RecordExists, first) should do the trick1. It's not a generic approach, though .. but the ordering should be much, much faster iff Files.ID is indexed. (If the query is still slow, consult the query plan to find out why ORDER BY is not using an index.)
Another alternative might be to include a persisted computed column for RecordExists (that is also indexed) that can be used as an index in the ORDER BY.
Once again, the idea is that the ORDER BY works over something sargable, which only requires reading sequentially inside the index (up to the desired number of records to match the outside limit) and not ordering 350+ million records on-the-fly :)
SQL Server is then able to push this ordering (and limit) down into the sub-query, instead of waiting for the intermediate result-set of the sub-query to come up. Look at the query plan differences based on what is being ordered.
1 Example:
SELECT TOP 100
-- If needed
CASE WHEN (p1.ID IS NULL) THEN
CAST(0 AS BIT) ELSE CAST(1 AS BIT) END AS RecordExists,
(columns snipped)
FROM (
SELECT
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
) AS p1
-- Hopefully ID is indexed, DESC makes NULLs (!RecordExists) go first
ORDER BY p1.ID DESC
2 Actually, it seems like it could hypothetically just stop after the first 100 0's without a full-sort .. at least under some extreme query planner optimization under a closed function range, but that depends on when the 0's are encountered in the intermediate result set (in the first few thousand or not until the hundreds of millions or never?). I highly doubt SQL Server accounts for this extreme case anyway; that is, don't count on this still non-sargable behavior.
Give this form a try
SELECT TOP(100) *
FROM (
SELECT TOP(100)
0 AS RecordExists
--,(columns snipped)
FROM dbo.Files AS e1
WHERE NOT EXISTS (SELECT * FROM dbo.Records e2 WHERE e1.FID = e2.FID)
ORDER BY SecondaryOrderColumn
) X
UNION ALL
SELECT * FROM (
SELECT TOP(100)
1 AS RecordExists
--,(columns snipped)
FROM dbo.Files AS e1
INNER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
ORDER BY SecondaryOrderColumn
) X
ORDER BY SecondaryOrderColumn
Key indexes:
Records (FID)
Files (FID, SecondaryOrdercolumn)
Well the reason it is much slower is because it is really a very different query without the order by clause.
With the order by clause:
Find all matching records out of the entire 350 million rows. Then sort them.
Without the order by clause:
Find the first 100 matching records. Stop.
Q: If you say the only difference is "with/outout" the "order by", then could you somehow move the "top 100" into the inner select?
EXAMPLE:
SELECT
(columns snipped)
FROM (
SELECT TOP 100
CASE WHEN (e2.ID IS NULL) THEN
CAST(0 AS BIT) ELSE CAST(1 AS BIT) END AS RecordExists,
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
) AS p1
ORDER BY p1.RecordExists
In SQL Server, null values collate lower than any value in the domain. Given these two tables:
create table dbo.foo
(
id int not null identity(1,1) primary key clustered ,
name varchar(32) not null unique nonclustered ,
)
insert dbo.foo ( name ) values ( 'alpha' )
insert dbo.foo ( name ) values ( 'bravo' )
insert dbo.foo ( name ) values ( 'charlie' )
insert dbo.foo ( name ) values ( 'delta' )
insert dbo.foo ( name ) values ( 'echo' )
insert dbo.foo ( name ) values ( 'foxtrot' )
go
create table dbo.bar
(
id int not null identity(1,1) primary key clustered ,
foo_id int null foreign key references dbo.foo(id) ,
name varchar(32) not null unique nonclustered ,
)
go
insert dbo.bar( foo_id , name ) values( 1 , 'golf' )
insert dbo.bar( foo_id , name ) values( 5 , 'hotel' )
insert dbo.bar( foo_id , name ) values( 3 , 'india' )
insert dbo.bar( foo_id , name ) values( 5 , 'juliet' )
insert dbo.bar( foo_id , name ) values( 6 , 'kilo' )
go
The query
select *
from dbo.foo foo
left join dbo.bar bar on bar.foo_id = foo.id
order by bar.foo_id, foo.id
yields the following result set:
id name id foo_id name
-- ------- ---- ------ -------
2 bravo NULL NULL NULL
4 delta NULL NULL NULL
1 alpha 1 1 golf
3 charlie 3 3 india
5 echo 2 5 hotel
5 echo 4 5 juliet
6 foxtrot 5 6 kilo
(7 row(s) affected)
This should allow the query optimizer to use a suitable index (if such exists); however, it does not guarantee than any such index would be used.
Can you try this?
SELECT TOP 100
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
ORDER BY e2.ID ASC
This should give you where e2.ID is null first. Also, make sure Records.ID is indexed. This should give you the ordering you were wanting.

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.