NOT EXISTS returning no rows - sql

This is not a question about the difference between NOT EXISTS and IN.
I just can't seem to get this sql working right.
The logic seems ok, but I'm missing something.
And I have searched everywhere, even in my years and years of notes.
One table named CompanyAccountantRef has 3 fields. ID, CompanyID, and AccountantID.
Currently in the table:
ID CompanyID AccountantID
8 6706 346388
9 6706 346256
10 6706 26263
11 363392 358951
Then this sql is not bringing back the correct row:
DECLARE #CompanyID INT = 363392
DECLARE #AccountIDs TABLE (ID INT)
INSERT INTO #AccountIDs (ID) VALUES (358951)
INSERT INTO #AccountIDs (ID) VALUES (26263)
SELECT #CompanyID AS CompanyID, a.ID
FROM #AccountIDs a
WHERE NOT EXISTS(
SELECT *
FROM CompanyAccountantRef
WHERE CompanyID = #CompanyID
AND AccountantID IN (SELECT ID FROM #AccountIDs))
It should bring back
CompanyID AccountantID
363392 26263
And yes, an accountant can have more than one company.
What am I missing here? Is it the use of the IN that breaks it?
I've tried several different way including joins with no luck.
Thanks.

I'm not sure why you are confused. Consider the subquery:
SELECT *
FROM CompanyAccountantRef
WHERE CompanyID = #CompanyID AND
AccountantID IN (SELECT ID FROM #AccountIDs)
This returns one row, because CompanyAccountantRef has one matching row and its accountant id is 358951. There is a match in #AccountIds. Hence, the query returns one row.
Hence, the NOT EXISTS is false. And no rows are returned.
I suspect that you want a correlated subquery of some sort. I'm not sure what logic you are looking for. The logic you have doesn't seem particularly useful (either it returns all rows in #AccountIds or none).
If I had to guess, you a looking for something like:
SELECT #CompanyID AS CompanyID, a.ID
FROM #AccountIDs a
WHERE NOT EXISTS (SELECT 1
FROM CompanyAccountantRef car
WHERE car.CompanyID = #CompanyID AND
car.AccountantID = a.ID
);
At the very least, this returns the row you are looking for.

select #CompanyID, ID
from AccountIDs
except
select #CompanyID, AccountantID
from CompanyAccountantRef
where CompanyID = #CompanyID
and AccountantID in (SELECT ID FROM #AccountIDs)

Related

SQL: create a view comparing different rows

I have data for movies that looks something like this:
cast_id | cast_name | movie_id
1 A 11
2 B 11
3 C 11
4 D 12
5 E 12
1 A 13
I want to create a view where I compare two different cast members so that I will start with something like this:
CREATE VIEW compare(cast_id_1, cast_id_2, num_movies);
SELECT * FROM compare LIMIT 1;
(1,2,2)
where I am looking at actor A and actor B, who have a total of 2 movies between the two of them.
Not sure how to compare the two different rows and my searchers so far have been unsuccessful. Any help is much appreciated!
That's a self-join:
create view myview as
select t1.cast_id cast_id_1, t2.cast_id cast_id_2, count(*) num_movies
from mytable t1
inner join mytable t2 on t2.movie_id = t1.movie_id and t1.cast_id < t2.cast_id
group by t1.cast_id, t2.cast_id
Thives generates all combinations of cast members that once appeared in the same movie, with the total number of movies. Join condition t1.cast_id < t2.cast_id is there to avoid "mirror" records.
You can then query the view. If you want members that have two common movies (which is actually not showing in your sample data...):
select * from myview where num_movies = 2
I'm thinking a procedure might be helpful. This stored procedure takes the 2 cast_id's and num_movies as input parameters. It selects the movie_id's of the movies the two cast_id's have appeared in together. Then based on whether or not that number exceeds the num_movies parameter: either 1) a list of movies (release dates, director, etc.) is returned, else the message 'Were not in 2 movies together' is returned.
drop proc if exists TwoMovieActors;
go
create proc TwoMovieActors
#cast_id_1 int,
#cast_id_2 int,
#num_movies int
as
set nocount on;
declare #m table(movie_id int unique not null,
rn int not null);
declare #rows int;
with
cast_cte as (
select *, row_number() over (partition by movie_id order by cast_name) rn
from movie_casts mc
where cast_id in(#cast_id_1, #cast_id_2))
insert #m
select movie_id, row_number() over (order by movie_id) rn
from cast_cte
where rn=2
select #rows=##rowcount;
if #rows<#num_movies
select concat('Were not in ', cast(#num_movies as varchar(11)), ' movies together');
else
select m.movie_id, mv.movie_name, mv.release_date, mv.director
from #m m
join movies mv on m.movie_id=mv.movie_id;
To execute it would be something like
exec TwoMovieActors 1, 2, 2;

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70

Updating Uncommitted data to a cell with in an UPDATE statement

I want to convert a table storing in Name-Value pair data to relational form in SQL Server 2008.
Source table
Strings
ID Type String
100 1 John
100 2 Milton
101 1 Johny
101 2 Gaddar
Target required
Customers
ID FirstName LastName
100 John Milton
101 Johny Gaddar
I am following the strategy given below,
Populate the Customer table with ID values in Strings Table
INSERT INTO CUSTOMERS SELECT DISTINCT ID FROM Strings
You get the following
Customers
ID FirstName LastName
100 NULL NULL
101 NULL NULL
Update Customers with the rest of the attributes by joining it to Strings using ID column. This way each record in Customers will have corresponding 2 matching records.
UPDATE Customers
SET FirstName = (CASE WHEN S.Type=1 THEN S.String ELSE FirstName)
LastName = (CASE WHEN S.Type=2 THEN S.String ELSE LastName)
FROM Customers
INNER JOIN Strings ON Customers.ID=Strings.ID
An intermediate state will be llike,
ID FirstName LastName ID Type String
100 John NULL 100 1 John
100 NULL Milton 100 2 Milton
101 Johny NULL 101 1 Johny
101 NULL Gaddar 101 2 Gaddar
But this is not working as expected. Because when assigning the values in the SET clause it is setting only the committed values instead of the uncommitted. Is there anyway to set uncommitted values (with in the processing time of query) in UPDATE statement?
PS: I am not looking for alternate solutions but make my approach work by telling SQL Server to use uncommitted data for UPDATE.
The easiest way to do it would be to split the update into two:
UPDATE Customers
SET FirstName = Strings.String
FROM Customers
INNER JOIN Strings ON Customers.ID=Strings.ID AND Strings.Type = 1
And then:
UPDATE Customers
SET LastName = Strings.String
FROM Customers
INNER JOIN Strings ON Customers.ID=Strings.ID AND Strings.Type = 2
There are probably ways to do it in one query such as a derived table, but unless that's a specific requirement I'd just use this approach.
Have a look at this, it should avoid all the steps you had
DECLARE #Table TABLE(
ID INT,
Type INT,
String VARCHAR(50)
)
INSERT INTO #Table (ID,[Type],String) SELECT 100 ,1 ,'John'
INSERT INTO #Table (ID,[Type],String) SELECT 100 ,2 ,'Milton'
INSERT INTO #Table (ID,[Type],String) SELECT 101 ,1 ,'Johny'
INSERT INTO #Table (ID,[Type],String) SELECT 101 ,2 ,'Gaddar'
SELECT IDs.ID,
tName.String NAME,
tSur.String Surname
FROM (
SELECT DISTINCT ID
FROM #Table
) IDs LEFT JOIN
#Table tName ON IDs.ID = tName.ID AND tName.[Type] = 1 LEFT JOIN
#Table tSur ON IDs.ID = tSur.ID AND tSur.[Type] = 2
OK, i do not think that you will find a solution to what you are looking for. From UPDATE (Transact-SQL) it states
Using UPDATE with the FROM Clause
The results of an UPDATE statement are
undefined if the statement includes a
FROM clause that is not specified in
such a way that only one value is
available for each column occurrence
that is updated, that is if the UPDATE
statement is not deterministic. For
example, in the UPDATE statement in
the following script, both rows in
Table1 meet the qualifications of the
FROM clause in the UPDATE statement;
but it is undefined which row from
Table1 is used to update the row in
Table2.
USE AdventureWorks;
GO
IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL
DROP TABLE dbo.Table1;
GO
IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL
DROP TABLE dbo.Table2;
GO
CREATE TABLE dbo.Table1
(ColA int NOT NULL, ColB decimal(10,3) NOT NULL);
GO
CREATE TABLE dbo.Table2
(ColA int PRIMARY KEY NOT NULL, ColB decimal(10,3) NOT NULL);
GO
INSERT INTO dbo.Table1 VALUES(1, 10.0), (1, 20.0), (1, 0.0);
GO
UPDATE dbo.Table2
SET dbo.Table2.ColB = dbo.Table2.ColB + dbo.Table1.ColB
FROM dbo.Table2
INNER JOIN dbo.Table1
ON (dbo.Table2.ColA = dbo.Table1.ColA);
GO
SELECT ColA, ColB
FROM dbo.Table2;
Astander is correct (I am accepting his answer). The update is not happening because of a read UNCOMMITTED issue but because of the multiple rows returned by the JOIN. I have verified this. UPDATE picks only the first row generated from the multiple records to update the original table. This is the behavior for MSSQL, Sybase and such RDMBMSs but Oracle does not allow this kind of an update an d it throws an error. I have verified this thing for MSSQL.
And again MSSQL does not support updating a cell with UNCOMMITTED data. Don't know the status with other RDBMSs. And I have no idea if anyRDBMS provides with in the query ISOLATION level management.
An alternate solution will be to do it in two steps, Aggregate to unpivot and then insert. This has lesser scans compared to methods given in above answers.
INSERT INTO Customers
SELECT
ID
,MAX(CASE WHEN Type = 1 THEN String ELSE NULL END) AS FirstName
,MAX(CASE WHEN Type = 2 THEN String ELSE NULL END) AS LastName
FROM Strings
GROUP BY ID
Thanks to my friend Roji Thomas for helping me with this.

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.

Select products where the category belongs to any category in the hierarchy

I have a products table that contains a FK for a category, the Categories table is created in a way that each category can have a parent category, example:
Computers
Processors
Intel
Pentium
Core 2 Duo
AMD
Athlon
I need to make a select query that if the selected category is Processors, it will return products that is in Intel, Pentium, Core 2 Duo, Amd, etc...
I thought about creating some sort of "cache" that will store all the categories in the hierarchy for every category in the db and include the "IN" in the where clause. Is this the best solution?
The best solution for this is at the database design stage. Your categories table needs to be a Nested Set. The article Managing Hierarchical Data in MySQL is not that MySQL specific (despite the title), and gives a great overview of the different methods of storing a hierarchy in a database table.
Executive Summary:
Nested Sets
Selects are easy for any depth
Inserts and deletes are hard
Standard parent_id based hierarchy
Selects are based on inner joins (so get hairy fast)
Inserts and deletes are easy
So based on your example, if your hierarchy table was a nested set your query would look something like this:
SELECT * FROM products
INNER JOIN categories ON categories.id = products.category_id
WHERE categories.lft > 2 and categories.rgt < 11
the 2 and 11 are the left and right respectively of the Processors record.
Looks like a job for a Common Table Expression.. something along the lines of:
with catCTE (catid, parentid)
as
(
select cat.catid, cat.catparentid from cat where cat.name = 'Processors'
UNION ALL
select cat.catid, cat.catparentid from cat inner join catCTE on cat.catparentid=catcte.catid
)
select distinct * from catCTE
That should select the category whose name is 'Processors' and any of it's descendents, should be able to use that in an IN clause to pull back the products.
I have done similar things in the past, first querying for the category ids, then querying for the products "IN" those categories. Getting the categories is the hard bit, and you have a few options:
If the level of nesting of categories is known or you can find an upper bound: Build a horrible-looking SELECT with lots of JOINs. This is fast, but ugly and you need to set a limit on the levels of the hierarchy.
If you have a relatively small number of total categories, query them all (just ids, parents), collect the ids of the ones you care about, and do a SELECT....IN for the products. This was the appropriate option for me.
Query up/down the hierarchy using a series of SELECTs. Simple, but relatively slow.
I believe recent versions of SQLServer have some support for recursive queries, but haven't used them myself.
Stored procedures can help if you don't want to do this app-side.
What you want to find is the transitive closure of the category "parent" relation. I suppose there's no limitation to the category hierarchy depth, so you can't formulate a single SQL query which finds all categories. What I would do (in pseudocode) is this:
categoriesSet = empty set
while new.size > 0:
new = select * from categories where parent in categoriesSet
categoriesSet = categoriesSet+new
So just keep on querying for children until no more are found. This behaves well in terms of speed unless you have a degenerated hierarchy (say, 1000 categories, each a child of another), or a large number of total categories. In the second case, you could always work with temporary tables to keep data transfer between your app and the database small.
Maybe something like:
select *
from products
where products.category_id IN
(select c2.category_id
from categories c1 inner join categories c2 on c1.category_id = c2.parent_id
where c1.category = 'Processors'
group by c2.category_id)
[EDIT] If the category depth is greater than one this would form your innermost query. I suspect that you could design a stored procedure that would drill down in the table until the ids returned by the inner query did not have children -- probably better to have an attribute that marks a category as a terminal node in the hierarchy -- then perform the outer query on those ids.
CREATE TABLE #categories (id INT NOT NULL, parentId INT, [name] NVARCHAR(100))
INSERT INTO #categories
SELECT 1, NULL, 'Computers'
UNION
SELECT 2, 1, 'Processors'
UNION
SELECT 3, 2, 'Intel'
UNION
SELECT 4, 2, 'AMD'
UNION
SELECT 5, 3, 'Pentium'
UNION
SELECT 6, 3, 'Core 2 Duo'
UNION
SELECT 7, 4, 'Athlon'
SELECT *
FROM #categories
DECLARE #id INT
SET #id = 2
; WITH r(id, parentid, [name]) AS (
SELECT id, parentid, [name]
FROM #categories c
WHERE id = #id
UNION ALL
SELECT c.id, c.parentid, c.[name]
FROM #categories c JOIN r ON c.parentid=r.id
)
SELECT *
FROM products
WHERE p.productd IN
(SELECT id
FROM r)
DROP TABLE #categories
The last part of the example isn't actually working if you're running it straight like this. Just remove the select from the products and substitute with a simple SELECT * FROM r
This should recurse down all the 'child' catagories starting from a given catagory.
DECLARE #startingCatagoryId int
DECLARE #current int
SET #startingCatagoryId = 13813 -- or whatever the CatagoryId is for 'Processors'
CREATE TABLE #CatagoriesToFindChildrenFor
(CatagoryId int)
CREATE TABLE #CatagoryTree
(CatagoryId int)
INSERT INTO #CatagoriesToFindChildrenFor VALUES (#startingCatagoryId)
WHILE (SELECT count(*) FROM #CatagoriesToFindChildrenFor) > 0
BEGIN
SET #current = (SELECT TOP 1 * FROM #CatagoriesToFindChildrenFor)
INSERT INTO #CatagoriesToFindChildrenFor
SELECT ID FROM Catagory WHERE ParentCatagoryId = #current AND Deleted = 0
INSERT INTO #CatagoryTree VALUES (#current)
DELETE #CatagoriesToFindChildrenFor WHERE CatagoryId = #current
END
SELECT * FROM #CatagoryTree ORDER BY CatagoryId
DROP TABLE #CatagoriesToFindChildrenFor
DROP TABLE #CatagoryTree
i like to use a stack temp table for hierarchal data.
here's a rough example -
-- create a categories table and fill it with 10 rows (with random parentIds)
CREATE TABLE Categories ( Id uniqueidentifier, ParentId uniqueidentifier )
GO
INSERT
INTO Categories
SELECT NEWID(),
NULL
GO
INSERT
INTO Categories
SELECT TOP(1)NEWID(),
Id
FROM Categories
ORDER BY Id
GO 9
DECLARE #lvl INT, -- holds onto the level as we move throught the hierarchy
#Id Uniqueidentifier -- the id of the current item in the stack
SET #lvl = 1
CREATE TABLE #stack (item UNIQUEIDENTIFIER, [lvl] INT)
-- we fill fill this table with the ids we want
CREATE TABLE #tmpCategories (Id UNIQUEIDENTIFIER)
-- for this example we’ll just select all the ids
-- if we want all the children of a specific parent we would include it’s id in
-- this where clause
INSERT INTO #stack SELECT Id, #lvl FROM Categories WHERE ParentId IS NULL
WHILE #lvl > 0
BEGIN -- begin 1
IF EXISTS ( SELECT * FROM #stack WHERE lvl = #lvl )
BEGIN -- begin 2
SELECT #Id = [item]
FROM #stack
WHERE lvl = #lvl
INSERT INTO #tmpCategories
SELECT #Id
DELETE FROM #stack
WHERE lvl = #lvl
AND item = #Id
INSERT INTO #stack
SELECT Id, #lvl + 1
FROM Categories
WHERE ParentId = #Id
IF ##ROWCOUNT > 0
BEGIN -- begin 3
SELECT #lvl = #lvl + 1
END -- end 3
END -- end 2
ELSE
SELECT #lvl = #lvl - 1
END -- end 1
DROP TABLE #stack
SELECT * FROM #tmpCategories
DROP TABLE #tmpCategories
DROP TABLE Categories
there is a good explanation here link text
My answer to another question from a couple days ago applies here... recursion in SQL
There are some methods in the book which I've linked which should cover your situation nicely.