Select all hierarchy level and below SQL Server - sql

I am having a difficult time with this one. I have seen a few examples on how to obtain all child records from a self referencing table given a parent and even how to get the parents of child records.
What I am trying to do is return a record and all child records given the ID.
To put this into context - I have a corporate hierarchy. Where:
#Role Level#
--------------------
Corporate 0
Region 1
District 2
Rep 3
What I need is a procedure that (1) figures out what level the record is and (2) retrieves that record and all children records.
The idea being a Region can see all districts and reps in a district, Districts can see their reps. Reps can only see themselves.
I have table:
ID ParentId Name
-------------------------------------------------------
1 Null Corporate HQ
2 1 South Region
3 1 North Region
4 1 East Region
5 1 West Region
6 3 Chicago District
7 3 Milwaukee District
8 3 Minneapolis District
9 6 Gold Coast Dealer
10 6 Blue Island Dealer
How do I do this:
CREATE PROCEDURE GetPositions
#id int
AS
BEGIN
--What is the most efficient way to do this--
END
GO
For example the expected result for #id = 3, I would want to return:
3, 6, 7, 8, 9, 10
I'd appreciate any help or ideas on this.

You could do this via a recursive CTE:
DECLARE #id INT = 3;
WITH rCTE AS(
SELECT *, 0 AS Level FROM tbl WHERE Id = #id
UNION ALL
SELECT t.*, r.Level + 1 AS Level
FROM tbl t
INNER JOIN rCTE r
ON t.ParentId = r.ID
)
SELECT * FROM rCTE OPTION(MAXRECURSION 0);
ONLINE DEMO

Assuming that you're on a reasonably modern version of SQL Server, you can use the hierarchyid datatype with a little bit of elbow grease. First, the setup:
alter table [dbo].[yourTable] add [path] hierarchyid null;
Next, we'll populate the new column:
with cte as (
select *, cast(concat('/', ID, '/') as varchar(max)) as [path]
from [dbo].[yourTable]
where [ParentID] is null
union all
select child.*,
cast(concat(parent.path, child.ID, '/') as varchar(max)) as [path]
from [dbo].[yourTable] as child
join cte as parent
on child.ParentID = parent.ID
)
update t
set path = c.path
from [dbo].[yourTable] as t
join cte as c
on t.ID = c.ID;
This is just a bog standard recursive table expression with one calculated column that represents the hierarchy. That's the hard part. Now, your procedure can look something like this:
create procedure dbo.GetPositions ( #id int ) as
begin
declare #h hierarchyid
set #h = (select Path from [dbo].[yourTable] where ID = #id);
select ID, ParentID, Name
from [dbo].[yourTable]
where Path.IsDescendentOf(#h) = 1;
end
So, to wrap up, all you're doing with the hierarchyid is storing the lineage for a given row so that you don't have to calculate it on the fly at select time.

Related

How to group hierarchical relationships together in SQL Server

I have a column name Parent and Child in table Example and Below is the Table Data
| Parent | Child |
|---------------------|------------------|
| 100 | 101 |
|---------------------|------------------|
| 101 | 102 |
|---------------------|------------------|
| 200 | 201 |
|---------------------|------------------|
| 103 | 102 |
|---------------------|------------------|
| 202 | 201 |
|---------------------|------------------|
If i give the input as 100 i should get the result as 100,101,102,103 Since 100->101->102->103 and also if i give the input as 102 then it should give the same above result. 102->101->100 and 102->103. I need to achieve this using stored Procedure only.
Below is the sample Code which i am trying
CREATE PROCEDURE GetAncestors(#thingID varchar(MAX))
AS
BEGIN
SET NOCOUNT ON;
WITH
CTE
AS
(
SELECT
Example.Parent, Example.Child
FROM Example
WHERE Parent = #thingID or Child = #thingID
UNION ALL
SELECT
Example.Parent, Example.Child
FROM
CTE
INNER JOIN Example ON Example.Parent = CTE.Child
)
SELECT
Parent AS Result
FROM CTE
UNION
SELECT
Child AS Result
FROM CTE
;
END
GO
The problem with your attempt is filtering at the start. If I'm right, your want to cluster your data (group them all together) by their relationships, either ascendant or descendant, or a mix of them. For example ID 100 has child 101, which has another child 102, but 102 has a parent 103 and you want the result to be these four (100, 101, 102, 103) for any input that is in that set. This is why you can't filter at the start, since you don't have any means of knowing which relationship will be chained throughout another relationship.
Solving this isn't as simple as it seems and you won't be able to solve it with just 1 recursion.
The following is a solution I made long time ago to group all these relationships together. Keep in mind that, for large datasets (over 100k), it might take a while to calculate, since it has to identify all groups first, and select the result at the end.
CREATE PROCEDURE GetAncestors(#thingID INT)
AS
BEGIN
SET NOCOUNT ON
-- Load your data
IF OBJECT_ID('tempdb..#TreeRelationship') IS NOT NULL
DROP TABLE #TreeRelationship
CREATE TABLE #TreeRelationship (
RelationID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
Parent INT,
Child INT,
GroupID INT)
INSERT INTO #TreeRelationship (
Parent,
Child)
SELECT
Parent = D.Parent,
Child = D.Child
FROM
Example AS D
UNION -- Data has to be loaded in both ways (direct and reverse) for algorithm to work correctly
SELECT
Parent = D.Child,
Child = D.Parent
FROM
Example AS D
-- Start algorithm
IF OBJECT_ID('tempdb..#FirstWork') IS NOT NULL
DROP TABLE #FirstWork
CREATE TABLE #FirstWork (
Parent INT,
Child INT,
ComponentID INT)
CREATE CLUSTERED INDEX CI_FirstWork ON #FirstWork (Parent, Child)
INSERT INTO #FirstWork (
Parent,
Child,
ComponentID)
SELECT DISTINCT
Parent = T.Parent,
Child = T.Child,
ComponentID = ROW_NUMBER() OVER (ORDER BY T.Parent, T.Child)
FROM
#TreeRelationship AS T
IF OBJECT_ID('tempdb..#SecondWork') IS NOT NULL
DROP TABLE #SecondWork
CREATE TABLE #SecondWork (
Component1 INT,
Component2 INT)
CREATE CLUSTERED INDEX CI_SecondWork ON #SecondWork (Component1)
DECLARE #v_CurrentDepthLevel INT = 0
WHILE #v_CurrentDepthLevel < 100 -- Relationships depth level can be controlled with this value
BEGIN
SET #v_CurrentDepthLevel = #v_CurrentDepthLevel + 1
TRUNCATE TABLE #SecondWork
INSERT INTO #SecondWork (
Component1,
Component2)
SELECT DISTINCT
Component1 = t1.ComponentID,
Component2 = t2.ComponentID
FROM
#FirstWork t1
INNER JOIN #FirstWork t2 on
t1.child = t2.parent OR
t1.parent = t2.parent
WHERE
t1.ComponentID <> t2.ComponentID
IF (SELECT COUNT(*) FROM #SecondWork) = 0
BREAK
UPDATE #FirstWork SET
ComponentID = CASE WHEN items.ComponentID < target THEN items.ComponentID ELSE target END
FROM
#FirstWork items
INNER JOIN (
SELECT
Source = Component1,
Target = MIN(Component2)
FROM
#SecondWork
GROUP BY
Component1
) new_components on new_components.source = ComponentID
UPDATE #FirstWork SET
ComponentID = target
FROM #FirstWork items
INNER JOIN(
SELECT
source = component1,
target = MIN(component2)
FROM
#SecondWork
GROUP BY
component1
) new_components ON new_components.source = ComponentID
END
;WITH Groupings AS
(
SELECT
parent,
child,
group_id = DENSE_RANK() OVER (ORDER BY ComponentID DESC)
FROM
#FirstWork
)
UPDATE FG SET
GroupID = IT.group_id
FROM
#TreeRelationship FG
INNER JOIN Groupings IT ON
IT.parent = FG.parent AND
IT.child = FG.child
-- Select the proper result
;WITH IdentifiedGroup AS
(
SELECT TOP 1
T.GroupID
FROM
#TreeRelationship AS T
WHERE
T.Parent = #thingID
)
SELECT DISTINCT
Result = T.Parent
FROM
#TreeRelationship AS T
INNER JOIN IdentifiedGroup AS I ON T.GroupID = I.GroupID
END
You will see that for #thingID of value 100, 101, 102 and 103 the result are these four, and for values 200, 201 and 202 the results are these three.
I'm pretty sure this isn't an optimal solution, but it gives the correct output and I never had the need to tune it up since it works fast for my requirements.
Here is a cut-down version of the query from a more generic question How to find all connected subgraphs of an undirected graph
The main idea is to treat (Parent,Child) pairs as edges in a graph and traverse all connected edges starting from a given node.
Since the graph is undirectional we build a list of pairs in both directions in CTE_Pairs at first.
CTE_Recursive follows the edges of a graph and stops when it detects a loop. It builds a path of visited nodes as a string in IDPath and stops the recursion if the new node is in the path (has been visited before).
Final CTE_CleanResult puts all found nodes in one simple list.
CREATE PROCEDURE GetAncestors(#thingID varchar(8000))
AS
BEGIN
SET NOCOUNT ON;
WITH
CTE_Pairs
AS
(
SELECT
CAST(Parent AS varchar(8000)) AS ID1
,CAST(Child AS varchar(8000)) AS ID2
FROM Example
WHERE Parent <> Child
UNION
SELECT
CAST(Child AS varchar(8000)) AS ID1
,CAST(Parent AS varchar(8000)) AS ID2
FROM Example
WHERE Parent <> Child
)
,CTE_Recursive
AS
(
SELECT
ID1 AS AnchorID
,ID1
,ID2
,CAST(',' + ID1 + ',' + ID2 + ',' AS varchar(8000)) AS IDPath
,1 AS Lvl
FROM
CTE_Pairs
WHERE ID1 = #thingID
UNION ALL
SELECT
CTE_Recursive.AnchorID
,CTE_Pairs.ID1
,CTE_Pairs.ID2
,CAST(CTE_Recursive.IDPath + CTE_Pairs.ID2 + ',' AS varchar(8000)) AS IDPath
,CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.ID2 = CTE_Pairs.ID1
WHERE
CTE_Recursive.IDPath NOT LIKE '%,' + CTE_Pairs.ID2 + ',%'
)
,CTE_RecursionResult
AS
(
SELECT AnchorID, ID1, ID2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorID, ID1 AS ID
FROM CTE_RecursionResult
UNION
SELECT AnchorID, ID2 AS ID
FROM CTE_RecursionResult
)
SELECT ID
FROM CTE_CleanResult
ORDER BY ID
OPTION(MAXRECURSION 0);
END;
you can simply use graph processing introduced in SQL‌ Server 2017.
here is an example
https://www.red-gate.com/simple-talk/sql/t-sql-programming/sql-graph-objects-sql-server-2017-good-bad/

What is the query to get parent records against specific child or get child records against parent?

Scenario:
I have data in the following hierarchy format in my table:
PERSON_ID Name PARENT_ID
1 Azeem 1
2 Farooq 2
3 Ahsan 3
4 Waqas 1
5 Adnan 1
6 Talha 2
7 Sami 2
8 Arshad 2
9 Hassan 8
E.g
Hassan is child of parent_id 8 which is (Arshad)
and Arshad is child of parent_id 2 which is (Farooq)
What I want:
First of all, I want to find all parent of parent of specific parent_id.
For Example: If I want to find the parent of Hassan then I also get the Parent of Hassan and also get its parent (Hassan -> Arshad -> Farooq)
Second, I want to find all child of Farooq like (Farooq -> Arshad -> Hassan)
Third, If Azeem is also have same parent like (Azeem -> Azeem) then show me this record.
What I've tried yet:
DECLARE #id INT
SET #id = 9
;WITH T AS (
SELECT p.PERSON_ID,p.Name, p.PARENT_ID
FROM hierarchy p
WHERE p.PERSON_ID = #id AND p.PERSON_ID != p.PARENT_ID
UNION ALL
SELECT c.PERSON_ID,c.Name, c.PARENT_ID
FROM hierarchy c
JOIN T h ON h.PARENT_ID = c.PERSON_ID)
SELECT h.PERSON_ID,h.Name FROM T h
and Its shows me below error:
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
If I understand your question correctly that you don't want to insert null values in Parent_ID column then you should replace NULL with 0 and your updated code will be like:
;WITH DATA AS (
SELECT p.PERSON_ID,p.Name, p.PARENT_ID
FROM hierarchy p
WHERE p.PERSON_ID = 9
UNION ALL
SELECT c.PERSON_ID,c.Name, c.PARENT_ID
FROM hierarchy c
JOIN DATA h
ON c.PERSON_ID = h.PARENT_ID
)
select * from DATA;
You have an infinite loop in your data: Azeem is his own parent. You need to either make his value NULL or change your condition to WHERE p.parent_id = #id AND p.parent_id != p.child_id.
Also, I feel you have your columns named the wrong way around - the primary-key should be named person_id instead of parent_id and your column named child_id actually points to that person's parent, so it should be named parent_id instead.
Well I found a way for my above case which is:
If I have below table structure:
PERSON_ID Name PARENT_ID
1 Azeem NULL
2 Farooq NULL
3 Ahsan NULL
4 Waqas 1
5 Adnan 1
6 Talha 2
7 Sami 2
8 Arshad 2
9 Hassan 8
Then I tried below query which working fine in case when Parent_ID have NULL values means there is no more parent of that record.
DECLARE #id INT
SET #id = 2
Declare #Table table(
PERSON_ID bigint,
Name varchar(50),
PARENT_ID bigint
);
;WITH T AS (
SELECT p.PERSON_ID,p.Name, p.PARENT_ID
FROM hierarchy p
WHERE p.PERSON_ID = #id AND p.PERSON_ID != p.PARENT_ID
UNION ALL
SELECT c.PERSON_ID,c.Name, c.PARENT_ID
FROM hierarchy c
JOIN T h ON h.PARENT_ID = c.PERSON_ID)
insert into #table
select * from T;
IF exists(select * from #table)
BEGIN
select PERSON_ID,Name from #table
End
Else
Begin
select PERSON_ID,Name from Hierarchy
where PERSON_ID = #id
end
Above query show me the desire output when I set the parameter value #id = 1
Above query show me the desire output when I set the parameter value #id = 9
Issue:
I don't want to insert null values in Parent_ID like if there is no more Parent of that Person then I insert same Person_ID in Parent_ID column.
If I replace null values with there person_id then I got below error.
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

Query to List all hierarchical parents and siblings and their childrens, but not list own childrens

I've a basic SQL table with a simple heirarchial connection between each rows. That is there is a ParentID for Every rows and using that its connecting with another row. Its as follows
AccountID | AccountName | ParentID
---------------------------------------
1 Mathew 0
2 Philip 1
3 John 2
4 Susan 2
5 Anita 1
6 Aimy 1
7 Elsa 3
8 Anna 7
.............................
.................................
45 Kristoff 8
Hope the structure is clear
But my requirement of listng these is a little weird. That is when we pass an AccountID, it should list all its parents and siblings and siblings childrens. But it never list any child of that AccountID to any level. I can explain that in little more detail with a picture. Sorry for the clarity of the picture.. mine is an old phone cam..
When we pass the AccountID 4, it should list all Parents and its siblings, but it should not list 4,6,7,8,9,10. That means that account and any of it childrens should be avoid in the result (Based on the picture tree elements). Hope the explanation is clear.
If I've got it right and you need to output whole table except 4 and all of it's descendants then try this recursive query:
WITH CT AS
(
SELECT * FROM T WHERE AccountID=4
UNION ALL
SELECT T.* FROM T
JOIN CT ON T.ParentID = CT.AccountId
)
SELECT * FROM T WHERE AccountID
NOT IN (SELECT AccountID FROM CT)
SQLFiddle demo
Answering to the question in the comment:
So it will not traverse to the top. It only traverse to specified
account. For example if I pass 4 as first parameter and 2 as second
parameter, the result should be these values 2,5,11,12
You should start from the ID=2 and travel to the bottom exclude ID=4 so you cut whole subtree after ID=4:
WITH CT AS
(
SELECT * FROM T WHERE AccountID=2
UNION ALL
SELECT T.* FROM T
JOIN CT ON T.ParentID = CT.AccountId
WHERE T.AccountId<>4
)
SELECT * FROM CT
Try this:
;with cte as
(select accountid,parentid, 0 as level from tbl
where parentid = 0
union all
select t.accountid,t.parentid,(level+1) from
cte c inner join tbl t on c.accountid= t.parentid
)
select * from cte
where level < (select level from cte where accountid = #accountid)
When you pass in the parameter #accountid this will return the accountid values of all nodes on levels before that of the parameter.
If you want to return everything on the same level as input except input itself, you can change the where clause to;
where level <=(select level from cte where accountid= #accountid )
and accountid <> #accountid
In your example, if #accountid = 4, this will return the values 1,2,3 (ancestors) as well as 5,13,14 (siblings).
Does this return what you are after?
declare #AccountID int
set #AccountID = 4
;with parents
as (
select AccountID, AccountName, ParentID
from Account
where AccountID = (select ParentID from Account Where AccountID = #AccountID)
union all
select A.AccountID, A.AccountName, A.ParentID
from Account as A
join parents as P
on P.ParentID = A.AccountID
),
children
as (
select AccountID, AccountName, ParentID
from parents
union all
select A.AccountID, A.AccountName, A.ParentID
from Account as A
join children as C
on C.AccountID = A.ParentID
where A.AccountID <> #AccountID
)
select distinct AccountID, AccountName, ParentID
from children
order by AccountID
For me it sounds like you want to go up in the tree. So considering this test data
DECLARE #tbl TABLE(AccountID INT,AccountName VARCHAR(100),ParentID INT)
INSERT INTO #tbl
VALUES
(1,'Mathew',0),
(2,'Philip',1),
(3,'John',2),
(4,'Susan',2),
(5,'Anita',1),
(6,'Aimy',1),
(7,'Elsa',3),
(8,'Anna',7)
The I would write a query like this:
DECLARE #AcountID INT=4
;WITH CTE
AS
(
SELECT
tbl.AccountID,
tbl.AccountName,
tbl.ParentID
FROM
#tbl AS tbl
WHERE
tbl.AccountID=#AcountID
UNION ALL
SELECT
tbl.AccountID,
tbl.AccountName,
tbl.ParentID
FROM
#tbl AS tbl
JOIN CTE
ON CTE.ParentID=tbl.AccountID
)
SELECT
*
FROM
CTE
WHERE
NOT CTE.AccountID=#AcountID
This will return a result like this:
2 Philip 1
1 Mathew 0

Need query to select direct and indirect customerID aliases

I need a query that will return all related alias id's from either column. Shown here are some alias customer ids, among thousands of other rows. If the input parameter to a query is id=7, I need a query that would return 5 rows (1,5,7,10,22). That is because they are all aliases of one-another. For example, 22 and 10 are indirect aliases of 7.
CustomerAlias
--------------------------
AliasCuID AliasCuID2
--------------------------
1 5
1 7
5 7
10 5
22 1
Here is an excerpt from the customer table.
Customer
----------------------------------
CuID CuFirstName CuLastName
----------------------------------
1 Mike Jones
2 Fred Smith
3 Jack Jackson
4 Emily Simpson
5 Mike Jones
6 Beth Smith
7 Mike jones
8 Jason Robard
9 Emilie Jiklonmie
10 Michael jones
11 Mark Lansby
12 Scotty Slash
13 Emilie Jiklonmy
22 mike jones
I've been able to come close, but I cannot seem to select the indirectly related aliases correctly. Given this query:
SELECT DISTINCT Customer.CuID, Customer.CuFirstName, Customer.CuLastName
FROM Customer WHERE
(Customer.CuID = 7) OR (Customer.CuID IN
(SELECT AliasCuID2
FROM CustomerAlias AS CustomerAlias_2
WHERE (AliasCuID = 7))) OR (Customer.CuID IN
(SELECT AliasCuID
FROM CustomerAlias AS CustomerAlias_1
WHERE (AliasCuID2 = 7)))
Returns 3 out of 5 of the desired ids of course. This lacks the indirectly related aliased id of 10 and 22 in the result rows.
1 Mike Jones
5 Mike Jones
7 Mike jones
* Based on suggestions below, I am trying a CTE hierarchical query.
I have this now after following some suggestions. It works for some, as long as the records in the table reference enough immediate ids. But, if the query uses id=10, then it still comes up short, just by the nature of the data.
DECLARE #id INT
SET #id = 10;
DECLARE #tmp TABLE ( a1 INT, a2 INT, Lev INT );
WITH Results (AliasCuID, AliasCuID2, [Level]) AS (
SELECT AliasCuID,
AliasCuID2,
0 as [Level]
FROM CustomerAlias
WHERE AliasCuID = #id OR AliasCuID2 = #id
UNION ALL
-- Recursive step
SELECT a.AliasCuID,
a.AliasCuID2,
r.[Level] + 1 AS [Level]
FROM CustomerAlias a
INNER JOIN Results r ON a.AliasCuID = r.AliasCuID2 )
INSERT INTO #tmp
SELECT * FROM Results;
WITH Results3 (AliasCuID, AliasCuID2, [Level]) AS (
SELECT AliasCuID,
AliasCuID2,
0 as [Level]
FROM CustomerAlias
WHERE AliasCuID = #id OR AliasCuID2 = #id
UNION ALL
-- Recursive step
SELECT a.AliasCuID,
a.AliasCuID2,
r.[Level] + 1 AS [Level]
FROM CustomerAlias a
INNER JOIN Results3 r ON a.AliasCuID2 = r.AliasCuID )
INSERT INTO #tmp
SELECT * FROM Results3;
SELECT DISTINCT a1 AS id FROM #tmp
UNION ALL
SELECT DISTINCT a2 AS id FROM #tmp
ORDER BY id
Note that this is a simplified the query to just give a list of related ids.
---
id
---
5
5
7
10
But, it is still unable to pull in ids 1 and 22.
This is not an easy problem to solve unless you have some idea of the depth of your search (https://stackoverflow.com/a/7569520/1803682) - which it looks like you do not - and take a brute force approach to it.
Assuming you do not know the depth you will need to write a stored proc. I followed this approach for a nearly identical problem: https://dba.stackexchange.com/questions/7147/find-highest-level-of-a-hierarchical-field-with-vs-without-ctes/7161#7161
UPDATE
If you don't care about the chain of how the alias's were created - I would run a script recursively to make them all refer to a single (master?) record. Then you can easily do the search and it will be quick - not a solution if you care about how the alias's get traversed though.
I created a SQL Fiddle for SQL Server 2012. Please let me know if you can or cannot access it.
My thought here was that you'd want to just keep checking the left and right branches recursively, separately. This logic probably falls apart if the relationships bounce between left and right. You could set up a third CTE to reference the first two, but joining on left to right and right to left, but ain't nobody got time for that.
The code is below as well.
CREATE TABLE CustomerAlias
(
AliasCuID INT,
AliasCuID2 INT
)
GO
INSERT INTO CustomerAlias
SELECT 1,5
UNION SELECT 1, 7
UNION SELECT 5, 7
UNION SELECT 10, 5
UNION SELECT 22, 1
GO
DECLARE #Value INT
SET #Value = 7
; WITH LeftAlias AS
(
SELECT AliasCuID
, AliasCuID2
FROM CustomerAlias
WHERE AliasCuID2 = #Value
UNION ALL
SELECT a.AliasCuID
, a.AliasCuID2
FROM CustomerAlias a
JOIN LeftAlias b
ON a.AliasCuID = b.AliasCuID2
)
, RightAlias AS
(
SELECT AliasCuID
, AliasCuID2
FROM CustomerAlias
WHERE AliasCuID = #Value
UNION ALL
SELECT a.AliasCuID
, a.AliasCuID2
FROM CustomerAlias a
JOIN LeftAlias b
ON a.AliasCuID2 = b.AliasCuID
)
SELECT DISTINCT A
FROM
(
SELECT A = AliasCuID
FROM LeftAlias
UNION ALL
SELECT A = AliasCuID2
FROM LeftAlias
UNION ALL
SELECT A = AliasCuID
FROM RightAlias
UNION ALL
SELECT A = AliasCuID2
FROM RightAlias
) s
ORDER BY A

Select products where the category belongs to any category in the hierarchy

I have a products table that contains a FK for a category, the Categories table is created in a way that each category can have a parent category, example:
Computers
Processors
Intel
Pentium
Core 2 Duo
AMD
Athlon
I need to make a select query that if the selected category is Processors, it will return products that is in Intel, Pentium, Core 2 Duo, Amd, etc...
I thought about creating some sort of "cache" that will store all the categories in the hierarchy for every category in the db and include the "IN" in the where clause. Is this the best solution?
The best solution for this is at the database design stage. Your categories table needs to be a Nested Set. The article Managing Hierarchical Data in MySQL is not that MySQL specific (despite the title), and gives a great overview of the different methods of storing a hierarchy in a database table.
Executive Summary:
Nested Sets
Selects are easy for any depth
Inserts and deletes are hard
Standard parent_id based hierarchy
Selects are based on inner joins (so get hairy fast)
Inserts and deletes are easy
So based on your example, if your hierarchy table was a nested set your query would look something like this:
SELECT * FROM products
INNER JOIN categories ON categories.id = products.category_id
WHERE categories.lft > 2 and categories.rgt < 11
the 2 and 11 are the left and right respectively of the Processors record.
Looks like a job for a Common Table Expression.. something along the lines of:
with catCTE (catid, parentid)
as
(
select cat.catid, cat.catparentid from cat where cat.name = 'Processors'
UNION ALL
select cat.catid, cat.catparentid from cat inner join catCTE on cat.catparentid=catcte.catid
)
select distinct * from catCTE
That should select the category whose name is 'Processors' and any of it's descendents, should be able to use that in an IN clause to pull back the products.
I have done similar things in the past, first querying for the category ids, then querying for the products "IN" those categories. Getting the categories is the hard bit, and you have a few options:
If the level of nesting of categories is known or you can find an upper bound: Build a horrible-looking SELECT with lots of JOINs. This is fast, but ugly and you need to set a limit on the levels of the hierarchy.
If you have a relatively small number of total categories, query them all (just ids, parents), collect the ids of the ones you care about, and do a SELECT....IN for the products. This was the appropriate option for me.
Query up/down the hierarchy using a series of SELECTs. Simple, but relatively slow.
I believe recent versions of SQLServer have some support for recursive queries, but haven't used them myself.
Stored procedures can help if you don't want to do this app-side.
What you want to find is the transitive closure of the category "parent" relation. I suppose there's no limitation to the category hierarchy depth, so you can't formulate a single SQL query which finds all categories. What I would do (in pseudocode) is this:
categoriesSet = empty set
while new.size > 0:
new = select * from categories where parent in categoriesSet
categoriesSet = categoriesSet+new
So just keep on querying for children until no more are found. This behaves well in terms of speed unless you have a degenerated hierarchy (say, 1000 categories, each a child of another), or a large number of total categories. In the second case, you could always work with temporary tables to keep data transfer between your app and the database small.
Maybe something like:
select *
from products
where products.category_id IN
(select c2.category_id
from categories c1 inner join categories c2 on c1.category_id = c2.parent_id
where c1.category = 'Processors'
group by c2.category_id)
[EDIT] If the category depth is greater than one this would form your innermost query. I suspect that you could design a stored procedure that would drill down in the table until the ids returned by the inner query did not have children -- probably better to have an attribute that marks a category as a terminal node in the hierarchy -- then perform the outer query on those ids.
CREATE TABLE #categories (id INT NOT NULL, parentId INT, [name] NVARCHAR(100))
INSERT INTO #categories
SELECT 1, NULL, 'Computers'
UNION
SELECT 2, 1, 'Processors'
UNION
SELECT 3, 2, 'Intel'
UNION
SELECT 4, 2, 'AMD'
UNION
SELECT 5, 3, 'Pentium'
UNION
SELECT 6, 3, 'Core 2 Duo'
UNION
SELECT 7, 4, 'Athlon'
SELECT *
FROM #categories
DECLARE #id INT
SET #id = 2
; WITH r(id, parentid, [name]) AS (
SELECT id, parentid, [name]
FROM #categories c
WHERE id = #id
UNION ALL
SELECT c.id, c.parentid, c.[name]
FROM #categories c JOIN r ON c.parentid=r.id
)
SELECT *
FROM products
WHERE p.productd IN
(SELECT id
FROM r)
DROP TABLE #categories
The last part of the example isn't actually working if you're running it straight like this. Just remove the select from the products and substitute with a simple SELECT * FROM r
This should recurse down all the 'child' catagories starting from a given catagory.
DECLARE #startingCatagoryId int
DECLARE #current int
SET #startingCatagoryId = 13813 -- or whatever the CatagoryId is for 'Processors'
CREATE TABLE #CatagoriesToFindChildrenFor
(CatagoryId int)
CREATE TABLE #CatagoryTree
(CatagoryId int)
INSERT INTO #CatagoriesToFindChildrenFor VALUES (#startingCatagoryId)
WHILE (SELECT count(*) FROM #CatagoriesToFindChildrenFor) > 0
BEGIN
SET #current = (SELECT TOP 1 * FROM #CatagoriesToFindChildrenFor)
INSERT INTO #CatagoriesToFindChildrenFor
SELECT ID FROM Catagory WHERE ParentCatagoryId = #current AND Deleted = 0
INSERT INTO #CatagoryTree VALUES (#current)
DELETE #CatagoriesToFindChildrenFor WHERE CatagoryId = #current
END
SELECT * FROM #CatagoryTree ORDER BY CatagoryId
DROP TABLE #CatagoriesToFindChildrenFor
DROP TABLE #CatagoryTree
i like to use a stack temp table for hierarchal data.
here's a rough example -
-- create a categories table and fill it with 10 rows (with random parentIds)
CREATE TABLE Categories ( Id uniqueidentifier, ParentId uniqueidentifier )
GO
INSERT
INTO Categories
SELECT NEWID(),
NULL
GO
INSERT
INTO Categories
SELECT TOP(1)NEWID(),
Id
FROM Categories
ORDER BY Id
GO 9
DECLARE #lvl INT, -- holds onto the level as we move throught the hierarchy
#Id Uniqueidentifier -- the id of the current item in the stack
SET #lvl = 1
CREATE TABLE #stack (item UNIQUEIDENTIFIER, [lvl] INT)
-- we fill fill this table with the ids we want
CREATE TABLE #tmpCategories (Id UNIQUEIDENTIFIER)
-- for this example we’ll just select all the ids
-- if we want all the children of a specific parent we would include it’s id in
-- this where clause
INSERT INTO #stack SELECT Id, #lvl FROM Categories WHERE ParentId IS NULL
WHILE #lvl > 0
BEGIN -- begin 1
IF EXISTS ( SELECT * FROM #stack WHERE lvl = #lvl )
BEGIN -- begin 2
SELECT #Id = [item]
FROM #stack
WHERE lvl = #lvl
INSERT INTO #tmpCategories
SELECT #Id
DELETE FROM #stack
WHERE lvl = #lvl
AND item = #Id
INSERT INTO #stack
SELECT Id, #lvl + 1
FROM Categories
WHERE ParentId = #Id
IF ##ROWCOUNT > 0
BEGIN -- begin 3
SELECT #lvl = #lvl + 1
END -- end 3
END -- end 2
ELSE
SELECT #lvl = #lvl - 1
END -- end 1
DROP TABLE #stack
SELECT * FROM #tmpCategories
DROP TABLE #tmpCategories
DROP TABLE Categories
there is a good explanation here link text
My answer to another question from a couple days ago applies here... recursion in SQL
There are some methods in the book which I've linked which should cover your situation nicely.