SQL Tree / Hierarchial Data - sql

This is my first post, I am trying to make a sql tree table that traverses. For example, If a person clicks on a drop down list called Categories, it will display Electric, and InterC. Then, if the user clicks on electric, it will drop down relays and switches, next if the person clicks on relays it will drop down X relays and if the person clicks on switches it will drop down Y switches. I have attempted below , but the part i don't understand is if i have another category InterC, how do I make that another level of drop downs ?
Table Category
insert test select 1, 0,'Electric'
insert test select 2, 1,'Relays'
insert test select 3, 1,'Switches'
insert test select 5, 2,'X Relays'
insert test select 6, 2,'Y Switches'
insert test select 7, 0,'InterC'
insert test select 8, 1,'x Sockets'
insert test select 9, 1,'y Sockets'
insert test select 10, 2,'X Relays'
insert test select 11, 2,'Y Relays'
;
create table test(id int,parentId int,name varchar(50))
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test WHERE parentid = 0
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid

Your hierarchical T-SQL query should return all the records in the table, both those under Electric and InterC.
However, you should make parentId nullable and have the root records have a null rather than 0. That will let you add a foreign key that protects your data integrity (it won't be possible to add orphaned records by mistake).
You hierarchy query returns all of your records, I'm guessing that you want to return just one at a time - for that add a where condition to the starting query.
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test
WHERE name = #category AND
parentId is null
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Then set #category to 'Electric' or'InterC' to get one or the other hierarchy.

Related

To find infinite recursive loop in CTE

I'm not a SQL expert, but if anybody can help me.
I use a recursive CTE to get the values as below.
Child1 --> Parent 1
Parent1 --> Parent 2
Parent2 --> NULL
If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop and gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.
Child1 --> Parent 1
Parent1 --> Child1
or
Child1 --> Parent 1
Parent1 --> Parent2
Parent2 --> Child1
With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.
Setup:
create table hierarchy (id integer, parent_id integer);
insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3),
(5, 4),
(3, 5); -- endless loop
Recursive query:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;
To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents,
id as root_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id,
p.root_id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
and c.root_id = p.root_id
)
select *
from tree;
Update for Postgres 14
Postgres 14 introduced the (standard compliant) CYCLE option to detect cycles:
with recursive tree as (
select id,
parent_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id
from hierarchy c
join tree p
on c.parent_id = p.id
)
cycle id -- track cycles for this column
set is_cycle -- adds a boolean column is_cycle
using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
You haven't specified the dialect or your column names, so it is difficult to make the perfect example...
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 2, 'SubChild')
-- End random data
;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
UNION ALL
SELECT R.StartingID, R.Level + 1,
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.*
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description
FROM RecursiveCTE
ORDER BY StartingID, Level
Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.
In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).
The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.
A little more complex, that generates a "better" report:
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 3, 'SubChild')
-- End random data
-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
SELECT MT1.* FROM #MyTable MT1
WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)
, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
SELECT ID, -- StartingID
1, -- Level
'|' + CAST(ID AS VARCHAR(MAX)) + '|',
'|' + CAST(Description AS VARCHAR(MAX)) + '|',
0, -- Loop
ParentID
FROM WithoutChildren
UNION ALL
SELECT R.StartingID, -- StartingID
R.Level + 1, -- Level
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.ParentID
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT * FROM RecursiveCTE
WHERE ParentID IS NULL OR Loop = 1
This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.
Here's an alternate method for detecting cycles in adjacency lists (parent/child relationships) where nodes can only have one parent which can be enforced with a unique constraint on the child column (id in the table below). This works by computing the closure table for the adjacency list via a recursive query. It starts by adding every node to the closure table as its own ancestor at level 0 then iteratively walks the adjacency list to expand the closure table. Cycles are detected when a new record's child and ancestor are the same at any level other than the original level zero (0):
-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (
with cte(ancestor, child, lev, cycle) as (
select id, id, 0, 0 from Table1
union all
select cte.ancestor
, Table1.id
, case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
, case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
from Table1
join cte
on cte.child = Table1.PARENT_ID
where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
ancestor
, child
, lev
, max(cycle) over (partition by ancestor) cycle
from cte
Given the following adjacency list for Table1:
| parent_id | id |
|-----------|----|
| (null) | 1 |
| (null) | 2 |
| 1 | 3 |
| 3 | 4 |
| 1 | 5 |
| 2 | 6 |
| 6 | 7 |
| 7 | 8 |
| 9 | 10 |
| 10 | 11 |
| 11 | 9 |
The above query which works on SQL Sever (and Oracle, PostgreSQL and MySQL 8 when modified as directed) rightly detects that nodes 9, 10, and 11 participate in a cycle of length 3.
SQL(/DB) Fiddles demonstrating this in various DBs can be found below:
Oracle 11gR2
SQL Server 2017
PostgeSQL 9.5
MySQL 8
You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.
For the initial selection, the distance between Child and Grandchild columns is 1. Every selection from union all increases the depth of Child by 1, and that of Grandchild by 2. The distance between them increases by 1.
If you have any loop, since the distance only increases by 1 each time, at some point after Child is in the loop, the distance will be a multiple of the cycle length. When that happens, the Child and the Grandchild columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.
SQL Server sample:
declare #LinkTable table (Parent int, Child int);
insert into #LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);
with cte as (
select lt1.Parent, lt1.Child, lt2.Child as Grandchild
from #LinkTable lt1
inner join #LinkTable lt2 on lt2.Parent = lt1.Child
union all
select cte.Parent, lt1.Child, lt3.Child as Grandchild
from cte
inner join #LinkTable lt1 on lt1.Parent = cte.Child
inner join #LinkTable lt2 on lt2.Parent = cte.Grandchild
inner join #LinkTable lt3 on lt3.Parent = lt2.Child
where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;
Remove one of the LinkTable records that causes the cycle, and you will find that the select no longer returns any data.
Try to limit the recursive result
WITH EMP_CTE AS
(
SELECT
0 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees
WHERE ManagerId IS NULL
UNION ALL
SELECT
[LEVEL] + 1 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees e
INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId
AND s.LEVEL < 100 --RECURSION LIMIT
)
SELECT * FROM EMP_CTE WHERE [Level] = 100
Here is the solution for SQL Server:
Table Insert script:
CREATE TABLE MyTable
(
[ID] INT,
[ParentID] INT,
[Name] NVARCHAR(255)
);
INSERT INTO MyTable
(
[ID],
[ParentID],
[Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');
Script to find the exact records which are the culprit:
;WITH RecursiveCTE
AS (
-- Get all parents:
-- Any record in MyTable table could be an Parent
-- We don't know here yet which record can involve in an infinite recursion.
SELECT ParentID AS StartID,
ID,
CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM MyTable
UNION ALL
-- Recursively try finding all the childrens of above parents
-- Keep on finding it until this child become parent of above parent.
-- This will bring us back in the circle to parent record which is being
-- keep in the StartID column in recursion
SELECT RecursiveCTE.StartID,
t.ID,
CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM RecursiveCTE
INNER JOIN MyTable AS t
ON t.ParentID = RecursiveCTE.ID
WHERE RecursiveCTE.StartID != RecursiveCTE.ID)
-- FInd the ones which causes the infinite recursion
SELECT StartID,
[ParentChildRelationPath],
RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);
Output of above query:

Flatten the tree path in SQL server Hierarchy ID

I am using SQL Hierarchy data type to model a taxonomy structure in my application.
The taxonomy can have the same name in different levels
During the setup this data needs to be uploaded via an excel sheet.
Before inserting any node I would like to check if the node at a particular path already exists so that I don't duplicate the entries.
What is the easiest way to check if the node # particular absolute path already exists or not?
for e.g Before inserting say "Retail" under "Bank 2" I should be able to check "/Bank 2/Retail" is not existing
Is there any way to provide a flattened representation of the entire tree structure so that I can check for the absolute path and then proceed?
Yes, you can do it using a recursive CTE.
In each iteration of the query you can append a new level of the hierarchy name.
There are lots of examples of this technique on the internet.
For example, with this sample data:
CREATE TABLE Test
(id INT,
parent_id INT null,
NAME VARCHAR(50)
)
INSERT INTO Test VALUES(1, NULL, 'L1')
INSERT INTO Test VALUES(2, 1, 'L1-A')
INSERT INTO Test VALUES(3, 2, 'L1-A-1')
INSERT INTO Test VALUES(4, 2, 'L1-A-2')
INSERT INTO Test VALUES(5, 1, 'L1-B')
INSERT INTO Test VALUES(6, 5, 'L1-B-1')
INSERT INTO Test VALUES(7, 5, 'L1-B-2')
you can write a recursive CTE like this:
WITH H AS
(
-- Anchor: the first level of the hierarchy
SELECT id, parent_id, name, CAST(name AS NVARCHAR(300)) AS path
FROM Test
WHERE parent_id IS NULL
UNION ALL
-- Recursive: join the original table to the anchor, and combine data from both
SELECT T.id, T.parent_id, T.name, CAST(H.path + '\' + T.name AS NVARCHAR(300))
FROM Test T INNER JOIN H ON T.parent_id = H.id
)
-- You can query H as if it was a normal table or View
SELECT * FROM H
WHERE PATH = 'L1\L1-A' -- for example to see if this exists
The result of the query (without the where filter) looks like this:
1 NULL L1 L1
2 1 L1-A L1\L1-A
5 1 L1-B L1\L1-B
6 5 L1-B-1 L1\L1-B\L1-B-1
7 5 L1-B-2 L1\L1-B\L1-B-2
3 2 L1-A-1 L1\L1-A\L1-A-1
4 2 L1-A-2 L1\L1-A\L1-A-2

Get missing nodes in hierarchy column

I have a table with hierarchy column which consists of numbers separated by colons, as well as number of current node and its parent:
id = '3:234:657:978'
currNode = 978
parent = 657
I also have a query which returns id's and some other columns from other tables, but some of the links are missing, for example 2 rows are returned, one with id of 3:234 and another one of id 3:234:567:890. I need the row with 3:234:567 id to form a hierarchy, but it`s not returned.
How can I join the table so I get the missing nodes (with fields other than id being NULL), but only the missing ones (excluding the ones which are not needed to form the hierarchy, e.g. are below the tree of the returned results)?
EDIT:
Sample data:
CREATE TABLE ids (
id VARCHAR(100)
, currNode INT PRIMARY KEY
, parent INT
, name VARCHAR(50)
);
CREATE TABLE someotherdata (
data VARCHAR(10)
, currnode INT
);
INSERT ALL
INTO ids(id, currnode, parent, name)
VALUES('3', 3, NULL, 'Node1')
INTO ids(id, currnode, parent, name)
VALUES('3:4', 4, 3, 'Node2')
INTO ids(id, currnode, parent, name)
VALUES('3:4:5', 5, 4, 'Node3')
INTO ids(id, currnode, parent, name)
VALUES('3:4:5:6', 6, 5, 'Node4')
INTO ids(id, currnode, parent, name)
VALUES('3:4:5:6:7', 7, 6, 'Node5')
SELECT * FROM dual; COMMIT;
INSERT ALL
INTO someotherdata (name, id)
VALUES('data1', '3:4')
INTO someotherdata (name, id)
VALUES('data2', '3:4:5:6')
SELECT * FROM dual; COMMIT;
Desired result (id is given as parameter to the query, here it equals to '3'):
id name data
3 Node1 NULL
3:4 Node2 data1
3:4:5 Node3 NULL
3:4:5:6 Node4 data2
(3:4:5:6:7 is excluded from the result since it is not needed to form hierarchy with records that return data)
This is not so nice. but seems to work:
SELECT it.id, it.name, ost.data
FROM
(SELECT DISTINCT t.id, t.name
FROM ids t JOIN someotherdata st
ON instr(':'||st.currnode||':', ':'||t.currnode||':') >0) it LEFT JOIN someotherdata ost
ON it.id = ost.currnode
Edit ok, this is nicer:
select distinct t.id, t.name, st.data
from ids t left outer join someotherdata st on t.id = st.currnode
start with t.id in (select ist.currnode from someotherdata ist)
connect by prior t.parent = t.currnode
order by t.id
Here is a sqlfiddle demo

SQL Server 2008, how to check if multi records exist in the DB?

I have 3 tables:
recipe:
id, name
ingredient:
id, name
recipeingredient:
id, recipeId, ingredientId, quantity
Every time, a customer creates a new recipe, I need to check the recipeingredient table to verify if this recipe exists or not. If ingredientId and quantity are exactly the same, I will tell the customer the recipe already exists. Since I need to check multiple rows, need help to write this query.
Knowing your ingredients and quantities, you can do something like this:
select recipeId as ExistingRecipeID
from recipeingredient
where (ingredientId = 1 and quantity = 1)
or (ingredientId = 8 and quantity = 1)
or (ingredientId = 13 and quantity = 1)
group by recipeId
having count(*) = 3 --must match # of ingeredients in WHERE clause
I originally thought that the following query would find pairs of recipes that have exactly the same ingredients:
select ri1.recipeId, ri2.recipeId
from RecipeIngredient ri1 full outer join
RecipeIngredient ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having count(ri1.id) = count(ri2.id) and -- same number of ingredients
count(ri1.id) = count(*) and -- all r1 ingredients are present
count(*) = count(ri2.id) -- all r2 ingredents are present
However, this query doesn't count things correctly, because the mismatches don't have the right pairs of ids. Alas.
The following does do the correct comparison. It counts the ingredients in each recipe before the join, so this value can just be compared on all matching rows.
select ri1.recipeId, ri2.recipeId
from (select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri1 full outer join
(select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having max(ri1.numingredients) = max(ri2.numingredients) and
max(ri1.numingredients) = count(*)
The having clause guarantees that each recipe that the same number of ingredients, and that the number of matching ingredients is the total. This time, I've tested it on the following data:
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
insert into #recipeingredient select 2, 3, 10
insert into #recipeingredient select 3, 1, 1
insert into #recipeingredient select 4, 1, 1
insert into #recipeingredient select 4, 3, 10
insert into #recipeingredient select 5, 1, 1
insert into #recipeingredient select 5, 2, 10
If you have a new recipe, you can modify this query to just look for the recipe in one of the tables (say ri1) using an additional condition on the on clause.
If you place the ingredients in a temporary table, you can substitute one of these tables, say ri1, with the new table.
You might try something like this to find if you have a duplicate:
-- Setup test data
declare #recipeingredient table (
id int not null primary key identity
, recipeId int not null
, ingredientId int not null
, quantity int not null
)
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
-- Actual Query
if exists (
select *
from #recipeingredient old
full outer join #recipeingredient new
on old.recipeId != new.recipeId -- Different recipes
and old.ingredientId = new.ingredientId -- but same ingredients
and old.quantity = new.quantity -- and same quantities
where old.id is null -- Match not found
or new.id is null -- Match not found
)
begin
select cast(0 as bit) as IsDuplicateRecipe
end
else begin
select cast(1 as bit) as IsDuplicateRecipe
end
Since this is really only searching for a duplicate, you might want to substitute a temp table or pass a table variable for the "new" table. This way you wouldn't have to insert the new records before doing your search. You could also insert into the base tables, wrap the whole thing in a transaction and rollback based upon the results.

Select products where the category belongs to any category in the hierarchy

I have a products table that contains a FK for a category, the Categories table is created in a way that each category can have a parent category, example:
Computers
Processors
Intel
Pentium
Core 2 Duo
AMD
Athlon
I need to make a select query that if the selected category is Processors, it will return products that is in Intel, Pentium, Core 2 Duo, Amd, etc...
I thought about creating some sort of "cache" that will store all the categories in the hierarchy for every category in the db and include the "IN" in the where clause. Is this the best solution?
The best solution for this is at the database design stage. Your categories table needs to be a Nested Set. The article Managing Hierarchical Data in MySQL is not that MySQL specific (despite the title), and gives a great overview of the different methods of storing a hierarchy in a database table.
Executive Summary:
Nested Sets
Selects are easy for any depth
Inserts and deletes are hard
Standard parent_id based hierarchy
Selects are based on inner joins (so get hairy fast)
Inserts and deletes are easy
So based on your example, if your hierarchy table was a nested set your query would look something like this:
SELECT * FROM products
INNER JOIN categories ON categories.id = products.category_id
WHERE categories.lft > 2 and categories.rgt < 11
the 2 and 11 are the left and right respectively of the Processors record.
Looks like a job for a Common Table Expression.. something along the lines of:
with catCTE (catid, parentid)
as
(
select cat.catid, cat.catparentid from cat where cat.name = 'Processors'
UNION ALL
select cat.catid, cat.catparentid from cat inner join catCTE on cat.catparentid=catcte.catid
)
select distinct * from catCTE
That should select the category whose name is 'Processors' and any of it's descendents, should be able to use that in an IN clause to pull back the products.
I have done similar things in the past, first querying for the category ids, then querying for the products "IN" those categories. Getting the categories is the hard bit, and you have a few options:
If the level of nesting of categories is known or you can find an upper bound: Build a horrible-looking SELECT with lots of JOINs. This is fast, but ugly and you need to set a limit on the levels of the hierarchy.
If you have a relatively small number of total categories, query them all (just ids, parents), collect the ids of the ones you care about, and do a SELECT....IN for the products. This was the appropriate option for me.
Query up/down the hierarchy using a series of SELECTs. Simple, but relatively slow.
I believe recent versions of SQLServer have some support for recursive queries, but haven't used them myself.
Stored procedures can help if you don't want to do this app-side.
What you want to find is the transitive closure of the category "parent" relation. I suppose there's no limitation to the category hierarchy depth, so you can't formulate a single SQL query which finds all categories. What I would do (in pseudocode) is this:
categoriesSet = empty set
while new.size > 0:
new = select * from categories where parent in categoriesSet
categoriesSet = categoriesSet+new
So just keep on querying for children until no more are found. This behaves well in terms of speed unless you have a degenerated hierarchy (say, 1000 categories, each a child of another), or a large number of total categories. In the second case, you could always work with temporary tables to keep data transfer between your app and the database small.
Maybe something like:
select *
from products
where products.category_id IN
(select c2.category_id
from categories c1 inner join categories c2 on c1.category_id = c2.parent_id
where c1.category = 'Processors'
group by c2.category_id)
[EDIT] If the category depth is greater than one this would form your innermost query. I suspect that you could design a stored procedure that would drill down in the table until the ids returned by the inner query did not have children -- probably better to have an attribute that marks a category as a terminal node in the hierarchy -- then perform the outer query on those ids.
CREATE TABLE #categories (id INT NOT NULL, parentId INT, [name] NVARCHAR(100))
INSERT INTO #categories
SELECT 1, NULL, 'Computers'
UNION
SELECT 2, 1, 'Processors'
UNION
SELECT 3, 2, 'Intel'
UNION
SELECT 4, 2, 'AMD'
UNION
SELECT 5, 3, 'Pentium'
UNION
SELECT 6, 3, 'Core 2 Duo'
UNION
SELECT 7, 4, 'Athlon'
SELECT *
FROM #categories
DECLARE #id INT
SET #id = 2
; WITH r(id, parentid, [name]) AS (
SELECT id, parentid, [name]
FROM #categories c
WHERE id = #id
UNION ALL
SELECT c.id, c.parentid, c.[name]
FROM #categories c JOIN r ON c.parentid=r.id
)
SELECT *
FROM products
WHERE p.productd IN
(SELECT id
FROM r)
DROP TABLE #categories
The last part of the example isn't actually working if you're running it straight like this. Just remove the select from the products and substitute with a simple SELECT * FROM r
This should recurse down all the 'child' catagories starting from a given catagory.
DECLARE #startingCatagoryId int
DECLARE #current int
SET #startingCatagoryId = 13813 -- or whatever the CatagoryId is for 'Processors'
CREATE TABLE #CatagoriesToFindChildrenFor
(CatagoryId int)
CREATE TABLE #CatagoryTree
(CatagoryId int)
INSERT INTO #CatagoriesToFindChildrenFor VALUES (#startingCatagoryId)
WHILE (SELECT count(*) FROM #CatagoriesToFindChildrenFor) > 0
BEGIN
SET #current = (SELECT TOP 1 * FROM #CatagoriesToFindChildrenFor)
INSERT INTO #CatagoriesToFindChildrenFor
SELECT ID FROM Catagory WHERE ParentCatagoryId = #current AND Deleted = 0
INSERT INTO #CatagoryTree VALUES (#current)
DELETE #CatagoriesToFindChildrenFor WHERE CatagoryId = #current
END
SELECT * FROM #CatagoryTree ORDER BY CatagoryId
DROP TABLE #CatagoriesToFindChildrenFor
DROP TABLE #CatagoryTree
i like to use a stack temp table for hierarchal data.
here's a rough example -
-- create a categories table and fill it with 10 rows (with random parentIds)
CREATE TABLE Categories ( Id uniqueidentifier, ParentId uniqueidentifier )
GO
INSERT
INTO Categories
SELECT NEWID(),
NULL
GO
INSERT
INTO Categories
SELECT TOP(1)NEWID(),
Id
FROM Categories
ORDER BY Id
GO 9
DECLARE #lvl INT, -- holds onto the level as we move throught the hierarchy
#Id Uniqueidentifier -- the id of the current item in the stack
SET #lvl = 1
CREATE TABLE #stack (item UNIQUEIDENTIFIER, [lvl] INT)
-- we fill fill this table with the ids we want
CREATE TABLE #tmpCategories (Id UNIQUEIDENTIFIER)
-- for this example we’ll just select all the ids
-- if we want all the children of a specific parent we would include it’s id in
-- this where clause
INSERT INTO #stack SELECT Id, #lvl FROM Categories WHERE ParentId IS NULL
WHILE #lvl > 0
BEGIN -- begin 1
IF EXISTS ( SELECT * FROM #stack WHERE lvl = #lvl )
BEGIN -- begin 2
SELECT #Id = [item]
FROM #stack
WHERE lvl = #lvl
INSERT INTO #tmpCategories
SELECT #Id
DELETE FROM #stack
WHERE lvl = #lvl
AND item = #Id
INSERT INTO #stack
SELECT Id, #lvl + 1
FROM Categories
WHERE ParentId = #Id
IF ##ROWCOUNT > 0
BEGIN -- begin 3
SELECT #lvl = #lvl + 1
END -- end 3
END -- end 2
ELSE
SELECT #lvl = #lvl - 1
END -- end 1
DROP TABLE #stack
SELECT * FROM #tmpCategories
DROP TABLE #tmpCategories
DROP TABLE Categories
there is a good explanation here link text
My answer to another question from a couple days ago applies here... recursion in SQL
There are some methods in the book which I've linked which should cover your situation nicely.