Find level of categories in a hierarchy - sql

I have a script to find the hierarchy of a list of categories. However I would need to know the depth of the level to find all corresponding categories. So my question is there a better way to rewrite the code so that it can drill down to the lowest level without my need to specify how deep to go. As the example below, I had to know that the category level goes to 5 level deep to find all categories.
WITH rCTE AS
(
SELECT
*,
0 AS Level
FROM dbo.inv_category ic
WHERE ic.Primary_org_id = 56392
UNION ALL
SELECT
t.*,
r.Level + 1 AS Level
FROM dbo.inv_category t
INNER JOIN rCTE r ON t.Parent_id = r.Category_id
)
SELECT DISTINCT
c1.Parent_id, c1.Category_id, c1.Category,
c2.Category, c2.Category_id, c2.Parent_id,
c3.Category, c3.Category_id, c3.Parent_id,
c4.Category, c4.Category_id, c4.Parent_id,
c5.Category, c5.Category_id, c5.Parent_id
FROM
rCTE c1
LEFT OUTER JOIN
rCTE c2 ON c1.Category_id = c2.Parent_id
LEFT OUTER JOIN
rCTE c3 ON c2.Category_id = c3.Parent_id
LEFT OUTER JOIN
rCTE c4 ON c3.Category_id = c4.Parent_id
LEFT OUTER JOIN
rCTE c5 ON c4.Category_id = c5.Parent_id
WHERE
c1.Parent_id = 0
ORDER BY
c1.Category, c2.Category

Your problem is that you want the parent categories in different columns, which makes the task more complicated that it could be.
One option to shorten the query and avoid multiple joins uses conditional aggregation:
with rcte as (...)
select
max(case when level = 0 then parent_id end) parent_id_0,
max(case when level = 0 then category_id end) category_id_0,
max(case when level = 0 then category end) category_0,
max(case when level = 1 then parent_id end) parent_id_1,
max(case when level = 1 then category_id end) category_id_1,
max(case when level = 1 then category end) category_1,
...
from rcte
You can add more triplets of conditional expression to manage as many maximum levels as needed; when the actual hierarchy of the product exhausts, the following columns will come out empty.
Another option is string aggregation. This generates a unique column for each original column, with all values concatenated in the order in which they appear in the hierarchy:
with rcte as (...)
select
string_agg(parent_id, ' > ') within group(order by level) parent_ids,
string_agg(category_id_id, ' > ') within group(order by level) category_ids,
string_agg(category, ' > ') within group(order by level) categories
from rcte

Related

Using SQL recursive query as AND statement

I have the following flowchart. Hopefully, it's self-explanatory
On top of the hierarchy there's a request that is a basic parent of all the request below it. Requests below have the 'id', 'parent_id', 'state' fields
My final goal is to find all parents ids that satisfy all AND statements including the last one (hierarchical query). However, I don't know how to use it as an AND statement.
The hierarchical query looks like this:
with cte
as (select id, state
from tbl_request as rH
WHERE id = /* each id from the very first select */
UNION ALL
select rH.id, rH.state
from tbl_request as rH
join cte
on rH.parent_id = cte.id
and (cte.state is null or cte.state NOT IN('not-legit'))
)
select case when exists(select 1 from cte where cte.state IN('not-legit'))
then 1 else 0 end
Expectantly, it does what it's supposed to
The solution was suggested in the question
Return true/false in recursive SQL query based on condition
For your convenience, here's a SQL Fiddle
I think I've worked out what you want.
You need to recurse through all the nodes and their children, returning its state and its ultimate root parent_id.
Then aggregate by that ID and exclude any group that contains a row with state = 'not-legit'. In other words, flip the logic to a double negative.
WITH cte AS (
SELECT rH.id, rH.state, rH.id AS top_parent
FROM tbl_request as rH
WHERE (rH.state is null or rH.state <> 'not-legit')
AND rH.parent_id IS NULL
UNION ALL
SELECT rH.id, rH.state, cte.top_parent
FROM tbl_request as rH
JOIN cte
ON rH.parent_id = cte.id
)
SELECT top_parent
FROM cte
GROUP BY
cte.top_parent
HAVING COUNT(CASE WHEN cte.state = 'not-legit' THEN 1 END) = 0;
You could also change the logic back to a positive, but it would need to look like this:
HAVING COUNT(CASE WHEN cte.state is null or cte.state <> 'not-legit' THEN 1 END) = COUNT(*)
In other words, there are the same number of these filtered rows as there are all rows.
This feels more complex than what I have put above.
SQL Fiddle
Replace your
WHERE id = /* each id from the very first select */
by
WHERE id in (
SELECT r.id FROM tbl_request AS r
/* there's also an INNER JOIN (hopefully, it won't be an obstacle) */
WHERE r.parent_id is null
/* a lot of AND statements */
)
Also, you should use UNION instead of UNION ALL, since there is no point using duplicated tuples (id and state) in this case.
To summarize, your query should look like this one
with cte
as (select id, state
from tbl_request as rH
WHERE id in (
SELECT r.id
FROM tbl_request AS r
/* there's also an INNER JOIN (hopefully, it won't be an obstacle) */
WHERE r.parent_id is null
/* a lot of AND statements */
) UNION
select rH.id, rH.state
from tbl_request as rH
join cte
on rH.parent_id = cte.id
and (cte.state is null or cte.state NOT IN('not-legit'))
)
Your subquery can contain any inner joins or any number of AND operators you need, as long as it returns one column (select r.id) it will work fine.

How to distinct column by starting from earliest/latest row with SQL query?

how can I distinct the column but the row were removed is from the earliest found/retain the last?
I have tried some ways, but all of them not worked.
below is the raw, column that I want to work with
parent_item_id
------------------------------------
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
below is the ways I tried:
using the default behaviour of distinct like this.
query:
WITH tree AS (SELECT distinct(ic.parent_item_id) FROM dbo.item_combination ic, dbo.product p WHERE ic.child_item_id != p.item_id
UNION ALL
SELECT ic.parent_item_id FROM tree t, dbo.item_combination ic WHERE t.parent_item_id=ic.child_item_id
)
SELECT DISTINCT (parent_item_id) from tree
result:
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
using row_number like this. but based on my logic it should change the order but why the final result is the same as way 1?
query:
WITH tree AS (SELECT distinct(ic.parent_item_id) FROM dbo.item_combination ic, dbo.product p WHERE ic.child_item_id != p.item_id
UNION ALL
SELECT ic.parent_item_id FROM tree t, dbo.item_combination ic WHERE t.parent_item_id=ic.child_item_id
)
SELECT DISTINCT(parent_item_id) FROM
(
SELECT t.parent_item_id, [row_number]=ROW_NUMBER() OVER(ORDER BY (SELECT 1)) FROM tree t ORDER BY [row_number] DESC OFFSET 0 ROWS
) r
group by r.parent_item_id, r.[row_number]
result:
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
the result I want/expected is like this.
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
From your comments, This is what I think should happen:
You need to establish a parent-child item view or a product-source view. This would look like:
create view v_ProductSourceMap as
SELECT ic.parent_item_id as item_id, p.item_id as source_id
FROM dbo.item_combination ic left join dbo.product p on ic.child_item_id = p.item_id
group by ic.parent_item_id, p.item_id
Check that the view represents all items derived from other items and for new items, source_id will be null.
select * from v_ProductSourceMap order by item_id, source_id
Now use a recursive query to traverse the mapping
WITH tree AS (
SELECT item_id, source_id, 1 as depth, cast(ic.item_id as varchar(max)) as bc
FROM v_ProductSourceMap ic
WHERE source_id is null
UNION ALL
SELECT ic.item_id, source_id, t.depth + 1, t.bc + '>' + cast(ic.item_id as varchar(max))
FROM tree t, v_ProductSourceMap ic WHERE ic.source_id=t.item_id
)
select * from tree
From here, look at the pattern in the depth and/or the breadcrumbs to figure out what your sort order could be.

Troubles isolating target cell in recursive sql query

I have a table, let's say it looks like this:
c | p
=====
|1|3|
|2|1|
|7|5|
c stands for current and p stands for parent
Given a c value of 2 I would return its top most ancestor (which has no parent) this value is 3. Since this is a self referencing table, I figured using CTE would be the best method however I am very new to using it. Nevertheless, I gave it a shot:
WITH Tree(this, parent) AS
( SELECT c ,p
FROM myTable
WHERE c = '2'
UNION ALL
SELECT M.c ,M.p
FROM myTable M
JOIN Tree T ON T.parent = M.c )
SELECT parent
FROM Tree
However this returns:
1
3
I only want 3 though. I have tried putting WHERE T.parent <> M.c but that doesn't entirely make sense. Neadless to say, I am a little confused for how to isolate the grandparent.
DECLARE #Table AS TABLE (Child INT, Parent INT)
INSERT INTO #Table VALUES (1,3),(2,1),(7,5)
;WITH cteRecursive AS (
SELECT
OriginalChild = Child
,Child
,Parent
,Level = 0
FROM
#Table
WHERE
Child = 2
UNION ALL
SELECT
c.OriginalChild
,t.Child
,t.Parent
,Level + 1
FROM
cteRecursive c
INNER JOIN #Table t
ON c.Parent = t.Child
)
SELECT TOP 1 TopAncestor = Parent
FROM
cteRecursive
ORDER BY
Level DESC
Use a recursive cte to Recuse up the tree until you cannot. Keep track of the Level of recursion, then take the last level of recursions parent and you have the top ancestor.
And just because I wrote it I will add in if you wanted to find the top ancestor of every child. The concept is still the same but you would need to introduce a row_number() to find the last level that was recursed.
DECLARE #Table AS TABLE (Child INT, Parent INT)
INSERT INTO #Table VALUES (1,3),(2,1),(7,5),(5,9)
;WITH cteRecursive AS (
SELECT
OriginalChild = Child
,Child
,Parent
,Level = 0
FROM
#Table
UNION ALL
SELECT
c.OriginalChild
,t.Child
,t.Parent
,Level + 1
FROM
cteRecursive c
INNER JOIN #Table t
ON c.Parent = t.Child
)
, cteTopAncestorRowNum AS (
SELECT
*
,TopAncestorRowNum = ROW_NUMBER() OVER (PARTITION BY OriginalChild ORDER BY Level DESC)
FROM
cteRecursive
)
SELECT
Child = OriginalChild
,TopMostAncestor = Parent
FROM
cteTopAncestorRowNum
WHERE
TopAncestorRowNum = 1

Getting active record based on column value

I have a database table named BusinessAssociate and in that table for the sake of complexity there are 2 columns
BusinessAssociateKey int
AmalgamatedIntoBAKey int
Using the BusinessAssociateKey we can join on other tables, and one of those tables (BACorporateStatus) tells us if that BusinessAssociate is active or amalgamated.
Let's assume that Business Associate key 123456 is amalgamated into BA Key 987654, in the same table there will be a row, with a BusinessAssociateKey of 987654, and this row may well be amalgamated too, for example into BusinessAssociateKey 283746.
Is there a way on a per BusinessAssociateKey to find the active (not amalgamated) Business Associate?
The number of chains is unknown, could be none or could be n.
Edit: Here is a SQL Fiddle, http://sqlfiddle.com/#!9/1e886/1 and in this example BusinessAssociateKey 56781 is not amalgamated, so for BusinessAssociateKey 123 the surviving/active BA Key is 56781.
Do a self join with the table. Here I have added row number to get last records using self join.
Select F.Nbr, F.BusinessAssociateKey, F.AmalgamatedIntoBAKey
From
(Select row_number() Over(order by (select 1)) as Nbr, E.BusinessAssociateKey, E.AmalgamatedIntoBAKey
From BusinessAssociate E
) F
LEFT OUTER JOIN
(Select row_number() Over(order by (select 1)) as Nbr, E.BusinessAssociateKey, E.AmalgamatedIntoBAKey
From BusinessAssociate E
) K
ON F.AmalgamatedIntoBAKey = K.BusinessAssociateKey
where K.Nbr IS NULL
http://sqlfiddle.com/#!6/88b53/26
Recursion:
;with rec_cte as(
select b1.BusinessAssociateKey, b1.AmalgamatedIntoBAKey, 1 as rn
from BusinessAssociate b1 left outer join BusinessAssociate b2 on b1.BusinessAssociateKey = b2.AmalgamatedIntoBAKey
where b2.BusinessAssociateKey is null
union all
select c.BusinessAssociateKey, b.AmalgamatedIntoBAKey, c.rn + 1
from rec_cte c inner join BusinessAssociate b on c.AmalgamatedIntoBAKey = b.BusinessAssociateKey
where b.AmalgamatedIntoBAKey is not null),
cte as(
select BusinessAssociateKey, max(rn) as rn
from rec_cte
group by BusinessAssociateKey)
select r.BusinessAssociateKey, r.AmalgamatedIntoBAKey
from rec_cte r inner join cte c on r.BusinessAssociateKey = c.BusinessAssociateKey and r.rn = c.rn
option (maxdop 0)

Using CTE with hierarchical data and 'cumulative' values

I'm experimenting with SQL Common Table Expressions using a sample hierarchy of cities, countries and continents and which have been visited and which haven't.
The table t_hierarchy looks like this:
(NOTE: The visited column is deliberately NULL for non-cities because I want this to be a dynamically calculated percentage.)
I have then used the following SQL to create a recursive result set based on the data in t_hierarchy:
WITH myCTE (ID, name, type, parentID, visited, Depth)
AS
(
Select ID, name, type, parentID, visited, 0 as Depth From t_hierarchy where parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, t_hierarchy.visited, Depth + 1
From t_hierarchy
inner join myCte on t_hierarchy.parentID = myCte.ID
)
Select ID, name, type, parentID, Depth, cnt.numDirectChildren, visited
FROM myCTE
LEFT JOIN (
SELECT theID = parentID, numDirectChildren = COUNT(*)
FROM myCTE
GROUP BY parentID
) cnt ON cnt.theID = myCTE.ID
order by ID
The result looks like this:
What I would like to do now, which I am struggling with, is to create a column, e.g. visitedPercentage to show the percentage of cities visited for each 'level' of the hierarchy (treating cities differently to countries and continents). To explain, working our way up the 'tree':
Madrid would be 100% because it has been visited (visited = 1)
Barcelona would be 0% because it has not been visited (visited = 0)
Spain would therefore be 50% because it has 2 direct children and one is 100% and the other is 0%
Europe would therefore be 50% because Spain is 50%, France is 100% (Paris has been visited), and Germany is 0% (Berlin has not been visited)
I hope this makes sense. I kind of want to say "if it's not a city, work out the visitedPercentage of THIS level based on the visitedPercentage of all direct children, otherwise just show 100% or 0%. Any guidance is much appreciated.
UPDATE:
I've managed to progress it a bit further using Daniel Gimenez's suggestion to the point where I've got France 100, Spain 50 etc. But the top level items (e.g. Europe) are still 0, like this:
I think this is because the calculation is being done after the recursive part of the query, rather than within it. I.e. this line:
SELECT... , visitPercent = SUM(CAST visited AS int) / COUNT(*) FROM myCTE GROUP BY parentID
is saying "look at the visited column for child objects, calculate the SUM of the values, and show the result as visitPercent", when it should be saying "look at the existing visitPercent value from the previous calculation", if that makes sense. I've no idea where to go from here! :)
I think I've done it, using 2 CTE's. In the end it was easier to get the total number of descendents for each level (children, grandchildren etc) and use that to calculate the overall percentage.
That was painful. At one point typing 'CATS' instead of 'CAST' had me puzzled for about 10 minutes.
with cte1 (ID,parentID,type,name,visited,Lvl) as (
select t.ID, t.parentID, t.type, t.name, t.visited, 0 as [Lvl]
from t_hierarchy t
where t.parentID is not null
union all
select c.ID, t.parentID, c.type, c.name, c.visited, c.Lvl + 1
from t_hierarchy t
inner join cte1 c on c.parentID = t.ID
where t.parentID is not null
),
cte2 (ID,name,type,parentID,parentName_for_reference,visited,Lvl) as (
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, p.name as parentName_for_reference, t_hierarchy.visited, 0 as Lvl
From t_hierarchy
left join t_hierarchy p ON p.ID = t_hierarchy.parentID
where t_hierarchy.parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID,p.name as parentName_for_reference, t_hierarchy.visited, Lvl + 1
From t_hierarchy
inner join cte2 on t_hierarchy.parentID = cte2.ID
inner join t_hierarchy p ON p.ID = t_hierarchy.parentID
)
select cte2.ID,cte2.name,cte2.type,cte2.parentID,cte2.parentName_for_reference,cte2.visited,cte2.Lvl
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST(cnt.totalDescendents as varchar) END AS totalDescendents
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST(COALESCE(cnt2.totalDescendentsVisited,0) as varchar) END AS totalDescendentsVisited
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST((CAST(ROUND(CAST(COALESCE(cnt2.totalDescendentsVisited,0) as float)/CAST(cnt.totalDescendents as float),2) AS numeric(36,2))*100) as varchar) END as asPercentage
from cte2
left JOIN (
SELECT theID = parentID, COUNT(*) as totalDescendents
FROM cte1
WHERE type = 'city'
GROUP BY parentID
) cnt ON cnt.theID = cte2.ID
left JOIN (
SELECT theID = parentID, COUNT(*) as totalDescendentsVisited
FROM cte1
WHERE type = 'city' AND visited = 1
GROUP BY parentID
) cnt2 ON cnt2.theID = cte2.ID
ORDER BY ID
These posts were helpful:
Keeping it simple and how to do multiple CTE in a query
CTE to get all children (descendants) of a parent