Using CTE with hierarchical data and 'cumulative' values - sql

I'm experimenting with SQL Common Table Expressions using a sample hierarchy of cities, countries and continents and which have been visited and which haven't.
The table t_hierarchy looks like this:
(NOTE: The visited column is deliberately NULL for non-cities because I want this to be a dynamically calculated percentage.)
I have then used the following SQL to create a recursive result set based on the data in t_hierarchy:
WITH myCTE (ID, name, type, parentID, visited, Depth)
AS
(
Select ID, name, type, parentID, visited, 0 as Depth From t_hierarchy where parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, t_hierarchy.visited, Depth + 1
From t_hierarchy
inner join myCte on t_hierarchy.parentID = myCte.ID
)
Select ID, name, type, parentID, Depth, cnt.numDirectChildren, visited
FROM myCTE
LEFT JOIN (
SELECT theID = parentID, numDirectChildren = COUNT(*)
FROM myCTE
GROUP BY parentID
) cnt ON cnt.theID = myCTE.ID
order by ID
The result looks like this:
What I would like to do now, which I am struggling with, is to create a column, e.g. visitedPercentage to show the percentage of cities visited for each 'level' of the hierarchy (treating cities differently to countries and continents). To explain, working our way up the 'tree':
Madrid would be 100% because it has been visited (visited = 1)
Barcelona would be 0% because it has not been visited (visited = 0)
Spain would therefore be 50% because it has 2 direct children and one is 100% and the other is 0%
Europe would therefore be 50% because Spain is 50%, France is 100% (Paris has been visited), and Germany is 0% (Berlin has not been visited)
I hope this makes sense. I kind of want to say "if it's not a city, work out the visitedPercentage of THIS level based on the visitedPercentage of all direct children, otherwise just show 100% or 0%. Any guidance is much appreciated.
UPDATE:
I've managed to progress it a bit further using Daniel Gimenez's suggestion to the point where I've got France 100, Spain 50 etc. But the top level items (e.g. Europe) are still 0, like this:
I think this is because the calculation is being done after the recursive part of the query, rather than within it. I.e. this line:
SELECT... , visitPercent = SUM(CAST visited AS int) / COUNT(*) FROM myCTE GROUP BY parentID
is saying "look at the visited column for child objects, calculate the SUM of the values, and show the result as visitPercent", when it should be saying "look at the existing visitPercent value from the previous calculation", if that makes sense. I've no idea where to go from here! :)

I think I've done it, using 2 CTE's. In the end it was easier to get the total number of descendents for each level (children, grandchildren etc) and use that to calculate the overall percentage.
That was painful. At one point typing 'CATS' instead of 'CAST' had me puzzled for about 10 minutes.
with cte1 (ID,parentID,type,name,visited,Lvl) as (
select t.ID, t.parentID, t.type, t.name, t.visited, 0 as [Lvl]
from t_hierarchy t
where t.parentID is not null
union all
select c.ID, t.parentID, c.type, c.name, c.visited, c.Lvl + 1
from t_hierarchy t
inner join cte1 c on c.parentID = t.ID
where t.parentID is not null
),
cte2 (ID,name,type,parentID,parentName_for_reference,visited,Lvl) as (
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, p.name as parentName_for_reference, t_hierarchy.visited, 0 as Lvl
From t_hierarchy
left join t_hierarchy p ON p.ID = t_hierarchy.parentID
where t_hierarchy.parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID,p.name as parentName_for_reference, t_hierarchy.visited, Lvl + 1
From t_hierarchy
inner join cte2 on t_hierarchy.parentID = cte2.ID
inner join t_hierarchy p ON p.ID = t_hierarchy.parentID
)
select cte2.ID,cte2.name,cte2.type,cte2.parentID,cte2.parentName_for_reference,cte2.visited,cte2.Lvl
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST(cnt.totalDescendents as varchar) END AS totalDescendents
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST(COALESCE(cnt2.totalDescendentsVisited,0) as varchar) END AS totalDescendentsVisited
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST((CAST(ROUND(CAST(COALESCE(cnt2.totalDescendentsVisited,0) as float)/CAST(cnt.totalDescendents as float),2) AS numeric(36,2))*100) as varchar) END as asPercentage
from cte2
left JOIN (
SELECT theID = parentID, COUNT(*) as totalDescendents
FROM cte1
WHERE type = 'city'
GROUP BY parentID
) cnt ON cnt.theID = cte2.ID
left JOIN (
SELECT theID = parentID, COUNT(*) as totalDescendentsVisited
FROM cte1
WHERE type = 'city' AND visited = 1
GROUP BY parentID
) cnt2 ON cnt2.theID = cte2.ID
ORDER BY ID
These posts were helpful:
Keeping it simple and how to do multiple CTE in a query
CTE to get all children (descendants) of a parent

Related

How to distinct column by starting from earliest/latest row with SQL query?

how can I distinct the column but the row were removed is from the earliest found/retain the last?
I have tried some ways, but all of them not worked.
below is the raw, column that I want to work with
parent_item_id
------------------------------------
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
below is the ways I tried:
using the default behaviour of distinct like this.
query:
WITH tree AS (SELECT distinct(ic.parent_item_id) FROM dbo.item_combination ic, dbo.product p WHERE ic.child_item_id != p.item_id
UNION ALL
SELECT ic.parent_item_id FROM tree t, dbo.item_combination ic WHERE t.parent_item_id=ic.child_item_id
)
SELECT DISTINCT (parent_item_id) from tree
result:
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
using row_number like this. but based on my logic it should change the order but why the final result is the same as way 1?
query:
WITH tree AS (SELECT distinct(ic.parent_item_id) FROM dbo.item_combination ic, dbo.product p WHERE ic.child_item_id != p.item_id
UNION ALL
SELECT ic.parent_item_id FROM tree t, dbo.item_combination ic WHERE t.parent_item_id=ic.child_item_id
)
SELECT DISTINCT(parent_item_id) FROM
(
SELECT t.parent_item_id, [row_number]=ROW_NUMBER() OVER(ORDER BY (SELECT 1)) FROM tree t ORDER BY [row_number] DESC OFFSET 0 ROWS
) r
group by r.parent_item_id, r.[row_number]
result:
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
the result I want/expected is like this.
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
From your comments, This is what I think should happen:
You need to establish a parent-child item view or a product-source view. This would look like:
create view v_ProductSourceMap as
SELECT ic.parent_item_id as item_id, p.item_id as source_id
FROM dbo.item_combination ic left join dbo.product p on ic.child_item_id = p.item_id
group by ic.parent_item_id, p.item_id
Check that the view represents all items derived from other items and for new items, source_id will be null.
select * from v_ProductSourceMap order by item_id, source_id
Now use a recursive query to traverse the mapping
WITH tree AS (
SELECT item_id, source_id, 1 as depth, cast(ic.item_id as varchar(max)) as bc
FROM v_ProductSourceMap ic
WHERE source_id is null
UNION ALL
SELECT ic.item_id, source_id, t.depth + 1, t.bc + '>' + cast(ic.item_id as varchar(max))
FROM tree t, v_ProductSourceMap ic WHERE ic.source_id=t.item_id
)
select * from tree
From here, look at the pattern in the depth and/or the breadcrumbs to figure out what your sort order could be.

Find level of categories in a hierarchy

I have a script to find the hierarchy of a list of categories. However I would need to know the depth of the level to find all corresponding categories. So my question is there a better way to rewrite the code so that it can drill down to the lowest level without my need to specify how deep to go. As the example below, I had to know that the category level goes to 5 level deep to find all categories.
WITH rCTE AS
(
SELECT
*,
0 AS Level
FROM dbo.inv_category ic
WHERE ic.Primary_org_id = 56392
UNION ALL
SELECT
t.*,
r.Level + 1 AS Level
FROM dbo.inv_category t
INNER JOIN rCTE r ON t.Parent_id = r.Category_id
)
SELECT DISTINCT
c1.Parent_id, c1.Category_id, c1.Category,
c2.Category, c2.Category_id, c2.Parent_id,
c3.Category, c3.Category_id, c3.Parent_id,
c4.Category, c4.Category_id, c4.Parent_id,
c5.Category, c5.Category_id, c5.Parent_id
FROM
rCTE c1
LEFT OUTER JOIN
rCTE c2 ON c1.Category_id = c2.Parent_id
LEFT OUTER JOIN
rCTE c3 ON c2.Category_id = c3.Parent_id
LEFT OUTER JOIN
rCTE c4 ON c3.Category_id = c4.Parent_id
LEFT OUTER JOIN
rCTE c5 ON c4.Category_id = c5.Parent_id
WHERE
c1.Parent_id = 0
ORDER BY
c1.Category, c2.Category
Your problem is that you want the parent categories in different columns, which makes the task more complicated that it could be.
One option to shorten the query and avoid multiple joins uses conditional aggregation:
with rcte as (...)
select
max(case when level = 0 then parent_id end) parent_id_0,
max(case when level = 0 then category_id end) category_id_0,
max(case when level = 0 then category end) category_0,
max(case when level = 1 then parent_id end) parent_id_1,
max(case when level = 1 then category_id end) category_id_1,
max(case when level = 1 then category end) category_1,
...
from rcte
You can add more triplets of conditional expression to manage as many maximum levels as needed; when the actual hierarchy of the product exhausts, the following columns will come out empty.
Another option is string aggregation. This generates a unique column for each original column, with all values concatenated in the order in which they appear in the hierarchy:
with rcte as (...)
select
string_agg(parent_id, ' > ') within group(order by level) parent_ids,
string_agg(category_id_id, ' > ') within group(order by level) category_ids,
string_agg(category, ' > ') within group(order by level) categories
from rcte

How to detect cyclical reference in SQL Server Query - SQL Server 2017

I have a recursive WITH query in SQL Server 2017:
;WITH rec AS (
SELECT
col1 AS root_order
,col1
,col2
,col3
,col4
,col5
,col6
,col7
,col8
,col9
FROM
TableA
UNION ALL
SELECT
rec.root_order,
TableA.col2,
TableA.col3,
TableA.col4,
TableA.col5,
TableA.col6,
TableA.col7,
TableA.col8,
TableA.col9,
rec.the_level
FROM
rec
INNER JOIN TableA on rec.Details = TableA.Orders
)
SELECT DISTINCT * FROM rec
This yields a: The statement terminated. The maximum recursion 100 has been exhausted before statement completion. error.
I have tried:
OPTION (maxrecursion 0) to let it continue. But when I do that, the query infinitely loops, so that doesn't work.
In Oracle, I can use CONNECT BY ROOT and CONNECT BY PRIOR and NOCYCLE, but I know things like that aren't available in SQL Server. So I found this MSDN link which suggest something of the form:
with hierarchy
as
(
select
child,
parent,
0 as cycle,
CAST('.' as varchar(max)) + LTRIM(child) + '.' as [path]
from
#hier
where
parent is null
union all
select
c.child,
c.parent,
case when p.[path] like '%.' + LTRIM(c.child) + '.%' then 1 else 0 end as cycle,
p.[path] + LTRIM(c.child) + '.' as [path]
from
hierarchy as p
inner join
#hier as c
on p.child = c.parent
and p.cycle = 0
)
select
child,
parent,
[path]
from
hierarchy
where
cycle = 1;
go
For finding the cycles (or avoiding them). I cannot seem to take my current query and edit it in that fashion. How can I edit my current SQL to perform the cyclic reference detection like in the MSDN article?
Some sample data as requested here in SQL FIDDLE.
What I normally do is pretty simple. In the anchor query (the first part of the CTE), I include a value "1 AS Level" in the select list. Then in the bottom query, I select Level + 1 as the Level, so I know what depth I'm up to. Then I can just put a sanity clause into the bottom query to limit the depth i.e. WHERE LEVEL <= 10 or whatever depth you want. But yes, you still need MAXRECURSION set to 0 if you want to go above 100 levels.
Here's an example based on AdventureWorks:
WITH Materials (BillOfMaterialsID, ProductName, ProductAssemblyID, ComponentID, [Level])
AS
(
SELECT bom.BillOfMaterialsID,
p.[Name],
bom.ProductAssemblyID,
bom.ComponentID,
1
FROM Production.BillOfMaterials AS bom
INNER JOIN Production.Product AS p
ON bom.ComponentID = p.ProductID
AND bom.EndDate IS NULL
WHERE bom.ProductAssemblyID IS NULL
UNION ALL
SELECT bom.BillOfMaterialsID,
p.[Name],
bom.ProductAssemblyID,
bom.ComponentID,
m.[Level] + 1
FROM Production.BillOfMaterials AS bom
INNER JOIN Production.Product AS p
ON bom.ComponentID = p.ProductID
INNER JOIN Materials AS m
ON bom.ProductAssemblyID = BOM.ComponentID
WHERE m.[Level] <= 5
)
SELECT m.BillOfMaterialsID,
m.ProductName,
m.ProductAssemblyID,
m.ComponentID,
m.[Level]
FROM Materials AS m
ORDER BY m.[Level], m.BillOfMaterialsID;

Troubles isolating target cell in recursive sql query

I have a table, let's say it looks like this:
c | p
=====
|1|3|
|2|1|
|7|5|
c stands for current and p stands for parent
Given a c value of 2 I would return its top most ancestor (which has no parent) this value is 3. Since this is a self referencing table, I figured using CTE would be the best method however I am very new to using it. Nevertheless, I gave it a shot:
WITH Tree(this, parent) AS
( SELECT c ,p
FROM myTable
WHERE c = '2'
UNION ALL
SELECT M.c ,M.p
FROM myTable M
JOIN Tree T ON T.parent = M.c )
SELECT parent
FROM Tree
However this returns:
1
3
I only want 3 though. I have tried putting WHERE T.parent <> M.c but that doesn't entirely make sense. Neadless to say, I am a little confused for how to isolate the grandparent.
DECLARE #Table AS TABLE (Child INT, Parent INT)
INSERT INTO #Table VALUES (1,3),(2,1),(7,5)
;WITH cteRecursive AS (
SELECT
OriginalChild = Child
,Child
,Parent
,Level = 0
FROM
#Table
WHERE
Child = 2
UNION ALL
SELECT
c.OriginalChild
,t.Child
,t.Parent
,Level + 1
FROM
cteRecursive c
INNER JOIN #Table t
ON c.Parent = t.Child
)
SELECT TOP 1 TopAncestor = Parent
FROM
cteRecursive
ORDER BY
Level DESC
Use a recursive cte to Recuse up the tree until you cannot. Keep track of the Level of recursion, then take the last level of recursions parent and you have the top ancestor.
And just because I wrote it I will add in if you wanted to find the top ancestor of every child. The concept is still the same but you would need to introduce a row_number() to find the last level that was recursed.
DECLARE #Table AS TABLE (Child INT, Parent INT)
INSERT INTO #Table VALUES (1,3),(2,1),(7,5),(5,9)
;WITH cteRecursive AS (
SELECT
OriginalChild = Child
,Child
,Parent
,Level = 0
FROM
#Table
UNION ALL
SELECT
c.OriginalChild
,t.Child
,t.Parent
,Level + 1
FROM
cteRecursive c
INNER JOIN #Table t
ON c.Parent = t.Child
)
, cteTopAncestorRowNum AS (
SELECT
*
,TopAncestorRowNum = ROW_NUMBER() OVER (PARTITION BY OriginalChild ORDER BY Level DESC)
FROM
cteRecursive
)
SELECT
Child = OriginalChild
,TopMostAncestor = Parent
FROM
cteTopAncestorRowNum
WHERE
TopAncestorRowNum = 1

SQL Elaborate Joins Query

I'm trying to solve the below problem.
I feel like it is possible, but I can't seem to get it.
Here's the scenario:
Table 1 (Assets)
1 Asset-A
2 Asset-B
3 Asset-C
4 Asset-D
Table 2 (Attributes)
1 Asset-A Red
2 Asset-A Hard
3 Asset-B Red
4 Asset-B Hard
5 Asset-B Heavy
6 Asset-C Blue
7 Asset-C Hard
If I am looking for something having the same attributes as Asset-A, then it should identify Asset-B since Asset-B has all the same attributes as Asset-A (it should discard heavy, since Asset-A didn't specify anything different or the similar). Also, if I wanted the attributes for only Asset-A AND Asset-B that were common, how would I get that?
Seems simple, but I can't nail it...
The actual table I am using, is almost precisely Table2, simply an association of an AssetId, and an AttributeId so:
PK: Id
int: AssetId
int: AttributeId
I only included the idea of the asset table to simplify the question.
SELECT ato.id, ato.value
FROM (
SELECT id
FROM assets a
WHERE NOT EXISTS
(
SELECT NULL
FROM attributes ata
LEFT JOIN
attributes ato
ON ato.id = ata.id
AND ato.value = ata.value
WHERE ata.id = 1
AND ato.id IS NULL
)
) ao
JOIN attributes ato
ON ato.id = ao.id
JOIN attributes ata
ON ata.id = 1
AND ata.value = ato.value
, or in SQL Server 2005 (with sample data to check):
WITH assets AS
(
SELECT 1 AS id, 'A' AS name
UNION ALL
SELECT 2 AS id, 'B' AS name
UNION ALL
SELECT 3 AS id, 'C' AS name
UNION ALL
SELECT 4 AS id, 'D' AS name
),
attributes AS
(
SELECT 1 AS id, 'Red' AS value
UNION ALL
SELECT 1 AS id, 'Hard' AS value
UNION ALL
SELECT 2 AS id, 'Red' AS value
UNION ALL
SELECT 2 AS id, 'Hard' AS value
UNION ALL
SELECT 2 AS id, 'Heavy' AS value
UNION ALL
SELECT 3 AS id, 'Blue' AS value
UNION ALL
SELECT 3 AS id, 'Hard' AS value
)
SELECT ato.id, ato.value
FROM (
SELECT id
FROM assets a
WHERE a.id <> 1
AND NOT EXISTS
(
SELECT ata.value
FROM attributes ata
WHERE ata.id = 1
EXCEPT
SELECT ato.value
FROM attributes ato
WHERE ato.id = a.id
)
) ao
JOIN attributes ato
ON ato.id = ao.id
JOIN attributes ata
ON ata.id = 1
AND ata.value = ato.value
I don't completely understand the first part of your question, identifying assets based on their attributes.
Making some assumptions about column names, the following query would yield the common attributes between Asset-A and Asset-B:
SELECT [Table 2].Name
FROM [Table 2]
JOIN [Table 1] a ON a.ID = [Table 2].AssetID AND a.Name = 'Asset-A'
JOIN [Table 1] b ON b.ID = [Table 2].AssetID AND b.Name = 'Asset-B'
GROUP BY [Table 2].Name
Select * From Assets A
Where Exists
(Select * From Assets
Where AssetId <> A.AssetID
And (Select Count(*)
From Attributes At1 Join Attributes At2
On At1.AssetId <> At2.AssetId
And At1.attribute <> At2.Attribute
Where At1.AssetId = A.AssetId Asset) = 0 )
And AssetId = 'Asset-A'
select at2.asset, count(*)
from attribute at1
inner join attribute at2 on at1.value = at2.value
where at1.asset = "Asset-A"
and at2.asset != "Asset-A"
group by at2.asset
having count(*) = (select count(*) from attribute where asset = "Asset-A");
Find all assets who have every attribute that "A" has (but also may have additional attributes):
SELECT Other.ID
FROM Assets Other
WHERE
Other.AssetID <> 'Asset-A' -- do not return Asset A as a match to itself
AND NOT EXISTS (SELECT NULL FROM Attributes AttA WHERE
AttA.AssetID='Asset-A'
AND NOT EXISTS (SELECT NULL FROM Attributes AttOther WHERE
AttOther.AssetID=Other.ID AND AttOther.AttributeID = AttA.AttributeID
)
)
I.e., "find any asset where there is no attribute of A that is not also an attribute of this asset".
Find all assets who have exactly the same attributes as "A":
SELECT Other.ID
FROM Assets Other
WHERE
Other.AssetID <> 'Asset-A' -- do not return Asset A as a match to itself
AND NOT EXISTS (SELECT NULL FROM Attributes AttA WHERE
AttA.AssetID='Asset-A'
AND NOT EXISTS (SELECT NULL FROM Attributes AttOther WHERE
AttOther.AssetID=Other.ID
AND AttOther.AttributeID = AttA.AttributeID
)
)
AND NOT EXISTS (SELECT NULL FROM Attributes AttaOther WHERE
AttaOther.AssetID=Other.ID
AND NOT EXISTS (SELECT NULL FROM Attributes AttaA WHERE
AttaA.AssetID='Asset-A'
AND AttaA.AttributeID = AttaOther.AttributeID
)
)
I.e., "find any asset where there is no attribute of A that is not also an attribute of this asset, and where there is no attribute of this asset that is not also an attribute of A."
This solution works as prescribed, thanks for the input.
WITH Atts AS
(
SELECT
DISTINCT
at1.[Attribute]
FROM
Attribute at1
WHERE
at1.[Asset] = 'Asset-A'
)
SELECT
DISTINCT
Asset,
(
SELECT
COUNT(ta2.[Attribute])
FROM
Attribute ta2
INNER JOIN
Atts b
ON
b.[Attribute] = ta2.[attribute]
WHERE
ta2.[Asset] = ta.Asset
)
AS [Count]
FROM
Atts a
INNER JOIN
Attribute ta
ON
a.[Attribute] = ta.[Attribute]
Find all assets that have all the same attributes as asset-a:
select att2.Asset from attribute att1
inner join attribute att2 on att2.Attribute = att1.Attribute and att1.Asset <> att2.Asset
where att1.Asset = 'Asset-A'
group by att2.Asset, att1.Asset
having COUNT(*) = (select COUNT(*) from attribute where Asset=att1.Asset)
I thought maybe I can do this with LINQ and then work my way backwards with:
var result = from productsNotA in DevProducts
where productsNotA.Product != "A" &&
(
from productsA in DevProducts
where productsA.Product == "A"
select productsA.Attribute
).Except
(
from productOther in DevProducts
where productOther.Product == productsNotA.Product
select productOther.Attribute
).Single() == null
select new {productsNotA.Product};
result.Distinct()
I thought that translating this back to SQL with LinqPad would result into a pretty SQL query. However it didn't :). DevProducts is my testtable with a column Product and Attribute. I thought I'd post the LINQ query anyways, might be useful to people who are playing around with LINQ.
If you can optimize the LINQ query above, please let me know (it might result in better SQL ;))
I'm using following DDL
CREATE TABLE Attributes (
Asset VARCHAR(100)
, Name VARCHAR(100)
, UNIQUE(Asset, Name)
)
Second question is easy
SELECT Name
FROM Attributes
WHERE Name IN (SELECT Name FROM Attributes WHERE Asset = 'A')
AND Asset = 'B'
First question is not more difficult
SELECT Asset
FROM Attributes
WHERE Name IN (SELECT Name FROM Attributes WHERE Asset = 'A')
GROUP BY Asset
HAVING COUNT(*) = (SELECT COUNT(*) FROM FROM Attributes WHERE Asset = 'A')
Edit:
I left AND Asset != 'A' out of the WHERE clause of the second snippet for brevity