How to detect cyclical reference in SQL Server Query - SQL Server 2017 - sql

I have a recursive WITH query in SQL Server 2017:
;WITH rec AS (
SELECT
col1 AS root_order
,col1
,col2
,col3
,col4
,col5
,col6
,col7
,col8
,col9
FROM
TableA
UNION ALL
SELECT
rec.root_order,
TableA.col2,
TableA.col3,
TableA.col4,
TableA.col5,
TableA.col6,
TableA.col7,
TableA.col8,
TableA.col9,
rec.the_level
FROM
rec
INNER JOIN TableA on rec.Details = TableA.Orders
)
SELECT DISTINCT * FROM rec
This yields a: The statement terminated. The maximum recursion 100 has been exhausted before statement completion. error.
I have tried:
OPTION (maxrecursion 0) to let it continue. But when I do that, the query infinitely loops, so that doesn't work.
In Oracle, I can use CONNECT BY ROOT and CONNECT BY PRIOR and NOCYCLE, but I know things like that aren't available in SQL Server. So I found this MSDN link which suggest something of the form:
with hierarchy
as
(
select
child,
parent,
0 as cycle,
CAST('.' as varchar(max)) + LTRIM(child) + '.' as [path]
from
#hier
where
parent is null
union all
select
c.child,
c.parent,
case when p.[path] like '%.' + LTRIM(c.child) + '.%' then 1 else 0 end as cycle,
p.[path] + LTRIM(c.child) + '.' as [path]
from
hierarchy as p
inner join
#hier as c
on p.child = c.parent
and p.cycle = 0
)
select
child,
parent,
[path]
from
hierarchy
where
cycle = 1;
go
For finding the cycles (or avoiding them). I cannot seem to take my current query and edit it in that fashion. How can I edit my current SQL to perform the cyclic reference detection like in the MSDN article?
Some sample data as requested here in SQL FIDDLE.

What I normally do is pretty simple. In the anchor query (the first part of the CTE), I include a value "1 AS Level" in the select list. Then in the bottom query, I select Level + 1 as the Level, so I know what depth I'm up to. Then I can just put a sanity clause into the bottom query to limit the depth i.e. WHERE LEVEL <= 10 or whatever depth you want. But yes, you still need MAXRECURSION set to 0 if you want to go above 100 levels.
Here's an example based on AdventureWorks:
WITH Materials (BillOfMaterialsID, ProductName, ProductAssemblyID, ComponentID, [Level])
AS
(
SELECT bom.BillOfMaterialsID,
p.[Name],
bom.ProductAssemblyID,
bom.ComponentID,
1
FROM Production.BillOfMaterials AS bom
INNER JOIN Production.Product AS p
ON bom.ComponentID = p.ProductID
AND bom.EndDate IS NULL
WHERE bom.ProductAssemblyID IS NULL
UNION ALL
SELECT bom.BillOfMaterialsID,
p.[Name],
bom.ProductAssemblyID,
bom.ComponentID,
m.[Level] + 1
FROM Production.BillOfMaterials AS bom
INNER JOIN Production.Product AS p
ON bom.ComponentID = p.ProductID
INNER JOIN Materials AS m
ON bom.ProductAssemblyID = BOM.ComponentID
WHERE m.[Level] <= 5
)
SELECT m.BillOfMaterialsID,
m.ProductName,
m.ProductAssemblyID,
m.ComponentID,
m.[Level]
FROM Materials AS m
ORDER BY m.[Level], m.BillOfMaterialsID;

Related

Why Row_Number in a view gives a nullable column

I have a view using a CTE and I want use a row number to simulate a key for my edmx in Visual Studio
ALTER VIEW [dbo].[ViewLstTypesArticle]
AS
WITH cte (IdTypeArticle, IdTypeArticleParent, Logo, Libelle, FullLibelle, Racine) AS
(
SELECT
f.Id AS IdTypeArticle, NULL AS IdParent,
f.Logo, f.Libelle,
CAST(f.Libelle AS varchar(MAX)) AS Expr1,
f.Id AS Racine
FROM
dbo.ArticleType AS f
LEFT OUTER JOIN
dbo.ArticleTypeParent AS p ON p.IdTypeArticle = f.Id
WHERE
(p.IdTypeArticleParent IS NULL)
AND (f.Affichable = 1)
UNION ALL
SELECT
f.Id AS IdTypeArticle, p.IdTypeArticleParent,
f.Logo, f.Libelle,
CAST(parent.Libelle + ' / ' + f.Libelle AS varchar(MAX)) AS Expr1,
parent.Racine
FROM
dbo.ArticleTypeParent AS p
INNER JOIN
cte AS parent ON p.IdTypeArticleParent = parent.IdTypeArticle
INNER JOIN
dbo.ArticleType AS f ON f.Id = p.IdTypeArticle
)
SELECT
*,
ROW_NUMBER() OVER (ORDER BY FullLibelle) AS Id
FROM
(SELECT
IdTypeArticle, IdTypeArticleParent, Logo, Libelle,
FullLibelle, Racine
FROM cte) AS CTE1
When I look in properties of column I see Id bigint ... NULL
And my edmx exclude this view cause don't find a column can be used to key
When I execute my view ID have no null. I've all my row number.
If someone encounter this problem and resolved it ... Thanks
SQL Server generally thinks that columns are NULL-able in views (and when using SELECT INTO).
You can convince SQL Server that this is not the case by using ISNULL():
select *,
ISNULL(ROW_NUMBER() over(ORDER BY FullLibelle), 0) as Id
from . . .
Note: This works with ISNULL() but not with COALESCE() which otherwise has very similar functionality.

How to distinct column by starting from earliest/latest row with SQL query?

how can I distinct the column but the row were removed is from the earliest found/retain the last?
I have tried some ways, but all of them not worked.
below is the raw, column that I want to work with
parent_item_id
------------------------------------
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
below is the ways I tried:
using the default behaviour of distinct like this.
query:
WITH tree AS (SELECT distinct(ic.parent_item_id) FROM dbo.item_combination ic, dbo.product p WHERE ic.child_item_id != p.item_id
UNION ALL
SELECT ic.parent_item_id FROM tree t, dbo.item_combination ic WHERE t.parent_item_id=ic.child_item_id
)
SELECT DISTINCT (parent_item_id) from tree
result:
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
using row_number like this. but based on my logic it should change the order but why the final result is the same as way 1?
query:
WITH tree AS (SELECT distinct(ic.parent_item_id) FROM dbo.item_combination ic, dbo.product p WHERE ic.child_item_id != p.item_id
UNION ALL
SELECT ic.parent_item_id FROM tree t, dbo.item_combination ic WHERE t.parent_item_id=ic.child_item_id
)
SELECT DISTINCT(parent_item_id) FROM
(
SELECT t.parent_item_id, [row_number]=ROW_NUMBER() OVER(ORDER BY (SELECT 1)) FROM tree t ORDER BY [row_number] DESC OFFSET 0 ROWS
) r
group by r.parent_item_id, r.[row_number]
result:
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
the result I want/expected is like this.
parent_item_id
--
9B3E7A72-D36A-42D3-A04C-186DEC409F93
E75C7294-F0C4-4C6E-8C12-DF5FBC93FA3B
942E1854-9EB4-4C19-8A1E-4FCC4953B50C
From your comments, This is what I think should happen:
You need to establish a parent-child item view or a product-source view. This would look like:
create view v_ProductSourceMap as
SELECT ic.parent_item_id as item_id, p.item_id as source_id
FROM dbo.item_combination ic left join dbo.product p on ic.child_item_id = p.item_id
group by ic.parent_item_id, p.item_id
Check that the view represents all items derived from other items and for new items, source_id will be null.
select * from v_ProductSourceMap order by item_id, source_id
Now use a recursive query to traverse the mapping
WITH tree AS (
SELECT item_id, source_id, 1 as depth, cast(ic.item_id as varchar(max)) as bc
FROM v_ProductSourceMap ic
WHERE source_id is null
UNION ALL
SELECT ic.item_id, source_id, t.depth + 1, t.bc + '>' + cast(ic.item_id as varchar(max))
FROM tree t, v_ProductSourceMap ic WHERE ic.source_id=t.item_id
)
select * from tree
From here, look at the pattern in the depth and/or the breadcrumbs to figure out what your sort order could be.

Query to fetch all referenced entities recursively

I have a datamodels which consists of 'Claims' which (to make things simple for stackoverflow) only has an OpenAmount field. There are two other tables, 'ClaimCoupling' and 'ClaimEntryReference'.
The ClaimCoupling table directly references back to the Claim table and the ClaimEntryReference is effectively the booking of a received amount that can be booked over multiple claims (See ClaimEntry_ID). See this diagram;
For simplicity I've removed all amounts as that's not what I am currently struggling with.
What I want is a query that will start # the Claim table, and fetches all a claim with an OpenAmount which is <> 0. However I want to be able to print out an accurate report of how this OpenAmount came to be, which means I'll need to also print out any Claims coupled to this claim. To make it even more interesting the same thing applies to the bookings, if a booking was made on claim X and claim Y and only X has an open amount I want to fetch both X and Y so I can then show the payment which was booked as a whole.
I've attempted to do this with a recursive CTE but this (rightfully) blows up on the circulair references. I figured I'd fix that with a simple where statement where I would say only recursively add records which are not yet part of CTE but this is not allowed....
WITH coupledClaims AS (
--Get all unique combinations
SELECT cc.SubstractedFromClaim_ID AS Claim_ID,
cc.AddedToClaim_ID AS Linked_Claim_ID FROM dbo.ClaimCoupling cc
UNION
SELECT cc.AddedToClaim_ID AS Claim_ID,
cc.SubstractedFromClaim_ID AS Linked_Claim_ID FROM dbo.ClaimCoupling cc
),
MyClaims as
(
SELECT * FROM Claim WHERE OpenAmount <> 0
UNION ALL
SELECT c.* FROM coupledClaims JOIN MyClaims mc ON coupledClaims.claim_id = mc.ID JOIN claim c ON c.ID = coupledClaims.linked_Claim_ID
WHERE c.ID NOT IN (SELECT ID FROM MyClaims)
)
SELECT * FROM MyClaims
After fiddling around with that for way too long I decided I'd do it with an actual loop... ##Rowcount and simply manually add them to a table variable but as I was writing this solution (which I'm sure I can get to work) I figured I'd ask here first because I don't like writing loops in TSQL as I always feel it's ugly and inefficient.
See the following sql Fiddle for the data models and some test data (I commented out the recursive part as otherwise I was not allowed to create a link);
http://sqlfiddle.com/#!6/129ad5/7/0
I'm hoping someone here will have a great way of handling this problem (likely I'm doing something wrong with the recursive CTE). For completion this is done on MS SQL 2016.
So here is what I've learned and done so far. Thanks to the comment of habo which refers to the following question; Infinite loop in CTE when parsing self-referencing table
Firstly I decided to at least 'solve' my problem and wrote some manual recursion, this solves my problem but is not as 'pretty' as the CTE solution which I was hoping/thinking would be easier to read as well as out perform the manual recursion solution.
Manual Recursion
/****************************/
/* CLAIMS AND PAYMENT LOGIC */
/****************************/
DECLARE #rows as INT = 0
DECLARE #relevantClaimIds as Table(
Debtor_ID INT,
Claim_ID int
)
SET NOCOUNT ON
--Get anchor condition
INSERT INTO #relevantClaimIds (Debtor_ID, Claim_ID)
select Debtor_ID, ID
from Claim c
WHERE OpenAmount <> 0
--Do recursion
WHILE #rows <> (SELECT COUNT(*) FROM #relevantClaimIds)
BEGIN
set #rows = (SELECT COUNT(*) FROM #relevantClaimIds)
--Subtracted
INSERT #relevantClaimIds (Debtor_ID, Claim_ID)
SELECT DISTINCT c.Debtor_ID, c.id
FROM claim c
inner join claimcoupling cc on cc.SubstractedFromClaim_ID = c.ID
JOIN #relevantClaimIds rci on rci.Claim_ID = cc.AddedToClaim_ID
--might be multiple paths to this recursion so eliminate duplicates
left join #relevantClaimIds dup on dup.Claim_ID = c.id
WHERE dup.Claim_ID is null
--Added
INSERT #relevantClaimIds (Debtor_ID, Claim_ID)
SELECT DISTINCT c.Debtor_ID, c.id
FROM claim c
inner join claimcoupling cc on cc.AddedToClaim_ID = c.ID
JOIN #relevantClaimIds rci on rci.Claim_ID = cc.SubstractedFromClaim_ID
--might be multiple paths to this recursion so eliminate duplicates
left join #relevantClaimIds dup on dup.Claim_ID = c.id
WHERE dup.Claim_ID is null
--Payments
INSERT #relevantClaimIds (Debtor_ID, Claim_ID)
SELECT DISTINCT c.Debtor_ID, c.id
FROM #relevantClaimIds f
join ClaimEntryReference cer on f.Claim_ID = cer.Claim_ID
JOIN ClaimEntryReference cer_linked on cer.ClaimEntry_ID = cer_linked.ClaimEntry_ID AND cer.ID <> cer_linked.ID
JOIN Claim c on c.ID = cer_linked.Claim_ID
--might be multiple paths to this recursion so eliminate duplicates
left join #relevantClaimIds dup on dup.Claim_ID = c.id
WHERE dup.Claim_ID is null
END
Then after I received and read the comment I decided to try the CTE solution which looks like this;
CTE Recursion
with Tree as
(
select Debtor_ID, ID AS Claim_ID, CAST(ID AS VARCHAR(MAX)) AS levels
from Claim c
WHERE OpenAmount <> 0
UNION ALL
SELECT c.Debtor_ID, c.id, t.levels + ',' + CAST(c.ID AS VARCHAR(MAX)) AS levels
FROM claim c
inner join claimcoupling cc on cc.SubstractedFromClaim_ID = c.ID
JOIN Tree t on t.Claim_ID = cc.AddedToClaim_ID
WHERE (','+T.levels+',' not like '%,'+cast(c.ID as varchar(max))+',%')
UNION ALL
SELECT c.Debtor_ID, c.id, t.levels + ',' + CAST(c.ID AS VARCHAR(MAX)) AS levels
FROM claim c
inner join claimcoupling cc on cc.AddedToClaim_ID = c.ID
JOIN Tree t on t.Claim_ID = cc.SubstractedFromClaim_ID
WHERE (','+T.levels+',' not like '%,'+cast(c.ID as varchar(max))+',%')
UNION ALL
SELECT c.Debtor_ID, c.id, t.levels + ',' + CAST(c.ID AS VARCHAR(MAX)) AS levels
FROM Tree t
join ClaimEntryReference cer on t.Claim_ID = cer.Claim_ID
JOIN ClaimEntryReference cer_linked on cer.ClaimEntry_ID = cer_linked.ClaimEntry_ID AND cer.ID <> cer_linked.ID
JOIN Claim c on c.ID = cer_linked.Claim_ID
WHERE (','+T.levels+',' not like '%,'+cast(c.ID as varchar(max))+',%')
)
select DISTINCT Tree.Debtor_ID, Tree.Claim_ID
from Tree
This solution is indeed a lot 'shorter' and easier on the eyes but does it actually perform better?
Performance differences
Manual; CPU 16, Reads 1793, Duration 13
CTE; CPU 47, Reads 4001, Duration 48
Conclusion
Not sure if it's due to the varchar cast that is required in the CTE solution or that it has to do one extra iteration before completing it's recursion but it actually requires more resources on all fronts than the manual recursion.
In the end it is possible with CTE however looks aren't everything (thank god ;-)) performance wise sticking with the manual recursion seems like a better route.

Using CTE with hierarchical data and 'cumulative' values

I'm experimenting with SQL Common Table Expressions using a sample hierarchy of cities, countries and continents and which have been visited and which haven't.
The table t_hierarchy looks like this:
(NOTE: The visited column is deliberately NULL for non-cities because I want this to be a dynamically calculated percentage.)
I have then used the following SQL to create a recursive result set based on the data in t_hierarchy:
WITH myCTE (ID, name, type, parentID, visited, Depth)
AS
(
Select ID, name, type, parentID, visited, 0 as Depth From t_hierarchy where parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, t_hierarchy.visited, Depth + 1
From t_hierarchy
inner join myCte on t_hierarchy.parentID = myCte.ID
)
Select ID, name, type, parentID, Depth, cnt.numDirectChildren, visited
FROM myCTE
LEFT JOIN (
SELECT theID = parentID, numDirectChildren = COUNT(*)
FROM myCTE
GROUP BY parentID
) cnt ON cnt.theID = myCTE.ID
order by ID
The result looks like this:
What I would like to do now, which I am struggling with, is to create a column, e.g. visitedPercentage to show the percentage of cities visited for each 'level' of the hierarchy (treating cities differently to countries and continents). To explain, working our way up the 'tree':
Madrid would be 100% because it has been visited (visited = 1)
Barcelona would be 0% because it has not been visited (visited = 0)
Spain would therefore be 50% because it has 2 direct children and one is 100% and the other is 0%
Europe would therefore be 50% because Spain is 50%, France is 100% (Paris has been visited), and Germany is 0% (Berlin has not been visited)
I hope this makes sense. I kind of want to say "if it's not a city, work out the visitedPercentage of THIS level based on the visitedPercentage of all direct children, otherwise just show 100% or 0%. Any guidance is much appreciated.
UPDATE:
I've managed to progress it a bit further using Daniel Gimenez's suggestion to the point where I've got France 100, Spain 50 etc. But the top level items (e.g. Europe) are still 0, like this:
I think this is because the calculation is being done after the recursive part of the query, rather than within it. I.e. this line:
SELECT... , visitPercent = SUM(CAST visited AS int) / COUNT(*) FROM myCTE GROUP BY parentID
is saying "look at the visited column for child objects, calculate the SUM of the values, and show the result as visitPercent", when it should be saying "look at the existing visitPercent value from the previous calculation", if that makes sense. I've no idea where to go from here! :)
I think I've done it, using 2 CTE's. In the end it was easier to get the total number of descendents for each level (children, grandchildren etc) and use that to calculate the overall percentage.
That was painful. At one point typing 'CATS' instead of 'CAST' had me puzzled for about 10 minutes.
with cte1 (ID,parentID,type,name,visited,Lvl) as (
select t.ID, t.parentID, t.type, t.name, t.visited, 0 as [Lvl]
from t_hierarchy t
where t.parentID is not null
union all
select c.ID, t.parentID, c.type, c.name, c.visited, c.Lvl + 1
from t_hierarchy t
inner join cte1 c on c.parentID = t.ID
where t.parentID is not null
),
cte2 (ID,name,type,parentID,parentName_for_reference,visited,Lvl) as (
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, p.name as parentName_for_reference, t_hierarchy.visited, 0 as Lvl
From t_hierarchy
left join t_hierarchy p ON p.ID = t_hierarchy.parentID
where t_hierarchy.parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID,p.name as parentName_for_reference, t_hierarchy.visited, Lvl + 1
From t_hierarchy
inner join cte2 on t_hierarchy.parentID = cte2.ID
inner join t_hierarchy p ON p.ID = t_hierarchy.parentID
)
select cte2.ID,cte2.name,cte2.type,cte2.parentID,cte2.parentName_for_reference,cte2.visited,cte2.Lvl
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST(cnt.totalDescendents as varchar) END AS totalDescendents
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST(COALESCE(cnt2.totalDescendentsVisited,0) as varchar) END AS totalDescendentsVisited
,CASE WHEN type = 'city' THEN 'N/A' ELSE CAST((CAST(ROUND(CAST(COALESCE(cnt2.totalDescendentsVisited,0) as float)/CAST(cnt.totalDescendents as float),2) AS numeric(36,2))*100) as varchar) END as asPercentage
from cte2
left JOIN (
SELECT theID = parentID, COUNT(*) as totalDescendents
FROM cte1
WHERE type = 'city'
GROUP BY parentID
) cnt ON cnt.theID = cte2.ID
left JOIN (
SELECT theID = parentID, COUNT(*) as totalDescendentsVisited
FROM cte1
WHERE type = 'city' AND visited = 1
GROUP BY parentID
) cnt2 ON cnt2.theID = cte2.ID
ORDER BY ID
These posts were helpful:
Keeping it simple and how to do multiple CTE in a query
CTE to get all children (descendants) of a parent

Recursive Query on a self referential table (not hierarchical)

I am creating a state chart of sorts with the data being stored in a simple self referencing table (JobPath)
JobId - ParentJobId
I was using a standard SQL CTE to get the data out which was working perfectly until I ended up with the following data
JobId - ParentId
1 2
2 3
3 4
4 2
Now as you can see Job 4 links to Job 2 which goes to Job 3 and then to Job 4 and so on.
Is there any way I can tell my query not to pull out data it already has?
Here is my current query
WITH JobPathTemp (JobId, ParentId, Level)
AS
(
-- Anchor member definition
SELECT j.JobId, jp.ParentJobId, 1 AS Level
FROM Job AS j
LEFT OUTER JOIN dbo.JobPath AS jp
ON j.JobId = jp.JobId
where j.JobId=1516
UNION ALL
-- Recursive member definition
SELECT j.JobId, jp.ParentJobId, Level + 1
FROM dbo.Job as j
INNER JOIN dbo.JobPath AS jp
ON j.JobId = jp.JobId
INNER JOIN JobPathTemp AS jpt
ON jpt.ParentId = jp.JobId
WHERE jp.ParentJobId <> jpt.JobId
)
-- Statement that executes the CTE
SELECT * FROM JobPathTemp
If you are not dealing with a large number of entries, the following solution might be suitable. The idea is to build the complete "id path" for each row and make sure the "current id" (in the recursive part) is not already in the path being processed:
(I removed the join to jobpath for testing purposes but the basic pattern should be the same)
WITH JobPathTemp (JobId, ParentId, Level, id_path)
AS
(
SELECT jobid,
parentid,
1 as level,
'|' + cast(jobid as varchar(max)) as id_path
FROM job
WHERE jobid = 1
UNION ALL
SELECT j.JobId,
j.parentid,
Level + 1,
jpt.id_path + '|' + cast(j.jobid as varchar(max))
FROM Job as j
INNER JOIN JobPathTemp AS jpt ON j.jobid = jpt.parentid
AND charindex('|' + cast(j.jobid as varchar), jpt.id_path) = 0
)
SELECT *
FROM JobPathTemp
;
This solution doesn't work, SQL Server doesn't support using UNION to join together the recursive term. Since you can't refer to the the recursion except as the join, tbh I don't see any alternative to using a stored function...
You didn't post your query... but I tried (in postgres, which works in much the same way) and if you use "UNION" (not "UNION ALL") in the recursive term, then it should automatically remove duplicate rows:
with /*recursive*/ jobs as
(select jobpath.jobid, jobpath.parentjobid from jobpath where jobid = 1
union
select jobpath.jobid, jobpath.parentjobid
from jobpath
join jobs on jobs.parentjobid = jobpath.jobid
)
select jobpath.* from jobpath join jobs on jobpath.jobid = jobs.jobid;