Terminating recursive query with a condition on entire level of recursion - Postgres - sql

Say i have the following table, called graph:
_id
relates
1
{2, 3}
2
{4}
3
{5, 6}
4
{3, 7}
5
{}
6
{}
7
{}
Here it is in graph form:
graph
My problem is to write a recursive query to find the shortest path between two nodes. In this example i will aim to find the shortest path between nodes 1 and 3.
The following query works fine and solves the problem.
with recursive pathFrom1to3(_id, relates, lvl, _path) as
(
select _id, relates, 1 as lvl, ARRAY[_id]
from graph
where _id = 1
union
select g._id, g.relates, p.lvl+1 as lvl, _path || g._id
from pathFrom1to3 p join graph g on g._id = any(p.relates)
where not g._id = any(p._path) -- handles cycles (not applicable for this example)
)
select * from pathFrom1To3
where _id = 3
limit 1
The recursive query actually finds all possible paths from starting from node 1, until it cannot find any more paths, so after the recursion is over, we are left with this table:
_id
relates
lvl
_path
1
{2, 3}
1
{1}
2
{4}
2
{1, 2}
3
{5, 6}
2
{1, 3}
4
{3, 7}
3
{1, 2, 4}
5
{}
3
{1, 3, 5}
6
{}
3
{1, 3, 6}
3
{5, 6}
4
{1, 2, 4, 3}
7
{}
4
{1, 2, 4, 7}
5
{}
5
{1, 2, 4, 3, 5}
6
{}
5
{1, 2, 4, 3, 6}
After we filter out for _id = 3 we get:
_id
relates
lvl
_path
3
{5, 6}
2
{1, 3}
3
{5, 6}
4
{1, 2, 4, 3}
And it is always the case that the shortest path is the path that we hit first (since the edges have no weight). With the logic of our query that would be equivalent to the earliest returned record, so we can use LIMIT for this: limit 1
_id
relates
lvl
_path
3
{5, 6}
2
{1, 2, 4}
And there is the shortest path from node 1 to 3.
My issue is the fact that this query computes all paths from node 1, making it awfully inefficient if the target node is near the start of the search. My goal is to make the query stop searching (completely) right after it hits the target node, since we know there will not be any other shortest path to come.
I need a way to terminate the recursion completely when node 3 is reached.
If i try to add a terminator for the node itself, such as the query below, the recursion for the other nodes at the same level continues, since the terminating condition is still satisfied:
with recursive pathFrom1to3(_id, relates, lvl, _path) as
(
select _id, relates, 1 as lvl, ARRAY[_id]
from graph
where _id = 1
union
select g._id, g.relates, p.lvl+1 as lvl, _path || g._id
from pathFrom1to3 p join graph g on g._id = any(p.relates)
where not g._id = any(p._path) -- handles cycles (not applicable for this example)
**and not p._id = 3**
)
select * from pathFrom1To3
produces:
_id
relates
lvl
_path
1
{2, 3}
1
{1}
2
{4}
2
{1, 2}
3
{5, 6}
2
{1, 3}
4
{3, 7}
3
{1, 2, 4}
3
{5, 6}
4
{1, 2, 4, 3}
7
{}
4
{1, 2, 4, 7}
5
{}
5
{1, 2, 4, 3, 5}
6
{}
5
{1, 2, 4, 3, 6}
Notice that the recursion stops for when node 3 is reached the first time; nodes 5 and 6 doesn't get searched further because node 3 was hit. But the recursion continues for node 4 (from node 2) because the p._id is not 3, it is 2.
We need a way to terminate the recursion when it reaches node 3 for the entire level.
My idea is to create a reached column which is 0 when the _id is not 3 and 1 when the _id is 3.
Then we can use SUM to check the sum of the reached values for the entire level and if it is not 0 then we terminate, but i am struggling to write this as a query.
here is my attempt at writing it:
with recursive pathFrom1to3(_id, relates, lvl, _path, reached, lvlReached) as
(
select _id, relates, 1 as lvl, ARRAY[_id], 0 as reached, 0 as lvlReached
from graph
where _id = 1
union
select _id, relates, lvl, _path, reached, lvlReached from
(
select g._id, g.relates, p.lvl+1 as lvl, _path || g._id,
case when 3 = any(g.relates) then 1 else 0 end as reached
from pathFrom1to3 p join graph g on g._id = any(p.relates)
where not g._id = any(p._path) -- handles cycles
) mainRecursion
join
(
select lvl, sum(reached) as lvlReached
from pathFrom1To3
group by lvl
) lvlReached
on mainRecursion.lvl = lvlReach.lvl
where lvlReached = 0
)
select * from pathFrom1To3
This gives me the error:
recursive reference to query "pathFrom1to3" must not appear more than once.

We can try a window function to hold the status indicating the solution is found, causing all further recursive iterations to be pruned.
The fiddle
WITH RECURSIVE cte(id, vals, relates, done) AS (
SELECT g._id, array[g._id], relates, null::int
FROM graph AS g
WHERE _id = 1
UNION ALL
SELECT g._id, vals||g._id, g.relates
, MAX(CASE WHEN 3 IN (done, g._id) THEN 3 END) OVER ()
FROM cte AS c
JOIN graph AS g
ON g._id = any(c.relates)
AND NOT g._id = any(c.vals) -- Avoid cycles
AND done IS NULL -- wait until we're done
)
SELECT * FROM cte
WHERE id = 3
;
The result:
id
vals
relates
done
3
{1,3}
{5,6}
3
By commenting out the id = 3 logic in the last query expression, we see all the generated rows:
id
vals
relates
done
1
{1}
{2,3}
null
2
{1,2}
{4}
3
3
{1,3}
{5,6}
3

You can search from the bottom up, working backwards from 3 to 1, thus decreasing the search space:
with recursive cte(id, vals) as (
select g._id, array[3, g._id] from graph g where 3 = any(g.relates)
union all
select g._id, vals||g._id from cte c join graph g
on c.id = any(g.relates) and not g._id = any(c.vals)
)
select * from cte where id = 1
See fiddle.

Related

How to get all substring occurences between some characters?

What i'm trying to get is the part of a column text that is between some characters ($$ to be exact) but the trick is those characters can occur more than twice (but always even like if there are more than 2 than it must be like $$xxx$$ ... $$yyy$$) and I need to get them separately.
When I try this, if the pattern only occur once then it's no problem :
regexp_substr(txt,'\$\$(.*)\$\$',1,1,null,1)
But lets say the column text is : $$xxx$$ ... $$yyy$$
then it gives me : xxx$$ ... $$yyy
but what I need is two get them in separate lines like :
xxx
yyy
which I couldn't get it done so how?
You could use a recursive query that matches the first occurrence and then removes that from the string for the next iteration of the recursive query.
Assuming your table and column are called tbl and txt:
with cte(match, txt) as (
select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
from tbl
where regexp_like(txt,'\$\$(.*?)\$\$')
union all
select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
from cte
where regexp_like(txt,'\$\$(.*?)\$\$')
)
select match from cte
One could also use CONNECT BY to "loop" through the elements surrounded by the double dollar signs, returning the data inside (the 2nd grouping). This method handles NULL elements (ID 7, element 2) and since the dollar signs are consumed as the regex moves from left to right, characters in between the groups are not falsely matched.
SQL> with tbl(id, txt) as (
select 1, '$$xxx$$' from dual union all
select 2, '$$xxx$$ ... $$yyy$$' from dual union all
select 3, '' from dual union all
select 4, '$$xxx$$abc$$yyy$$' from dual union all
select 5, '$$xxx$$ ... $$yyy$$ ... $$www$$ ... $$zzz$$' from dual union all
select 6, '$$aaa$$$$bbb$$$$ccc$$$$ddd$$' from dual union all
select 7, '$$aaa$$$$$$$$ccc$$$$ddd$$' from dual
)
select id, level, regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level,null,2) element
from tbl
connect by regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level) is not null
and prior txt = txt
and prior sys_guid() is not null
order by id, level;
ID LEVEL ELEMENT
---------- ---------- -------------------------------------------
1 1 xxx
2 1 xxx
2 2 yyy
3 1
4 1 xxx
4 2 yyy
5 1 xxx
5 2 yyy
5 3 www
5 4 zzz
6 1 aaa
6 2 bbb
6 3 ccc
6 4 ddd
7 1 aaa
7 2
7 3 ccc
7 4 ddd
18 rows selected.
SQL>

Bigquery SQL to Generate a mixed-level factorial designs array- Cartesian Product

Suppose I have Table1 that contains a variable number of tasks. Each task may have different options in Table2.
For this example,
Task id = 1 has two possible options
Task id = 2 has two possible options
Task id = 3 has only 1 possible option
Task id = 4 has only 1 possible option
How to build a bigquery SQL to obtain the cartesian product of all possible combinations?
The output would be as shown in the Output table below, where Run is the number of a possible combination between the Tasks options.
Table1
Taskid Name
1 A
2 B
3 C
4 D
Table2
Taskid Optionid Attribute1 Attribute2 Attribute3
1 1 5 7 9
1 2 2 4 6
2 1 4 6 8
2 2 2 4 8
3 1 1 4 9
4 1 4 7 10
Output Table
Run Taskid Name Optionid Attribute1 Attribute2 Attribute3
1 1 A 1 5 7 9
1 2 B 1 4 6 8
1 3 C 1 1 4 9
1 4 D 1 4 7 10
2 1 A 1 5 7 9
2 2 B 2 2 4 8
2 3 C 1 1 4 9
2 4 D 1 4 7 10
3 1 A 2 2 4 6
3 2 B 1 4 6 8
3 3 C 1 1 4 9
3 4 D 1 4 7 10
4 1 A 2 2 4 6
4 2 B 2 2 4 8
4 3 C 1 1 4 9
4 4 D 1 4 7 10
I have managed to create the combinations in MS Access query using pivot. Please see below at the bottom of the question where added the MS Access queries to generate the combinations. However, this is limited to 4 tasks, is there a way to scale it to account for any number of tasks and have it all in Bigquery?
MS Access query #1 (CTQ):
TRANSFORM First(VariableT.[Optionid]) AS [FirstOfOption id]
SELECT Table2.[Optionid]
FROM VariableT
GROUP BY Table2.[Optionid]
PIVOT Table2.[Taskid];
MS Access query #2 (this generates the combinations):
SELECT CTQ.[1], CTQ_1.[2], CTQ_2.[3], CTQ_3.[4]
FROM CTQ, CTQ AS CTQ_1, CTQ AS CTQ_2, CTQ AS CTQ_3
GROUP BY CTQ.[1], CTQ_1.[2], CTQ_2.[3], CTQ_3.[4]
HAVING (((CTQ.[1]) Is Not Null) AND ((CTQ_1.[2]) Is Not Null)
AND ((CTQ_2.[3]) Is Not Null) AND ((CTQ_3.[4]) Is Not Null));
I really enjoyed this "exercise"
Ended up being relatively skin and simple, but with limitations because of use of JS UDF. For big sets might have memory related issues
#standardSQL
CREATE TEMPORARY FUNCTION generateCombinations(taskOptions ARRAY<STRUCT<Taskid INT64, opts ARRAY<INT64>>>)
RETURNS STRING
LANGUAGE js AS """
var arr = [];
for (i = 0; i < taskOptions.length; i++) {
arr.push(taskOptions[i].opts);
}
return cartesianProduct(arr).join('|');
function cartesianProduct(arr) {
return arr.reduce((a, b) =>
a.map(x => b.map(y => x.concat(y)))
.reduce((a, b) => a.concat(b), []), [[]]);
}
""";
WITH combinations AS (
SELECT generateCombinations(ARRAY_AGG(STRUCT<Taskid INT64, opts ARRAY<INT64>>(TaskId, opts) ORDER BY Taskid)) arr
FROM (SELECT Taskid, ARRAY_AGG(Optionid) opts FROM `yourproject.youdataset.table2` GROUP BY Taskid)
), runs AS (
SELECT Run + 1 Run, combination
FROM combinations, UNNEST(SPLIT(arr, '|')) combination WITH OFFSET Run
)
SELECT Run, t1.Taskid, Name, Optionid, Attribute1, Attribute2, Attribute3
FROM runs, UNNEST(SPLIT(combination)) oid WITH OFFSET tid
JOIN `yourproject.youdataset.table2` t2 ON t2.Taskid = tid + 1 AND CAST(Optionid AS STRING) = oid
JOIN `yourproject.youdataset.table1` t1 ON t1.Taskid = tid + 1
-- ORDER BY Run, Taskid
you can test / play with above using dummy data from your question
#standardSQL
CREATE TEMPORARY FUNCTION generateCombinations(taskOptions ARRAY<STRUCT<Taskid INT64, opts ARRAY<INT64>>>)
RETURNS STRING
LANGUAGE js AS """
var arr = [];
for (i = 0; i < taskOptions.length; i++) {
arr.push(taskOptions[i].opts);
}
return cartesianProduct(arr).join('|');
function cartesianProduct(arr) {
return arr.reduce((a, b) =>
a.map(x => b.map(y => x.concat(y)))
.reduce((a, b) => a.concat(b), []), [[]]);
}
""";
WITH `yourproject.youdataset.table1` AS (
SELECT 1 Taskid, 'A' Name UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'C' UNION ALL
SELECT 4, 'D'
), `yourproject.youdataset.table2` AS (
SELECT 1 Taskid, 1 Optionid, 5 Attribute1, 7 Attribute2, 9 Attribute3 UNION ALL
SELECT 1, 2, 2, 4, 6 UNION ALL
SELECT 2, 1, 4, 6, 8 UNION ALL
SELECT 2, 2, 2, 4, 8 UNION ALL
SELECT 3, 1, 1, 4, 9 UNION ALL
SELECT 4, 1, 4, 7, 10
), combinations AS (
SELECT generateCombinations(ARRAY_AGG(STRUCT<Taskid INT64, opts ARRAY<INT64>>(TaskId, opts) ORDER BY Taskid)) arr
FROM (SELECT Taskid, ARRAY_AGG(Optionid) opts FROM `yourproject.youdataset.table2` GROUP BY Taskid)
), runs AS (
SELECT Run + 1 Run, combination
FROM combinations, UNNEST(SPLIT(arr, '|')) combination WITH OFFSET Run
)
SELECT Run, t1.Taskid, Name, Optionid, Attribute1, Attribute2, Attribute3
FROM runs, UNNEST(SPLIT(combination)) oid WITH OFFSET tid
JOIN `yourproject.youdataset.table2` t2 ON t2.Taskid = tid + 1 AND CAST(Optionid AS STRING) = oid
JOIN `yourproject.youdataset.table1` t1 ON t1.Taskid = tid + 1
ORDER BY Run, Taskid
output here is exactly as in expected output of your question
I have run your query with 19 tasks having multiple option levels for up to 5 options and the query returned "Error: Request timed out."
As I mentioned - the main limitation here is output to be approximately 5MB or less.
Of course with 19 tasks and up to 5 options for each - the output will be much larger than 5MB if you to try to process them in one run. Btw, the max I was able to process using above approach in one run was 7 tasks with 5 options each or 10 tasks with 2 to 5 options each.
So, the "workaround" for you would be - instead of running all tasks in one run - you can split your tasks to let's say four groups with respectively 5 tasks with up to 5 options in each group. Then you can run above for each group with separate destination table (yourproject.youdataset.run1, yourproject.youdataset.run2, yourproject.youdataset.run3, yourproject.youdataset.run4) for each group thus materializing virtual table "runs" from initial answer .
#standardSQL
CREATE TEMPORARY FUNCTION generateCombinations(taskOptions ARRAY<STRUCT<Taskid INT64, opts ARRAY<INT64>>>)
RETURNS STRING
LANGUAGE js AS """
var arr = [];
for (i = 0; i < taskOptions.length; i++) {
arr.push(taskOptions[i].opts);
}
return cartesianProduct(arr).join('|');
function cartesianProduct(arr) {
return arr.reduce((a, b) =>
a.map(x => b.map(y => x.concat(y)))
.reduce((a, b) => a.concat(b), []), [[]]);
}
""";
WITH combinations AS (
SELECT generateCombinations(ARRAY_AGG(STRUCT<Taskid INT64, opts ARRAY<INT64>>(TaskId, opts) ORDER BY Taskid)) arr
FROM (SELECT Taskid, ARRAY_AGG(Optionid) opts FROM `yourproject.youdataset.table2` GROUP BY Taskid)
), runs AS (
SELECT Run + 1 Run, combination
FROM combinations, UNNEST(SPLIT(arr, '|')) combination WITH OFFSET Run
)
SELECT *
FROM runs
And finally, you can cross join these four tables to get final set of all combinations. This will give you final materialized table for "runs" - yourproject.youdataset.run. After you got it - now you can apply the rest of query
#standardSQL
WITH runs AS (
SELECT Run, combination
FROM `yourproject.youdataset.run`
)
SELECT Run, t1.Taskid, Name, Optionid, Attribute1, Attribute2, Attribute3
FROM runs, UNNEST(SPLIT(combination)) oid WITH OFFSET tid
JOIN `yourproject.youdataset.table2` t2 ON t2.Taskid = tid + 1 AND CAST(Optionid AS STRING) = oid
JOIN `yourproject.youdataset.table1` t1 ON t1.Taskid = tid + 1
Note: above is outline of workaround - less likely but you might need to do some some minor adjustments along the road here.

Postgres Group and order by range (specific values)

I have a table like this:
id simcard simcard_order
80769 56407503370245588410 1
80788 66329183922439284822 2
80803 20993658565113174305 0
80804 81781641934100313243 4
80852 71560493627263868232 3
80784 23739383536995189713 1
80793 42702512646659519628 2
80805 17990699721985463506 0
80832 08525531276567944345 4
80854 74478849586042090832 3
80786 22535328208807554315 1
80812 34317440773382930807 0
80826 36103390459816949722 2
80858 15439885499080289130 3
80862 26786481240939036248 4
80792 59566921916027957512 1
80813 98968026512101636608 0
80835 65834894114116066528 2
80864 17764015687751814947 4
80882 41427844162545991837 3
80887 41587969946566907740 4
80891 46059625228552654737 3
80824 76381392106884963712 1
80863 77385361462191701926 2
80868 46607630719285200008 0
80892 08860583551940471945 4
80899 85443153649210377733 3
80934 90908807112484733323 2
80937 25660906025678471304 0
80967 34298088103509862330 3
The column simcard_order has repeat values from 0 to 4.
I want to order the table like this:
id simcard simcard_order
80769 56407503370245588410 0
80788 66329183922439284822 1
80803 20993658565113174305 2
80804 81781641934100313243 3
80852 71560493627263868232 4
80784 23739383536995189713 0
80793 42702512646659519628 1
80805 17990699721985463506 2
80832 08525531276567944345 3
80854 74478849586042090832 4
80786 22535328208807554315 0
80812 34317440773382930807 1
80826 36103390459816949722 2
80858 15439885499080289130 3
80862 26786481240939036248 4
....
and so on... So in this case I have 3 groups of (0, 1, 2, 3, 4)
Always the order must be 0, 1, 2, 3, 4.
I have used this sql, but it does not work properly:
SELECT id, simcard, simcard_order
FROM tmp_pending_simcards
WHERE tmp_pending_simcards.simcard_order IN (0, 1, 2, 3, 4)
ORDER BY (0, 1, 2, 3, 4)
If I understand correctly:
SELECT id, simcard, simcard_order
FROM tmp_pending_simcards tps
WHERE tps.simcard_order IN (0, 1, 2, 3, 4)
ORDER BY ROW_NUMBER() OVER (PARTITION BY tps.simcard_order),
tps.simcard_order;
Usually, you would have an ORDER BY as part of ROW_NUMBER(), but Postgres does not require it.

Getting a count from multiple columns with set values in SQL and MS Access

I have a table of information with multiple columns but each column can hold only 1 of 3 values (0,1,2). It is generated through user inputted choices of Yes, No Maybe. I want to count each column and be fed back the values.
EG Table:
ID Coffee Tea Water Hot_Choc
1 1 0 2 2
2 0 2 0 1
3 1 2 0 2
4 2 0 1 1
5 1 1 2 2
6 0 1 2 1
7 2 1 0 1
8 1 2 1 2
I'd like to query the data to an output like this:
Coffee Tea Water Hot_Choc
0 2 2 3 0
1 4 3 2 4
2 2 3 3 4
I tried running a basic count script:
SELECT coffee, count(*) FROM Drinks GROUP BY coffee
Which works fine.
and then tried evolving into:
SELECT coffee,tea,water,hot_choc, count(*) from Drinks group by coffee,tea,water,hot_choc
But with that I get a count for each instance. So when coffee is 0 what are the counts for tea, water and hot_choc.
I also tried with a sum(iif clause because I'm running in access:
select sum(iif(coffee=0,1,0)) as coffee_maybe,
sum(iif(coffee=1,1,0)) as coffee_no,
sum(iif(coffee=2,1,0)) as coffee_yes from drinks;
etc.
Which gets me this:
Coffee_maybe coffee_no coffee_yes tea_maybe tea_no tea_yes etc...
2 4 2 2 3 2
So I'm wondering if anyone has other thoughts as to how to get the count in the above format.
Thanks so much for the help, sorry it's been a long read I wanted to give as much context as possible.
I think the following will work in Access:
select 0, sum(iif(coffee = 0, 1, 0)) as coffee, sum(iif(tea = 0, 1, 0)) as tea,
sum(iif(water = 0, 1, 0)) as water, sum(iif(hot_choc = 0, 1, 0)) as hot_choc
from drinks d
union all
select 1, sum(iif(coffee = 1, 1, 0)) as coffee, sum(iif(tea = 1, 1, 0)) as tea,
sum(iif(water = 1, 1, 0)) as water, sum(iif(hot_choc = 1, 1, 0)) as hot_choc
from drinks d
union all
select 2, sum(iif(coffee = 2, 1, 0)) as coffee, sum(iif(tea = 2, 1, 0)) as tea,
sum(iif(water = 2, 1, 0)) as water, sum(iif(hot_choc = 2, 1, 0)) as hot_choc
from drinks d;
There are slightly easier ways to do this in other databases.

Counting number of children in hierarchical SQL data

for a simple data structure such as so:
ID parentID Text Price
1 Root
2 1 Flowers
3 1 Electro
4 2 Rose 10
5 2 Violet 5
6 4 Red Rose 12
7 3 Television 100
8 3 Radio 70
9 8 Webradio 90
For reference, the hierarchy tree looks like this:
ID Text Price
1 Root
|2 Flowers
|-4 Rose 10
| |-6 Red Rose 12
|-5 Violet 5
|3 Electro
|-7 Television 100
|-8 Radio 70
|-9 Webradio 90
I'd like to count the number of children per level. So I would get a new column "NoOfChildren" like so:
ID parentID Text Price NoOfChildren
1 Root 8
2 1 Flowers 3
3 1 Electro 3
4 2 Rose 10 1
5 2 Violet 5 0
6 4 Red Rose 12 0
7 3 Television 100 0
8 3 Radio 70 1
9 8 Webradio 90 0
I read a few things about hierarchical data, but I somehow get stuck on the multiple inner joins on the parentIDs. Maybe someone could help me out here.
Using a CTE would get you what you want.
Recursively go through all children, remembering the root.
COUNT the items for each root.
JOIN these again with your original table to produce the results.
Test Data
DECLARE #Data TABLE (
ID INTEGER PRIMARY KEY
, ParentID INTEGER
, Text VARCHAR(32)
, Price INTEGER
)
INSERT INTO #Data
SELECT 1, Null, 'Root', NULL
UNION ALL SELECT 2, 1, 'Flowers', NULL
UNION ALL SELECT 3, 1, 'Electro', NULL
UNION ALL SELECT 4, 2, 'Rose', 10
UNION ALL SELECT 5, 2, 'Violet', 5
UNION ALL SELECT 6, 4, 'Red Rose', 12
UNION ALL SELECT 7, 3, 'Television', 100
UNION ALL SELECT 8, 3, 'Radio', 70
UNION ALL SELECT 9, 8, 'Webradio', 90
SQL Statement
;WITH ChildrenCTE AS (
SELECT RootID = ID, ID
FROM #Data
UNION ALL
SELECT cte.RootID, d.ID
FROM ChildrenCTE cte
INNER JOIN #Data d ON d.ParentID = cte.ID
)
SELECT d.ID, d.ParentID, d.Text, d.Price, cnt.Children
FROM #Data d
INNER JOIN (
SELECT ID = RootID, Children = COUNT(*) - 1
FROM ChildrenCTE
GROUP BY RootID
) cnt ON cnt.ID = d.ID
Consider using a modified preorder tree traversal way of storing the hierarchical data. See http://www.sitepoint.com/hierarchical-data-database/
Determining number of children for any node then becomes a simple:
SELECT (right-left-1) / 2 AS num_children FROM ...