SQL - Summing at every level by conditions in hierarchy - sql

I had a question I am trying to solve in regards to summing from the lowest child in a hierarchy to the highest level provided (noted by an ID). Essentially, I am attempting to input an ID of a large container into a recursive function that calculates the values of the lower nodes and then traces them back up for that particular ID.
I was working on the hierarchy in a recursive CTE and was able to get to the point shown below:
ALTER FUNCTION [dbo].[tree_from_id](#myid INT) RETURNS TABLE AS
RETURN (
WITH tree_cte (id, ID_PARENT, VOLUME, nlevel, treepath)
as
(SELECT i.id, i.id_parent, i.VOLUME, nlevel = 1, cast(i.id as nvarchar(255)) as treepath
FROM item i
WHERE id = #myid
UNION ALL
SELECT i.ID, i.ID_PARENT, i.VOLUME, tc.nlevel + 1, cast(tc.treepath+' <- ' + cast(i.id as nvarchar(255))as nvarchar(255)) as treepath
FROM tree_cte tc
INNER JOIN
item i ON i.id_parent = tc.id
SELECT * FROM tree_cte
)
GO
This allowed me to get the level of each item and display their hierarchy, done by calling the function like:
SELECT * FROM tree_from_id(215548)
where 215548 is the particular items ID.
I was looking to implement something similar for the item volumes, but I am unsure how to go about aggregating from the lower child and rolling up to their parents. I tried the same method as shown above, but, understandably, it started from the parent and added downward to the children.
Also, is there a way to go about adding cases that define whether or not to add that item's volume/use a different value for it's volume? For example - if the item has no children, take different dimensions for volume?
Thanks! I am trying to learn more about SQL and found this to be a particularly difficult problem for me.

I'm not sure if it's volume that you want to accumulate, but can't you just sum it as you go?
Partial snippet....
UNION ALL
SELECT i.ID, i.ID_PARENT, i.VOLUME+tc.VOLUME, tc.nlevel + 1, cast(tc.treepath+' <- ' + cast(i.id as nvarchar(255))as nvarchar(255)) as treepath
FROM tree_cte tc

Related

Recursive query within recursive query

I would like to solve a problem consisting of 2 recursions.
In one of the 2 recursions I find out the answer to one question which is "What is the leaf member of a specific input (template)?" This is already solved.
In a second recursion I would like to run this query for a number of other inputs (templates).
1st part of the problem:
I have a tree and would like to find the leaf of it. This part of the recursion can be solved using this query:
with recursive full_tree as (
select id, "previousVersionId", 1 as level
from template
where
template."id" = '5084520a-bb07-49e8-b111-3ea8182dc99f'
union all
select c.id, c."previousVersionId", p.level + 1
from template c
inner join full_tree p on c."previousVersionId" = p.id
)
select * from full_tree
order by level desc
limit 1
The query output is one record including the leaf id I'm interested in. This is fine.
2nd part of the query:
Here's the problem. I would like to run the first query n times.
Currently I can run the query only if it's just one id ('5084520a-bb07-49e8-b111-3ea8182dc99f' in the example). But what If I have a list of 100 such ids.
My ultimate goal is to get one id response (the leaf id) to each of the 100 template ids in the list.
In theory, a query that allows me to run above query for each of my e.g. 100 template ids would solve my problem.

Get child count in binary tree in Sql server

I have a legacy system system based on following structure
there is a userid, name, parent, and side. Here side is tinyint (0 = left , 1 = right) which means the side on which current user is located.
Now I just wanted to have a count of children of a specific node.
I have a recursive solution, but it halts with the increment in size of data.
WITH MyCTE AS (
SELECT node.username, node.referenceID
FROM Members as node
WHERE node.referenceID = 'humansuceess10'
UNION ALL SELECT parent.username, parent.referenceID
FROM Members as parent , MyCTE as x
WHERE x.username= parent.referenceID
) SELECT COUNT(username) FROM MyCTE option (maxrecursion 0) ;
I have tested this query on 15000 records and unfortunately it stuck and resulted in a time out error.
I'm looking for a non-recursive solution.

Retrieve hierarchical groups ... with infinite recursion

I've a table like this which contains links :
key_a key_b
--------------
a b
b c
g h
a g
c a
f g
not really tidy & infinite recursion ...
key_a = parent
key_b = child
Require a query which will recompose and attribute a number for each hierarchical group (parent + direct children + indirect children) :
key_a key_b nb_group
--------------------------
a b 1
a g 1
b c 1
**c a** 1
f g 2
g h 2
**link responsible of infinite loop**
Because we have
A-B-C-A
-> Only want to show simply the link as shown.
Any idea ?
Thanks in advance
The problem is that you aren't really dealing with strict hierarchies; you're dealing with directed graphs, where some graphs have cycles. Notice that your nbgroup #1 doesn't have any canonical root-- it could be a, b, or c due to the cyclic reference from c-a.
The basic way of dealing with this is to think in terms of graph techniques, not recursion. In fact, an iterative approach (not using a CTE) is the only solution I can think of in SQL. The basic approach is explained here.
Here is a SQL Fiddle with a solution that addresses both the cycles and the shared-leaf case. Notice it uses iteration (with a failsafe to prevent runaway processes) and table variables to operate; I don't think there's any getting around this. Note also the changed sample data (a-g changed to a-h; explained below).
If you dig into the SQL you'll notice that I changed some key things from the solution given in the link. That solution was dealing with undirected edges, whereas your edges are directed (if you used undirected edges the entire sample set is a single component because of the a-g connection).
This gets to the heart of why I changed a-g to a-h in my sample data. Your specification of the problem is straightforward if only leaf nodes are shared; that's the specification I coded to. In this case, a-h and g-h can both get bundled off to their proper components with no problem, because we're concerned about reachability from parents (even given cycles).
However, when you have shared branches, it's not clear what you want to show. Consider the a-g link: given this, g-h could exist in either component (a-g-h or f-g-h). You put it in the second, but it could have been in the first instead, right? This ambiguity is why I didn't try to address it in this solution.
Edit: To be clear, in my solution above, if shared braches ARE encountered, it treats the whole set as a single component. Not what you described above, but it will have to be changed after the problem is clarified. Hopefully this gets you close.
You should use a recursive query. In the first part we select all records which are top level nodes (have no parents) and using ROW_NUMBER() assign them group ID numbers. Then in the recursive part we add to them children one by one and use parent's groups Id numbers.
with CTE as
(
select t1.parent,t1.child,
ROW_NUMBER() over (order by t1.parent) rn
from t t1 where
not exists (select 1 from t where child=t1.parent)
union all
select t.parent,t.child, CTE.rn
from t
join CTE on t.parent=CTE.Child
)
select * from CTE
order by RN,parent
SQLFiddle demo
Painful problem of graph walking using recursive CTEs. This is the problem of finding connected subgraphs in a graph. The challenge with using recursive CTEs is to prevent unwarranted recursion -- that is, infinite loops In SQL Server, that typically means storing them in a string.
The idea is to get a list of all pairs of nodes that are connected (and a node is connected with itself). Then, take the minimum from the list of connected nodes and use this as an id for the connected subgraph.
The other idea is to walk the graph in both directions from a node. This ensures that all possible nodes are visited. The following is query that accomplishes this:
with fullt as (
select keyA, keyB
from t
union
select keyB, keyA
from t
),
CTE as (
select t.keyA, t.keyB, t.keyB as last, 1 as level,
','+cast(keyA as varchar(max))+','+cast(keyB as varchar(max))+',' as path
from fullt t
union all
select cte.keyA, cte.keyB,
(case when t.keyA = cte.last then t.keyB else t.keyA
end) as last,
1 + level,
cte.path+t.keyB+','
from fullt t join
CTE
on t.keyA = CTE.last or
t.keyB = cte.keyA
where cte.path not like '%,'+t.keyB+',%'
) -- select * from cte where 'g' in (keyA, keyB)
select t.keyA, t.keyB,
dense_rank() over (order by min(cte.Last)) as grp,
min(cte.Last)
from t join
CTE
on (t.keyA = CTE.keyA and t.keyB = cte.keyB) or
(t.keyA = CTE.keyB and t.keyB = cte.keyA)
where cte.path like '%,'+t.keyA+',%' or
cte.path like '%,'+t.keyB+',%'
group by t.id, t.keyA, t.keyB
order by t.id;
The SQLFiddle is here.
you might want to check with COMMON TABLE EXPRESSIONS
here's the link

SQL Server 2008: Recursive query where hierarchy isn't strict

I'm dealing with a large multi-national corp. I have a table (oldtir) that shows ownership of subsidiaries. The fields for this problem are:
cID - PK for this table
dpm_sub - FK for the subsidiary company
dpm_pco - FK for the parent company
year - the year in which this is the relationship (because they change over time)
There are other fields, but not relevant to this problem. (Note that there are no records to specifically indicate the top-level companies, so we have to figure out which they are by having them not appear as subsidiaries.)
I've written the query below:
with CompanyHierarchy([year], dpm_pco, dpm_sub, cID)
as (select distinct oldtir.[year], cast(' ' as nvarchar(5)) as dpm_pco, oldtir.dpm_pco as dpm_sub, cast(0 as float) as cID
from oldtir
where oldtir.dpm_pco not in
(select dpm_sub from oldtir oldtir2
where oldtir.[year] = oldtir2.[year]
and oldtir2.dpm_sub <> oldtir2.dpm_pco)
and oldtir.[year] = 2011
union all
select oldtir.[year], oldtir.dpm_pco, oldtir.dpm_sub, oldtir.cID
from oldtir
join CompanyHierarchy
on CompanyHierarchy.dpm_sub = oldtir.dpm_pco
and CompanyHierarchy.[year] = oldtir.[year]
where oldtir.[year] = 2011
)
select distinct CompanyHierarchy.[Year],
CompanyHierarchy.[dpm_pco],
CompanyHierarchy.dpm_sub,
from CompanyHierarchy
order by 1, 2, 3
It fails with msg 530: "The maximum recursion 100 has been exhausted before statement completion."
I believe the problem is that the relationships in the table aren't strictly hierarchical. Specifically, one subsidiary can be owned by more than one company, and you can even have the situation where A owns B and part of C, and B also owns part of C. (One of the other fields indicates percent of ownership.)
For the time being, I've solved the problem by adding a field to track level, and arbitrarily stopping after a few levels. But this feels kludgy to me, since I can't be sure of the maximum number of levels.
Any ideas how to do this generically?
Thanks,
Tamar
Thanks to the commenters. They made me go back and look more closely at the data. There were, in fact, errors in the data, which led to infinite recursion. Fixed the data and the query worked just fine.
Add the OPTION statement and see if it makes a difference. This will increase the levels of recursion to 32K
select distinct CompanyHierarchy.[Year],
CompanyHierarchy.[dpm_pco],
CompanyHierarchy.dpm_sub,
from CompanyHierarchy
order by 1, 2, 3
option (maxrecursion 0)

How do I remember which way round PRIOR should go in CONNECT BY queries

I've a terrible memory. Whenever I do a CONNECT BY query in Oracle - and I do mean every time - I have to think hard and usually through trial and error work out on which argument the PRIOR should go.
I don't know why I don't remember - but I don't.
Does anyone have a handy memory mnemonic so I always remember ?
For example:
To go down a tree from a node - obviously I had to look this up :) - you do something like:
select
*
from
node
connect by
prior node_id = parent_node_id
start with
node_id = 1
So - I start with a node_id of 1 (the top of the branch) and the query looks for all nodes where the parent_node_id = 1 and then iterates down to the bottom of the tree.
To go up the tree the prior goes on the parent:
select
*
from
node
connect by
node_id = prior parent_node_id
start with
node_id = 10
So starting somewhere down a branch (node_id = 10 in this case) Oracle first gets all nodes where the parent_node_id is the same as the one for which node_id is 10.
EDIT: I still get this wrong so thought I'd add a clarifying edit to expand on the accepted answer - here's how I remember it now:
select
*
from
node
connect by
prior node_id = parent_node_id
start with
node_id = 1
The 'english language' version of this SQL I now read as...
In NODE, starting with the row in
which node_id = 1, the next row
selected has its parent_node_id
equal to node_id from the previous
(prior) row.
EDIT: Quassnoi makes a great point - the order you write the SQL makes things a lot easier.
select
*
from
node
start with
node_id = 1
connect by
parent_node_id = prior node_id
This feels a lot clearer to me - the "start with" gives the first row selected and the "connect by" gives the next row(s) - in this case the children of node_id = 1.
I always try to put the expressions in JOIN's in the following order:
joined.column = leading.column
This query:
SELECT t.value, d.name
FROM transactions t
JOIN
dimensions d
ON d.id = t.dimension
can be treated either like "for each transaction, find the corresponding dimension name", or "for each dimension, find all corresponding transaction values".
So, if I search for a given transaction, I put the expressions in the following order:
SELECT t.value, d.name
FROM transactions t
JOIN
dimensions d
ON d.id = t.dimension
WHERE t.id = :myid
, and if I search for a dimension, then:
SELECT t.value, d.name
FROM dimensions d
JOIN
transactions t
ON t.dimension = d.id
WHERE d.id = :otherid
Ther former query will most probably use index scans first on (t.id), then on (d.id), while the latter one will use index scans first on (d.id), then on (t.dimension), and you can easily see it in the query itself: the searched fields are at left.
The driving and driven tables may be not so obvious in a JOIN, but it's as clear as a bell for a CONNECT BY query: the PRIOR row is driving, the non-PRIOR is driven.
That's why this query:
SELECT *
FROM hierarchy
START WITH
id = :root
CONNECT BY
parent = PRIOR id
means "find all rows whose parent is a given id". This query builds a hierarchy.
This can be treated like this:
connect_by(row) {
add_to_rowset(row);
/* parent = PRIOR id */
/* PRIOR id is an rvalue */
index_on_parent.searchKey = row->id;
foreach child_row in index_on_parent.search {
connect_by(child_row);
}
}
And this query:
SELECT *
FROM hierarchy
START WITH
id = :leaf
CONNECT BY
id = PRIOR parent
means "find the rows whose id is a given parent". This query builds an ancestry chain.
Always put PRIOR in the right part of the expression.
Think of PRIOR column as of a constant all your rows will be searched for.
Think about the order in which the records are going to be selected: the link-back column on each record must match the link-forward column on the PRIOR record selected.
Part of the reason that this is difficult to visualise each time, is because sometimes the data is modelled as id + child_id, and sometimes it is modelled as id + parent_id. Depending on which way round your data is modelled, you have to place your PRIOR keyword on the the opposite side.
The trick I find, is always to look at the world through the eyes of the leaf-node of the tree you're trying to build.
So (speaking as the leaf node) when I look up, I see my parent node, which I may also refer to as the PRIOR node. Ergo, my ID is the same as his CHILD_ID. So in this case PRIOR child_id = id.
When the data is modelled the other way around (i.e. when I hold my parent's ID rather than him holding mine), my parent_id is the same value as his id. Ergo in this scenario PRIOR id = parent_id.