Get child count in binary tree in Sql server - sql

I have a legacy system system based on following structure
there is a userid, name, parent, and side. Here side is tinyint (0 = left , 1 = right) which means the side on which current user is located.
Now I just wanted to have a count of children of a specific node.
I have a recursive solution, but it halts with the increment in size of data.
WITH MyCTE AS (
SELECT node.username, node.referenceID
FROM Members as node
WHERE node.referenceID = 'humansuceess10'
UNION ALL SELECT parent.username, parent.referenceID
FROM Members as parent , MyCTE as x
WHERE x.username= parent.referenceID
) SELECT COUNT(username) FROM MyCTE option (maxrecursion 0) ;
I have tested this query on 15000 records and unfortunately it stuck and resulted in a time out error.
I'm looking for a non-recursive solution.

Related

Recursive query within recursive query

I would like to solve a problem consisting of 2 recursions.
In one of the 2 recursions I find out the answer to one question which is "What is the leaf member of a specific input (template)?" This is already solved.
In a second recursion I would like to run this query for a number of other inputs (templates).
1st part of the problem:
I have a tree and would like to find the leaf of it. This part of the recursion can be solved using this query:
with recursive full_tree as (
select id, "previousVersionId", 1 as level
from template
where
template."id" = '5084520a-bb07-49e8-b111-3ea8182dc99f'
union all
select c.id, c."previousVersionId", p.level + 1
from template c
inner join full_tree p on c."previousVersionId" = p.id
)
select * from full_tree
order by level desc
limit 1
The query output is one record including the leaf id I'm interested in. This is fine.
2nd part of the query:
Here's the problem. I would like to run the first query n times.
Currently I can run the query only if it's just one id ('5084520a-bb07-49e8-b111-3ea8182dc99f' in the example). But what If I have a list of 100 such ids.
My ultimate goal is to get one id response (the leaf id) to each of the 100 template ids in the list.
In theory, a query that allows me to run above query for each of my e.g. 100 template ids would solve my problem.

SQL - Summing at every level by conditions in hierarchy

I had a question I am trying to solve in regards to summing from the lowest child in a hierarchy to the highest level provided (noted by an ID). Essentially, I am attempting to input an ID of a large container into a recursive function that calculates the values of the lower nodes and then traces them back up for that particular ID.
I was working on the hierarchy in a recursive CTE and was able to get to the point shown below:
ALTER FUNCTION [dbo].[tree_from_id](#myid INT) RETURNS TABLE AS
RETURN (
WITH tree_cte (id, ID_PARENT, VOLUME, nlevel, treepath)
as
(SELECT i.id, i.id_parent, i.VOLUME, nlevel = 1, cast(i.id as nvarchar(255)) as treepath
FROM item i
WHERE id = #myid
UNION ALL
SELECT i.ID, i.ID_PARENT, i.VOLUME, tc.nlevel + 1, cast(tc.treepath+' <- ' + cast(i.id as nvarchar(255))as nvarchar(255)) as treepath
FROM tree_cte tc
INNER JOIN
item i ON i.id_parent = tc.id
SELECT * FROM tree_cte
)
GO
This allowed me to get the level of each item and display their hierarchy, done by calling the function like:
SELECT * FROM tree_from_id(215548)
where 215548 is the particular items ID.
I was looking to implement something similar for the item volumes, but I am unsure how to go about aggregating from the lower child and rolling up to their parents. I tried the same method as shown above, but, understandably, it started from the parent and added downward to the children.
Also, is there a way to go about adding cases that define whether or not to add that item's volume/use a different value for it's volume? For example - if the item has no children, take different dimensions for volume?
Thanks! I am trying to learn more about SQL and found this to be a particularly difficult problem for me.
I'm not sure if it's volume that you want to accumulate, but can't you just sum it as you go?
Partial snippet....
UNION ALL
SELECT i.ID, i.ID_PARENT, i.VOLUME+tc.VOLUME, tc.nlevel + 1, cast(tc.treepath+' <- ' + cast(i.id as nvarchar(255))as nvarchar(255)) as treepath
FROM tree_cte tc

SQL Server - Multiplying row values for a given column value [duplicate]

Im looking for something like SELECT PRODUCT(table.price) FROM table GROUP BY table.sale similar to how SUM works.
Have I missed something on the documentation, or is there really no PRODUCT function?
If so, why not?
Note: I looked for the function in postgres, mysql and mssql and found none so I assumed all sql does not support it.
For MSSQL you can use this. It can be adopted for other platforms: it's just maths and aggregates on logarithms.
SELECT
GrpID,
CASE
WHEN MinVal = 0 THEN 0
WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)
ELSE EXP(ABSMult)
END
FROM
(
SELECT
GrpID,
--log of +ve row values
SUM(LOG(ABS(NULLIF(Value, 0)))) AS ABSMult,
--count of -ve values. Even = +ve result.
SUM(SIGN(CASE WHEN Value < 0 THEN 1 ELSE 0 END)) AS Neg,
--anything * zero = zero
MIN(ABS(Value)) AS MinVal
FROM
Mytable
GROUP BY
GrpID
) foo
Taken from my answer here: SQL Server Query - groupwise multiplication
I don't know why there isn't one, but (take more care over negative numbers) you can use logs and exponents to do:-
select exp (sum (ln (table.price))) from table ...
There is no PRODUCT set function in the SQL Standard. It would appear to be a worthy candidate, though (unlike, say, a CONCATENATE set function: it's not a good fit for SQL e.g. the resulting data type would involve multivalues and pose a problem as regards first normal form).
The SQL Standards aim to consolidate functionality across SQL products circa 1990 and to provide 'thought leadership' on future development. In short, they document what SQL does and what SQL should do. The absence of PRODUCT set function suggests that in 1990 no vendor though it worthy of inclusion and there has been no academic interest in introducing it into the Standard.
Of course, vendors always have sought to add their own functionality, these days usually as extentions to Standards rather than tangentally. I don't recall seeing a PRODUCT set function (or even demand for one) in any of the SQL products I've used.
In any case, the work around is fairly simple using log and exp scalar functions (and logic to handle negatives) with the SUM set function; see #gbn's answer for some sample code. I've never needed to do this in a business application, though.
In conclusion, my best guess is that there is no demand from SQL end users for a PRODUCT set function; further, that anyone with an academic interest would probably find the workaround acceptable (i.e. would not value the syntactic sugar a PRODUCT set function would provide).
Out of interest, there is indeed demand in SQL Server Land for new set functions but for those of the window function variety (and Standard SQL, too). For more details, including how to get involved in further driving demand, see Itzik Ben-Gan's blog.
You can perform a product aggregate function, but you have to do the maths yourself, like this...
SELECT
Exp(Sum(IIf(Abs([Num])=0,0,Log(Abs([Num])))))*IIf(Min(Abs([Num]))=0,0,1)*(1-2*(Sum(IIf([Num]>=0,0,1)) Mod 2)) AS P
FROM
Table1
Source: http://productfunctionsql.codeplex.com/
There is a neat trick in T-SQL (not sure if it's ANSI) that allows to concatenate string values from a set of rows into one variable. It looks like it works for multiplying as well:
declare #Floats as table (value float)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
declare #multiplier float = null
select
#multiplier = isnull(#multiplier, '1') * value
from #Floats
select #multiplier
This can potentially be more numerically stable than the log/exp solution.
I think that is because no numbering system is able to accommodate many products. As databases are designed for large number of records, a product of 1000 numbers would be super massive and in case of floating point numbers, the propagated error would be huge.
Also note that using log can be a dangerous solution. Although mathematically log(a*b) = log(a)*log(b), it might not be in computers as we are not dealing with real numbers. If you calculate 2^(log(a)+log(b)) instead of a*b, you may get unexpected results. For example:
SELECT 9999999999*99999999974482, EXP(LOG(9999999999)+LOG(99999999974482))
in Sql Server returns
999999999644820000025518, 9.99999999644812E+23
So my point is when you are trying to do the product do it carefully and test is heavily.
One way to deal with this problem (if you are working in a scripting language) is to use the group_concat function.
For example, SELECT group_concat(table.price) FROM table GROUP BY table.sale
This will return a string with all prices for the same sale value, separated by a comma.
Then with a parser you can get each price, and do a multiplication. (In php you can even use the array_reduce function, in fact in the php.net manual you get a suitable example).
Cheers
Another approach based on fact that the cardinality of cartesian product is product of cardinalities of particular sets ;-)
⚠ WARNING: This example is just for fun and is rather academic, don't use it in production! (apart from the fact it's just for positive and practically small integers)⚠
with recursive t(c) as (
select unnest(array[2,5,7,8])
), p(a) as (
select array_agg(c) from t
union all
select p.a[2:]
from p
cross join generate_series(1, p.a[1])
)
select count(*) from p where cardinality(a) = 0;
The problem can be solved using modern SQL features such as window functions and CTEs. Everything is standard SQL and - unlike logarithm-based solutions - does not require switching from integer world to floating point world nor handling nonpositive numbers. Just number rows and evaluate product in recursive query until no row remain:
with recursive t(c) as (
select unnest(array[2,5,7,8])
), r(c,n) as (
select t.c, row_number() over () from t
), p(c,n) as (
select c, n from r where n = 1
union all
select r.c * p.c, r.n from p join r on p.n + 1 = r.n
)
select c from p where n = (select max(n) from p);
As your question involves grouping by sale column, things got little bit complicated but it's still solvable:
with recursive t(sale,price) as (
select 'multiplication', 2 union
select 'multiplication', 5 union
select 'multiplication', 7 union
select 'multiplication', 8 union
select 'trivial', 1 union
select 'trivial', 8 union
select 'negatives work', -2 union
select 'negatives work', -3 union
select 'negatives work', -5 union
select 'look ma, zero works too!', 1 union
select 'look ma, zero works too!', 0 union
select 'look ma, zero works too!', 2
), r(sale,price,n,maxn) as (
select t.sale, t.price, row_number() over (partition by sale), count(1) over (partition by sale)
from t
), p(sale,price,n,maxn) as (
select sale, price, n, maxn
from r where n = 1
union all
select p.sale, r.price * p.price, r.n, r.maxn
from p
join r on p.sale = r.sale and p.n + 1 = r.n
)
select sale, price
from p
where n = maxn
order by sale;
Result:
sale,price
"look ma, zero works too!",0
multiplication,560
negatives work,-30
trivial,8
Tested on Postgres.
Here is an oracle solution for anyone who needs it
with data(id, val) as(
select 1,1.0 from dual union all
select 2,-2.0 from dual union all
select 3,1.0 from dual union all
select 4,2.0 from dual
),
neg(val , modifier) as(
select exp(sum(ln(abs(val)))), case when mod(count(*),2) = 0 then 1 Else -1 end
from data
where val <0
)
,
pos(val) as (
select exp(sum(ln(val)))
from data
where val >=0
)
select (select val*modifier from neg)*(select val from pos) product from dual

Retrieve hierarchical groups ... with infinite recursion

I've a table like this which contains links :
key_a key_b
--------------
a b
b c
g h
a g
c a
f g
not really tidy & infinite recursion ...
key_a = parent
key_b = child
Require a query which will recompose and attribute a number for each hierarchical group (parent + direct children + indirect children) :
key_a key_b nb_group
--------------------------
a b 1
a g 1
b c 1
**c a** 1
f g 2
g h 2
**link responsible of infinite loop**
Because we have
A-B-C-A
-> Only want to show simply the link as shown.
Any idea ?
Thanks in advance
The problem is that you aren't really dealing with strict hierarchies; you're dealing with directed graphs, where some graphs have cycles. Notice that your nbgroup #1 doesn't have any canonical root-- it could be a, b, or c due to the cyclic reference from c-a.
The basic way of dealing with this is to think in terms of graph techniques, not recursion. In fact, an iterative approach (not using a CTE) is the only solution I can think of in SQL. The basic approach is explained here.
Here is a SQL Fiddle with a solution that addresses both the cycles and the shared-leaf case. Notice it uses iteration (with a failsafe to prevent runaway processes) and table variables to operate; I don't think there's any getting around this. Note also the changed sample data (a-g changed to a-h; explained below).
If you dig into the SQL you'll notice that I changed some key things from the solution given in the link. That solution was dealing with undirected edges, whereas your edges are directed (if you used undirected edges the entire sample set is a single component because of the a-g connection).
This gets to the heart of why I changed a-g to a-h in my sample data. Your specification of the problem is straightforward if only leaf nodes are shared; that's the specification I coded to. In this case, a-h and g-h can both get bundled off to their proper components with no problem, because we're concerned about reachability from parents (even given cycles).
However, when you have shared branches, it's not clear what you want to show. Consider the a-g link: given this, g-h could exist in either component (a-g-h or f-g-h). You put it in the second, but it could have been in the first instead, right? This ambiguity is why I didn't try to address it in this solution.
Edit: To be clear, in my solution above, if shared braches ARE encountered, it treats the whole set as a single component. Not what you described above, but it will have to be changed after the problem is clarified. Hopefully this gets you close.
You should use a recursive query. In the first part we select all records which are top level nodes (have no parents) and using ROW_NUMBER() assign them group ID numbers. Then in the recursive part we add to them children one by one and use parent's groups Id numbers.
with CTE as
(
select t1.parent,t1.child,
ROW_NUMBER() over (order by t1.parent) rn
from t t1 where
not exists (select 1 from t where child=t1.parent)
union all
select t.parent,t.child, CTE.rn
from t
join CTE on t.parent=CTE.Child
)
select * from CTE
order by RN,parent
SQLFiddle demo
Painful problem of graph walking using recursive CTEs. This is the problem of finding connected subgraphs in a graph. The challenge with using recursive CTEs is to prevent unwarranted recursion -- that is, infinite loops In SQL Server, that typically means storing them in a string.
The idea is to get a list of all pairs of nodes that are connected (and a node is connected with itself). Then, take the minimum from the list of connected nodes and use this as an id for the connected subgraph.
The other idea is to walk the graph in both directions from a node. This ensures that all possible nodes are visited. The following is query that accomplishes this:
with fullt as (
select keyA, keyB
from t
union
select keyB, keyA
from t
),
CTE as (
select t.keyA, t.keyB, t.keyB as last, 1 as level,
','+cast(keyA as varchar(max))+','+cast(keyB as varchar(max))+',' as path
from fullt t
union all
select cte.keyA, cte.keyB,
(case when t.keyA = cte.last then t.keyB else t.keyA
end) as last,
1 + level,
cte.path+t.keyB+','
from fullt t join
CTE
on t.keyA = CTE.last or
t.keyB = cte.keyA
where cte.path not like '%,'+t.keyB+',%'
) -- select * from cte where 'g' in (keyA, keyB)
select t.keyA, t.keyB,
dense_rank() over (order by min(cte.Last)) as grp,
min(cte.Last)
from t join
CTE
on (t.keyA = CTE.keyA and t.keyB = cte.keyB) or
(t.keyA = CTE.keyB and t.keyB = cte.keyA)
where cte.path like '%,'+t.keyA+',%' or
cte.path like '%,'+t.keyB+',%'
group by t.id, t.keyA, t.keyB
order by t.id;
The SQLFiddle is here.
you might want to check with COMMON TABLE EXPRESSIONS
here's the link

Flattening a Recursive Hierarchy Into a Dimension with SSIS

I have a recursive hierarchy in a relational database, this reflects teams and their position within a hierarchy.
I wish to flatten this hierarchy into a dimension for data warehousing, it's a SQL Server database, using SSIS to SSAS.
I have a table, teams:
teamid Teamname
1 Team 1
2 Team 2
And a table teamhierarchymapping:
Teamid heirarchyid
1 4
2 2
And a table hierarchy:
sequenceid parentsequenceid Name
1 null root
2 1 Level 1.1
3 1 Level 1.2
4 3 Level 1.2 1
Giving
Level 1.1 (Contains Team 2)
root <
Level 1.2 <
Level 1.2 1 (Contains Team 1)
I want to flatten this to a dimension like:
Team Name Level 1 Level 2 Level 3
Team 1 Root Level 1.1 [None]
Team 2 Root Level 1.2 Level 1.2 1
I've tried various nasty sets of SQL to try and bring that together, and various piping around in SSIS (which I am just starting to pick up), and I'm not finding a solution that brings it together.
Can anyone help?
(Edit corrected issue with sample data, I think)
Do you have an error in your sample data? I can't see how the hierarchy mapping connects to the hierarchy table to get the results you want, unless the hierarchy mapping is teamid 1 => hierid 2 and teamid 2 => hierid 4.
SSIS may not be able to do it (easily), so it may be better to create a OLEDB Source that executes SQL of the following format. Note this does assume you're using SQL Server 2008 as the 'PIVOT' function was introduced there...
WITH hier AS (
SELECT parentseqid, sequenceid, hiername as parentname, hiername FROM TeamHierarchy
UNION ALL
SELECT hier.parentseqid, TH.sequenceid, hier.parentname, TH.hiername FROM hier
INNER JOIN TeamHierarchy TH ON TH.parentseqid = hier.sequenceid
),
teamhier AS (
SELECT T.*, THM.hierarchyid FROM Teams T
INNER JOIN TeamHierarchyMapping THM ON T.teamid = THM.teamid
)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY teamname ORDER BY teamname, sequenceid, parentseqid) AS 'Depth', hier.parentname, teamhier.teamname
FROM hier
INNER JOIN teamhier ON hier.sequenceid = teamhier.hierarchyid
) as t1
PIVOT (MAX(parentname) FOR Depth IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) AS pvtTable
ORDER BY teamname;
There's a few different elements to this, and there may be a better way to do it, but for flattening hierarchies, CTE's are ideal.
Two CTEs are created: 'hier' which takes care of flattening the hierarchy and 'teamhier' which is just a helper "view" to make the joins later on simpler. IF you just take the hier CTE and run it, you'll get your flattened view:
WITH hier AS (
SELECT parentseqid, sequenceid, hiername as parentname, hiername FROM TeamHierarchy
UNION ALL
SELECT hier.parentseqid, TH.sequenceid, hier.parentname, TH.hiername FROM hier
INNER JOIN TeamHierarchy TH ON TH.parentseqid = hier.sequenceid
)
SELECT * FROM hier ORDER BY parentseqid, sequenceid
The next part of it basically takes this flattened view, joins it to your team tables (to get the team name) and uses SQL Server's PIVOT to rotate it round and get everything aligned as you want it. More information on PIVOT is available on the MSDN.
If you're using SQL Server 2005, then you can just take the hierarchy flattening bit and you should be able to use SSIS's native 'PIVOT' transformation block to hopefully do the dirty pivoting work.
Is there a reason why you need to flatten the hierarchy? Consider the parent-child hierarchy dimension type in SSAS, this handles variable-depth hierarchies, and would allow extra functions/features that a flattened hierarchy would not:
http://msdn.microsoft.com/en-us/library/ms174846.aspx
http://msdn.microsoft.com/en-us/library/ms174935.aspx