Recursive CTE - Compute Parent Value based on child values - sql

Here is an tree in which I want to compute total price at each level using a recursive T-SQL Query (presumably CTE) with the expected results below.
At each level the total price equals to the sum of childs' values time a parent coefficient.
Parent1 (id:1 ; coef: 3)
|
|-SubParent2 (id:2 ; coef: 0.5)
| |
| |- Child (id:4) price=10
| |- Child (id:5) price=15
|
|
|-SubParent3 (id:3; coef: 2)
| |
| |- Child (id:6) price=12
| |- Child (id:7) price=13
DESIRED Results
---------------
ID | Totalprice
1 | 187.5 (Totalprice2[=12.5] + Totalprice3[=50]) * coef1[=3]
2 | 12.5 (price4[=10]+price5[=15]) * coef2[= 0.5]
3 | 50 (price6[=12]+price7[=13]) * coef3[= 2]
4 | 10
5 | 15
6 | 12
7 | 13
I am sure I can do it using recursive CTE but I don't find how to do it since I cannot use group by in the recursive part of the CTE...
Creation Script
CREATE TABLE [dbo].[Instances] (
[ID] int NOT NULL,
[Coef] float NULL,
[price] float NULL,
[ParentID] int NULL);
INSERT INTO Instances
Values
(1,3,NULL,NULL),
(2,0.5,NULL,1),
(3,2,NULL,1),
(4,1,10,2),
(5,1,15,2),
(6,1,12,3),
(7,1,13,3)
Thank you for your help

This was a bit tricky, anyhow common table expression do miracles.
The idea I'm using is first to select all the leafs (records with prices only) then go step by step upwards each step I take the price multiplied by the coef, and then again till then end. after that, I will use a sum with a group by to get the final result,
My result match your expected output.
;with leafs as (
select *,ins.Coef*ins.price [total] from Instances ins where price is not null
union all
select ins.*,leafs.total*ins.Coef [total] from leafs
inner join Instances ins on ins.ID=leafs.ParentID
)
select ID,sum(total) Totalprice from leafs
group by ID
order by ID
The result of the query above is as below:-
ID Totalprice
1 187.5
2 12.5
3 50
4 10
5 15
6 12
7 13
Hope this helps.

Related

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

Use SQL statements to construct an aggregator to calculate the data sum of a matrix

I have recently tried to construct an aggregator to calculate the sum of the entire table data. Do you have any opinions?
Following is my table structure and dataļ¼š
CREATE TABLE matrix (
id INT NOT NULL PRIMARY KEY UNIQUE,
room INT NOT NULL,
salary INT NOT NULL,
time INT NOT NULL);
id | room | salary | time
----+------+--------+------
1 | 101 | 100 | 7
2 | 205 | 150 | 8
3 | 304 | 160 | 7
4 | 106 | 200 | 8
I want to calculate the total of the entire table data by constructing an aggregator so that its output looks like this:
sum
------
1366
Are there any good solutions?
You didn't tell us the DBMS you are using, but in standard SQL you could "unpivot" the columns using a lateral join and then sum the result:
select sum(t.val)
from matrix
cross join lateral ( values (room), (salary), (time) ) as t(val);

How to write a CTE to aggregate hierarchical values

I want to write expressions in sqlite to process a tree of items, starting with the leaf nodes (the bottom) and proceeding back to their parents all the way to the root node (the top), such that each parent node is updated based on the content of its children. I've been able to write a CTE that does something similar, but isn't yet totally correct.
I have a simple table "test1" containing some nested values:
id | parent | value | total
---+--------+--------------
1 | NULL | NULL | NULL
2 | 1 | NULL | NULL
3 | 2 | NULL | NULL
4 | 3 | 50 | NULL
5 | 3 | 50 | NULL
6 | 2 | 60 | NULL
7 | 6 | 90 | NULL
8 | 6 | 60 | NULL
Rows may have children who reference their parent via their parent field. Rows may have a value of their own as well as child rows, or they may simply be parents without values (ie. "wrappers"). The leafs would be the rows without any children.
For each row I'd like to calculate the total, as the average or the row's value (if not null) AND its children's totals. This should start with the leaf nodes and proceed up the tree to their parents, all the way to the root node at the top of the data hierarchy.
I've tried a number of variations of CTE's but am having difficulty writing one that will recursively calculate these totals from the bottom up.
Currently, I have:
UPDATE test1 SET total = (
WITH RECURSIVE cte(cte_id,cte_parent,cte_value,cte_total) AS (
SELECT test1.id, test1.parent, test1.value, test1.total
UNION ALL
select t.id, t.parent, t.value, t.total from test1 t, cte
WHERE cte.cte_id=t.parent
) SELECT AVG(cte_value) FROM cte
);
which produces:
id | parent | value | total
---+--------+-------+------
1 | NULL | NULL | 62
2 | 1 | NULL | 62
3 | 2 | NULL | 50
4 | 3 | 50 | 50
5 | 3 | 50 | 50
6 | 2 | 60 | 70
7 | 6 | 90 | 90
8 | 6 | 60 | 60
Looking at the top-most rows, this is not quite right, since it's taking an average of not only the row's immediate children, but of all the row's descendants. This causes row 2 for example to have a total of 62 instead of 60. The expected results should set rows 2's total to 60, as the average of its immediate child rows 3 and 6. Row 1's total would be 60 as well.
How can I calculate a "total" value for each row based on an average of the row's value and the values of it's immediate children only, while ensuring the upper levels of the hierarchy are correctly populated based on the calculated totals of their children?
It turns out that a very similar question and solution was posted here:
How can I traverse a tree bottom-up to calculate a (weighted) average of node values in PostgreSQL?
Since sqlite3 doesn't let you create functions, the example using a recursive CTE applies:
with recursive cte(id, parent, value, level, total) as (
select
t.id, t.parent, t.value,
0,
t.value as total
from test1 t
where not exists (
select id
from test1
where parent = t.id)
union all
select
t.id, t.parent, t.value,
c.level+1,
case when t.value is null then c.total else t.value end
from test1 t
join cte c on t.id=c.parent
)
select id, parent, value, avg(total) total from (
select
id, parent, value, level, avg(total) total
from cte
group by id,parent,level
)
group by id, parent
order by id

Find best match in tree given a combination of multiple keys

I have a structure / tree that looks similar to this.
CostType is mandatory and can exist by itself, but it can have a parent ProfitType or Unit and other CostTypes as children.
There can only be duplicate Units. Other cannot appear multiple times in the structure.
| ID | name | parent_id | ProfitType | CostType | Unit |
| -: | ------------- | --------: |
| 1 | Root | (NULL) |
| 2 | 1 | 1 | 300 | | |
| 3 | 1-1 | 2 | | 111 | |
| 4 | 1-1-1 | 3 | | | 8 |
| 5 | 1-2 | 2 | | 222 | |
| 6 | 1-2-1 | 5 | | 333 | |
| 7 | 1-2-1-1 | 6 | | | 8 |
| 8 | 1-2-1-2 | 6 | | | 9 |
Parameters | should RETURN |
(300,111,8) | 4 |
(null,111,8) | 4 |
(null,null,8) | first match, 4 |
(null,222,8) | best match, 5 |
(null,333,null) | 6 |
I am at a loss on how I could create a function that receives (ProfitType, CostType, Unit) and return the best matching ID from the structure.
This isn't giving exactly the answers you provided as example, but see my comment above - if (null,222,8) should be 7 to match how (null,333,8) returns 4 then this is correct.
Also note that I formatted this using temp tables instead of as a function, I don't want to trip a schema change audit so I posted what I have as temp tables, I can rewrite it as a function Monday when my DBA is available, but I thought you might need it before the weekend. Just edit the "DECLARE #ProfitType int = ..." lines to the values you want to test
I also put in quite a few comments because the logic is tricky, but if they aren't enough leave a comment and I can expand my explanation
/*
ASSUMPTIONS:
A tree can be of arbitrary depth, but will not exceed the recursion limit (defaults to 100)
All trees will include at least 1 CostType
All trees will have at most 1 ProfitType
CostType can appear multiple times in a traversal from root to leaf (can units?)
*/
SELECT *
INTO #Temp
FROM (VALUES (1,'Root',NULL, NULL, NULL, NULL)
, (2,'1', 1, 300, NULL, NULL)
, (3,'1-1', 2, NULL, 111, NULL)
, (4,'1-1-1', 3, NULL, NULL, 8)
, (5,'1-2', 2, NULL, 222, NULL)
, (6,'1-2-1', 5, NULL, 333, NULL)
, (7,'1-2-1-1', 6, NULL, NULL, 8)
, (8,'1-2-1-2', 6, NULL, NULL, 9)
) as TempTable(ID, RName, Parent_ID, ProfitType, CostType, UnitID)
--SELECT * FROM #Temp
DECLARE #ProfitType int = NULL--300
DECLARE #CostType INT = 333 --NULL --111
DECLARE #UnitID INT = NULL--8
--SELECT * FROM #Temp
;WITH cteMatches as (
--Start with all nodes that match one criteria, default a score of 100
SELECT N.ID as ReportID, *, 100 as Score, 1 as Depth
FROM #Temp AS N
WHERE N.CostType= #CostType OR N.ProfitType=#ProfitType OR N.UnitID = #UnitID
), cteEval as (
--This is a recursive CTE, it has a (default) limit of 100 recursions
--, but that can be raised if your trees are deeper than 100 nodes
--Start with the base case
SELECT M.ReportID, M.RName, M.ID ,M.Parent_ID, M.Score
, M.Depth, M.ProfitType , M.CostType , M.UnitID
FROM cteMatches as M
UNION ALL
--This is the recursive part, add to the list of matches the match when
--its immediate parent is also considered. For that match increase the score
--if the parent contributes another match. Also update the ID of the match
--to the parent's IDs so recursion can keep adding if more matches are found
SELECT M.ReportID, M.RName, N.ID ,N.Parent_ID
, M.Score + CASE WHEN N.CostType= #CostType
OR N.ProfitType=#ProfitType
OR N.UnitID = #UnitID THEN 100 ELSE 0 END as Score
, M.Depth + 1, N.ProfitType , N.CostType , N.UnitID
FROM cteEval as M INNER JOIN #Temp AS N on M.Parent_ID = N.ID
)SELECT TOP 1 * --Drop the "TOP 1 *" to see debugging info (runners up)
FROM cteEval
ORDER BY SCORE DESC, DEPTH
DROP TABLE #Temp
I'm sorry I don't have enough rep to comment.
You'll have to define "best answer" (for example, why isn't the answer to null,222,8 7 or null instead of 5?), but here's the approach I'd use:
Derive a new table where ProfitType and CostType are listed explicitly instead of only by inheritance. I would approach that by using a cursor (how awful, I know) and following the parent_id until a ProfitType and CostType is found -- or the root is reached. This presumes an unlimited amount of child/grandchild levels for parent_id. If there is a limit, then you can instead use N self joins where N is the number of parent_id levels allowed.
Then you run multiple queries against the derived table. The first query would be for an exact match (and then exit if found). Then next query would be for the "best" partial match (then exit if found), followed by queries for 2nd best, 3rd best, etc. until you've exhausted your "best" match criteria.
If you need nested parent CostTypes to be part of the "best match" criteria, then I would make duplicate entries in the derived table for each row that has multiple CostTypes with a CostType "level". level 1 is the actual CostType. level 2 is that CostType's parent, level 3 etc. Then your best match queries would return multiple rows and you'd need to pick the row with the lowest level (which is the closest parent/grandparent).

Select records by parents and childs

I have a table with records when some records are parents of other.
I have a column parent that with zero value means that is the parent. When is a child it has the id of the parent record.
What I need is to list the record by order, parent and childs.
My table TOOLS:
ID | order | parent | name
100 | 1 | 0 | X
200 | 2 | 0 | Y
150 | 0 | 100 | X.1
300 | 0 | 200 | Y.1
I need the following result:
ID | order | parent | name
100 | 1 | 0 | X
150 | 0 | 100 | X.1
200 | 2 | 0 | Y
300 | 0 | 200 | Y.1
How can I order this?
If I use this query
select t.*
from t
order by (case when parent = 0 then id else parent end), order desc;
The result is this:
ID | order | parent | name
200 | 2 | 0 | Y
300 | 0 | 200 | Y.1
100 | 1 | 0 | X
150 | 0 | 100 | X.1
And if I change the to order asc it put the records with order = 0 at the top...
Look this example
Thank you
You need to construct a value the provides the desired order. One way to do that is to construct a materialized path that encodes your order criteria. Since you are using SQL Server you could use its hierarchyid data type for this, however, the real magic is in building the materialized path itself, of which I'll show two versions, one used with the hierarchyid (path) and one without (path2):
with cte as (
-- Anchor Part
select id, [order], parent, name, 1 level
, cast('/'+cast(tools.id as varchar(10))+'/' as varchar(4000)) path
, cast(cast(tools.id as binary(4)) as varbinary(4000)) path2
from TOOLS
where parent = 0
union all
-- Recursive Part
select tools.*
, level+1
, cast(path+cast(tools.id as varchar(10))+'/' as varchar(4000))
, cast(path2+cast(tools.id as binary(4)) as varbinary(4000))
from tools
join cte
on tools.parent = cte.id
)
select id
, [order]
, parent
, name
, level
, path
, cast(path as hierarchyid) hid
, path2
from cte order by hid
In the above query, a recursive common table expression is used to walk the tree and build the two example paths. The first path, while it would work as a sort column on its own with your current IDs, would begin to fail as soon as you had IDs with a length other than three digits as the levels most significant digits would not necessarily align correctly. The cast to heirarchyid in the final query resolves that issue. The second path can be use directly as a sort key since it casts each level of the path to a four byte binary value, as such it can handle IDs up to 4,294,967,295 and path a length (tree depth) of 1000 levels and does not need a final cast to any other data types.
THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.
In this case, you can use order by:
select t.*
from t
order by (case when parent = 0 then id else parent end), order desc;
Here is a SQL Fiddle that demonstrates that this works on your original question.