SQL Server (terminal result) hierarchy map - sql

In SQL Server 2016, I have a table with the following chaining structure:
dbo.Item
OriginalItem
ItemID
NULL
7
1
2
NULL
1
5
6
3
4
NULL
8
NULL
5
9
11
2
3
EDIT NOTE: Bold numbers were added as a response to #lemon comments below
Importantly, this example is a trivialized version of the real data, and the neatly ascending entries is not something that is present in the actual data, I'm just doing that to simplify the understanding.
I've constructed a query to get what I'm calling the TerminalItemID, which in this example case is ItemID 4, 6, and 7, and populated that into a temporary table #TerminalItems, the resultset of which would look like:
#TerminalItems
TerminalItemID
4
6
7
8
11
What I need, is a final mapping table that would look something like this (using the above example -- note that it also contains for 4, 6, and 7 mapping to themselves, this is needed by the business logic):
#Mapping
ItemID
TerminalItemID
1
4
2
4
3
4
4
4
5
6
6
6
7
7
8
8
9
11
11
11
What I need help with is how to build this last #Mapping table. Any assistance in this direction is greatly appreciated!

This should do:
with MyTbl as (
select *
from (values
(NULL, 1 )
,(1, 2 )
,(2, 3 )
,(3, 4 )
,(NULL, 5 )
,(5, 6 )
,(NULL, 7 )
) T(OriginalItem, ItemID)
)
, TerminalItems as (
/* Find all leaf level items: those not appearing under OriginalItem column */
select LeafItem=ItemId, ImmediateOriginalItem=M.OriginalItem
from MyTbl M
where M.ItemId not in
(select distinct OriginalItem
from MyTbl AllParn
where OriginalItem is not null
)
), AllLevels as (
/* Use a recursive CTE to find and report all parents */
select ThisItem=LeafItem, ParentItem=ImmediateOriginalItem
from TerminalItems
union all
select ThisItem=AL.ThisItem, M.OriginalItem
from AllLevels AL
inner join
MyTbl M
on M.ItemId=AL.ParentItem
)
select ItemId=coalesce(ParentItem,ThisItem), TerminalItemId=ThisItem
from AllLevels
order by 1,2
Beware of the MAXRECURSION setting; by default SQLServer iterates through recursion 100 times; this would mean that the depth of your tree can be 100, max (the maximum number of nodes between a terminal item and its ultimate original item). This can be increased by OPTION(MAXRECURSION nnn) where nnn can be adjusted as needed. It can also be removed entirely by using 0 but this is not recommended because your data can cause infinite loops.

This is a typical gaps-and-islands problem and can also be carried out without recursion in three steps:
assign 1 at the beginning of each partition
compute a running sum over your flag value (generated at step 1)
extract the max "ItemID" on your partition (generated at step 2)
WITH cte1 AS (
SELECT *, CASE WHEN OriginalItem IS NULL THEN 1 ELSE 0 END AS changepartition
FROM Item
), cte2 AS (
SELECT *, SUM(changepartition) OVER(ORDER BY ItemID) AS parts
FROM cte1
)
SELECT ItemID, MAX(ItemID) OVER(PARTITION BY parts) AS TerminalItemID
FROM cte2
Check the demo here.
Assumption: Your terminal id items correspond to the "ItemID" value preceding a NULL "OriginalItem" value.
EDIT: "Fixing orphaned records."
The query works correctly when records are not orphaned. The only way to deal them, is to get missing records back, so that the query can work correctly on the full data.
This is carried out by an extra subquery (done at the beginning), that will apply a UNION ALL between:
the available records of the original table
the missing records
WITH fix_orphaned_records AS(
SELECT * FROM Item
UNION ALL
SELECT NULL AS OriginalItem,
i1.OriginalItem AS ItemID
FROM Item i1
LEFT JOIN Item i2 ON i1.OriginalItem = i2.ItemID
WHERE i1.OriginalItem IS NOT NULL AND i2.ItemID IS NULL
), cte AS (
...
Missing records correspond to "OriginalItem" values that are never found within the "ItemID" field. A self left join will uncover these missing records.
Check the demo here.

You can use a recursive CTE to compute the last item in the sequence. For example:
with
n (orig_id, curr_id, lvl) as (
select itemid, itemid, 1 from item
union all
select n.orig_id, i.itemid, n.lvl + 1
from n
join item i on i.originalitem = n.curr_id
)
select *
from (
select *, row_number() over(partition by orig_id order by lvl desc) as rn from n
) x
where rn = 1
Result:
orig_id curr_id lvl rn
-------- -------- ---- --
1 4 4 1
2 4 3 1
3 4 2 1
4 4 1 1
5 6 2 1
6 6 1 1
7 7 1 1
See running example at db<>fiddle.

Related

Postgresql query to get sum of tree

Hi guys I would like to ask about postgresql and what could be the best query to get sum of column when you have table of elements that has some descendants of more levels ie.
id id_parentvalue
1 null 3
2 null 4
3 1 2
4 2 3
5 3 4
6 3 2
7 4 5
8 4 7
so the result would be rows with sum of all of their tree as follows
value of ids 5 and 6 together is 6 plus value of their parent would be 8 plus his parent would be 11, and same for items with id 7 and 8 so the grandĖ‡parent with id=2 would have value 19
id id_parentvalue
1 null 11
2 null 19
thanks in advance
Use recursive CTEs:
with recursive cte as (
select t.id, t.value, ultimate_parent as id
from t
where id_parent is null
union all
select t.id, t.value, cte.ultimate_parent
from cte join
t
on t.id_parent = cte.id
)
select ultimate_parent, sum(value)
from cte
group by ultimate_parent;
The recursive part starts with the ultimate parents -- the records whose parent is NULL. It then brings in lower levels, step-by-step, keeping the id of the ultimate parent.
The final aggregation just sums the values together.

Pre-order sorting of parents and children

Given the following data:
id | parent | sort
--------------------
1 | null | 0
2 | null | 1
3 | 1 | 0
4 | 1 | 1
5 | 3 | 0
6 | 5 | 0
7 | 2 | 0
How do I do a pre-order sort, meaning parents first, then children, then grandchildren, etc...?
The sorted result I'm looking for is: 1, 3, 5, 6, 4, 2, 7
If at all possible, I'd like to do this without using a CTE (or a CTE I can understand). The way I'm doing it now is just selecting every record and checking "upwards" to see if there are any parents, grandparents and greatgrandparents. It makes more sense to do something for the records that don't have a parents (top items) and let it go on until there are no children anymore, right?
I just can't wrap my head around this...
This is an oversimplification of my actual query, but what I'm doing now is along the lines of:
SELECT ..some columns ..
FROM table t
LEFT JOIN table tparent WHERE tparent.ID = t.Parent
LEFT JOIN table tgrandparent WHERE tgrandparent.ID = tparent.Parent
LEFT JOIN table tgreatgrandparent WHERE tgreatgrandparent.ID = tgrandparent.Parent
This does use CTEs, but hopefully I can explain their usage:
;With ExistingQuery (id,parent,sort) as (
select 1,null,0 union all
select 2,null,1 union all
select 3,1 ,0 union all
select 4,1 ,1 union all
select 5,3 ,0 union all
select 6,5 ,0 union all
select 7,2 ,0
), Reord as (
select *,ROW_NUMBER() OVER (ORDER BY parent,sort) as rn from ExistingQuery
), hier as (
select id,parent,'/' + CONVERT(varchar(max),rn)+'/' as part
from Reord
union all
select h.id,r.parent,'/' + CONVERT(varchar(max),r.rn) + part
from hier h
inner join
Reord r
on
h.parent = r.id
)
select
e.*
from
hier h
inner join
ExistingQuery e
on
h.id = e.id
where
h.parent is null
order by CONVERT(hierarchyid,h.part)
ExistingQuery is just whatever you've currently got for your query. You should be able to just place your existing query in there (possibly with an expanded column list) and everything should just work.
Reord addresses a concern of mine but it may not be needed - if your actual data is actually such that the id values are indeed in the right order that we can ignore sort then remove Reord and replace the references to rn with id. But this CTE does that work to make sure that the children of parents are respecting the sort column.
Finally, the hier CTE is the meat of this solution - for every row, it's building up a hierachyid for that row - from the child, working back up the tree until we hit the root.
And once the CTEs are done with, we join back to ExistingQuery so that we're just getting the data from there, but can use the hierarchyid to perform proper sorting - that type already knows how to correctly sort hierarchical data.
Result:
id parent sort
----------- ----------- -----------
1 NULL 0
3 1 0
5 3 0
6 5 0
4 1 1
2 NULL 1
7 2 0
And the result showing the part column from hier, which may help you see what that CTE constructed:
id parent sort part
----------- ----------- ----------- --------------
1 NULL 0 /1/
3 1 0 /1/3/
5 3 0 /1/3/6/
6 5 0 /1/3/6/7/
4 1 1 /1/4/
2 NULL 1 /2/
7 2 0 /2/5/
(You may also want to change the final SELECT to just SELECT * from hier to also get a feel for how that CTE works)
I finally dove into CTE and got it working, here is the base of the query if anyone else may come across it. It's important to note that sort is a padded string, starting at 0000000001 and counting upwards.
WITH recursive_CTE (id, parentId, sort)
AS
(
-- CTE ANCHOR SELECTS ROOTS --
SELECT t.ID AS id, t.Parent as parentId, t.sort AS sort
FROM table t
WHERE t.Parent IS NULL
UNION ALL
-- CTE RECURSIVE SELECTION --
SELECT t.ID AS id, t.Parent as parentId, cte.sort + t.sort AS sort
FROM table t
INNER JOIN recursive_CTE cte ON cte.id = t.Parent
)
SELECT * FROM recursive_CTE
ORDER BY sort
I believe this is the main part needed to make this kind of query work. It's actually pretty fast if you make sure you're hitting the necessary indices.
Sort is built up by expanding a string.
So a parent would have sort '0000000001', his direct child will have '00000000010000000001' and his grandchild will have '000000000100000000010000000001' etc. His sibling starts at '0000000002' and so comes after all the 01 records.

Is there a way to update groups of rows with separate incrementing values in one query

Lets say you have the following table:
Id Index
1 3
1 1
2 1
3 3
1 5
what I would like to have is the following:
Id Index
1 0
1 1
2 0
3 0
1 2
As you might notice, the goal is for every row where Id is the same, to incrementally update the Index column, starting from zero.
Now, I know this is fairly simple with using cursors, but out of curiosity is there a way to do this with single UPDATE query, somehow combining with temp tables, common table expressions or something similar?
Yes, assuming that the you don't really care about the order of the values for the new index values. SQL Server offers updatable CTEs and window functions that do exactly what you want:
with toupdate as (
select t.*, row_number() over (partition by id order by (select NULL)) as newindex
from table t
)
update toupdate
set index = newindex;
If you want them in a specific order, then you need another column to specify the ordering. The existing index column doesn't work.
With Row_number() -1 and CTE you can write as:
CREATE TABLE #temp1(
Id int,
[Index] int)
INSERT INTO #temp1 VALUES (1,3),(1,1),(2,1),(3,3),(1,5);
--select * from #temp1;
With CTE as
(
select t.*, row_number() over (partition by id order by (select null))-1 as newindex
from #temp1 t
)
Update CTE
set [Index] = newindex;
select * from #temp1;
Demo
I'm not sure why you would want to do this really, but I had fun figuring it out!
This solution relies on your table having a primary key for the self join... but you could always create an auto inc index if none exists and this is a one off job... This will also have the added benefit of getting you to think about the precise ordering of this you want... as currently there is no way of saying which order [ID] will get [Index] in.
UPDATE dbo.Example
SET [Index] = b.newIndex
FROM dbo.Example a
INNER JOIN (
select
z.ID,
z.[Index],
(row_number() over (partition by ID order by (select NULL))) as newIndex
from Example z
) b ON a.ID = b.ID AND a.[Index]=b.[Index] --Is this a unique self join for your table?.. no PK provided. You might need to make an index first.
Probably, this is what you want
SELECT *,RANK() OVER(PARTITION BY Id ORDER BY [Index])-1 AS NewIndex FROM
(
SELECT 1 AS Id,3 [Index]
UNION
SELECT 1,1
UNION
SELECT 2,1
UNION
SELECT 3,3
UNION
SELECT 1,5
) AS T
& the result will come as
Now if you want to update the table then execute this script
UPDATE tblname SET Index=RANK() OVER(PARTITION BY t.Id ORDER BY t.[Index])-1
FROM tblname AS t
In case I am missing something or any further assistance is required please let me know.
CREATE TABLE #temp1(
Id int,
Value int)
INSERT INTO #temp1 VALUES (1,2),(1,3),(2,3),(4,5)
SELECT
Id
,Value
,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Id) Id
FROM #temp1
Start with this :)
Gave me results like
Id Value Count
1 2 1
1 3 2
1 2 3
1 3 4
1 2 5
1 3 6
1 2 7
1 3 8
2 3 1
2 4 2
2 5 3
2 3 4
2 4 5
2 5 6
2 4 7
2 5 8
2 3 9
2 3 10
3 4 1
4 5 1
4 5 2
4 5 3
4 5 4

SQL - Add value with previous row only

I have a table named myvals with the following fields:
ID number
-- -------
1 7
2 3
3 4
4 0
5 9
Starting on 2nd row, I would like to add the number with the previous row number. So, my end result would look like this
ID number
-- ------
1 7
2 10
3 7
4 4
5 9
You could use the LAG analytic function
SELECT Id, number + LAG(number,1,0) OVER (ORDER BY Id) FROM table
First thing's first. You can't add to null to ID 1 must have a value.
create table #temp
(
month_type datetime,
value int
)
insert into #temp
Select '2015/01/01',1
union
Select '2015/02/01',2
union
Select '2015/03/01',3
union
Select '2015/04/01',4
SELECT t.value,t1.value,(t.value+t1.value)/2 FROM #temp t1
left join #temp t on t.month_type=Dateadd(MONTH,-1,t1.month_type)

SQL-Query - finding pattern of another table

I have a table with colors:
COLORS
idColor Name
------- ------
4 Yellow
5 Green
6 Red
And I have another table with data:
PRODUCTS
idProduct idCategory idColor
--------- ---------- -------
1 1 4
2 1 5
3 1 6
4 2 10
5 2 11
6 2 12
7 3 4
8 3 5
9 3 8
10 4 4
11 4 5
12 4 6
13 5 4
14 6 4
15 6 5
I just want return rows from Products when the idColor values from table Colors (4, 5, 6) are present in the second table and IdCategory has exactly 3 elements with the same idColor values 4, 5, 6.
For this example, The query should return:
IdCategory
----------
1
4
Try this:
SELECT idCategory
FROM PRODUCTS
GROUP BY idCategory
HAVING COUNT(*) = 3
AND COUNT(DISTINCT CASE WHEN idColor IN (4,5,6) THEN idColor END) = 3
Here is a demo for you to try.
UPDATED
If you want to dynamically filter the results depending on the values of the table `COLOR
SELECT idCategory
FROM PRODUCTS P
LEFT JOIN (SELECT idColor, COUNT(*) OVER() TotalColors
FROM COLORS) C
ON P.idColor = C.idColor
GROUP BY idCategory
HAVING COUNT(*) = MIN(C.TotalColors)
AND COUNT(DISTINCT C.idColor) = MIN(C.TotalColors)
Here is a fiddle with this example.
You can use aggregates to make sure it has all 3 colors, and also to make sure it DOESN'T have any other colors. Something like this:
SELECT *
FROM
(
SELECT idCategory
, SUM(CASE WHEN idColor IN (4, 5, 6) THEN 1 ELSE 0 END) AS GoodColors
, SUM(CASE WHEN idColor NOT IN (4, 5, 6) THEN 1 ELSE 0 END) AS BadColors
FROM Products
GROUP BY idCategory
) t0
WHERE GoodColors = 3 AND BadColors = 0
Note, if the 4, 5, 6 is found more than once per idCategory then a different technique must be employed. But from your example, it doesn't appear that way.
I am guessing that you would like to perform this task based on data in a table, rather than hardcoding the values 4, 5, and 6 (like in some of the answers given). To that end, in my solution I created a dbo.ColorSets table that you can fill with as many different sets of colors as you want, then run the query and see all the product Categories that match those color Sets. The reason I didn't just use your dbo.Color table is that it appeared to be the lookup table, complete with color names, so it didn't seem like the right one to be picking out a particular set of colors rather than the entire list possible.
I used a technique that will maintain good performance even on huge amounts of data, as compared to other query methods that use aggregates exclusively. No matter what method one uses, this task will pretty much always require a scan of the entire Products table because you can't compare all the rows without, well, comparing all the rows. But the JOIN is on indexable columns and is only for the candidates that have a very good chance of being proper matches, so the amount of work required is greatly reduced.
Here's what the ColorSets table looks like:
CREATE TABLE dbo.ColorSets (
idSet int NOT NULL,
idColor int NOT NULL,
CONSTRAINT PK_ColorSet PRIMARY KEY CLUSTERED (idSet, idColor)
);
INSERT dbo.ColorSets
VALUES
(1, 4),
(1, 5),
(1, 6), -- your color set: yellow, green, and red
(2, 4),
(2, 5),
(2, 8) -- an additional color set: yellow, green, and purple
;
And the query (see this working in a SqlFiddle):
WITH Sets AS (
SELECT
idSet,
Grp = Checksum_Agg(idColor)
FROM
dbo.ColorSets
GROUP BY
idSet
), Categories AS (
SELECT
idCategory,
Grp = Checksum_Agg(idColor)
FROM
dbo.Products
GROUP BY
idCategory
)
SELECT
S.idSet,
C.idCategory
FROM
Sets S
INNER JOIN Categories C
ON S.Grp = C.Grp
WHERE
NOT EXISTS (
SELECT *
FROM
(
SELECT *
FROM dbo.ColorSets CS
WHERE CS.idSet = S.idSet
) CS
FULL JOIN (
SELECT *
FROM dbo.Products P
WHERE P.idCategory = C.idCategory
) P
ON CS.idColor = P.idColor
WHERE
CS.idColor IS NULL
OR P.idColor IS NULL
)
;
Result:
idSet idCategory
1 1
2 3
1 4
If I understand your question, this should do it
select distinct idCategory
from Products
where idColors in (4,5,6)