Querying Parent-Child relationship in a consecutive way - sql

I'm trying to write an import tool to convert my database from one schema to another.
So now I've come across a table that uses a Parent-Child relationship (via PK ID FK ParentID) and I want to select all records consecutively.
The risk of my query is that I might try to import a child element, whose parent element is not already imported. This would result in a recordset that's not going to be imported and is therefore to avoid.
My query I've worked on is as following:
SELECT * FROM Table a INNER JOIN Table b ON (b.ParentID=a.ID and a.ID= b.ParentID)
Unfortunately that doesn't work (it doesn't give me all the records in the table), so I need a query that gives me all rows in the table, ordered by child and parent elements, that I just can loop over to import.
Can someone guide me the way?

What you're looking for is a recursive common table expression which can be found at this link:
http://technet.microsoft.com/en-us/library/ms186243%28v=sql.105%29.aspx
You can use this to tell your downstream ETL the sequence things should be loaded in. For instance, all 1's go first and 2's second and so on.
DECLARE #Table TABLE (
ID INT,
ParentId INT)
INSERT INTO #Table
VALUES
(1, 0),
(2, 1),
(3, 1),
(4, 0),
(5, 4),
(6, 4),
(7, 1),
(8, 7)
--This is the anchor query and selects top level records
;WITH cte_Recursive AS (
SELECT ID, ParentId, 1 [Depth]
FROM #Table
WHERE ParentId = 0
UNION ALL
SELECT T.ID, T.ParentId
,R.Depth + 1 [Depth]
FROM #Table T
INNER JOIN cte_Recursive R ON R.ID = T.ParentId
)
SELECT *
FROM cte_Recursive

Related

Selecting X amount of rows from one table depending on value of column from another joined table

I am trying to join several tables. To simplify the situation, there is a table called Boxes which has a foreign key column for another table, Requests. This means that with a simple join I can get all the boxes that can be used to fulfill a request. But the Requests table also has a column called BoxCount which limits the number of boxes that is needed.
Is there a way to structure the query in such a way that when I join the two tables, I will only get the number of rows from Boxes that is specified in the BoxCount column of the given Request, rather than all of the rows from Boxes that have a matching foreign key?
Script to initialize sample data:
CREATE TABLE Requests (
Id int NOT NULL PRIMARY KEY,
BoxCount Int NOT NULL);
CREATE TABLE Boxes (
Id int NOT NULL PRIMARY KEY,
Label varchar,
RequestId INT FOREIGN KEY REFERENCES Requests(Id));
INSERT INTO Requests (Id, BoxCount)
VALUES
(1, 2),
(2, 3);
INSERT INTO Boxes (Id, Label, RequestId)
VALUES
(1, 'A', 1),
(2, 'B', 1),
(3, 'C', 1),
(4, 'D', 2),
(5, 'E', 2),
(6, 'F', 2),
(7, 'G', 2);
So, for example, when the hypothetical query is ran, it should return boxes A and B (because the first Request only needs 2 boxes), but not C. Similarly it should also include boxes D, E and F, but not box G, because the second request only requires 3 boxes.
Here is another approach using ROWCOUNT - a common and useful technique that every sql writer should master. The idea here is that you create a sequential number for all boxes within a request and use that to compare to the box count for filtering.
with boxord as (select *,
ROW_NUMBER() OVER (PARTITION BY RequestId ORDER BY Id) as rno
from dbo.Boxes
)
select req.*, boxord.Label, boxord.rno
from dbo.Requests as req inner join boxord on req.Id = boxord.RequestId
where req.BoxCount >= boxord.rno
order by req.Id, boxord.rno
;
fiddle to demonstrate
The INNER JOIN keyword selects records that have matching values in both tables
SELECT (cols) FROM Boxes
INNER JOIN Request on Boxes.(FK_column) = request.id
WHERE Request.BoxCount = (value)
select r.id,
r.boxcount,
b.id,
b.label
from requests r
cross apply (
select top (r.BoxCount)
id, label
from boxes
where requestid = r.id
order by id
) b;

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

SQL Hierarchy - Resolve full path for all ancestors of a given node

I have a hierarchy described by an adjacency list. There is not necessarily a single root element, but I do have data to identify the leaf (terminal) items in the hiearchy. So, a hierachy that looked like this ...
1
- 2
- - 4
- - - 7
- 3
- - 5
- - 6
8
- 9
... would be described by a table, like this. NOTE: I don't have the ability to change this format.
id parentid isleaf
--- -------- ------
1 null 0
2 1 0
3 1 0
4 2 0
5 3 1
6 3 1
7 4 1
8 null 0
9 8 1
here is the sample table definition and data:
CREATE TABLE [dbo].[HiearchyTest](
[id] [int] NOT NULL,
[parentid] [int] NULL,
[isleaf] [bit] NOT NULL
)
GO
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (1, NULL, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (2, 1, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (3, 1, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (4, 2, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (5, 3, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (6, 3, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (7, 4, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (8, NULL, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (9, 8, 1)
GO
From this, I need to provide any id and get a list of all ancestors including all descendents of each. So, if I provided the input of id = 6, I would expect the following:
id descendentid
-- ------------
1 1
1 3
1 6
3 3
3 6
6 6
id 6 just has itself
its parent, id 3 would have decendents of 3 and 6
its parent, id 1 would have decendents of 1, 3, and 6
I will be using this data to provide roll-up calculations at each level in the hierarchy. This works well, assuming I can get the dataset above.
I have accomplished this using two recusive ctes - one to get the "terminal" item for each node in the hiearchy. Then, a second one where I get the full ancestory of my selected node (so, 6 resolves to 6, 3, 1) to walk up and get the full set. I'm hoping that I'm missing something and that this can be accomplished in one round. Here is the example double-recursion code:
declare #test int = 6;
with cte as (
-- leaf nodes
select id, parentid, id as terminalid
from HiearchyTest
where isleaf = 1
union all
-- walk up - preserve "terminal" item for all levels
select h.id, h.parentid, c.terminalid
from HiearchyTest as h
inner join
cte as c on h.id = c.parentid
)
, cte2 as (
-- get all ancestors of our test value
select id, parentid, id as descendentid
from cte
where terminalid = #test
union all
-- and walkup from each to complete the set
select h.id, h.parentid, c.descendentid
from HiearchyTest h
inner join cte2 as c on h.id = c.parentid
)
-- final selection - order by is just for readability of this example
select id, descendentid
from cte2
order by id, descendentid
Additional detail: the "real" hierarchy will be much larger than the example. It can technically have infinite depth, but realistically it would rarely go more than 10 levels deep.
In summary, my question is if I can accomplish this with a single recursive cte instead of having to recurse over the hierarchy twice.
Because your data is a tree structure, we can use the hierarchyid data type to meet your needs (despite your saying that you can't in the comments). First, the easy part - generating the hierarchyid with a recursive cte
with cte as (
select id, parentid,
cast(concat('/', id, '/') as varchar(max)) as [path]
from [dbo].[HiearchyTest]
where ParentID is null
union all
select child.id, child.parentid,
cast(concat(parent.[path], child.id, '/') as varchar(max))
from [dbo].[HiearchyTest] as child
join cte as parent
on child.parentid = parent.id
)
select id, parentid, cast([path] as hierarchyid) as [path]
into h
from cte;
Next, a little table-valued function I wrote:
create function dbo.GetAllAncestors(#h hierarchyid, #ReturnSelf bit)
returns table
as return
select #h.GetAncestor(n.n) as h
from dbo.Numbers as n
where n.n <= #h.GetLevel()
or (#ReturnSelf = 1 and n.n = 0)
union all
select #h
where #ReturnSelf = 1;
Armed with that, getting your desired result set isn't too bad:
declare #h hierarchyid;
set #h = (
select path
from h
where id = 6
);
with cte as (
select *
from h
where [path].IsDescendantOf(#h) = 1
or #h.IsDescendantOf([path]) = 1
)
select h.id as parent, c.id as descendentid
from cte as c
cross apply dbo.GetAllAncestors([path], 1) as a
join h
on a.h = h.[path]
order by h.id, c.id;
Of course, you're missing out on a lot of the benefit of using a hierarchyid by not persisting it (you'll either have to keep it up to date in the side table or generate it every time). But there you go.
Okay this has been bothering me since I have read the question and I just came back to think of it again..... Anyway, why do you need to recurse back down to get all of the descendants? You have asked for ancestors not descendants and your result set is not trying to get other siblings, grand children, etc.. It is getting a parent and a grand parent in this case. Your First cte gives you everything you need to know except when an ancestor id is also the parentid. So with a union all, a little magic to setup the originating ancestor, and you have everything you need without a second recursion.
declare #test int = 6;
with cte as (
-- leaf nodes
select id, parentid, id as terminalid
from HiearchyTest
where isleaf = 1
union all
-- walk up - preserve "terminal" item for all levels
select h.id, h.parentid, c.terminalid
from HiearchyTest as h
inner join
cte as c on h.id = c.parentid
)
, cteAncestors AS (
SELECT DISTINCT
id = IIF(parentid IS NULL, #Test, id)
,parentid = IIF(parentid IS NULL,id,parentid)
FROM
cte
WHERE
terminalid = #test
UNION
SELECT DISTINCT
id
,parentid = id
FROM
cte
WHERE
terminalid = #test
)
SELECT
id = parentid
,DecendentId = id
FROM
cteAncestors
ORDER BY
id
,DecendentId
Your result set from your first cte gives you your 2 ancestors and self related to their ancestor except in the case of the originating ancestors who's parentid is null. That null is a special case I will deal with in a minute.
Remember at this point your query is producing Ancestors not descendants, but what it doesn't give you is self references meaning grandparent = grandparent, parent = parent, self = self. But all you have to do to get that is to add rows for every id and make the parentid equal to their id. hence the union. Now your result set is almost totally shaped up:
The special case of the null parentid. So the null parentid identifies the originating ancestor meaning that ancestor has no other ancestor in your dataset. And here is how you will use that to your advantage. Because you started your initial recursion at the leaf level there is no direct tie to the id that you started with to the originating ancestor, but there is at every other level, simply hijack that null parent id and flip the values around and you now have an ancestor for your leaf.
Then in the end if you want it to be a descendants table switch the columns and you are finished. One last note DISTINCTs are there in case the id is repeated with an additional parentid. E.g. 6 | 3 and another record for 6 | 4
I'm not sure if this performs better, or even produces the proper results in all cases, but you could capture a node list, then use xml functionality to parse it out and cross apply to the id list:
declare #test int = 6;
;WITH cte AS (SELECT id, parentid, CAST(id AS VARCHAR(MAX)) as IDlist
FROM HiearchyTest
WHERE isleaf = 1
UNION ALL
SELECT h.id, h.parentid , CAST(CONCAT(c.IDlist,',',h.id) AS VARCHAR(MAX))
FROM HiearchyTest as h
JOIN cte as c
ON h.id = c.parentid
)
,cte2 AS (SELECT *, CAST ('<M>' + REPLACE(IDlist, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM cte
WHERE IDlist LIKE '%'+CAST(#test AS VARCHAR(50))+'%'
)
SELECT id,Split.a.value('.', 'VARCHAR(100)') AS descendentid
FROM cte2 a
CROSS APPLY Data.nodes ('/M') AS Split(a);

Ordering parent rows by date descending with child rows ordered independently beneath each

This is a contrived version of my table schema to illustrate my problem:
QuoteID, Details, DateCreated, ModelQuoteID
Where QuoteID is the primary key and ModelQuoteID is a nullable foreign key back onto this table to represent a quote which has been modelled off another quote (and may have subsequently had its Details column etc changed).
I need to return a list of quotes ordered by DateCreated descending with the exception of modelled quotes, which should sit beneath their parent quote, ordered by date descending within any other sibling quotes (quotes can only be modelled one level deep).
So for example if I have these 4 quote rows:
1, 'Fix the roof', '01/01/2012', null
2, 'Clean the drains', '02/02/2012', null
3, 'Fix the roof and door', '03/03/2012', 1
4, 'Fix the roof, door and window', '04/04/2012', 1
5, 'Mow the lawn', '05/05/2012', null
Then I need to get the results back in this order:
5 - Mow the lawn
2 - Clean the drains
1 - Fix the roof
4 - -> Fix the roof, door and window
3 - -> Fix the roof and door
I'm also passing in search criteria such as keywords for Details, and I'm returning modelled quotes even if they don't contain the search term but their parent quote does. I've got that part working using a common table expression to get the original quotes, unioned with a join for modelled ones.
That works nicely but currently I'm having to do the rearrangement of the modelled quotes into the correct order in code. That's not ideal because my next step is to implement paging in the SQL, and if the rows are not grouped properly at that time then I won't have the children present in the current page to do the re-ordering in code. Generally speaking they will be naturally grouped together anyway, but not always. You could create a model quote today for a quote from a month back.
I've spent quite some time on this, can any SQL gurus help? Much appreciated.
EDIT: Here is a contrived version of my SQL to fit my contrived example :-)
;with originals as (
select
q.*
from
Quote q
where
Details like #details
)
select
*
from
(
select
o.*
from
originals o
union
select
q2.*
from
Quote q2
join
originals o on q2.ModelQuoteID = o.QuoteID
)
as combined
order by
combined.CreatedDate desc
Watching the Olympics -- just skimmed your post -- looks like you want to control the sort at each level (root and one level in), and make sure the data is returned with the children directly beneath its parent (so you can page the data...). We do this all the time. You can add an order by to each inner query and create a sort column. I contrived a slightly different example that should be easy for you to apply to your circumstance. I sorted the root ascending and level one descending just to illustrate how you can control each part.
declare #tbl table (id int, parent int, name varchar(10))
insert into #tbl (id, parent, name)
values (1, null, 'def'), (2, 1, 'this'), (3, 1, 'is'), (4, 1, 'a'), (5, 1, 'test'),
(6, null, 'abc'), (7, 6, 'this'), (8, 6, 'is'), (9, 6, 'another'), (10, 6, 'test')
;with cte (id, parent, name, sort) as (
select id, parent, name, cast(right('0000' + cast(row_number() over (order by name) as varchar(4)), 4) as varchar(1024))
from #tbl
where parent is null
union all
select t.id, t.parent, t.name, cast(cte.sort + right('0000' + cast(row_number() over (order by t.name desc) as varchar(4)), 4) as varchar(1024))
from #tbl t inner join cte on t.parent = cte.id
)
select * from cte
order by sort
This produces these results:
id parent name sort
---- -------- ------- ----------
6 NULL abc 0001
7 6 this 00010001
10 6 test 00010002
8 6 is 00010003
9 6 another 00010004
1 NULL def 0002
2 1 this 00020001
5 1 test 00020002
3 1 is 00020003
4 1 a 00020004
You can see that the root nodes are sorted ascending and the inner nodes are sorted descending.

What is an efficient SQL query to join to a double maximum in a sub query?

Sorry the title is crap - I could not think of a better way to phrase it. This is my data structure:
Widget WidgetTransition
------ ----------------
Id, WidgetId,
... TransitionTypeId,
Cost,
...
I want a query that will give me a list of details for each widget along with the details of the most expensive (max WidgetTransition.Cost) transition. If a widget has two or more transitions that are 'tied' for the most expensive transition, the details of the most recent transition (max WidgetTransition.WidgetId) should be used. If the widget has no transitions, it should not appear in the results. This is the best that I could come up with:
SELECT *
FROM Widget
JOIN WidgetTransition
ON WidgetTransition.WidgetId = Widget.Id
AND WidgetTransition.Cost = (
SELECT Max(MostExpensiveTransition.Cost)
FROM WidgetTransition MostExpensiveTransition
WHERE MostExpensiveTransition.WidgetId = Widget.Id
)
This almost works, but has two problems.
Doesn't deal with tied transitions properly. If a widget has two or more tied transitions, each transition will appear in the results, instead of the most recent.
With large data sets, the query is sloooow. The Sybase database that I am running it on will do two table scans (WidgetTransition.Cost is not in the index) on WidgetTransition for each widget. Presumably one is for the join and one to find the max cost.
Is there a better way to write this query that fixes the tied problem and/or runs more efficiently? I want to avoid using T-SQL or a stored procedure.
If you are using a database product that supports ranking functions and common table expressions such as SQL Server 2005 and later (or recent versions of Sybase), you can do something like:
With Data As
(
Select WidgetId, TransitionTypeId
, Row_Number() Over ( Partition By WidgetId
Order By Cost Desc, WidgetId Desc ) As Rnk
From WidgetTransition
)
Select ...
From Widget As W
Join Data As D
On D.WidgetId = W.WidgetId
Where D.Rnk = 1
Have you tried creating the 'MaxValues' like below:
SELECT *
FROM Widget
JOIN WidgetTransition
ON WidgetTransition.WidgetId = Widget.Id
JOIN (SELECT MostExpensiveTransition.WidgetId, Max(MostExpensiveTransition.Cost) Cost
FROM WidgetTransition MostExpensiveTransition
GROUP BY MostExpensiveTransition.WidgetId
) MaxValues
ON MaxValues.WidgetId = Widget.Id
AND WidgetTransition.Cost = MaxValues.Cost
I'm going from memory here as I don't have a sql database to play with at the moment, so sorry if it doesn't quite work.
If I understand you correctly this will do what you want.
Don't know about performance. You have to test that and try to figure out what indexes is helpful.
declare #Widget table (ID int)
declare #WidgetTransition table (WidgetID int, TransitionTypeID int, Cost int)
insert into #Widget values (1)
insert into #Widget values (2)
insert into #Widget values (3)
insert into #WidgetTransition values (1, 1, 1)
insert into #WidgetTransition values (1, 2, 2)
insert into #WidgetTransition values (2, 1, 2)
insert into #WidgetTransition values (2, 2, 12)
insert into #WidgetTransition values (2, 3, 12)
select *
from #Widget as Widget
inner join #WidgetTransition as WidgetTransition
on Widget.ID = WidgetTransition.WidgetID
inner join
( select Widget.ID, max(WT.TransitionTypeID) as TrasitionTypeID
from #Widget as Widget
inner join
( select WidgetID, max(Cost) as Cost
from #WidgetTransition
group by WidgetID
) as MaxCost
on Widget.ID = MaxCost.WidgetID
inner join #WidgetTransition as WT
on Widget.ID = WT.WidgetID and
MaxCost.Cost = WT.Cost
group by Widget.ID
) as MaxTransitionTypeID
on Widget.ID = MaxTransitionTypeID.ID and
WidgetTransition.TransitionTypeID = MaxTransitionTypeID.TrasitionTypeID