Pre-order sorting of parents and children - sql

Given the following data:
id | parent | sort
--------------------
1 | null | 0
2 | null | 1
3 | 1 | 0
4 | 1 | 1
5 | 3 | 0
6 | 5 | 0
7 | 2 | 0
How do I do a pre-order sort, meaning parents first, then children, then grandchildren, etc...?
The sorted result I'm looking for is: 1, 3, 5, 6, 4, 2, 7
If at all possible, I'd like to do this without using a CTE (or a CTE I can understand). The way I'm doing it now is just selecting every record and checking "upwards" to see if there are any parents, grandparents and greatgrandparents. It makes more sense to do something for the records that don't have a parents (top items) and let it go on until there are no children anymore, right?
I just can't wrap my head around this...
This is an oversimplification of my actual query, but what I'm doing now is along the lines of:
SELECT ..some columns ..
FROM table t
LEFT JOIN table tparent WHERE tparent.ID = t.Parent
LEFT JOIN table tgrandparent WHERE tgrandparent.ID = tparent.Parent
LEFT JOIN table tgreatgrandparent WHERE tgreatgrandparent.ID = tgrandparent.Parent

This does use CTEs, but hopefully I can explain their usage:
;With ExistingQuery (id,parent,sort) as (
select 1,null,0 union all
select 2,null,1 union all
select 3,1 ,0 union all
select 4,1 ,1 union all
select 5,3 ,0 union all
select 6,5 ,0 union all
select 7,2 ,0
), Reord as (
select *,ROW_NUMBER() OVER (ORDER BY parent,sort) as rn from ExistingQuery
), hier as (
select id,parent,'/' + CONVERT(varchar(max),rn)+'/' as part
from Reord
union all
select h.id,r.parent,'/' + CONVERT(varchar(max),r.rn) + part
from hier h
inner join
Reord r
on
h.parent = r.id
)
select
e.*
from
hier h
inner join
ExistingQuery e
on
h.id = e.id
where
h.parent is null
order by CONVERT(hierarchyid,h.part)
ExistingQuery is just whatever you've currently got for your query. You should be able to just place your existing query in there (possibly with an expanded column list) and everything should just work.
Reord addresses a concern of mine but it may not be needed - if your actual data is actually such that the id values are indeed in the right order that we can ignore sort then remove Reord and replace the references to rn with id. But this CTE does that work to make sure that the children of parents are respecting the sort column.
Finally, the hier CTE is the meat of this solution - for every row, it's building up a hierachyid for that row - from the child, working back up the tree until we hit the root.
And once the CTEs are done with, we join back to ExistingQuery so that we're just getting the data from there, but can use the hierarchyid to perform proper sorting - that type already knows how to correctly sort hierarchical data.
Result:
id parent sort
----------- ----------- -----------
1 NULL 0
3 1 0
5 3 0
6 5 0
4 1 1
2 NULL 1
7 2 0
And the result showing the part column from hier, which may help you see what that CTE constructed:
id parent sort part
----------- ----------- ----------- --------------
1 NULL 0 /1/
3 1 0 /1/3/
5 3 0 /1/3/6/
6 5 0 /1/3/6/7/
4 1 1 /1/4/
2 NULL 1 /2/
7 2 0 /2/5/
(You may also want to change the final SELECT to just SELECT * from hier to also get a feel for how that CTE works)

I finally dove into CTE and got it working, here is the base of the query if anyone else may come across it. It's important to note that sort is a padded string, starting at 0000000001 and counting upwards.
WITH recursive_CTE (id, parentId, sort)
AS
(
-- CTE ANCHOR SELECTS ROOTS --
SELECT t.ID AS id, t.Parent as parentId, t.sort AS sort
FROM table t
WHERE t.Parent IS NULL
UNION ALL
-- CTE RECURSIVE SELECTION --
SELECT t.ID AS id, t.Parent as parentId, cte.sort + t.sort AS sort
FROM table t
INNER JOIN recursive_CTE cte ON cte.id = t.Parent
)
SELECT * FROM recursive_CTE
ORDER BY sort
I believe this is the main part needed to make this kind of query work. It's actually pretty fast if you make sure you're hitting the necessary indices.
Sort is built up by expanding a string.
So a parent would have sort '0000000001', his direct child will have '00000000010000000001' and his grandchild will have '000000000100000000010000000001' etc. His sibling starts at '0000000002' and so comes after all the 01 records.

Related

SQL Server (terminal result) hierarchy map

In SQL Server 2016, I have a table with the following chaining structure:
dbo.Item
OriginalItem
ItemID
NULL
7
1
2
NULL
1
5
6
3
4
NULL
8
NULL
5
9
11
2
3
EDIT NOTE: Bold numbers were added as a response to #lemon comments below
Importantly, this example is a trivialized version of the real data, and the neatly ascending entries is not something that is present in the actual data, I'm just doing that to simplify the understanding.
I've constructed a query to get what I'm calling the TerminalItemID, which in this example case is ItemID 4, 6, and 7, and populated that into a temporary table #TerminalItems, the resultset of which would look like:
#TerminalItems
TerminalItemID
4
6
7
8
11
What I need, is a final mapping table that would look something like this (using the above example -- note that it also contains for 4, 6, and 7 mapping to themselves, this is needed by the business logic):
#Mapping
ItemID
TerminalItemID
1
4
2
4
3
4
4
4
5
6
6
6
7
7
8
8
9
11
11
11
What I need help with is how to build this last #Mapping table. Any assistance in this direction is greatly appreciated!
This should do:
with MyTbl as (
select *
from (values
(NULL, 1 )
,(1, 2 )
,(2, 3 )
,(3, 4 )
,(NULL, 5 )
,(5, 6 )
,(NULL, 7 )
) T(OriginalItem, ItemID)
)
, TerminalItems as (
/* Find all leaf level items: those not appearing under OriginalItem column */
select LeafItem=ItemId, ImmediateOriginalItem=M.OriginalItem
from MyTbl M
where M.ItemId not in
(select distinct OriginalItem
from MyTbl AllParn
where OriginalItem is not null
)
), AllLevels as (
/* Use a recursive CTE to find and report all parents */
select ThisItem=LeafItem, ParentItem=ImmediateOriginalItem
from TerminalItems
union all
select ThisItem=AL.ThisItem, M.OriginalItem
from AllLevels AL
inner join
MyTbl M
on M.ItemId=AL.ParentItem
)
select ItemId=coalesce(ParentItem,ThisItem), TerminalItemId=ThisItem
from AllLevels
order by 1,2
Beware of the MAXRECURSION setting; by default SQLServer iterates through recursion 100 times; this would mean that the depth of your tree can be 100, max (the maximum number of nodes between a terminal item and its ultimate original item). This can be increased by OPTION(MAXRECURSION nnn) where nnn can be adjusted as needed. It can also be removed entirely by using 0 but this is not recommended because your data can cause infinite loops.
This is a typical gaps-and-islands problem and can also be carried out without recursion in three steps:
assign 1 at the beginning of each partition
compute a running sum over your flag value (generated at step 1)
extract the max "ItemID" on your partition (generated at step 2)
WITH cte1 AS (
SELECT *, CASE WHEN OriginalItem IS NULL THEN 1 ELSE 0 END AS changepartition
FROM Item
), cte2 AS (
SELECT *, SUM(changepartition) OVER(ORDER BY ItemID) AS parts
FROM cte1
)
SELECT ItemID, MAX(ItemID) OVER(PARTITION BY parts) AS TerminalItemID
FROM cte2
Check the demo here.
Assumption: Your terminal id items correspond to the "ItemID" value preceding a NULL "OriginalItem" value.
EDIT: "Fixing orphaned records."
The query works correctly when records are not orphaned. The only way to deal them, is to get missing records back, so that the query can work correctly on the full data.
This is carried out by an extra subquery (done at the beginning), that will apply a UNION ALL between:
the available records of the original table
the missing records
WITH fix_orphaned_records AS(
SELECT * FROM Item
UNION ALL
SELECT NULL AS OriginalItem,
i1.OriginalItem AS ItemID
FROM Item i1
LEFT JOIN Item i2 ON i1.OriginalItem = i2.ItemID
WHERE i1.OriginalItem IS NOT NULL AND i2.ItemID IS NULL
), cte AS (
...
Missing records correspond to "OriginalItem" values that are never found within the "ItemID" field. A self left join will uncover these missing records.
Check the demo here.
You can use a recursive CTE to compute the last item in the sequence. For example:
with
n (orig_id, curr_id, lvl) as (
select itemid, itemid, 1 from item
union all
select n.orig_id, i.itemid, n.lvl + 1
from n
join item i on i.originalitem = n.curr_id
)
select *
from (
select *, row_number() over(partition by orig_id order by lvl desc) as rn from n
) x
where rn = 1
Result:
orig_id curr_id lvl rn
-------- -------- ---- --
1 4 4 1
2 4 3 1
3 4 2 1
4 4 1 1
5 6 2 1
6 6 1 1
7 7 1 1
See running example at db<>fiddle.

How to write a recursive CTE query to find the parent record?

I am having trouble writing a query which would return the parental id.
I have a table that looks like this:
CREATE TABLE csx
(
id INT UNSIGNED PRIMARY KEY,
cd VARCHAR(11),
category VARCHAR(20),
lvl INT,
parent_id INT
)
INSERT INTO csx
VALUES
(1,"ab-00-00-00",'ab',1,null),
(2,"ac-00-00-00",'ac',1,null),
(3,"ac-01-00-00",'ac',2,2),
(4,"ac-01-00-01",'ac',3,3),
(5,"ac-01-00-02",'ac',3,3),
(6,"ac-03-00-00",'ac',2,2),
(7,"ac-03-00-01",'ac',3,6),
(8,"ac-03-00-02",'ac'3,6),
(9,"ac-02-00-00",'ac'2,2),
(10,"ac-02-00-01",'ac',3,9)
I want to check whether parent_id (referencing the id of the entry) is correct.
I am new to recursive CTEs (I think those have to be used). Could you please shed some light on the correct way to implement the CTE which would return parental ids?
Instead of trying to verify values after the fact, ensure there can never be invalid ParentIDs with a foreign key :
CREATE TABLE csx
(
id INT PRIMARY KEY,
cd VARCHAR(11),
category VARCHAR(20),
lvl INT,
parent_id INT references csx(id)
)
If you wanted to ensure there are no invalid ParentIDs in an existing table, a simple LEFT JOIN would be enough to find all problems:
SELECT t1.*
from csx t1 left join csx t2 on t1.ParentID=t2.ID
where t2.ID is null
This will return all rows with a non-existent ParentID. The FOREIGN KEY on the other hand ensures there won't be any invalid values in the first place.
To calculate levels and paths you can use a recursive CTE. A CTE is more-or-less a subquery that can be referenced by name and used in multiple places. A recursive CTE is a CTE that refers to itsel.
To get all root items and their children, the following CTE first selects all roots, then joins the actual table with itself to retrieve the children :
with cte as (
select csx.* ,
1 as Level,
cast(ID as varchar(200)) as Path
from csx
where parent_id is null
union all
select csx.* ,
cte.Level+1 as Level,
cast(CONCAT_WS('/',cte.Path, csx.ID) as varchar(200)) As Path
from csx inner join cte on cte.ID=csx.parent_id
)
select * from cte
order by path
id
cd
category
lvl
parent_id
Level
Path
1
ab-00-00-00
ab
1
NULL
1
1
2
ac-00-00-00
ac
1
NULL
1
2
3
ac-01-00-00
ac
2
2
2
2/3
4
ac-01-00-01
ac
3
3
3
2/3/4
5
ac-01-00-02
ac
3
3
3
2/3/5
6
ac-03-00-00
ac
2
2
2
2/6
7
ac-03-00-01
ac
3
6
3
2/6/7
8
ac-03-00-02
ac
3
6
3
2/6/8
9
ac-02-00-00
ac
2
2
2
2/9
10
ac-02-00-01
ac
3
9
3
2/9/10
The first query selects the roots and sets the root values for the Level (1) and Path (ID). The next query joins the table with the CTE to match roots and children.
The question's cd column isn't a path though. It looks like a row number inside each parent's direct children. Calculating row numbers is the job of the ROW_NUMBER function. Since we're counting inside a parent's children, we can use ROW_NUMBER() OVER(PARTITION BY Parent_ID ORDER BY ID).
with cte as (
select csx.* ,
1 as Level,
cast(category as varchar(200)) as Path
from csx
where parent_id is null
union all
select csx.* ,
cte.Level+1 as Level,
cast(CONCAT_WS('-',cte.Path, ROW_NUMBER() OVER(PARTITION BY csx.Parent_ID ORDER BY csx.ID)) as varchar(200)) As Path
from csx inner join cte on cte.ID=csx.parent_id
)
select * from cte
order by path;
This produces
id
cd
category
lvl
parent_id
Level
Path
1
ab-00-00-00
ab
1
NULL
1
ab
2
ac-00-00-00
ac
1
NULL
1
ac
3
ac-01-00-00
ac
2
2
2
ac-1
4
ac-01-00-01
ac
3
3
3
ac-1-1
5
ac-01-00-02
ac
3
3
3
ac-1-2
6
ac-03-00-00
ac
2
2
2
ac-2
7
ac-03-00-01
ac
3
6
3
ac-2-1
8
ac-03-00-02
ac
3
6
3
ac-2-2
9
ac-02-00-00
ac
2
2
2
ac-3
10
ac-02-00-01
ac
3
9
3
ac-3-1
Unfortunately, the values don't match. With ORDER BY ID, the row with ID 6 will have a path ac-2 instead of ac-3. Changing the order will break all other rows. There's no other indicator that could be used to determine the row number of the children, at least not in this table.
This means that either rows 6-10 are all wrong, or thatcd can't be used to determine if Parent_ID is wrong. It doesn't contain identifiers but calculated values. The only way to say if the data match these, is to try and reproduce them. Unfortunately, there's not enough information in the table to do so

Oracle SQL query to count "children" in current query set

I have got an SQL query in Oracle with a multilevel subquery for generating my website navigation in the database. This query has a multilevel subquery because for each user I have to check whether they have the right to access this part of the navigation. The result looks kind of like the following:
ID | ID_PARENT | NAME | LINK
------------------------------------------
1 Main ~/
2 1 Sub1 ~/Sub1
3 1 Sub2 ~/Sub2
4 2 Sub1.1 ~/Sub1.1
5 2 Sub1.2 ~/Sub1.2
6 2 Sub1.3 ~/Sub1.3
The ID_PARENT column refers to the ID column of another row in the same table.
Now what I need is a query that, for each row, gives me the amount of rows in the current query set (because there exist other navigation entries that some users do not have the rights to, and I want to avoid making the same subquery twice) that have the current ID as ID_PARENT, so basically counts the children. With the example above the result I need should look like the following:
ID | ID_PARENT | NAME | LINK | CHILDREN
---------------------------------------------------------
1 Main ~/ 2
2 1 Sub1 ~/Sub1 3
3 1 Sub2 ~/Sub2 0
4 2 Sub1.1 ~/Sub1.1 0
5 2 Sub1.2 ~/Sub1.2 0
6 2 Sub1.3 ~/Sub1.3 0
I have tried a fair share of SQL queries, but none of them get me the result I need. Can anybody help me with this?
You can count() separately the record for your ID_PARENT and then join it with your main query. Something like this:
SELECT A.*, COALESCE(B.RC ,0) AS CHILDREN_NUMBER
FROM YOURTABLE A
LEFT JOIN ( SELECT ID_PARENT,COUNT(*) AS RC FROM YOURTABLE GROUP BY ID_PARENT) B ON A.ID = B.ID_PARENT;
Ouput:
ID ID_PARENT NAME LINK CHILDREN_NUMBER
1 NULL Main / 2
2 1 SUB1 /Sub1 3
3 1 SUB2 /Sub2 0
4 2 SUB1.1 /Sub1.1 0
5 2 SUB1.2 /Sub1.2 0
6 2 SUB1.3 /Sub1.3 0
For example
with q(ID, ID_PARENT, NAME, LINK) as (
-- original query
)
select ID, ID_PARENT, NAME, LINK
,(select count(*) from q q2 where q2.ID_PARENT = q.ID) CHILDREN
from q
Try like this, this is same as above answer by etsa.
select
n.id,n.parent_id,n.name,n.link,coalesce(b.children,0)
from navigation n
left join (select
parent_id as parent,count(id) as children
from navigation group by parent_id) b
on n.id=b.parent;

Selecting the top 5 unique rows, sorted randomly in SQL Server 2005?

I have a table in SQL Server 2005 that looks like
1 | bob | joined
2 | bob | left
3 | john | joined
4 | steve | joined
5 | andy | joined
6 | kyle | joined
What I want is to give someone the ability to pull up the activity of 5 random users (showing their latest activity)
ex: I want to return results 1, 3, 4, 5, 6 - or - 2, 3, 4, 5, 6 - but never - 1, 2, 3, 4, 5 (because 1 and 2 are activities from the same user, and I don't want him showing up twice at the expense of a different unique user that could have their activity displayed)
I'm trying something like SELECT TOP(5) FROM table ORDER BY NEWID() to get the top 5 and the random aspect going, but when i try to incorporate UNIQUE or DISTINCT anywhere (to stop from receiving back both rows 1 and 2) i get SQL errors and i have no idea how to progress
select top 5 name, id from (
select top 99.999 PERCENT name,id, NEWID() dummy from sysobjects
order by dummy) dummyName
This works just replace the column names and tables for the ones you want
Using a CTE:
WITH cte AS (
SELECT t.id,
t.name,
t.txt,
ROW_NUMBER() OVER(PARTITION BY t.name
ORDER BY NEWID()) AS rank
FROM TABLE t)
SELECT TOP 5
c.id,
c.name,
c.txt
FROM cte c
WHERE c.rank = 1
ORDER BY NEWID()
Non-CTE equivalent:
SELECT TOP 5
c.id,
c.name,
c.txt
FROM (SELECT t.id,
t.name,
t.txt,
ROW_NUMBER() OVER(PARTITION BY t.name
ORDER BY NEWID()) AS rank
FROM TABLE t) c
WHERE c.rank = 1
ORDER BY NEWID()

Recursive SQL CTE's and Custom Sort Ordering

Image you are creating a DB schema for a threaded discussion board. Is there an efficient way to select a properly sorted list for a given thread? The code I have written works but does not sort the way I would like it too.
Let's say you have this data:
ID | ParentID
-----------------
1 | null
2 | 1
3 | 2
4 | 1
5 | 3
So the structure is supposed to look like this:
1
|- 2
| |- 3
| | |- 5
|- 4
Ideally, in the code, we want the result set to appear in the following order: 1, 2, 3, 5, 4
PROBLEM: With the CTE I wrote it is actually being returned as: 1, 2, 4, 3, 5
I know this would be easy to group/order by using LINQ but I am reluctant to do this in memory. It seems like the best solution at this point though...
Here is the CTE I am currently using:
with Replies as (
select c.CommentID, c.ParentCommentID 1 as Level
from Comment c
where ParentCommentID is null and CommentID = #ParentCommentID
union all
select c.CommentID, c.ParentCommentID, r.Level + 1 as Level
from Comment c
inner join Replies r on c.ParentCommentID = r.CommentID
)
select * from Replies
Any help would be appreciated; Thanks!
I'm new to SQL and had not heard about hierarchyid datatype before. After reading about it from this comment I decided I may want to incorporate this into my design. I will experiment with this tonight and post more information if I have success.
Update
Returned result from my sample data, using dance2die's suggestion:
ID | ParentID | Level | DenseRank
-------------------------------------
15 NULL 1 1
20 15 2 1
21 20 3 1
17 22 3 1
22 15 2 2
31 15 2 3
32 15 2 4
33 15 2 5
34 15 2 6
35 15 2 7
36 15 2 8
I am sure that you will love this.
I recently find out about Dense_Rank() function, which is for "ranking within the partition of a result set" according to MSDN
Check out the code below and how "CommentID" is sorted.
As far as I understand, you are trying to partition your result set by ParentCommentID.
Pay attention to "denserank" column.
with Replies (CommentID, ParentCommentID, Level) as
(
select c.CommentID, c.ParentCommentID, 1 as Level
from Comment c
where ParentCommentID is null and CommentID = 1
union all
select c.CommentID, c.ParentCommentID, r.Level + 1 as Level
from Comment c
inner join Replies r on c.ParentCommentID = r.CommentID
)
select *,
denserank = dense_rank() over (partition by ParentCommentID order by CommentID)
from Replies
order by denserank
Result below
You have to use hierarchyid (sql2008 only) or a bunch of string (or byte) concatenation.
Hmmmm - I am not sure if your structure is the best suited for this problem. Off the top of my head I cannot think of anyway to sort the data as you want it within the above query.
The best I can think of is if you have a parent table that ties your comments together (eg. a topic table). If you do you should be able to simply join your replies onto that (you will need to include the correct column obviously), and then you can sort by the topicID, Level to get the sort order you are after (or whatever other info on the topic table represents a good value for sorting).
Consider storing the entire hierarchy (with triggers to update it if it changes ) in a field.
This field in your example would have:
1
1.2
1.2.3
1.2.5
1.4
then you just have to sort on that field, try this and see:
create table #temp (test varchar (10))
insert into #temp (test)
select '1'
union select '1.2'
union select '1.2.3'
union select '1.2.5'
union select '1.4'
select * from #temp order by test asc