Recursive SQL CTE's and Custom Sort Ordering - sql

Image you are creating a DB schema for a threaded discussion board. Is there an efficient way to select a properly sorted list for a given thread? The code I have written works but does not sort the way I would like it too.
Let's say you have this data:
ID | ParentID
-----------------
1 | null
2 | 1
3 | 2
4 | 1
5 | 3
So the structure is supposed to look like this:
1
|- 2
| |- 3
| | |- 5
|- 4
Ideally, in the code, we want the result set to appear in the following order: 1, 2, 3, 5, 4
PROBLEM: With the CTE I wrote it is actually being returned as: 1, 2, 4, 3, 5
I know this would be easy to group/order by using LINQ but I am reluctant to do this in memory. It seems like the best solution at this point though...
Here is the CTE I am currently using:
with Replies as (
select c.CommentID, c.ParentCommentID 1 as Level
from Comment c
where ParentCommentID is null and CommentID = #ParentCommentID
union all
select c.CommentID, c.ParentCommentID, r.Level + 1 as Level
from Comment c
inner join Replies r on c.ParentCommentID = r.CommentID
)
select * from Replies
Any help would be appreciated; Thanks!
I'm new to SQL and had not heard about hierarchyid datatype before. After reading about it from this comment I decided I may want to incorporate this into my design. I will experiment with this tonight and post more information if I have success.
Update
Returned result from my sample data, using dance2die's suggestion:
ID | ParentID | Level | DenseRank
-------------------------------------
15 NULL 1 1
20 15 2 1
21 20 3 1
17 22 3 1
22 15 2 2
31 15 2 3
32 15 2 4
33 15 2 5
34 15 2 6
35 15 2 7
36 15 2 8

I am sure that you will love this.
I recently find out about Dense_Rank() function, which is for "ranking within the partition of a result set" according to MSDN
Check out the code below and how "CommentID" is sorted.
As far as I understand, you are trying to partition your result set by ParentCommentID.
Pay attention to "denserank" column.
with Replies (CommentID, ParentCommentID, Level) as
(
select c.CommentID, c.ParentCommentID, 1 as Level
from Comment c
where ParentCommentID is null and CommentID = 1
union all
select c.CommentID, c.ParentCommentID, r.Level + 1 as Level
from Comment c
inner join Replies r on c.ParentCommentID = r.CommentID
)
select *,
denserank = dense_rank() over (partition by ParentCommentID order by CommentID)
from Replies
order by denserank
Result below

You have to use hierarchyid (sql2008 only) or a bunch of string (or byte) concatenation.

Hmmmm - I am not sure if your structure is the best suited for this problem. Off the top of my head I cannot think of anyway to sort the data as you want it within the above query.
The best I can think of is if you have a parent table that ties your comments together (eg. a topic table). If you do you should be able to simply join your replies onto that (you will need to include the correct column obviously), and then you can sort by the topicID, Level to get the sort order you are after (or whatever other info on the topic table represents a good value for sorting).

Consider storing the entire hierarchy (with triggers to update it if it changes ) in a field.
This field in your example would have:
1
1.2
1.2.3
1.2.5
1.4
then you just have to sort on that field, try this and see:
create table #temp (test varchar (10))
insert into #temp (test)
select '1'
union select '1.2'
union select '1.2.3'
union select '1.2.5'
union select '1.4'
select * from #temp order by test asc

Related

Merge Multiple Rows to One Row having Same value swapped between 2 columns In SQL Server

I am working on an API-based simple chat module. I am trying to get chat conversations for a particular user but due to 2 columns having the same value swapped between each other is causing my data to be duplicated.
I want to merge rows having the same values swapped between 2 columns and the merged row should be based on the latest entry inserted in the database.
The data looks like this :
Id To From Message ConversationTime
1 1 2 hello 11:00AM
2 3 1 hi 12:00PM
3 1 3 how are you? 12:15PM
4 3 1 I am fine. 12:30PM
5 4 5 Hi! 04:30PM
6 5 4 Hello 04:35PM
7 1 5 Hola! 06:30PM
So for example if user with Id 1 My result needs to look like this:
Id To From Message ConversationTime
1 1 2 hello 11:00AM
4 3 1 I am fine. 12:30PM
7 1 5 Hola! 06:30PM
If Id is 5 then result would be like this:
Id To From Message ConversationTime
6 5 4 Hello 04:35PM
7 1 5 Hola! 06:30PM
My result set looks like this:
Id To From Message ConversationTime
1 1 2 hello 11:00AM
3 1 3 how are you? 12:15PM
4 3 1 I am fine. 12:30PM
7 1 5 Hola! 06:30PM
Any help would be grateful. Thanks in advance!
The idea is the same as the linked duplicate Get top 1 row of each group ; just use a CASE expression to get the ID of the other user:
DECLARE #ID int = 1;
WITH RNs AS(
SELECT ID,
[To], --TO is a reserved keyword and should not be used for object names
[From], --FROM is a reserved keyword and should not be used for object names
Message,
ConversationTime, --I assume this is a time
ROW_NUMBER() OVER (PARTITION BY CASE [To] WHEN #ID THEN [From] ELSE [To] END ORDER BY ConversationTime DESC) AS RN --TO and FROM are reserved keywords and should not be used for object names
FROM dbo.YourTable
WHERE #ID IN ([To],[From])) --TO and FROM are reserved keywords and should not be used for object names
SELECT ID,
[To], --TO is a reserved keyword and should not be used for object names
[From], --FROM is a reserved keyword and should not be used for object names
Message,
ConversationTime --I assume this is a time
FROM RN
WHERE RN = 1;
SQL Server allows you to do this without a case expressions by unpivoting the data and then using window functions:
select t.*
from (select t.*,
row_number() over (partition by v.user_other order by t.conversationTime desc) as seqnum
from t cross apply
(values (t.to, t.from), (t.from, to.to)
) v(user, user_other)
where v.user = 1
) t
where seqnum = 1;

Oracle SQL query to count "children" in current query set

I have got an SQL query in Oracle with a multilevel subquery for generating my website navigation in the database. This query has a multilevel subquery because for each user I have to check whether they have the right to access this part of the navigation. The result looks kind of like the following:
ID | ID_PARENT | NAME | LINK
------------------------------------------
1 Main ~/
2 1 Sub1 ~/Sub1
3 1 Sub2 ~/Sub2
4 2 Sub1.1 ~/Sub1.1
5 2 Sub1.2 ~/Sub1.2
6 2 Sub1.3 ~/Sub1.3
The ID_PARENT column refers to the ID column of another row in the same table.
Now what I need is a query that, for each row, gives me the amount of rows in the current query set (because there exist other navigation entries that some users do not have the rights to, and I want to avoid making the same subquery twice) that have the current ID as ID_PARENT, so basically counts the children. With the example above the result I need should look like the following:
ID | ID_PARENT | NAME | LINK | CHILDREN
---------------------------------------------------------
1 Main ~/ 2
2 1 Sub1 ~/Sub1 3
3 1 Sub2 ~/Sub2 0
4 2 Sub1.1 ~/Sub1.1 0
5 2 Sub1.2 ~/Sub1.2 0
6 2 Sub1.3 ~/Sub1.3 0
I have tried a fair share of SQL queries, but none of them get me the result I need. Can anybody help me with this?
You can count() separately the record for your ID_PARENT and then join it with your main query. Something like this:
SELECT A.*, COALESCE(B.RC ,0) AS CHILDREN_NUMBER
FROM YOURTABLE A
LEFT JOIN ( SELECT ID_PARENT,COUNT(*) AS RC FROM YOURTABLE GROUP BY ID_PARENT) B ON A.ID = B.ID_PARENT;
Ouput:
ID ID_PARENT NAME LINK CHILDREN_NUMBER
1 NULL Main / 2
2 1 SUB1 /Sub1 3
3 1 SUB2 /Sub2 0
4 2 SUB1.1 /Sub1.1 0
5 2 SUB1.2 /Sub1.2 0
6 2 SUB1.3 /Sub1.3 0
For example
with q(ID, ID_PARENT, NAME, LINK) as (
-- original query
)
select ID, ID_PARENT, NAME, LINK
,(select count(*) from q q2 where q2.ID_PARENT = q.ID) CHILDREN
from q
Try like this, this is same as above answer by etsa.
select
n.id,n.parent_id,n.name,n.link,coalesce(b.children,0)
from navigation n
left join (select
parent_id as parent,count(id) as children
from navigation group by parent_id) b
on n.id=b.parent;

PostgreSQL Order by stepping numbers

I need to order records from a table by a column. The old system the customer was using manually selected level 1 items, then all the children of level 1 items for level 2, then so on and so forth through level 5. That is horrible IMHO, as it requires hundreds of queries and calls to the DB.
So in the new DB structure I'm trying to make it all one query to the DB if possible and have it order it correctly the first time. The customer wants it displayed to them this way so I have no choice but to figure out a way to order this way.
This is an example of the items and their level codes (1 being the single digit codes, 2 the 2 digit codes, 3 for 4 digit codes, 4 for 6 digit codes and level 5 for 8 digit codes):
It's supposed to order basically everything that starts with a 5 goes under Code 5. Everything that starts with a 51 goes under code 51. If you look at the column n_mad_id it links to the "Mother" ID of the code that is the mother of that code, so code 51's mother is code 5. Code 5101's mother is code 51. Code 5201's mother is code 52. And so on and so forth.
Then the n_nivel column is the level that the code belongs to. Each code has a level and a mother. The top level codes (i.e. 1, 2, 3, 4, 5) are all level 1 since they are only one digit.
I was hoping that there might be an easy ORDER BY way to do this. I've been playing with it for two days and can't seem to get it to obey.
The absolutely simplest way would be to cast the n_cod field to text and then order on that:
SELECT *
FROM mytable
WHERE left(n_cod::text, 1) = '5' -- optional
ORDER BY n_cod::text;
Not pretty, but functional.
You could consider changing your table definition to make n_cod of type char(8) because you do not use it as a number anyway (in the sense of performing calculations). That would make the query a lot faster.
Interesting task. As I understand that you want to get result in order like
n_id n_cod n_nivel n_mad_id
10 5 1 0
11 51 2 10
12 5101 3 11
14 510101 4 12
...
13 52 2 10
...
?
If yes then it may do the trick:
with recursive
tt(n_id, n_mad_id, n_cod, x) as (
select t.n_id, t.n_mad_id, t.n_cod, array[t.n_id]
from yourtable t where t.n_mad_id = 0
union all
select t.n_id, t.n_mad_id, t.n_cod, x || t.n_id
from tt join yourtable t on t.n_mad_id = tt.n_id)
select * from tt order by x;
Here is my original test query:
create table t(id, parent) as values
(1, null),
(3, 1),
(7, 3),
(5, 3),
(6, 5),
(2, null),
(8, 2),
(4, 2);
with recursive
tt(id, parent, x) as (
select t.id, t.parent, array[t.id] from t where t.parent is null
union all
select t.id, t.parent, x || t.id from tt join t on t.parent = tt.id)
select * from tt order by x;
and its result:
id | parent | x
----+--------+-----------
1 | (null) | {1}
3 | 1 | {1,3}
5 | 3 | {1,3,5}
6 | 5 | {1,3,5,6}
7 | 3 | {1,3,7}
2 | (null) | {2}
4 | 2 | {2,4}
8 | 2 | {2,8}
(8 rows)
Read about recursive queries.

Pre-order sorting of parents and children

Given the following data:
id | parent | sort
--------------------
1 | null | 0
2 | null | 1
3 | 1 | 0
4 | 1 | 1
5 | 3 | 0
6 | 5 | 0
7 | 2 | 0
How do I do a pre-order sort, meaning parents first, then children, then grandchildren, etc...?
The sorted result I'm looking for is: 1, 3, 5, 6, 4, 2, 7
If at all possible, I'd like to do this without using a CTE (or a CTE I can understand). The way I'm doing it now is just selecting every record and checking "upwards" to see if there are any parents, grandparents and greatgrandparents. It makes more sense to do something for the records that don't have a parents (top items) and let it go on until there are no children anymore, right?
I just can't wrap my head around this...
This is an oversimplification of my actual query, but what I'm doing now is along the lines of:
SELECT ..some columns ..
FROM table t
LEFT JOIN table tparent WHERE tparent.ID = t.Parent
LEFT JOIN table tgrandparent WHERE tgrandparent.ID = tparent.Parent
LEFT JOIN table tgreatgrandparent WHERE tgreatgrandparent.ID = tgrandparent.Parent
This does use CTEs, but hopefully I can explain their usage:
;With ExistingQuery (id,parent,sort) as (
select 1,null,0 union all
select 2,null,1 union all
select 3,1 ,0 union all
select 4,1 ,1 union all
select 5,3 ,0 union all
select 6,5 ,0 union all
select 7,2 ,0
), Reord as (
select *,ROW_NUMBER() OVER (ORDER BY parent,sort) as rn from ExistingQuery
), hier as (
select id,parent,'/' + CONVERT(varchar(max),rn)+'/' as part
from Reord
union all
select h.id,r.parent,'/' + CONVERT(varchar(max),r.rn) + part
from hier h
inner join
Reord r
on
h.parent = r.id
)
select
e.*
from
hier h
inner join
ExistingQuery e
on
h.id = e.id
where
h.parent is null
order by CONVERT(hierarchyid,h.part)
ExistingQuery is just whatever you've currently got for your query. You should be able to just place your existing query in there (possibly with an expanded column list) and everything should just work.
Reord addresses a concern of mine but it may not be needed - if your actual data is actually such that the id values are indeed in the right order that we can ignore sort then remove Reord and replace the references to rn with id. But this CTE does that work to make sure that the children of parents are respecting the sort column.
Finally, the hier CTE is the meat of this solution - for every row, it's building up a hierachyid for that row - from the child, working back up the tree until we hit the root.
And once the CTEs are done with, we join back to ExistingQuery so that we're just getting the data from there, but can use the hierarchyid to perform proper sorting - that type already knows how to correctly sort hierarchical data.
Result:
id parent sort
----------- ----------- -----------
1 NULL 0
3 1 0
5 3 0
6 5 0
4 1 1
2 NULL 1
7 2 0
And the result showing the part column from hier, which may help you see what that CTE constructed:
id parent sort part
----------- ----------- ----------- --------------
1 NULL 0 /1/
3 1 0 /1/3/
5 3 0 /1/3/6/
6 5 0 /1/3/6/7/
4 1 1 /1/4/
2 NULL 1 /2/
7 2 0 /2/5/
(You may also want to change the final SELECT to just SELECT * from hier to also get a feel for how that CTE works)
I finally dove into CTE and got it working, here is the base of the query if anyone else may come across it. It's important to note that sort is a padded string, starting at 0000000001 and counting upwards.
WITH recursive_CTE (id, parentId, sort)
AS
(
-- CTE ANCHOR SELECTS ROOTS --
SELECT t.ID AS id, t.Parent as parentId, t.sort AS sort
FROM table t
WHERE t.Parent IS NULL
UNION ALL
-- CTE RECURSIVE SELECTION --
SELECT t.ID AS id, t.Parent as parentId, cte.sort + t.sort AS sort
FROM table t
INNER JOIN recursive_CTE cte ON cte.id = t.Parent
)
SELECT * FROM recursive_CTE
ORDER BY sort
I believe this is the main part needed to make this kind of query work. It's actually pretty fast if you make sure you're hitting the necessary indices.
Sort is built up by expanding a string.
So a parent would have sort '0000000001', his direct child will have '00000000010000000001' and his grandchild will have '000000000100000000010000000001' etc. His sibling starts at '0000000002' and so comes after all the 01 records.

In SQL, find duplicates in one column with unique values for another column

So I have a table of aliases linked to record ids. I need to find duplicate aliases with unique record ids. To explain better:
ID Alias Record ID
1 000123 4
2 000123 4
3 000234 4
4 000123 6
5 000345 6
6 000345 7
The result of a query on this table should be something to the effect of
000123 4 6
000345 6 7
Indicating that both record 4 and 6 have an alias of 000123 and both record 6 and 7 have an alias of 000345.
I was looking into using GROUP BY but if I group by alias then I can't select record id and if I group by both alias and record id it will only return the first two rows in this example where both columns are duplicates. The only solution I've found, and it's a terrible one that crashed my server, is to do two different selects for all the data and then join them
ON [T_1].[ALIAS] = [T_2].[ALIAS] AND NOT [T_1].[RECORD_ID] = [T_2].[RECORD_ID]
Are there any solutions out there that would work better? As in, not crash my server when run on a few hundred thousand records?
It looks as if you have two requirements:
Identify all aliases that have more than one record id, and
List the record ids for these aliases horizontally.
The first is a lot easier to do than the second. Here's some SQL that ought to get you where you want with the first:
WITH A -- Get a list of unique combinations of Alias and [Record ID]
AS (
SELECT Distinct
Alias
, [Record ID]
FROM T1
)
, B -- Get a list of all those Alias values that have more than one [Record ID] associated
AS (
SELECT Alias
FROM A
GROUP BY
Alias
HAVING COUNT(*) > 1
)
SELECT A.Alias
, A.[Record ID]
FROM A
JOIN B
ON A.Alias = B.Alias
Now, as for the second. If you're satisfied with the data in this form:
Alias Record ID
000123 4
000123 6
000345 6
000345 7
... you can stop there. Otherwise, things get tricky.
The PIVOT command will not necessarily help you, because it's trying to solve a different problem than the one you have.
I am assuming that you can't necessarily predict how many duplicate Record ID values you have per Alias, and thus don't know how many columns you'll need.
If you have only two, then displaying each of them in a column becomes a relatively trivial exercise. If you have more, I'd urge you to consider whether the destination for these records (a report? A web page? Excel?) might be able to do a better job of displaying them horizontally than SQL Server can do in returning them arranged horizontally.
Perhaps what you want is just the min() and max() of RecordId:
select Alias, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having min(RecordId) <> max(RecordId)
You can also count the number of distinct values, using count(distinct):
select Alias, count(distinct RecordId) as NumRecordIds, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having count(DISTINCT RecordID) > 1;
This will give all repeated values:
select Alias, count(RecordId) as NumRecordIds,
from yourTable t
group by Alias
having count(RecordId) <> count(distinct RecordId);
I agree with Ann L's answer but would like to show how you can use window functions with CTE's as you may prefer the readability.
(Re: how to pivot horizontally, I again agree with Ann)
create temporary table things (
id serial primary key,
alias varchar,
record_id int
)
insert into things (alias, record_id) values
('000123', 4),
('000123', 4),
('000234', 4),
('000123', 6),
('000345', 6),
('000345', 7);
with
things_with_distinct_aliases_and_record_ids as (
select distinct on (alias, record_id)
id,
alias,
record_id
from things
),
things_with_unique_record_id_counts_per_alias as (
select *,
COUNT(*) OVER(PARTITION BY alias) as unique_record_ids_count
from things_with_distinct_aliases_and_record_ids
)
select * from things_with_unique_record_id_counts_per_alias
where unique_record_ids_count > 1
The first CTE gets all the unique alias/record id combinations. E.g.
id | alias | record_id
----+--------+-----------
1 | 000123 | 4
4 | 000123 | 6
3 | 000234 | 4
5 | 000345 | 6
6 | 000345 | 7
The second CTE simply creates a new column for the above and adds the count of record ids for each alias. This allows you to filter only those aliases which have more than one record id associated with them.
id | alias | record_id | unique_record_ids_count
----+--------+-----------+-------------------------
1 | 000123 | 4 | 2
4 | 000123 | 6 | 2
3 | 000234 | 4 | 1
5 | 000345 | 6 | 2
6 | 000345 | 7 | 2
SELECT A.CitationId,B.CitationId, A.CitationName, A.LoaderID, A.PrimaryReferenceLoaderID,B.SecondaryReference1LoaderID, A.SecondaryReference1LoaderID, A.SecondaryReference2LoaderID,
A.SecondaryReference3LoaderID, A.SecondaryReference4LoaderID, A.CreatedOn, A.LastUpdatedOn
FROM CitationMaster A, CitationMaster B
WHERE A.PrimaryReferenceLoaderID= B.SecondaryReference1LoaderID and Isnull(A.PrimaryReferenceLoaderID,'') != '' and Isnull(B.SecondaryReference1LoaderID,'') !=''