Combine values into string inside recursive CTE - sql

I am using a recursive CTE to essentially navigate down a tree structure of id values. At each iteration of the recursion I would like to have a column that represents a string 'list' of all id values (nodes) that have been visited in the iteration steps so far. At first glance it seems like I would need a group by or aggregate function (like string_agg()) to accomplish this, but these are not allowed in the recursive part of a recursive CTE. My question seems to be similar to that found here recursive CTE to combine values, but I am hoping for a slightly more straightforward answer that will relate to the data I have added a sample of below. This first table is essentially the top of the tree. There could be one or multiple nodes at the top depending on how many records are in this first table. Here there is just 1 record:
name_id
group_id
100
15
where name_id you could think of as the name of the tree, which is essentially irrelevant to this example since there is only one name_id, and group_id is the top of the tree which will then branch out based on the following table:
group_id
group_id_member
10
15
11
15
4
10
11
4
3
11
10
3
where the group_id_member is saying that, for example, the group_id from the first table, 15, is a member of the group_id 10 and also 11. So the tree would look like 15 at the top, with a branch down for 10 and another branch for 11. Then off of 10 comes a branch of 4, and off of 11 comes a branch of 3. Then off of 4 comes a branch of 11, and off of 3 comes a branch of 10. Thus, this is an infinite loop since those branches are already in the table.
Essentially, the goal of this recursive query is to return all the branch nodes but not return any of them more than once. This recursive cte will run forever unless we do a check, like checking whether or not the group_id value is already in rpath. To check that, I am wanting to create a column which is a list of all nodes that have gotten hit from current and previous iterations (rpath column), and append to that column at each iteration, making the result table look something like:
name_id
group_id
group_id_member
iter
rpath
100
15
null
1
/15/
100
10
15
2
/15/10/11/
100
11
15
2
/15/10/11/
100
4
10
3
/15/10/11/4/3/
100
3
11
3
/15/10/11/4/3/
100
10
3
4
/15/10/11/4/3/10/11/
100
11
4
4
/15/10/11/4/3/10/11/
Where I included the 4th iteration in the table, which shows that the group_id values 10 and 11 are now in the rpath two times since this loop has been fully completed. Ultimately I would want to terminate this recursion at the 3rd iteration since at the 4th, for both group_id values, they appear in the rpath list already.
So my primary question is, what is the best way to create this rpath string inside the recursive part of the cte? See below for code to create the temp tables, and my code which attempts to solve this but is returning an error since I am trying to use string_agg() in the recursive part of the cte which is not allowed. Thanks!
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (group_id int, group_id_member int)
INSERT into #t1 VALUES
(10,15),
(11, 15),
(4, 10),
(11, 4),
(3, 11),
(10, 3);
if object_id('tempdb..#t2') is not null drop table #t2
CREATE TABLE #t2 (name_id int, group_id int)
INSERT into #t2 VALUES
(100, 15)
; with rec as (
select name_id,
group_id,
cast(null as int) as group_id_member,
1 as iter,
convert(varchar(128),concat('/', str_agg.agg, '/')) as rpath
from #t2
cross apply (select string_agg(group_id, '/') as agg from #t2) as str_agg -- this does work since it is not in the recursive part of the cte
union all
select rec.name_id,
t1.group_id,
t1.group_id_member,
iter + 1 as iter,
convert(varchar(128), concat(rec.rpath, str_agg1.agg1, '/')) as rpath -- trying to create the string that represents all of the nodes that have been visited
from #t1 t1
inner join rec
on t1.group_id_member = rec.group_id
cross apply (select string_agg(t1.group_id_member ,'/') as agg1 from #t1 t1 where t1.group_id_member = rec.group_id) str_agg1 -- cross apply to create the string_agg, if this was allowed in recursive cte...
where rec.rpath not like '%/' + convert(varchar(128), t1.group_id) + '/%'
)
select * from rec order by iter asc

At each iteration of the recursion I would like to have a column that represents a string 'list' of all id values (nodes) that have been visited in the iteration steps so far
As I understand your question, you don’t need aggregation here. The recursive query processes iteratively, so you can just accumulate the ids of the visited nodes along the way, using concat.
with rec as (
select
name_id,
group_id,
cast(null as int) as group_id_member,
1 as iter,
concat('/', convert(varchar(max), group_id)) as rpath
from #t2
union all
select
r.name_id,
t1.group_id,
t1.group_id_member,
iter + 1 as iter,
concat(r.rpath, '/', t1.group_id)
from #t1 t1
inner join rec r
on t1.group_id_member = r.group_id
where r.rpath not like concat('%/', t1.group_id, '/%')
)
select * from rec order by iter
Demo on DB Fiddle

Related

How can I get a random number generated in a CTE not to change in JOIN?

The problem
I'm generating a random number for each row in a table #Table_1 in a CTE, using this technique. I'm then joining the results of the CTE on another table, #Table_2. Instead of getting a random number for each row in #Table_1, I'm getting a new random number for every resulting row in the join!
CREATE TABLE #Table_1 (Id INT)
CREATE TABLE #Table_2 (MyId INT, ParentId INT)
INSERT INTO #Table_1
VALUES (1), (2), (3)
INSERT INTO #Table_2
VALUES (1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (1, 3)
;WITH RandomCTE AS
(
SELECT Id, (ABS(CHECKSUM(NewId())) % 5)RandomNumber
FROM #Table_1
)
SELECT r.Id, t.MyId, r.RandomNumber
FROM RandomCTE r
INNER JOIN #Table_2 t
ON r.Id = t.ParentId
The results
Id MyId RandomNumber
----------- ----------- ------------
1 1 1
1 2 2
1 3 0
1 4 3
2 1 4
2 2 0
2 3 0
3 1 3
The desired results
Id MyId RandomNumber
----------- ----------- ------------
1 1 1
1 2 1
1 3 1
1 4 1
2 1 4
2 2 4
2 3 4
3 1 3
What I tried
I tried to obscure the logic of the random number generation from the optimizer by casting the random number to VARCHAR, but that did not work.
What I don't want to do
I'd like to avoid using a temporary table to store the results of the CTE.
How can I generate a random number for a table and preserve that random number in a join without using temporary storage?
This seems to do the trick:
WITH CTE AS(
SELECT Id, (ABS(CHECKSUM(NewId())) % 5)RandomNumber
FROM #Table_1),
RandomCTE AS(
SELECT Id,
RandomNumber
FROM CTE
GROUP BY ID, RandomNumber)
SELECT *
FROM RandomCTE r
INNER JOIN #Table_2 t
ON r.Id = t.ParentId;
It looks like SQL Server is aware that, at the point of being outside the CTE, that RandomNumber is effectively just NEWID() with some additional functions wrapped around it (DB<>Fiddle), and hence it still generates a unique ID for each row. The GROUP BY clause in the second CTE therefore forces the data engine to define RandomNumber a value so it can perform the GROUP BY.
Per the quote in this answer
The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-estabilished tenet. It's the
fundamental 'leeway' tha allows the optimizer enough freedom to gain
significant improvements in query-plan execution.
If it is important for your application that the random number be evaluated once and only once you should calculate it up front and store it into a temp table.
Anything else is not guaranteed and so is irresponsible to add into your application's code base - as even if it works now it may break as a result of a schema change/execution plan change/version upgrade/CU install.
For example Lamu's answer breaks if a unique index is added to #Table_1 (Id)
How about not using a real random number at all? Use rand() with a seed:
WITH RandomCTE AS (
SELECT Id,
CONVERT(INT, RAND(ROW_NUMBER() OVER (ORDER BY NEWID()) * 999999) * 5) as RandomNumber
FROM #Table_1
)
SELECT r.Id, t.MyId, r.RandomNumber
FROM RandomCTE rINNER JOIN
#Table_2 t
ON r.Id = t.ParentId;
The seed argument to rand() is pretty awful. Values of the seed near each other produce similar initial values, which is the reason for the multiplication.
Here is the db<>fiddle.

SQL - List all pages in between record while maintaining ID key

I'm trying to come up with a useful way to list all pages in between the first of last page of a document into new rows while maintaining the ID number as a key, or cross reference. I have a few ways of getting pages in between, but I'm not exactly sure how to maintain the key in a programmatic way.
Example Input:
First Page Last Page ID
ABC_001 ABC_004 1
ABC_005 ABC_005 2
ABC_006 ABC_010 3
End Result:
All Pages ID
ABC_001 1
ABC_002 1
ABC_003 1
ABC_004 1
ABC_005 2
ABC_006 3
ABC_007 3
ABC_008 3
ABC_009 3
ABC_010 3
Any help is much appreciated. I'm using SQL mgmt studio.
One approach would be to set up a numbers table, that contains a list of numbers that you may possibly find in the column content:
CREATE TABLE numbers( idx INTEGER);
INSERT INTO numbers VALUES(1);
INSERT INTO numbers VALUES(2);
...
INSERT INTO numbers VALUES(10);
Now, assuming that all page values have 7 characters, with the last 3 being digits, we can JOIN the original table with the numbers table to generate the missing records:
SELECT
CONCAT(
SUBSTRING(t.First_Page, 1, 4),
REPLICATE('0', 3 - LEN(n.idx)),
n.idx
) AS [ALl Pages],
t.id
FROM
mytable t
INNER JOIN numbers n
ON n.idx >= CAST(SUBSTRING(t.First_Page, 5, 3) AS int)
AND n.idx <= CAST(SUBSTRING(t.Last_Page, 5, 3) AS int)
This demo on DB Fiddle with your sample data returns:
ALl Pages | id
:-------- | -:
ABC_001 | 1
ABC_002 | 1
ABC_003 | 1
ABC_004 | 1
ABC_005 | 2
ABC_006 | 3
ABC_007 | 3
ABC_008 | 3
ABC_009 | 3
ABC_010 | 3
To find all pages from First Page to Last Page per Book ID, CAST your page numbers from STRING to INTEGER, then add +1 to each page number until you reach the Last Page.
First, turn your original table into a table variable with the Integer data types using a TRY_CAST.
DECLARE #Book TABLE (
[ID] INT
,[FirstPage] INT
,[LastPage] INT
)
INSERT INTO #Book
SELECT [ID]
,TRY_CAST(RIGHT([FirstPage], 3) AS int) AS [FirstPage]
,TRY_CAST(RIGHT([LastPage], 3) AS int) AS [LastPage]
FROM [YourOriginalTable]
Set the maximum page that your pages will increment to using a variable. This will cap out your results to the correct number of pages. Otherwise your table would have many more rows than you need.
DECLARE #LastPage INT
SELECT #LastPage = MAX([LastPage]) FROM #Book
Turning a three-column table (ID, First Page, Last Page) into a two-column table (ID, Page) will require an UNPIVOT.
We're tucking that UNPIVOT into a CTE (Common Table Expression: basically a smart version of a temporary table (like a #TempTable or #TableVariable, but which you can only use once, and is a little more efficient in certain circumstances).
In addition to the UNPIVOT of your [First Name] and [Last Name] columns into a tall table, we're going to append every other combination of page number per ID using a UNION ALL.
;WITH BookCTE AS (
SELECT [ID]
,[Page]
FROM (SELECT [ID]
,[FirstPage]
,[LastPage]
FROM #Book) AS bp
UNPIVOT
(
[Page] FOR [Pages] IN ([FirstPage], [LastPage])
) AS up
UNION ALL
SELECT [ID], [Page] + 1 FROM BookCTE WHERE [Page] + 1 < #LastPage
)
Now that your data is held in a table format using a CTE with all combinations of [ID] and [Page] up to the maximum page in your #Book table, it's time to join your CTE with the #Book table.
SELECT DISTINCT
cte.ID
,cte.Page
FROM BookCTE AS cte
INNER JOIN #Book AS bk
ON bk.ID = cte.ID
WHERE cte.Page <= bk.[LastPage]
ORDER BY
cte.ID
,cte.Page
OPTION (MAXRECURSION 10000)
See also:
How to generate a range of numbers between two numbers (I based my code off of #Jayvee's answer)
Assigning variables using SET vs SELECT
SQL Server UNPIVOT
SQL Server CTE Basics
Recursive CTEs Explained
Note: will update with re-integrating string portion of FirstPage and LastPage (which I assume is based on book title). Stand by.

create a table of duplicated rows of another table using the select statement

I have a table with one column containing different integers.
For each integer in the table I would like to duplicate it as the number of digits -
For example:
12345 (5 digits):
1. 12345
2. 12345
3. 12345
4. 12345
5. 12345
I thought doing it using with recursion t (...) as () but I didn't manage, since I don't really understand how it works and what is happening "behind the scenes.
I don't want to use insert because I want it to be scalable and automatic for as many integers as needed in a table.
Any thoughts and an explanation would be great.
The easiest way is to join to a table with numbers from 1 to n in it.
SELECT n, x
FROM yourtable
JOIN
(
SELECT day_of_calendar AS n
FROM sys_calendar.CALENDAR
WHERE n BETWEEN 1 AND 12 -- maximum number of digits
) AS dt
ON n <= CHAR_LENGTH(TRIM(ABS(x)))
In my example I abused TD's builtin calendar, but that's not a good choice, as the optimizer doesn't know how many rows will be returned and as the plan must be a Product Join it might decide to do something stupid. So better use a number table...
Create a numbers table that will contain the integers from 1 to the maximum number of digits that the numbers in your table will have (I went with 6):
create table numbers(num int)
insert numbers
select 1 union select 2 union select 3 union select 4 union select 5 union select 6
You already have your table (but here's what I was using to test):
create table your_table(num int)
insert your_table
select 12345 union select 678
Here's the query to get your results:
select ROW_NUMBER() over(partition by b.num order by b.num) row_num, b.num, LEN(cast(b.num as char)) num_digits
into #temp
from your_table b
cross join numbers n
select t.num
from #temp t
where t.row_num <= t.num_digits
I found a nice way to perform this action. Here goes:
with recursive t (num,num_as_char,char_n)
as
(
select num
,cast (num as varchar (100)) as num_as_char
,substr (num_as_char,1,1)
from numbers
union all
select num
,substr (t.num_as_char,2) as num_as_char2
,substr (num_as_char2,1,1)
from t
where char_length (num_as_char2) > 0
)
select *
from t
order by num,char_length (num_as_char) desc

Returning rows that had no matches

I've read and read and read but I haven't found a solution to my problem.
I'm doing something like:
SELECT a
FROM t1
WHERE t1.b IN (<external list of values>)
There are other conditions of course but this is the jist of it.
My question is: is there a way to show which in the manually entered list of values didn't find a match? I've looked but I can't find and I'm going in circles.
Create a temp table with the external list of values, then you can do:
select item
from tmptable t
where t.item not in ( select b from t1 )
If the list is short enough, you can do something like:
with t as (
select case when t.b1='FIRSTITEM' then 1 else 0 end firstfound
case when t.b1='2NDITEM' then 1 else 0 end secondfound
case when t.b1='3RDITEM' then 1 else 0 end thirdfound
...
from t1 wher t1.b in 'LIST...'
)
select sum(firstfound), sum(secondfound), sum(thirdfound), ...
from t
But with proper rights, I would use Nicholas' answer.
To display which values in the list of values haven't found a match, as one of the approaches, you could create a nested table SQL(schema object) data type:
-- assuming that the values in the list
-- are of number datatype
create type T_NumList as table of number;
and use it as follows:
-- sample of data. generates numbers from 1 to 11
SQL> with t1(col) as(
2 select level
3 from dual
4 connect by level <= 11
5 )
6 select s.column_value as without_match
7 from table(t_NumList(1, 2, 15, 50, 23)) s -- here goes your list of values
8 left join t1 t
9 on (s.column_value = t.col)
10 where t.col is null
11 ;
Result:
WITHOUT_MATCH
-------------
15
50
23
SQLFiddle Demo
There is no easy way to convert "a externally provided" list into a table that can be used to do the comparison. One way is to use one of the (undocumented) system types to generate a table on the fly based on the values supplied:
with value_list (id) as (
select column_value
from table(sys.odcinumberlist (1, 2, 3)) -- this is the list of values
)
select l.id as missing_id
from value_list l
left join t1 on t1.id = l.id
where t1.id is null;
There are ways to get what you have described, but they have requirements which exceed the statement of the problem. From the minimal description provided, there's no way to have the SQL return the list of the manually-entered values that did not match.
For example, if it's possible to insert the manually-entered values into a separate table - let's call it matchtbl, with the column named b - then the following should do the job:
SELECT matchtbl.b
FROM matchtbl
WHERE matchtbl.b NOT IN (SELECT distinct b
FROM t1)
Of course, if the data is being processed by a programming language, it should be relatively easy to keep track of the set of values returned by the original query, by adding the b column to the output, and then perform the set difference.
Putting the list in an in clause makes this hard. If you can put the list in a table, then the following works:
with list as (
select val1 as value from dual union all
select val2 from dual union all
. . .
select valn
)
select list.value, count(t1.b)
from list left outer join
t1
on t1.b = list.value
group by list.value;

Ordering parent rows by date descending with child rows ordered independently beneath each

This is a contrived version of my table schema to illustrate my problem:
QuoteID, Details, DateCreated, ModelQuoteID
Where QuoteID is the primary key and ModelQuoteID is a nullable foreign key back onto this table to represent a quote which has been modelled off another quote (and may have subsequently had its Details column etc changed).
I need to return a list of quotes ordered by DateCreated descending with the exception of modelled quotes, which should sit beneath their parent quote, ordered by date descending within any other sibling quotes (quotes can only be modelled one level deep).
So for example if I have these 4 quote rows:
1, 'Fix the roof', '01/01/2012', null
2, 'Clean the drains', '02/02/2012', null
3, 'Fix the roof and door', '03/03/2012', 1
4, 'Fix the roof, door and window', '04/04/2012', 1
5, 'Mow the lawn', '05/05/2012', null
Then I need to get the results back in this order:
5 - Mow the lawn
2 - Clean the drains
1 - Fix the roof
4 - -> Fix the roof, door and window
3 - -> Fix the roof and door
I'm also passing in search criteria such as keywords for Details, and I'm returning modelled quotes even if they don't contain the search term but their parent quote does. I've got that part working using a common table expression to get the original quotes, unioned with a join for modelled ones.
That works nicely but currently I'm having to do the rearrangement of the modelled quotes into the correct order in code. That's not ideal because my next step is to implement paging in the SQL, and if the rows are not grouped properly at that time then I won't have the children present in the current page to do the re-ordering in code. Generally speaking they will be naturally grouped together anyway, but not always. You could create a model quote today for a quote from a month back.
I've spent quite some time on this, can any SQL gurus help? Much appreciated.
EDIT: Here is a contrived version of my SQL to fit my contrived example :-)
;with originals as (
select
q.*
from
Quote q
where
Details like #details
)
select
*
from
(
select
o.*
from
originals o
union
select
q2.*
from
Quote q2
join
originals o on q2.ModelQuoteID = o.QuoteID
)
as combined
order by
combined.CreatedDate desc
Watching the Olympics -- just skimmed your post -- looks like you want to control the sort at each level (root and one level in), and make sure the data is returned with the children directly beneath its parent (so you can page the data...). We do this all the time. You can add an order by to each inner query and create a sort column. I contrived a slightly different example that should be easy for you to apply to your circumstance. I sorted the root ascending and level one descending just to illustrate how you can control each part.
declare #tbl table (id int, parent int, name varchar(10))
insert into #tbl (id, parent, name)
values (1, null, 'def'), (2, 1, 'this'), (3, 1, 'is'), (4, 1, 'a'), (5, 1, 'test'),
(6, null, 'abc'), (7, 6, 'this'), (8, 6, 'is'), (9, 6, 'another'), (10, 6, 'test')
;with cte (id, parent, name, sort) as (
select id, parent, name, cast(right('0000' + cast(row_number() over (order by name) as varchar(4)), 4) as varchar(1024))
from #tbl
where parent is null
union all
select t.id, t.parent, t.name, cast(cte.sort + right('0000' + cast(row_number() over (order by t.name desc) as varchar(4)), 4) as varchar(1024))
from #tbl t inner join cte on t.parent = cte.id
)
select * from cte
order by sort
This produces these results:
id parent name sort
---- -------- ------- ----------
6 NULL abc 0001
7 6 this 00010001
10 6 test 00010002
8 6 is 00010003
9 6 another 00010004
1 NULL def 0002
2 1 this 00020001
5 1 test 00020002
3 1 is 00020003
4 1 a 00020004
You can see that the root nodes are sorted ascending and the inner nodes are sorted descending.