Dynamic table created from CTE (parent/child) - sql

If I have a very simple table called tree
create table if not exists tree (id int primary key, parent int, name text);
And a few rows of data
insert into tree values (1, null, 'A');
insert into tree values (2, 1, 'B');
insert into tree values (3, 1, 'C');
insert into tree values (4, 2, 'D');
insert into tree values (5, 2, 'E');
insert into tree values (6, 3, 'F');
insert into tree values (7, 3, 'G');
I can easily run CTEs on it, and produce an output giving me path like this
with recursive R(id, level, path, name) as (
select id,1,name,name from tree where parent is null
union select tree.id, level + 1, path || '.' || tree.name, tree.name from tree join R on R.id=tree.parent
) select level,path,name from R;
Which gives the output
level | path | name
-------+-------+------
1 | A | A
2 | A.B | B
2 | A.C | C
3 | A.B.D | D
3 | A.B.E | E
3 | A.C.F | F
3 | A.C.G | G
What I'm wondering, is it possible to somehow project this output into another table, dynamically creating columns based on level (level1, level2, level3 etc), giving me something like this in return
id | level1 | level2 | level3
---+--------+--------+-------
1 | A | |
2 | A | B |
3 | A | C |
4 | A | B | D
5 | A | B | E
6 | A | C | F
7 | A | C | G
Any help would be appreciated.

If you know the maximum depth of your tree, I'd keep your approach and simplify it using array concatenation to produce the desired output.
So for a 5 level tree, that would look like this :
WITH RECURSIVE R(id, path) AS (
SELECT id, ARRAY[name::text] FROM tree WHERE parent IS NULL
UNION SELECT tree.id, path || tree.name FROM tree JOIN R ON R.id=tree.parent
)
SELECT id,
path[1] AS l1,
path[2] AS l2,
path[3] AS l3,
path[4] AS l4,
path[5] AS l5
FROM R;
PS : sorry for not commenting on Ziggy's answer which is very close, but I don't have enough reputation to do so. I don't see why you would need a windowing function here ?

PostgreSQL requires to always define the type of the output, so you can't have the columns levelX produced dynamically. However, you can do the following:
with recursive
R(id, path) as (
select id,ARRAY[name::text] from tree where parent is null
union
select tree.id, path || tree.name::text from tree join R on R.id=tree.parent
)
select row_number() over (order by cardinality(path), path), id,
path[1] as level1, path[2] as level2, path[3] as level3
from R
order by 1
In the example above, the column row_number happens to match id, but probably that wouldn't happen with your real data.

Related

What is the best possible implementation for the following recursively query?

I got a table with the following struct representing a file system.
Every item, might be a file or a folder, has a unique id. If it is a category(folder), it contains other files.
level indicates the directory depth.
|id |parent_id|is_category|level|
|:-:|: - :|: - :|: - :|
|0 | -1 | true | 0 |
|1 | 0 | true | 1 |
|2 | 0 | true | 1 |
|3 | 1 | true | 2 |
|4 | 2 | false | 2 |
|5 | 3 | true | 3 |
|6 | 5 | false | 4 |
|7 | 5 | false | 4 |
|8 | 5 | true | 4 |
|9 | 5 | false | 4 |
Task:
Fetch all subitems levels <= 3 in the folder id == 1.
The result ids should be [1,3,5]
My current implementation is recursively queries, which means, for the example above, my program would fetch id == 1 first and then find all items with is_categorh == true and level <= 3.
It doesn't feel like a efficient way.
Any advice will be appreciated.
You don't mention the database you are using so I'll assume PostgreSQL.
You can retrieve the rows you want using a single query that uses a "Recursive CTE". Recursive CTEs are implemented by several database engines, such as Oracle, DB2, PostgreSQL, SQL Server, MariaDB, MySQL, HyperSQL, H2, Teradata, etc.
The query should take a for similar to:
with recursive x as (
select * from t where id = 1
union all
select t.*
from x
join t on t.parent_id = x.id and t.level <= 3
)
select id from x
For the record, the data script I used to test it is:
create table t (
id int,
parent_id int,
level int
);
insert into t (id, parent_id, level) values (0, -1, 0);
insert into t (id, parent_id, level) values (1, 0, 1);
insert into t (id, parent_id, level) values (2, 0, 1);
insert into t (id, parent_id, level) values (3, 1, 2);
insert into t (id, parent_id, level) values (4, 2, 2);
insert into t (id, parent_id, level) values (5, 3, 3);
insert into t (id, parent_id, level) values (6, 5, 4);
insert into t (id, parent_id, level) values (7, 5, 4);
insert into t (id, parent_id, level) values (8, 5, 4);
insert into t (id, parent_id, level) values (9, 5, 4);
As others have said, recursive CTE's are a fast, and typically efficient method to retrieve the data you're looking for. If you wanted to avoid recursive CTE's, since they aren't infinitely scalable, and thus prone to erratic behavior given certain use cases, you could also take a more direct approach by implementing the recursive search via a WHILE loop. Note that this is not more efficient than the recursive CTE, but it is something that gives you more control over what happens in the recursion. In my sample, I am using Transact-SQL.
First, setup code, like #The Impaler provided:
drop table if exists
dbo.folder_tree;
create table dbo.folder_tree
(
id int not null constraint [PK_folder_tree] primary key clustered,
parent_id int not null,
fs_level int not null,
is_category bit not null constraint [DF_folder_tree_is_category] default(0),
constraint [UQ_folder_tree_parent_id] unique(parent_id, id)
);
insert into dbo.folder_tree
(id, parent_id, fs_level, is_category)
values
(0, -1, 0, 1), --|0 | -1 | true | 0 |
(1, 0, 1, 1), --|1 | 0 | true | 1 |
(2, 0, 1, 1), --|2 | 0 | true | 1 |
(3, 1, 2, 1), --|3 | 1 | true | 2 |
(4, 2, 2, 0), --|4 | 2 | false | 2 |
(5, 3, 3, 1), --|5 | 3 | true | 3 |
(6, 5, 4, 0), --|6 | 5 | false | 4 |
(7, 5, 4, 0), --|7 | 5 | false | 4 |
(8, 5, 4, 1), --|8 | 5 | true | 4 |
(9, 5, 4, 0); --|9 | 5 | false | 4 |
And then the code for implementing a recursive search of the table via WHILE loop:
drop function if exists
dbo.folder_traverse;
go
create function dbo.folder_traverse
(
#start_id int,
#max_level int = null
)
returns #result table
(
id int not null primary key,
parent_id int not null,
fs_level int not null,
is_category bit not null
)
as
begin
insert into
#result
select
id,
parent_id,
fs_level,
is_category
from
dbo.folder_tree
where
id = #start_id;
while ##ROWCOUNT > 0
begin
insert into
#result
select
f.id,
f.parent_id,
f.fs_level,
f.is_category
from
#result r
inner join dbo.folder_tree f on
r.id = f.parent_id
where
f.is_category = 1 and
(
#max_level is null or
f.fs_level <= #max_level
)
except
select
id,
parent_id,
fs_level,
is_category
from
#result;
end;
return;
end;
go
In closing, the only reason I'd recommend this approach is if you have a large number of recursive members, or need to add logging or some other process in between actions. This approach is slower in most use cases, and adds complexity to the code, but is an alternative to the recursive CTE and meets your required criteria.

Count Based on Columns in SQL Server

I have 3 tables:
SELECT id, letter
FROM As
+--------+--------+
| id | letter |
+--------+--------+
| 1 | A |
| 2 | B |
+--------+--------+
SELECT id, letter
FROM Xs
+--------+------------+
| id | letter |
+--------+------------+
| 1 | X |
| 2 | Y |
| 3 | Z |
+--------+------------+
SELECT id, As_id, Xs_id
FROM A_X
+--------+-------+-------+
| id | As_id | Xs_id |
+--------+-------+-------+
| 9 | 1 | 1 |
| 10 | 1 | 2 |
| 11 | 2 | 3 |
| 12 | 1 | 2 |
| 13 | 2 | 3 |
| 14 | 1 | 1 |
+--------+-------+-------+
I can count all As and Bs with group by. But I want to count As and Bs based on X,Y and Z. What I want to get is below:
+-------+
| X,Y,Z |
+-------+
| 2,2,0 |
| 0,0,2 |
+-------+
X,Y,Z
A 2,2,0
B 0,0,2
What is the best way to do this at MSSQL? Is it an efficent way to use foreach for example?
edit: It is not a duplicate because I just wanted to know the efficent way not any way.
For what you're trying to do without knowing what is inefficient with your current code (because none was provided), a Pivot is best. There are a million resources online and here in the stack overflow Q/A forums to find what you need. This is probably the simplest explanation of a Pivot which I frequently need to remind myself of the complicated syntax of a pivot.
To specifically answer your question, this is the code that shows how the link above applies to your question
First Tables needed to be created
DECLARE #AS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XA AS TABLE (ID INT, AsID INT, XsID INT)
Values were added to the tables
INSERT INTO #AS (ID, Letter)
SELECT 1,'A'
UNION
SELECT 2,'B'
INSERT INTO #XS (ID, Letter)
SELECT 1,'X'
UNION
SELECT 2,'Y'
UNION
SELECT 3,'Z'
INSERT INTO #XA (ID, ASID, XSID)
SELECT 9,1,1
UNION
SELECT 10,1,2
UNION
SELECT 11,2,3
UNION
SELECT 12,1,2
UNION
SELECT 13,2,3
UNION
SELECT 14,1,1
Then the query which does the pivot is constructed:
SELECT LetterA, [X],[Y],[Z]
FROM (SELECT A.LETTER AS LetterA
,B.LETTER AS LetterX
,C.ID
FROM #XA C
JOIN #AS A
ON A.ID = C.ASID
JOIN #XS B
ON B.ID = C.XSID
) Src
PIVOT (COUNT(ID)
FOR LetterX IN ([X],[Y],[Z])
) AS PVT
When executed, your results are as follows:
Letter X Y Z
A 2 2 0
B 0 0 2
As i said in comment ... just join and do simple pivot
if object_id('tempdb..#AAs') is not null drop table #AAs
create table #AAs(id int, letter nvarchar(5))
if object_id('tempdb..#XXs') is not null drop table #XXs
create table #XXs(id int, letter nvarchar(5))
if object_id('tempdb..#A_X') is not null drop table #A_X
create table #A_X(id int, AAs int, XXs int)
insert into #AAs (id, letter) values (1, 'A'), (2, 'B')
insert into #XXs (id, letter) values (1, 'X'), (2, 'Y'), (3, 'Z')
insert into #A_X (id, AAs, XXs)
values (9, 1, 1),
(10, 1, 2),
(11, 2, 3),
(12, 1, 2),
(13, 2, 3),
(14, 1, 1)
select LetterA,
ISNULL([X], 0) [X],
ISNULL([Y], 0) [Y],
ISNULL([Z], 0) [Z]
from (
select distinct a.letter [LetterA], x.letter [LetterX],
count(*) over (partition by a.letter, x.letter order by a.letter) [Counted]
from #A_X ax
join #AAs A on ax.AAs = A.ID
join #XXs X on ax.XXs = X.ID
)src
PIVOT
(
MAX ([Counted]) for LetterX in ([X], [Y], [Z])
) piv
You get result as you asked for
LetterA X Y Z
A 2 2 0
B 0 0 2

save parent child relationship with comma separated

I am working in SQL 2008 R2
I have a table like below
| UID | Value | ParentID | Path
|:----|-------|------------:|:------------:|
| 1 | A | NULL | |1| |
| 2 | B | 1 | |1|2| |
| 3 | C | 2 | |1|2|3| |
| 4 | D | 1 | |1|4| |
| 5 | E | NULL | |5| |
While inserting the above first 3 columns, the fourth column Path should be able to save the relation with '|' separator. How this can be achieved?
Thanks in advance.
As D Stanley mentions, you don't want to store the Path in your table. Better to just store the UID, Value, and ParentID, and then query the path data from that if and when you need it. If you need it often, you might consider defining a view on the table for that purpose. One way to perform the query is with a recursive CTE. Something like this should work:
-- Sample data from the question.
declare #x table ([UID] bigint, [Value] nchar(1), [ParentID] bigint);
insert #x values
(1, N'A', null),
(2, N'B', 1),
(3, N'C', 2),
(4, N'D', 1),
(5, N'E', null);
with [PathCTE] as
(
-- Base case: any entry with no parent is its own path.
select
*,
[Path] = convert(nvarchar(max), N'|' + convert(nvarchar, [UID]) + N'|')
from
#x
where
[ParentID] is null
union all
-- Recursive case: for any entry whose parent is already in the result set,
-- we can construct the path by appending a single value to the parent path.
select
[Child].*,
[Path] = convert(nvarchar(max), [Parent].[Path] + convert(nvarchar, [Child].[UID]) + N'|')
from
#x [Child]
inner join [PathCTE] [Parent] on [Child].[ParentID] = [Parent].[UID]
)
select * from [PathCTE] order by [UID];

SQL Ignore duplicate primary keys

Imagine you have a string of results from a SELECT statement:
ID (pk) Name Address
1 a b
1 c d
1 e f
2 a b
3 a d
2 a d
Is it possible to alter the SQL statement to get one record ONLY for the record with ID 1?
I have a SELECT statement that displays multiple values which can have the same primary key. I want to only take one of those records, if say, I have 5 records with the same primary key.
SQL: http://pastebin.com/cFCBA2Uy
Screenshot: http://i.imgur.com/UlMBZhC.png
What I want is to show only one file which is for e.g. File Number: 925, 890
You stated that no matter which row to choose when there are more than one row for the same Id, you just want one row for each id.
The following query does what you asked for:
DECLARE #T table
(
id int,
name varchar(50),
address varchar(50)
)
INSERT INTO #T VALUES
(1, 'a', 'b'),
(1, 'c', 'd'),
(1, 'e', 'f'),
(2, 'a', 'b'),
(3, 'a', 'd'),
(2, 'a', 'd');
WITH A AS
(
SELECT
t.id, t.name, t.address,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY (SELECT NULL)) AS RowNumber
FROM
#T t
)
SELECT
A.id, A.name, A.address
FROM
A
WHERE
A.RowNumber = 1
But I think there should be a criteria. If you find one, express your criteria as the ORDER BY inside the OVER clause.
EDIT:
Here you have the result:
+----+------+---------+
| id | name | address |
+----+------+---------+
| 1 | a | b |
| 2 | a | b |
| 3 | a | d |
+----+------+---------+
Disclaimer: the query I wrote is non-deterministic, different conditions (indexes, statistics, etc) might lead to different results.

CONCAT(column) OVER(PARTITION BY...)? Group-concatentating rows without grouping the result itself

I need a way to make a concatenation of all rows (per group) in a kind of window function like how you can do COUNT(*) OVER(PARTITION BY...) and the aggregate count of all rows per group will repeat across each particular group. I need something similar but a string concatenation of all values per group repeated across each group.
Here is some example data and my desired result to better illustrate my problem:
grp | val
------------
1 | a
1 | b
1 | c
1 | d
2 | x
2 | y
2 | z
And here is what I need (the desired result):
grp | val | groupcnct
---------------------------------
1 | a | abcd
1 | b | abcd
1 | c | abcd
1 | d | abcd
2 | x | xyz
2 | y | xyz
2 | z | xyz
Here is the really tricky part of this problem:
My particular situation prevents me from being able to reference the same table twice (I'm actually doing this within a recursive CTE, so I can't do a self-join of the CTE or it will throw an error).
I'm fully aware that one can do something like:
SELECT a.*, b.groupcnct
FROM tbl a
CROSS APPLY (
SELECT STUFF((
SELECT '' + aa.val
FROM tbl aa
WHERE aa.grp = a.grp
FOR XML PATH('')
), 1, 0, '') AS groupcnct
) b
But as you can see, that is referencing tbl two times in the query.
I can only reference tbl once, hence why I'm wondering if windowing the group-concatenation is possible (I'm a bit new to TSQL since I come from a MySQL background, so not sure if something like that can be done).
Create Table:
CREATE TABLE tbl
(grp int, val varchar(1));
INSERT INTO tbl
(grp, val)
VALUES
(1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(2, 'x'),
(2, 'y'),
(2, 'z');
In sql 2017 you can use STRING_AGG function:
SELECT STRING_AGG(T.val, ',') AS val
, T.grp
FROM #tbl AS T
GROUP BY T.grp
I tried using pure CTE approach: Which is the best way to form the string value using column from a Table with rows having same ID? Thinking it is faster
But the benchmark tells otherwise, it's better to use subquery(or CROSS APPLY) results from XML PATH as they are faster: Which is the best way to form the string value using column from a Table with rows having same ID?
DECLARE #tbl TABLE
(
grp INT
,val VARCHAR(1)
);
BEGIN
INSERT INTO #tbl(grp, val)
VALUES
(1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(2, 'x'),
(2, 'y'),
(2, 'z');
END;
----------- Your Required Query
SELECT ST2.grp,
SUBSTRING(
(
SELECT ','+ST1.val AS [text()]
FROM #tbl ST1
WHERE ST1.grp = ST2.grp
ORDER BY ST1.grp
For XML PATH ('')
), 2, 1000
) groupcnct
FROM #tbl ST2
Is it possible for you to just put your stuff in the select instead or do you run into the same issue? (i replaced 'tbl' with 'TEMP.TEMP123')
Select
A.*
, [GROUPCNT] = STUFF((
SELECT '' + aa.val
FROM TEMP.TEMP123 AA
WHERE aa.grp = a.grp
FOR XML PATH('')
), 1, 0, '')
from TEMP.TEMP123 A
This worked for me -- wanted to see if this worked for you too.
I know this post is old, but just in case, someone is still wondering, you can create scalar function that concatenates row values.
IF OBJECT_ID('dbo.fnConcatRowsPerGroup','FN') IS NOT NULL
DROP FUNCTION dbo.fnConcatRowsPerGroup
GO
CREATE FUNCTION dbo.fnConcatRowsPerGroup
(#grp as int) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #val AS VARCHAR(MAX)
SELECT #val = COALESCE(#val,'')+val
FROM tbl
WHERE grp = #grp
RETURN #val;
END
GO
select *, dbo.fnConcatRowsPerGroup(grp)
from tbl
Here is the result set I got from querying a sample table:
grp | val | (No column name)
---------------------------------
1 | a | abcd
1 | b | abcd
1 | c | abcd
1 | d | abcd
2 | x | xyz
2 | y | xyz
2 | z | xyz