What is the best possible implementation for the following recursively query? - sql

I got a table with the following struct representing a file system.
Every item, might be a file or a folder, has a unique id. If it is a category(folder), it contains other files.
level indicates the directory depth.
|id |parent_id|is_category|level|
|:-:|: - :|: - :|: - :|
|0 | -1 | true | 0 |
|1 | 0 | true | 1 |
|2 | 0 | true | 1 |
|3 | 1 | true | 2 |
|4 | 2 | false | 2 |
|5 | 3 | true | 3 |
|6 | 5 | false | 4 |
|7 | 5 | false | 4 |
|8 | 5 | true | 4 |
|9 | 5 | false | 4 |
Task:
Fetch all subitems levels <= 3 in the folder id == 1.
The result ids should be [1,3,5]
My current implementation is recursively queries, which means, for the example above, my program would fetch id == 1 first and then find all items with is_categorh == true and level <= 3.
It doesn't feel like a efficient way.
Any advice will be appreciated.

You don't mention the database you are using so I'll assume PostgreSQL.
You can retrieve the rows you want using a single query that uses a "Recursive CTE". Recursive CTEs are implemented by several database engines, such as Oracle, DB2, PostgreSQL, SQL Server, MariaDB, MySQL, HyperSQL, H2, Teradata, etc.
The query should take a for similar to:
with recursive x as (
select * from t where id = 1
union all
select t.*
from x
join t on t.parent_id = x.id and t.level <= 3
)
select id from x
For the record, the data script I used to test it is:
create table t (
id int,
parent_id int,
level int
);
insert into t (id, parent_id, level) values (0, -1, 0);
insert into t (id, parent_id, level) values (1, 0, 1);
insert into t (id, parent_id, level) values (2, 0, 1);
insert into t (id, parent_id, level) values (3, 1, 2);
insert into t (id, parent_id, level) values (4, 2, 2);
insert into t (id, parent_id, level) values (5, 3, 3);
insert into t (id, parent_id, level) values (6, 5, 4);
insert into t (id, parent_id, level) values (7, 5, 4);
insert into t (id, parent_id, level) values (8, 5, 4);
insert into t (id, parent_id, level) values (9, 5, 4);

As others have said, recursive CTE's are a fast, and typically efficient method to retrieve the data you're looking for. If you wanted to avoid recursive CTE's, since they aren't infinitely scalable, and thus prone to erratic behavior given certain use cases, you could also take a more direct approach by implementing the recursive search via a WHILE loop. Note that this is not more efficient than the recursive CTE, but it is something that gives you more control over what happens in the recursion. In my sample, I am using Transact-SQL.
First, setup code, like #The Impaler provided:
drop table if exists
dbo.folder_tree;
create table dbo.folder_tree
(
id int not null constraint [PK_folder_tree] primary key clustered,
parent_id int not null,
fs_level int not null,
is_category bit not null constraint [DF_folder_tree_is_category] default(0),
constraint [UQ_folder_tree_parent_id] unique(parent_id, id)
);
insert into dbo.folder_tree
(id, parent_id, fs_level, is_category)
values
(0, -1, 0, 1), --|0 | -1 | true | 0 |
(1, 0, 1, 1), --|1 | 0 | true | 1 |
(2, 0, 1, 1), --|2 | 0 | true | 1 |
(3, 1, 2, 1), --|3 | 1 | true | 2 |
(4, 2, 2, 0), --|4 | 2 | false | 2 |
(5, 3, 3, 1), --|5 | 3 | true | 3 |
(6, 5, 4, 0), --|6 | 5 | false | 4 |
(7, 5, 4, 0), --|7 | 5 | false | 4 |
(8, 5, 4, 1), --|8 | 5 | true | 4 |
(9, 5, 4, 0); --|9 | 5 | false | 4 |
And then the code for implementing a recursive search of the table via WHILE loop:
drop function if exists
dbo.folder_traverse;
go
create function dbo.folder_traverse
(
#start_id int,
#max_level int = null
)
returns #result table
(
id int not null primary key,
parent_id int not null,
fs_level int not null,
is_category bit not null
)
as
begin
insert into
#result
select
id,
parent_id,
fs_level,
is_category
from
dbo.folder_tree
where
id = #start_id;
while ##ROWCOUNT > 0
begin
insert into
#result
select
f.id,
f.parent_id,
f.fs_level,
f.is_category
from
#result r
inner join dbo.folder_tree f on
r.id = f.parent_id
where
f.is_category = 1 and
(
#max_level is null or
f.fs_level <= #max_level
)
except
select
id,
parent_id,
fs_level,
is_category
from
#result;
end;
return;
end;
go
In closing, the only reason I'd recommend this approach is if you have a large number of recursive members, or need to add logging or some other process in between actions. This approach is slower in most use cases, and adds complexity to the code, but is an alternative to the recursive CTE and meets your required criteria.

Related

Dynamic table created from CTE (parent/child)

If I have a very simple table called tree
create table if not exists tree (id int primary key, parent int, name text);
And a few rows of data
insert into tree values (1, null, 'A');
insert into tree values (2, 1, 'B');
insert into tree values (3, 1, 'C');
insert into tree values (4, 2, 'D');
insert into tree values (5, 2, 'E');
insert into tree values (6, 3, 'F');
insert into tree values (7, 3, 'G');
I can easily run CTEs on it, and produce an output giving me path like this
with recursive R(id, level, path, name) as (
select id,1,name,name from tree where parent is null
union select tree.id, level + 1, path || '.' || tree.name, tree.name from tree join R on R.id=tree.parent
) select level,path,name from R;
Which gives the output
level | path | name
-------+-------+------
1 | A | A
2 | A.B | B
2 | A.C | C
3 | A.B.D | D
3 | A.B.E | E
3 | A.C.F | F
3 | A.C.G | G
What I'm wondering, is it possible to somehow project this output into another table, dynamically creating columns based on level (level1, level2, level3 etc), giving me something like this in return
id | level1 | level2 | level3
---+--------+--------+-------
1 | A | |
2 | A | B |
3 | A | C |
4 | A | B | D
5 | A | B | E
6 | A | C | F
7 | A | C | G
Any help would be appreciated.
If you know the maximum depth of your tree, I'd keep your approach and simplify it using array concatenation to produce the desired output.
So for a 5 level tree, that would look like this :
WITH RECURSIVE R(id, path) AS (
SELECT id, ARRAY[name::text] FROM tree WHERE parent IS NULL
UNION SELECT tree.id, path || tree.name FROM tree JOIN R ON R.id=tree.parent
)
SELECT id,
path[1] AS l1,
path[2] AS l2,
path[3] AS l3,
path[4] AS l4,
path[5] AS l5
FROM R;
PS : sorry for not commenting on Ziggy's answer which is very close, but I don't have enough reputation to do so. I don't see why you would need a windowing function here ?
PostgreSQL requires to always define the type of the output, so you can't have the columns levelX produced dynamically. However, you can do the following:
with recursive
R(id, path) as (
select id,ARRAY[name::text] from tree where parent is null
union
select tree.id, path || tree.name::text from tree join R on R.id=tree.parent
)
select row_number() over (order by cardinality(path), path), id,
path[1] as level1, path[2] as level2, path[3] as level3
from R
order by 1
In the example above, the column row_number happens to match id, but probably that wouldn't happen with your real data.

Count Based on Columns in SQL Server

I have 3 tables:
SELECT id, letter
FROM As
+--------+--------+
| id | letter |
+--------+--------+
| 1 | A |
| 2 | B |
+--------+--------+
SELECT id, letter
FROM Xs
+--------+------------+
| id | letter |
+--------+------------+
| 1 | X |
| 2 | Y |
| 3 | Z |
+--------+------------+
SELECT id, As_id, Xs_id
FROM A_X
+--------+-------+-------+
| id | As_id | Xs_id |
+--------+-------+-------+
| 9 | 1 | 1 |
| 10 | 1 | 2 |
| 11 | 2 | 3 |
| 12 | 1 | 2 |
| 13 | 2 | 3 |
| 14 | 1 | 1 |
+--------+-------+-------+
I can count all As and Bs with group by. But I want to count As and Bs based on X,Y and Z. What I want to get is below:
+-------+
| X,Y,Z |
+-------+
| 2,2,0 |
| 0,0,2 |
+-------+
X,Y,Z
A 2,2,0
B 0,0,2
What is the best way to do this at MSSQL? Is it an efficent way to use foreach for example?
edit: It is not a duplicate because I just wanted to know the efficent way not any way.
For what you're trying to do without knowing what is inefficient with your current code (because none was provided), a Pivot is best. There are a million resources online and here in the stack overflow Q/A forums to find what you need. This is probably the simplest explanation of a Pivot which I frequently need to remind myself of the complicated syntax of a pivot.
To specifically answer your question, this is the code that shows how the link above applies to your question
First Tables needed to be created
DECLARE #AS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XA AS TABLE (ID INT, AsID INT, XsID INT)
Values were added to the tables
INSERT INTO #AS (ID, Letter)
SELECT 1,'A'
UNION
SELECT 2,'B'
INSERT INTO #XS (ID, Letter)
SELECT 1,'X'
UNION
SELECT 2,'Y'
UNION
SELECT 3,'Z'
INSERT INTO #XA (ID, ASID, XSID)
SELECT 9,1,1
UNION
SELECT 10,1,2
UNION
SELECT 11,2,3
UNION
SELECT 12,1,2
UNION
SELECT 13,2,3
UNION
SELECT 14,1,1
Then the query which does the pivot is constructed:
SELECT LetterA, [X],[Y],[Z]
FROM (SELECT A.LETTER AS LetterA
,B.LETTER AS LetterX
,C.ID
FROM #XA C
JOIN #AS A
ON A.ID = C.ASID
JOIN #XS B
ON B.ID = C.XSID
) Src
PIVOT (COUNT(ID)
FOR LetterX IN ([X],[Y],[Z])
) AS PVT
When executed, your results are as follows:
Letter X Y Z
A 2 2 0
B 0 0 2
As i said in comment ... just join and do simple pivot
if object_id('tempdb..#AAs') is not null drop table #AAs
create table #AAs(id int, letter nvarchar(5))
if object_id('tempdb..#XXs') is not null drop table #XXs
create table #XXs(id int, letter nvarchar(5))
if object_id('tempdb..#A_X') is not null drop table #A_X
create table #A_X(id int, AAs int, XXs int)
insert into #AAs (id, letter) values (1, 'A'), (2, 'B')
insert into #XXs (id, letter) values (1, 'X'), (2, 'Y'), (3, 'Z')
insert into #A_X (id, AAs, XXs)
values (9, 1, 1),
(10, 1, 2),
(11, 2, 3),
(12, 1, 2),
(13, 2, 3),
(14, 1, 1)
select LetterA,
ISNULL([X], 0) [X],
ISNULL([Y], 0) [Y],
ISNULL([Z], 0) [Z]
from (
select distinct a.letter [LetterA], x.letter [LetterX],
count(*) over (partition by a.letter, x.letter order by a.letter) [Counted]
from #A_X ax
join #AAs A on ax.AAs = A.ID
join #XXs X on ax.XXs = X.ID
)src
PIVOT
(
MAX ([Counted]) for LetterX in ([X], [Y], [Z])
) piv
You get result as you asked for
LetterA X Y Z
A 2 2 0
B 0 0 2

How can I write a procedure that gets levels i through j or a tree like this?

I have a table like
Users
-------------------------
id | ancestor_id | ....
-------------------------
1 | NULL | ....
2 | 1 | ....
3 | 1 | ....
4 | 3 | ....
5 | 3 | ....
that would represent a tree like
level 1 1
/ \
level 2 2 3
/ \
level 3 4 5
and I want to create a procedure that returns the ith through jth generation of descendants of a given user:
CREATE PROCEDURE DescendantsLevel
#user_id INT,
#i INT,
#j INT
AS
....
If #j is NULL, however, it returns all descendants beginning from generation #i.
Examples:
EXEC DescendantLevel #user_id=1,#i=2,#j=NULL
would return
-------------------------
id | ancestor_id | ....
-------------------------
1 | NULL | ....
2 | 1 | ....
3 | 1 | ....
4 | 3 | ....
5 | 3 | ....
and
EXEC DescendantLevel #user_id=1,#i=1,#j=2
would return
Users
-------------------------
id | ancestor_id | ....
-------------------------
1 | NULL | ....
2 | 1 | ....
3 | 1 | ....
Several questions, I have:
Is there a better value than NULL to represent some concept of "infinity" in SQL?
How can I implement the procedure I've described?
Is there a better way of designing the database in order to simplify the procedure?
Using a recursive CTE:
DECLARE #test TABLE (id INT NOT NULL, ancestor_id INT NULL)
DECLARE
#id INT = 1,
#i INT = 1,
#j INT = 2
INSERT INTO #test (id, ancestor_id)
VALUES
(1, NULL),
(2, 1),
(3, 1),
(4, 3),
(5, 3)
;WITH CTE_Tree AS
(
SELECT
id,
ancestor_id,
1 AS lvl,
id AS base
FROM
#test
WHERE
id = #id
UNION ALL
SELECT
C.id,
C.ancestor_id,
P.lvl + 1 AS lvl,
P.base AS base
FROM
CTE_Tree P
INNER JOIN #test C ON C.ancestor_id = P.id
WHERE
lvl <= COALESCE(#j, 9999)
)
SELECT
id,
ancestor_id
FROM
CTE_Tree
WHERE
lvl BETWEEN #i AND COALESCE(#j, 9999)
This relies on no more than 9999 levels of recursion (actually the default limit on recursion for SQL Server is 100, so more than 100 levels and you'll get an error).

Get id of max value in group

I have a table and i would like to gather the id of the items from each group with the max value on a column but i have a problem.
SELECT group_id, MAX(time)
FROM mytable
GROUP BY group_id
This way i get the correct rows but i need the id:
SELECT id,group_id,MAX(time)
FROM mytable
GROUP BY id,group_id
This way i got all the rows. How could i achieve to get the ID of max value row for time from each group?
Sample Data
id = 1, group_id = 1, time = 2014.01.03
id = 2, group_id = 1, time = 2014.01.04
id = 3, group_id = 2, time = 2014.01.04
id = 4, group_id = 2, time = 2014.01.02
id = 5, group_id = 3, time = 2014.01.01
and from that i should get id: 2,3,5
Thanks!
Use your working query as a sub-query, like this:
SELECT `id`
FROM `mytable`
WHERE (`group_id`, `time`) IN (
SELECT `group_id`, MAX(`time`) as `time`
FROM `mytable`
GROUP BY `group_id`
)
Have a look at the below demo
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(id INT , group_id INT , time_st DATE);
INSERT INTO mytable VALUES(1, 1, '2014-01-03'),(2, 1, '2014-01-04'),(3, 2, '2014-01-04'),(4, 2, '2014-01-02'),(5, 3, '2014-01-01');
/** Check all data **/
SELECT * FROM mytable;
+------+----------+------------+
| id | group_id | time_st |
+------+----------+------------+
| 1 | 1 | 2014-01-03 |
| 2 | 1 | 2014-01-04 |
| 3 | 2 | 2014-01-04 |
| 4 | 2 | 2014-01-02 |
| 5 | 3 | 2014-01-01 |
+------+----------+------------+
/** Query for Actual output**/
SELECT
id
FROM
mytable
JOIN
(
SELECT group_id, MAX(time_st) as max_time
FROM mytable GROUP BY group_id
) max_time_table
ON mytable.group_id = max_time_table.group_id AND mytable.time_st = max_time_table.max_time;
+------+
| id |
+------+
| 2 |
| 3 |
| 5 |
+------+
When multiple groups may contain the same value, you could use
SELECT subq.id
FROM (SELECT id,
value,
MAX(time) OVER (PARTITION BY group_id) as max_time
FROM mytable) as subq
WHERE subq.time = subq.max_time
The subquery here generates a new column (max_time) that contains the maximum time per group. We can then filter on time and max_time being identical. Note that this still returns multiple rows per group if the maximum value occurs multiple time within the same group.
Full example:
CREATE TABLE test (
id INT,
group_id INT,
value INT
);
INSERT INTO test (id, group_id, value) VALUES (1, 1, 100);
INSERT INTO test (id, group_id, value) VALUES (2, 1, 200);
INSERT INTO test (id, group_id, value) VALUES (3, 1, 300);
INSERT INTO test (id, group_id, value) VALUES (4, 2, 100);
INSERT INTO test (id, group_id, value) VALUES (5, 2, 300);
INSERT INTO test (id, group_id, value) VALUES (6, 2, 200);
INSERT INTO test (id, group_id, value) VALUES (7, 3, 300);
INSERT INTO test (id, group_id, value) VALUES (8, 3, 200);
INSERT INTO test (id, group_id, value) VALUES (9, 3, 100);
select * from test;
id | group_id | value
----+----------+-------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 300
4 | 2 | 100
5 | 2 | 300
6 | 2 | 200
7 | 3 | 300
8 | 3 | 200
9 | 3 | 100
(9 rows)
SELECT subq.id
FROM (SELECT id,
value,
MAX(value) OVER (partition by group_id) as max_value
FROM test) as subq
WHERE subq.value = subq.max_value;
id
----
3
5
7
(3 rows)

Oracle SQL recursive query to find parent of level one

suppose i have a table service :
Name | ID | PARENT_ID | LEVEL |
-------------------------------------------
s1 | 1 | null | 0 |
s2 | 2 | 1 | 1 |
s3 | 3 | 1 | 2 |
s4 | 4 | 2 | 2 |
s5 | 5 | 3 | 3 |
s6 | 6 | 4 | 3 |
and i want to get the parent of level 1 for s6(id=6) which should return s2 , is there a way to make a recursive query until a level is reached ?
You can go UP the tree instead of going down - from leaf (id = 6) to root (which in this reverse case itself would be a leaf, connect_by_isleaf = 1), and take a "parent" of that leaf using prior operator.
upd: Misunderstood your requirement about LEVEL (in Oracle hierarchical queries it is a dynamic pseudocolumn specifying hierarchical depth of a row). If you want to limit your result set to rows with a specific value of your custom pre-populated LEVEL column - you can just add it to where condition.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE t
("NAME" varchar2(2), "ID" int, "PARENT_ID" int, "LVL" int)
;
INSERT ALL
INTO t ("NAME", "ID", "PARENT_ID", "LVL")
VALUES ('s1', 1, NULL, 0)
INTO t ("NAME", "ID", "PARENT_ID", "LVL")
VALUES ('s2', 2, 1, 1)
INTO t ("NAME", "ID", "PARENT_ID", "LVL")
VALUES ('s3', 3, 1, 2)
INTO t ("NAME", "ID", "PARENT_ID", "LVL")
VALUES ('s4', 4, 2, 2)
INTO t ("NAME", "ID", "PARENT_ID", "LVL")
VALUES ('s5', 5, 3, 3)
INTO t ("NAME", "ID", "PARENT_ID", "LVL")
VALUES ('s6', 6, 4, 3)
SELECT * FROM dual
;
Query 1:
select id as id, name as name from t
where lvl = 1
connect by id = prior parent_id
start with id = 6
Results:
| ID | NAME |
|----|------|
| 2 | s2 |
This is possible with a hierarchical query:
create table tq84_h (
id number,
parent_id number,
level_ number
);
insert into tq84_h values (1, null, 0);
insert into tq84_h values (2, 1 , 1);
insert into tq84_h values (3, 1 , 2);
insert into tq84_h values (4, 2 , 2);
insert into tq84_h values (5, 3 , 3);
insert into tq84_h values (6, 4 , 3);
select
parent_id
from
tq84_h
where
level_ = 2
start with
id = 6
connect by
prior parent_id = id and
level_>1
;