In the following example:
TABLE
ID NAME ATTR
-----------------
1 A1 ROOT
2 A2
3 A3 VALX
4 A4
5 A5
6 A6
RELATIONSHIP
ID CHILD_ID PARENT_ID
-------------------------
1 6 4
2 5 4
3 4 3
4 3 1
5 2 1
SCHEMA
I need a query to get the value of the ATTR column of the PARENT when it is different from null. Raising the levels until you get the first match.
For example with ID 6:
ID NAME NAME_PARENT ATTR_PARENT
-----------------------------------------
6 A6 A3 VALX
I have tried with:
select T.ID, T.NAME, T2.NAME PARENT_NAME, T2.ATTR ATTR_PARENT
from TABLE T
INNER JOIN RELATIONSHIP R
ON R.CHILD_ID = T.ID
INNER JOIN TABLE T2
ON T2.ID = R.PARENT_D
WHERE T2.ATTR IS NOT NULL
START WITH T.ID = 6
CONNECT BY T.ID = PRIOR R.PARENTID
--and R.PARENTID != prior T.ID
And sorry for my bad english
Instead of using the [mostly obsolete] CONNECT BY clause you can use standard Recursive SQL CTEs (Common Table Expressions).
For example:
with
n (id, name, name_parent, attr_parent, parent_id, lvl) as (
select t.id, t.name, b.name, b.attr, r.parent_id, 1
from t
join r on t.id = r.child_id
join t b on b.id = r.parent_id
where t.id = 6 -- starting node
union all
select n.id, n.name, b.name, b.attr, r.parent_id, lvl + 1
from n
join r on r.child_id = n.parent_id
join t b on b.id = r.parent_id
where n.attr_parent is null
)
select id, name, name_parent, attr_parent
from n
where lvl = (select max(lvl) from n)
Result:
ID NAME NAME_PARENT ATTR_PARENT
-- ---- ----------- -----------
6 A6 A3 VALX
For reference, the data script I used is:
create table t (
id number(6),
name varchar2(10),
attr varchar2(10)
);
insert into t (id, name, attr) values (1, 'A1', 'ROOT');
insert into t (id, name, attr) values (2, 'A2', null);
insert into t (id, name, attr) values (3, 'A3', 'VALX');
insert into t (id, name, attr) values (4, 'A4', null);
insert into t (id, name, attr) values (5, 'A5', null);
insert into t (id, name, attr) values (6, 'A6', null);
create table r (
id number(6),
child_id number(6),
parent_id number(6)
);
insert into r (id, child_id, parent_id) values (1, 6, 4);
insert into r (id, child_id, parent_id) values (2, 5, 4);
insert into r (id, child_id, parent_id) values (3, 4, 3);
insert into r (id, child_id, parent_id) values (4, 3, 1);
insert into r (id, child_id, parent_id) values (5, 2, 1);
Here is how you can do the whole thing in a single pass of connect by - using the various features available for this kind of query (including the connect_by_isleaf flag and the connect_by_root pseudo-column):
select connect_by_root(r.child_id) as id,
connect_by_root(t.name) as name,
t.name as name_parent,
t.attr as attribute_parent
from r join t on r.child_id = t.id
where connect_by_isleaf = 1
start with r.child_id = 6
connect by prior r.parent_id = r.child_id and prior t.attr is null
;
ID NAME NAME_PARENT ATTRIBUTE_PARENT
---------- ---------- ----------- ----------------
6 A6 A3 VALX
Note that this will still return a null ATTRIBUTE_PARENT, if the entire tree is walked without ever finding an ancestor with non-null ATTRIBUTE. If in fact you only want to show something in the output if an ancestor has a non-null ATTRIBUTE (and allow the output to have no rows if there is no such ancestor), you can change the where clause to where t.attr is not null. In most cases, though, you would probably want the behavior as I coded it.
I used the tables and data as posted in #TheImpaler 's answer (thank you for the create table and insert statements!)
As I commented under his answer: recursive with clause is in the SQL Standard, so it has some advantages over connect by. However, whenever the same job can be done with connect by, it's worth at least testing it that way too. In many cases, due to numerous optimizations Oracle has come up with over time, connect by will be much faster.
One reason some developers avoid connect by is that they don't spend the time to learn the various features (like the ones I used here). Not a good reason, in my opinion.
This is my first post, I am trying to make a sql tree table that traverses. For example, If a person clicks on a drop down list called Categories, it will display Electric, and InterC. Then, if the user clicks on electric, it will drop down relays and switches, next if the person clicks on relays it will drop down X relays and if the person clicks on switches it will drop down Y switches. I have attempted below , but the part i don't understand is if i have another category InterC, how do I make that another level of drop downs ?
Table Category
insert test select 1, 0,'Electric'
insert test select 2, 1,'Relays'
insert test select 3, 1,'Switches'
insert test select 5, 2,'X Relays'
insert test select 6, 2,'Y Switches'
insert test select 7, 0,'InterC'
insert test select 8, 1,'x Sockets'
insert test select 9, 1,'y Sockets'
insert test select 10, 2,'X Relays'
insert test select 11, 2,'Y Relays'
;
create table test(id int,parentId int,name varchar(50))
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test WHERE parentid = 0
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Your hierarchical T-SQL query should return all the records in the table, both those under Electric and InterC.
However, you should make parentId nullable and have the root records have a null rather than 0. That will let you add a foreign key that protects your data integrity (it won't be possible to add orphaned records by mistake).
You hierarchy query returns all of your records, I'm guessing that you want to return just one at a time - for that add a where condition to the starting query.
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test
WHERE name = #category AND
parentId is null
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Then set #category to 'Electric' or'InterC' to get one or the other hierarchy.
I have a heirarchical table in the format
CREATE TABLE tree_hierarchy (
id NUMBER (20)
,parent_id NUMBER (20)
);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (2, 1);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (4, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (9, 4);
When I run the Query:-
SELECT id,parent_id,
CONNECT_BY_ISLEAF leaf,
LEVEL,
SYS_CONNECT_BY_PATH(id, '/') Path,
SYS_CONNECT_BY_PATH(parent_id, '/') Parent_Path
FROM tree_hierarchy
WHERE CONNECT_BY_ISLEAF<>0
CONNECT BY PRIOR id = PARENT_id
ORDER SIBLINGS BY ID;
Result I am Getting is like this:-
"ID" "PARENT_ID" "LEAF" "LEVEL" "PATH" "PARENT_PATH"
9 4 1 3 "/2/4/9" "/1/2/4"
9 4 1 2 "/4/9" "/2/4"
9 4 1 1 "/9" "/4"
But I need an Oracle Sql Query That gets me only this
"ID" "PARENT_ID" "LEAF" "LEVEL" "PATH" "PARENT_PATH"
9 4 1 3 "/2/4/9" "/1/2/4"
This is a simpler example I have more that 1000 records in such fashion.When I run the above query,It is generating many duplicates.Can any one give me a generic query that will give complete path from leaf to root with out duplicates.Thanks for the help in advance
The root node in finite hierarchy must be always known.
According to the definition: http://en.wikipedia.org/wiki/Tree_structure
the root node is a node that has no parents.
To check if a given node is a root node, take "parent_id" and check in the table if exists a record with this id.
The query might look like this:
SELECT id,parent_id,
CONNECT_BY_ISLEAF leaf,
LEVEL,
SYS_CONNECT_BY_PATH(id, '/') Path,
SYS_CONNECT_BY_PATH(parent_id, '/') Parent_Path
FROM tree_hierarchy th
WHERE CONNECT_BY_ISLEAF<>0
CONNECT BY PRIOR id = PARENT_id
START WITH not exists (
select 1 from tree_hierarchy th1
where th1.id = th.parent_id
)
ORDER SIBLINGS BY ID;
You should point the id patently to build the path for. Now your query is building the path for all leaves which satisfy your condition. You need to use "start with" Let's try it like this:
SELECT id,parent_id,
CONNECT_BY_ISLEAF leaf,
LEVEL,
SYS_CONNECT_BY_PATH(id, '/') Path,
SYS_CONNECT_BY_PATH(parent_id, '/') Parent_Path
FROM tree_hierarchy
WHERE CONNECT_BY_ISLEAF<>0
CONNECT BY PRIOR id = PARENT_id
START WITH id = 2
ORDER SIBLINGS BY ID;
I do have a table with list of files. There is id_folder, id_parrent_folder, size (file size):
create table sample_data (
id_folder bigint ,
id_parrent_folder bigint,
size bigint
);
I would like to know, how many files are in every subfolder (including current folder) for each folder (starting wigh given folder). Given the samle data posted below I expect the following output:
id_folder files
100623 35
100624 14
Sample data:
insert into sample_data values (100623,58091,60928);
insert into sample_data values (100623,58091,59904);
insert into sample_data values (100623,58091,54784);
insert into sample_data values (100623,58091,65024);
insert into sample_data values (100623,58091,25600);
insert into sample_data values (100623,58091,31744);
insert into sample_data values (100623,58091,27648);
insert into sample_data values (100623,58091,39424);
insert into sample_data values (100623,58091,30720);
insert into sample_data values (100623,58091,71168);
insert into sample_data values (100623,58091,68608);
insert into sample_data values (100623,58091,34304);
insert into sample_data values (100623,58091,46592);
insert into sample_data values (100623,58091,35328);
insert into sample_data values (100623,58091,29184);
insert into sample_data values (100623,58091,38912);
insert into sample_data values (100623,58091,38400);
insert into sample_data values (100623,58091,49152);
insert into sample_data values (100623,58091,14444);
insert into sample_data values (100623,58091,33792);
insert into sample_data values (100623,58091,14789);
insert into sample_data values (100624,100623,16873);
insert into sample_data values (100624,100623,32768);
insert into sample_data values (100624,100623,104920);
insert into sample_data values (100624,100623,105648);
insert into sample_data values (100624,100623,31744);
insert into sample_data values (100624,100623,16431);
insert into sample_data values (100624,100623,46592);
insert into sample_data values (100624,100623,28160);
insert into sample_data values (100624,100623,58650);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
insert into sample_data values (100624,100623,162);
I've tried to use example from postgresql (postgresql docs), but it (obviously) can't work this way. Any help appreciated.
-- Edit
I've tried the following query:
WITH RECURSIVE included_files(id_folder, parrent_folder, dist_last_change) AS (
SELECT
id_folder,
id_parrent_folder,
size
FROM
sample_data p
WHERE
id_folder = 100623
UNION ALL
SELECT
p.id_folder,
p.id_parrent_folder,
p.size
FROM
included_files if,
sample_data p
WHERE
p.id_parrent_folder = if.id_folder
)
select * from included_files
This won't work, because for every child there is a lot of parents and as a result rows in child folders are multiplied.
With your sample data, this returns what you want. I'm not 100% sure though that it will cover all possible anomalies in your tree:
with recursive folder_sizes as (
select id_folder, id_parent_folder, count(*) as num_files
from sample_data
group by id_folder, id_parent_folder
),
folder_tree as (
select id_folder, id_parent_folder, num_files as total_files
from folder_sizes
where id_parent_folder = 100623
union all
select c.id_folder, c.id_parent_folder, c.num_files + p.total_files as total_files
from folder_sizes c
join folder_tree p on p.id_parent_folder = c.id_folder
)
select id_folder, id_parent_folder, total_files
from folder_tree;
Here is a SQLFiddle demo: http://sqlfiddle.com/#!12/bb942/2
This only covers a single level hierarchy though (because of the id_parent_folder = 100623 condition). To cover any number of levels, I can only think of a two step approach, that first collects all sub-folders and then walks that tree up again, to calculate the total number of files.
Something like this:
with recursive folder_sizes as (
select id_folder, id_parent_folder, count(*) as num_files
from sample_data
group by id_folder, id_parent_folder
),
folder_tree_down as (
select id_folder, id_parent_folder, num_files, id_folder as root_folder, 1 as level
from folder_sizes
union all
select c.id_folder, c.id_parent_folder, c.num_files, p.root_folder, p.level + 1 as level
from folder_sizes c
join folder_tree_down p on p.id_folder = c.id_parent_folder
),
folder_tree_up as (
select id_folder, id_parent_folder, num_files as total_files, level
from folder_tree_down
where root_folder = 100623
union all
select c.id_folder, c.id_parent_folder, c.num_files + p.total_files as total_files, p.level
from folder_tree_down c
join folder_tree_up p on p.id_parent_folder = c.id_folder
)
select id_folder, id_parent_folder, total_files
from folder_tree_up
where level > 1;
That produces the same output as the first statement, but I think it should work with an unlimited number of levels.
Very nice problem to think about, I upvoted!
As I see it, 2 cases to think about:
multi-level paths and
multi-child nodes.
So far I've came up with the following query:
WITH RECURSIVE tree AS (
SELECT id_folder id, array[id_folder] arr
FROM sample_data sd
WHERE NOT EXISTS (SELECT 1 FROM sample_data s
WHERE s.id_parrent_folder=sd.id_folder)
UNION ALL
SELECT sd.id_folder,t.arr||sd.id_folder
FROM tree t
JOIN sample_data sd ON sd.id_folder IN (
SELECT id_parrent_folder FROM sample_data WHERE id_folder=t.id))
,ids AS (SELECT DISTINCT id, unnest(arr) ua FROM tree)
,agg AS (SELECT id_folder id,count(*) cnt FROM sample_data GROUP BY 1)
SELECT ids.id, sum(agg.cnt)
FROM ids JOIN agg ON ids.ua=agg.id
GROUP BY 1
ORDER BY 1;
I've added the following rows to the sample_data:
INSERT INTO sample_data VALUES (100625,100623,123);
INSERT INTO sample_data VALUES (100625,100623,456);
INSERT INTO sample_data VALUES (100625,100623,789);
INSERT INTO sample_data VALUES (100626,100625,1);
This query is not optimal though and will be slowing down as number of rows grows.
Full-scale tests
In order to simulate original situation, I've done a small python script that scans filesystem and stores it into the database (thus the delay, I'm not yet good at python scripting).
The following tables had been created:
CREATE TABLE fs_file(file_id bigserial, name text, type char(1), level int4);
CREATE TABLE fs_tree(file_id int8, parent_id int8, size int8);
Scanning whole filesystem of my MBP took 7.5 minutes and I have 870k entries in the fs_tree table, which is quite similar to the original task. After upload, the following was run:
CREATE INDEX i_fs_tree_1 ON fs_tree(file_id);
CREATE INDEX i_fs_tree_2 ON fs_tree(parent_id);
VACUUM ANALYZE fs_file;
VACUUM ANALYZE fs_tree;
I've tried running my first query on this data and had to kill it after aprx 1 hour. The improved one takes round 2 minutes (on my MBP) to do the job on the whole filesystem. Here it comes:
WITH RECURSIVE descent AS (
SELECT fs.file_id grp, fs.file_id, fs.size, 1 k, 0 AS lvl
FROM fs_tree fs
WHERE fs.parent_id = (SELECT file_id FROM fs_file WHERE name = '/')
UNION ALL
SELECT DISTINCT CASE WHEN k.k=0 THEN d.grp ELSE fs.file_id END AS grp,
fs.file_id, fs.size, k.k, d.lvl+1
FROM descent d
JOIN fs_tree fs ON d.file_id=fs.parent_id
CROSS JOIN generate_series(0,1) k(k))
/* the query */
SELECT grp, file_id, size, k, lvl
FROM descent
ORDER BY 1,2,3;
Query uses my table names, but it shouldn't be difficult to change it. It will build a set of groups for each file_id found in the fs_tree. To get the desired output, you can do something like:
SELECT grp AS file_id, count(*), sum(size)
FROM descent GROUP BY 1;
Some notes:
query will work only if there're no duplicates. I think it is a right way to go, 'cos it is impossible to have 2 equally named entries in a single directory;
query doesn't care bout the depth or sibling count of the tree, though this does have impact on the performance;
for me it was good experience, as similar functionality is needed also for task planning systems (I'm working with one at the moment);
as tasks are considered, single entry can have multiple parents (but not otherwise) and query will still work;
this problem can be solved in other ways too, like traversing the tree in ascending order, or using pre-calculated values to avoid the final grouping step, but this is getting a bit bigger then a simple question, so I live it as an exercise for you.
Recommendations
To get this query work, you should prepare your data by aggregating it:
WITH RECURSIVE
fs_tree AS (
SELECT id_folder file_id, id_parrent_folder parent_id,
sum(size) AS size, count(*) AS cnt
FROM sample_data GROUP BY 1,2)
,descent AS (
SELECT fs.file_id grp, fs.file_id, fs.size, fs.cnt, 1 k, 0 AS lvl
FROM fs_tree fs
WHERE fs.parent_id = 58091
UNION ALL
SELECT DISTINCT CASE WHEN k.k=0 THEN d.grp ELSE fs.file_id END AS grp,
fs.file_id, fs.size, fs.cnt, k.k, d.lvl+1
FROM descent d
JOIN fs_tree fs ON d.file_id=fs.parent_id
CROSS JOIN generate_series(0,1) k(k))
/* the query */
SELECT grp file_id, sum(size) size, sum(cnt) cnt
FROM descent
GROUP BY 1
ORDER BY 1,2,3;
In order to speed things up, you can implement Materialized Views and pre-calculate some metrics.
Sample data
Here's a small dump that will show the data inside the tables:
INSERT INTO fs_file VALUES (1, '/Users/viy/prj/logs', 'D', 0),
(2, 'jobs', 'D', 1),
(3, 'pg_csv_load', 'F', 2),
(4, 'pg_logs', 'F', 2),
(5, 'logs.sql', 'F', 1),
(6, 'logs.sql~', 'F', 1),
(7, 'pgfouine-1.2.tar.gz', 'F', 1),
(8, 'u.sql', 'F', 1),
(9, 'u.sql~', 'F', 1);
INSERT INTO fs_tree VALUES (1, NULL, 0),
(2, 1, 0),
(3, 2, 936),
(4, 2, 706),
(5, 1, 4261),
(6, 1, 4261),
(7, 1, 793004),
(8, 1, 491),
(9, 1, 491);
Note, that I've slightly updated create statements.
And this is the script I've used to scan the filesystem:
#!/usr/bin/python
import os
import psycopg2
import sys
from stat import *
def walk_tree(full, parent, level, call_back):
'''recursively descend the directory tree rooted at top,
calling the callback function for each regular file'''
if not os.access(full, os.R_OK):
return
for f in os.listdir(full):
path = os.path.join(full, f)
if os.path.islink(path):
# It's a link, register and continue
e = entry(f, "L", level)
call_back(parent, e, 0)
continue
mode = os.stat(path).st_mode
if S_ISDIR(mode):
e = entry(f, "D", level)
call_back(parent, e, 0)
# It's a directory, recurse into it
try:
walk_tree(path, e, level+1, call_back)
except OSError:
pass
elif S_ISREG(mode):
# It's a file, call the callback function
call_back(parent, entry(f, "F", level), os.stat(path).st_size)
else:
# It's unknown, just register
e = entry(f, "U", level)
call_back(parent, e, 0)
def register(parent, entry, size):
db_cur.execute("INSERT INTO fs_tree VALUES (%s,%s,%s)",
(entry, parent, size))
def entry(name, type, level):
db_cur.execute("""INSERT INTO fs_file(name,type, level)
VALUES (%s, %s, %s) RETURNING file_id""",
(name, type, level))
return db_cur.fetchone()[0]
db_con=psycopg2.connect("dbname=postgres")
db_cur=db_con.cursor()
if len(sys.argv) != 2:
raise SyntaxError("Root directory expected!")
if not S_ISDIR(os.stat(sys.argv[1]).st_mode):
raise SyntaxError("A directory is wanted!")
e=entry(sys.argv[1], "D", 0)
register(None, e, 0)
walk_tree(sys.argv[1], e, 1, register)
db_con.commit()
db_cur.close()
db_con.close()
This script is for Python 3.2 and is based on the example from official python documentation.
Hope this clarifies things for you.