PostgreSQL finding nearest parent value of a certain level - sql

*I apologize, my tables displayed correctly when I was writing this question and after publishing the formatting looks off. trying to fix that now
I am trying to write a query in postgresql that would return, for any given child value, the nearest parent value that has reached a certain rank. Currently, I have this query, which displays the entire hierarchical path for any given child value-
WITH RECURSIVE tree AS (
SELECT "ChildDisplayID",
"ParentID",
"Rank",
1 as level
FROM table1
WHERE "ChildDisplayID" = {{some ChildID}}
UNION ALL
SELECT t1."ChildDisplayID",
t1."ParentID",
t1."Rank",
t.level + 1
FROM table1 t1
JOIN tree t ON t."ParentID" = t1."ChildDisplayID"
)
SELECT *
FROM tree
What I want to do is have in a single row that displays the child ID and the parent ID of the nearest parent whose rank is "Partner". For example, here is the output I am currently getting:
| ChildID | ParentID | Rank | Level |
|---------|----------|------|-------|
| 6 | 5 |Associate Manager| 1 |
| 5 | 4 |Manager| 2 |
| 4 | 3 |Associate Partner| 3 |
| 3 | 2 |Partner| 4 |
| 2 | 1 |Partner| 5 |
| 1 | |CEO| 6 |
Here is the output I want:
|ChildID | Nearest Partner | Rank |
|--------|----------|------|
|6 |3 | Partner |
What is the best way to do this?

You can put a stop condition on the first matching partner in the recursion, then filter the result:
WITH RECURSIVE tree AS (
SELECT "ChildDisplayID" as initialid, "ChildDisplayID", "ParentID", "Rank", 1 as level
FROM table1
WHERE "ChildDisplayID" = {{some ChildID}}
UNION ALL
SELECT t.initialid, t1."ChildDisplayID", t1."ParentID", t1."Rank", t.level + 1
FROM table1 t1
INNER JOIN tree t ON t."ParentID" = t1."ChildDisplayID"
WHERE t."Rank" <> 'Partner'
)
SELECT *
FROM tree
WHERE "Rank" = 'Partner'
It seems like you have a hierarchy where each child has just one parent, so there should be only one match, or no match at all.

Related

How to write a CTE to aggregate hierarchical values

I want to write expressions in sqlite to process a tree of items, starting with the leaf nodes (the bottom) and proceeding back to their parents all the way to the root node (the top), such that each parent node is updated based on the content of its children. I've been able to write a CTE that does something similar, but isn't yet totally correct.
I have a simple table "test1" containing some nested values:
id | parent | value | total
---+--------+--------------
1 | NULL | NULL | NULL
2 | 1 | NULL | NULL
3 | 2 | NULL | NULL
4 | 3 | 50 | NULL
5 | 3 | 50 | NULL
6 | 2 | 60 | NULL
7 | 6 | 90 | NULL
8 | 6 | 60 | NULL
Rows may have children who reference their parent via their parent field. Rows may have a value of their own as well as child rows, or they may simply be parents without values (ie. "wrappers"). The leafs would be the rows without any children.
For each row I'd like to calculate the total, as the average or the row's value (if not null) AND its children's totals. This should start with the leaf nodes and proceed up the tree to their parents, all the way to the root node at the top of the data hierarchy.
I've tried a number of variations of CTE's but am having difficulty writing one that will recursively calculate these totals from the bottom up.
Currently, I have:
UPDATE test1 SET total = (
WITH RECURSIVE cte(cte_id,cte_parent,cte_value,cte_total) AS (
SELECT test1.id, test1.parent, test1.value, test1.total
UNION ALL
select t.id, t.parent, t.value, t.total from test1 t, cte
WHERE cte.cte_id=t.parent
) SELECT AVG(cte_value) FROM cte
);
which produces:
id | parent | value | total
---+--------+-------+------
1 | NULL | NULL | 62
2 | 1 | NULL | 62
3 | 2 | NULL | 50
4 | 3 | 50 | 50
5 | 3 | 50 | 50
6 | 2 | 60 | 70
7 | 6 | 90 | 90
8 | 6 | 60 | 60
Looking at the top-most rows, this is not quite right, since it's taking an average of not only the row's immediate children, but of all the row's descendants. This causes row 2 for example to have a total of 62 instead of 60. The expected results should set rows 2's total to 60, as the average of its immediate child rows 3 and 6. Row 1's total would be 60 as well.
How can I calculate a "total" value for each row based on an average of the row's value and the values of it's immediate children only, while ensuring the upper levels of the hierarchy are correctly populated based on the calculated totals of their children?
It turns out that a very similar question and solution was posted here:
How can I traverse a tree bottom-up to calculate a (weighted) average of node values in PostgreSQL?
Since sqlite3 doesn't let you create functions, the example using a recursive CTE applies:
with recursive cte(id, parent, value, level, total) as (
select
t.id, t.parent, t.value,
0,
t.value as total
from test1 t
where not exists (
select id
from test1
where parent = t.id)
union all
select
t.id, t.parent, t.value,
c.level+1,
case when t.value is null then c.total else t.value end
from test1 t
join cte c on t.id=c.parent
)
select id, parent, value, avg(total) total from (
select
id, parent, value, level, avg(total) total
from cte
group by id,parent,level
)
group by id, parent
order by id

Select records by parents and childs

I have a table with records when some records are parents of other.
I have a column parent that with zero value means that is the parent. When is a child it has the id of the parent record.
What I need is to list the record by order, parent and childs.
My table TOOLS:
ID | order | parent | name
100 | 1 | 0 | X
200 | 2 | 0 | Y
150 | 0 | 100 | X.1
300 | 0 | 200 | Y.1
I need the following result:
ID | order | parent | name
100 | 1 | 0 | X
150 | 0 | 100 | X.1
200 | 2 | 0 | Y
300 | 0 | 200 | Y.1
How can I order this?
If I use this query
select t.*
from t
order by (case when parent = 0 then id else parent end), order desc;
The result is this:
ID | order | parent | name
200 | 2 | 0 | Y
300 | 0 | 200 | Y.1
100 | 1 | 0 | X
150 | 0 | 100 | X.1
And if I change the to order asc it put the records with order = 0 at the top...
Look this example
Thank you
You need to construct a value the provides the desired order. One way to do that is to construct a materialized path that encodes your order criteria. Since you are using SQL Server you could use its hierarchyid data type for this, however, the real magic is in building the materialized path itself, of which I'll show two versions, one used with the hierarchyid (path) and one without (path2):
with cte as (
-- Anchor Part
select id, [order], parent, name, 1 level
, cast('/'+cast(tools.id as varchar(10))+'/' as varchar(4000)) path
, cast(cast(tools.id as binary(4)) as varbinary(4000)) path2
from TOOLS
where parent = 0
union all
-- Recursive Part
select tools.*
, level+1
, cast(path+cast(tools.id as varchar(10))+'/' as varchar(4000))
, cast(path2+cast(tools.id as binary(4)) as varbinary(4000))
from tools
join cte
on tools.parent = cte.id
)
select id
, [order]
, parent
, name
, level
, path
, cast(path as hierarchyid) hid
, path2
from cte order by hid
In the above query, a recursive common table expression is used to walk the tree and build the two example paths. The first path, while it would work as a sort column on its own with your current IDs, would begin to fail as soon as you had IDs with a length other than three digits as the levels most significant digits would not necessarily align correctly. The cast to heirarchyid in the final query resolves that issue. The second path can be use directly as a sort key since it casts each level of the path to a four byte binary value, as such it can handle IDs up to 4,294,967,295 and path a length (tree depth) of 1000 levels and does not need a final cast to any other data types.
THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.
In this case, you can use order by:
select t.*
from t
order by (case when parent = 0 then id else parent end), order desc;
Here is a SQL Fiddle that demonstrates that this works on your original question.

Best Way to Join One Column on Columns From Two Other Tables

I have a schema like the following in Oracle
Section:
+--------+----------+
| sec_ID | group_ID |
+--------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+--------+----------+
Section_to_Item:
+--------+---------+
| sec_ID | item_ID |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
+--------+---------+
Item:
+---------+------+
| item_ID | data |
+---------+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+---------+------+
Item_Version:
+---------+----------+--------+
| item_ID | start_ID | end_ID |
+---------+----------+--------+
| 1 | 1 | |
| 2 | 1 | 3 |
| 3 | 2 | |
| 4 | 1 | 2 |
+---------+----------+--------+
Section_to_Item has FK into Section and Item on the *_ID columns.
Item_version is indexed on item_ID but has no FK to Item.item_ID (ran out of space in the snapshot group).
I have code that receives a list of version IDs and I want to get all items in sections in a given group that are valid for at least one of the versions passed in. If an item has no end_ID, it's valid for anything starting with start_ID. If it has an end_id, it's valid for anything up until (not including) end_ID.
What I currently have is:
SELECT Items.data
FROM Section, Section_to_Items, Item, Item_Version
WHERE Section.group_ID = 1
AND Section_to_Item.sec_ID = Section.sec_ID
AND Item.item_ID = Section_to_Item.item_ID
AND Item.item_ID = Item_Version.item_ID
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that the UNION ALL statement is dynamically generated from the list of passed in versions.
This query currently does a cartesian join and is very slow.
For some reason, if I change the query to join
AND Item_Version.item_ID = Section_to_Item.item_ID
which is not a FK, the query does not do the cartesian join and is much faster.
A) Can anyone explain why this is?
B) Is this the right way to be joining this sequence of tables (I feel weird about joining Item.item_ID to two different tables)
C) Is this the right way to get versions between start_ID and end_ID?
Edit
Same query with inner join syntax:
SELECT Items.data
FROM Item
INNER JOIN Section_to_Items ON Section_to_Items.item_ID = Item.item_ID
INNER JOIN Section ON Section.sec_ID = Section_to_Items.sec_ID
INNER JOIN Item_Version ON Item_Version.item_ID = Item_.item_ID
WHERE Section.group_ID = 1
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that in this case the performance difference comes from joining on Item_Version first and then joining Section_to_Item on Item_Version.item_ID.
In terms of table size, Section_to_Item, Item, and Item_Version should be similar (1000s) while Section should be small.
Edit
I just found out that apparently, the schema has no FKs. The FKs specified in the schema configuration files are ignored. They're just there for documentation. So there's no difference between joining on a FK column or not. That being said, by changing the joins into a cascade of SELECT INs, I'm able to avoid joining the entire Item table twice. I don't love the resulting query, and I don't really understand the difference, but the stats indicate it's much less work (changes the A-Rows returned from the inner most scan on Section from 656,000 to 488 (it used to be 656k starts returning 1 row, now it's 488 starts returning 1 row)).
Edit
It turned out to be stale statistics - the two queries were equivalent the whole time but with the incomplete statistics, the DB happened to notice the correct plan only in the second instance. After updating statistics, both queries generated the same plan.
I'm not sure if this is the best idea but this seems to avoid the cartesian join:
select data
from Item
where item_ID in (
select item_ID
from Item_Version
where item_ID in (
select item_ID
from Section_to_Item
where sec_ID in (
select sec_ID
from Section
where group_ID = 1
)
)
and exists (
select 1
from (
select 2 as version
from dual
union all
select 3 as version
from dual
) versions
where versions.version >= start_ID
and (end_ID is null or versions.version <)
)
)

SQL Server: Select hierarchically related items from one table

Say, I have an organizational structure that is 5 levels deep:
CEO -> DeptHead -> Supervisor -> Foreman -> Worker
The hierarchy is stored in a table Position like this:
PositionId | PositionCode | ManagerId
1 | CEO | NULL
2 | DEPT01 | 1
3 | DEPT02 | 1
4 | SPRV01 | 2
5 | SPRV02 | 2
6 | SPRV03 | 3
7 | SPRV04 | 3
... | ... | ...
PositionId is uniqueidentifier. ManagerId is the ID of employee's manager, referring PositionId from the same table.
I need a SQL query to get the hierarchy tree going down from a position, provided as parameter, including the position itself. I managed to develop this:
-- Select the original position itself
SELECT
'Rank' = 0,
Position.PositionCode
FROM Position
WHERE Position.PositionCode = 'CEO' -- Parameter
-- Select the subordinates
UNION
SELECT DISTINCT
'Rank' =
CASE WHEN Pos2.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos3.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos4.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos5.PositionCode IS NULL THEN 0 ELSE 1
END
END
END
END,
'PositionCode' = RTRIM(ISNULL(Pos5.PositionCode, ISNULL(Pos4.PositionCode, ISNULL(Pos3.PositionCode, Pos2.PositionCode)))),
FROM Position Pos1
LEFT JOIN Position Pos2
ON Pos1.PositionId = Pos2.ManagerId
LEFT JOIN Position Pos3
ON Pos2.PositionId = Pos3.ManagerId
LEFT JOIN Position Pos4
ON Pos3.PositionId = Pos4.ManagerId
LEFT JOIN Position Pos5
ON Pos4.PositionId = Pos5.ManagerId
WHERE Pos1.PositionCode = 'CEO' -- Parameter
ORDER BY Rank ASC
It works not only for 'CEO' but for any position, displaying its subordinates. Which gives me the following output:
Rank | PositionCode
0 | CEO
... | ...
2 | SPRV55
2 | SPRV68
... | ...
3 | FRMN10
3 | FRMN12
... | ...
4 | WRKR01
4 | WRKR02
4 | WRKR03
4 | WRKR04
My problems are:
The output does not include intermediate nodes - it will only output end nodes, i.e. workers and intermediate managers which have no subordinates. I need all intermediate managers as well.
I have to manually UNION the row with original position on top of the output. I there any more elegant way to do this?
I want the output to be sorted in hieararchical tree order. Not all DeptHeads, then all Supervisors, then all Foremen then all workers, but like this:
Rank | PositionCode
0 | CEO
1 | DEPT01
2 | SPRV01
3 | FRMN01
4 | WRKR01
4 | WRKR02
... | ...
3 | FRMN02
4 | WRKR03
4 | WRKR04
... | ...
Any help would be greatly appreciated.
Try a recursive CTE, the example on TechNet is almost identical to your problem I believe:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
Thx, everyone suggesting CTE. I got the following code and it's working okay:
WITH HierarchyTree (PositionId, PositionCode, Rank)
AS
(
-- Anchor member definition
SELECT PositionId, PositionCode,
0 AS Rank
FROM Position AS e
WHERE PositionCode = 'CEO'
UNION ALL
-- Recursive member definition
SELECT e.PositionId, e.PositionCode,
Rank + 1
FROM Position AS e
INNER JOIN HierarchyTree AS d
ON e.ManagerId = d.PositionId
)
SELECT Rank, PositionCode
FROM HierarchyTree
GO
I had a similar problem to yours on a recent project but with a variable recursion length - typically between 1 and 10 levels.
I wanted to simplify the SQL side of things so I put some extra work into the logic of storing the recursive elements by storing a "hierarchical path" in addition to the direct manager Id.
So a very contrived example:
Employee
Id | JobDescription | Hierarchy | ManagerId
1 | DIRECTOR | 1\ | NULL
2 | MANAGER 1 | 1\2\ | 1
3 | MANAGER 2 | 1\3\ | 1
4 | SUPERVISOR 1 | 1\2\4 | 2
5 | SUPERVISOR 2 | 1\3\5 | 3
6 | EMPLOYEE 1 | 1\2\4\6 | 4
7 | EMPLOYEE 2 | 1\3\5\7 | 5
This means you have the power to very quickly query any level of the tree and get all descendants by using a LIKE query on the Hierarchy column
For example
SELECT * FROM dbo.Employee WHERE Hierarchy LIKE '\1\2\%'
would return
MANAGER 1
SUPERVISOR 1
EMPLOYEE 1
Additionally you can also easily get one level of the tree by using the ManagerId column.
The downside to this approach is you have to construct the hierarchy when inserting or updating records but believe me when I say this storage structure saved me a lot of pain later on without the need for unnecessary query complexity.
One thing to note is that my approach gives you the raw data - I then parse the result set into a recursive strongly typed structure in my services layer. As a rule I don't tend to format output in SQL.

Value in one field as lookup from same table

I'm certain this is very easy, but I am very poor at database stuff...
I have the following table in access 2003:
title | id
/root | 1
/root/x | 2
/root/x/y | 3
/root/x/y/z | 4
/root/x/a | 5
/root/x/a/b | 6
i.e. a bunch of nodes and id numbers - you can see that /root/x is the parent of /root/x/y. I'd like to create another table which has a list of all the nodes, along with the id's of their parents. i.e:
id | parent id
1 | -
2 | 1
3 | 2
4 | 3
5 | 2
6 | 5
The follwing will give me the id and the value of the parent:
select id, left(c.title, instrrev(c.title, "/")-1) as parentValue from nodeIDs
yields
id | parentNode
1 |
2 | /root
3 | /root/x
4 | /root/x/y
5 | /root/x
6 | /root/x/a
What is the extra step needed to return the id's of those parent nodes, rather than their values, i.e, return '1' instead of '/root' in that last table?
Many thanks
Something like this perhaps:
select c.id,
left(c.title, instrrev(c.title, "/")-1) as parentValue
, p.id as parentID
from nodeIDs c
left join
nodeIDs p
on left(c.title, instrrev(c.title, "/")-1) = p.title
Something along these lines, I think.
select t1.id,
left(t1.title, instrrev(t1.title, "/")-1) as parentNode,
t2.id as parentID
from nodeIDs t1
inner join nodeIDs t2 on (left(t1.title, instrrev(t1.title, "/")-1)) = t2.title
I don't have any easy way to test this. But the basic idea is that, having derived the title of the parent node, you can do an inner join on it to get the associated id number.