Select next(sibling) and previous(sibling) from a SQLite DB tree representation - sql

I've got the following sample tree structure in an SQLite DB:
id | idparent | order
1 -1 0
2 1 0
3 2 0
4 1 1
Which represents the following tree:
1
|- 2
| |- 3
|- 4
And I would like to create a select statement to get all the nodes next, previous, next sibling and previous sibling. Such a statement would return from the previous sample:
id | idparent | next | previous | nextsibling | previoussibling
1 -1 2 NULL NULL NULL
2 1 3 1 4 NULL
3 2 4 2 NULL NULL
4 1 NULL 3 NULL 2
I can get the nextsibling and previoussiblingbut I'm stuck with the next and previous:
SELECT node.id, node.idparent,
??? AS next
??? AS previous,
(SELECT id FROM nodes WHERE idparent = node.idparent AND `order` > node.`order` LIMIT 1) AS nextsibbling,
(SELECT id FROM nodes WHERE idparent = node.idparent AND `order` < node.`order` LIMIT 1) AS previoussibbling
FROM nodes node;
I guess I need to use the WHERE NOT EXISTS clause but I can't figure out how I can achieve that. I should mention that changing the DB structure is not an option.
Thanks in advance for any help.

Your schema (what's called the "adjacency list model") is rather limited in terms of which operations it supports. Instead, try the nested set mode: store bounds for each node rather than each node's parent. A node descends from all nodes where the parent's bounds contain the child's. The bounds also give the depth first traversal of the tree, where the lower bound gives when the node is entered and the upper bound gives when the node is exited. Sorting the nodes by the left bound thus gives a pre-order traversal, and sorting by the right gives a post-order traversal.
CREATE TABLE `hier` (
`id` int(11) PRIMARY KEY AUTO_INCREMENT,
`left` int(11) NOT NULL,
`right` int(11) NOT NULL,
`data` varchar(128),
INDEX `bounds` (`left`,`right`),
INDEX `sdnuob` (`right`, `left`)
);
INSERT INTO HIER (id, `left`, `right`, data)
VALUES
(1, 1, 8, 'foo'),
(2, 2, 5, 'mane'),
(3, 3, 4, 'padme'),
(4, 6, 7, 'hum')
;
SELECT h.id AS node,
p.id AS prev, p.`left` AS p_l, n.id AS `next`, n.`left` AS n_l,
ps.id AS prevSibling, ns.id AS nextSibling
FROM hier AS h
LEFT JOIN hier AS p ON h.`left` > p.`left`
LEFT JOIN hier AS pb ON h.`left` > pb.`left`
LEFT JOIN hier AS n ON h.`left`< n.`left`
LEFT JOIN hier AS nb ON h.`left`< nb.`left`
LEFT JOIN hier AS ps ON h.`left` = ps.`right`+1
LEFT JOIN hier AS ns ON h.`right`= ns.`left`-1
GROUP BY node, prevSibling, nextSibling, p.`left`, n.`left`
HAVING (p.`left` IS NULL OR p.`left` = MAX(pb.`left`))
AND (n.`left` IS NULL OR n.`left` = MIN(nb.`left`))
;
Result:
+------+------+------+------+------+-------------+-------------+
| node | prev | p_l | next | n_l | prevSibling | nextSibling |
+------+------+------+------+------+-------------+-------------+
| 1 | NULL | NULL | 2 | 2 | NULL | NULL |
| 2 | 1 | 1 | 3 | 3 | NULL | 4 |
| 3 | 2 | 2 | 4 | 6 | NULL | NULL |
| 4 | 3 | 3 | NULL | NULL | 2 | NULL |
+------+------+------+------+------+-------------+-------------+
If you really need to find a node's parent (or depth), you can use a view, or use the technique applied in the view to a query:
CREATE VIEW hier_depth AS
SELECT c.*, p.id AS parent, p.`left` AS p_left, COUNT(a.id) AS depth
FROM hier AS c
LEFT JOIN hier AS p ON p.`left` < c.`left` AND c.`right` < p.`right`
LEFT JOIN hier AS a ON a.`left` < c.`left` AND c.`right` < a.`right`
GROUP BY c.id, parent
HAVING p.`left` IS NULL OR p.`left` = MAX(a.left)
;

I don't think your schema supports a next query. IIUC, you might need to go up multiple levels to determine the next node.
I recommend to add a path column, which takes colon-separated paths as values, such as 1:2:3 or 1:4. The next node will then be the next one in path order.

Related

PostgreSQL finding nearest parent value of a certain level

*I apologize, my tables displayed correctly when I was writing this question and after publishing the formatting looks off. trying to fix that now
I am trying to write a query in postgresql that would return, for any given child value, the nearest parent value that has reached a certain rank. Currently, I have this query, which displays the entire hierarchical path for any given child value-
WITH RECURSIVE tree AS (
SELECT "ChildDisplayID",
"ParentID",
"Rank",
1 as level
FROM table1
WHERE "ChildDisplayID" = {{some ChildID}}
UNION ALL
SELECT t1."ChildDisplayID",
t1."ParentID",
t1."Rank",
t.level + 1
FROM table1 t1
JOIN tree t ON t."ParentID" = t1."ChildDisplayID"
)
SELECT *
FROM tree
What I want to do is have in a single row that displays the child ID and the parent ID of the nearest parent whose rank is "Partner". For example, here is the output I am currently getting:
| ChildID | ParentID | Rank | Level |
|---------|----------|------|-------|
| 6 | 5 |Associate Manager| 1 |
| 5 | 4 |Manager| 2 |
| 4 | 3 |Associate Partner| 3 |
| 3 | 2 |Partner| 4 |
| 2 | 1 |Partner| 5 |
| 1 | |CEO| 6 |
Here is the output I want:
|ChildID | Nearest Partner | Rank |
|--------|----------|------|
|6 |3 | Partner |
What is the best way to do this?
You can put a stop condition on the first matching partner in the recursion, then filter the result:
WITH RECURSIVE tree AS (
SELECT "ChildDisplayID" as initialid, "ChildDisplayID", "ParentID", "Rank", 1 as level
FROM table1
WHERE "ChildDisplayID" = {{some ChildID}}
UNION ALL
SELECT t.initialid, t1."ChildDisplayID", t1."ParentID", t1."Rank", t.level + 1
FROM table1 t1
INNER JOIN tree t ON t."ParentID" = t1."ChildDisplayID"
WHERE t."Rank" <> 'Partner'
)
SELECT *
FROM tree
WHERE "Rank" = 'Partner'
It seems like you have a hierarchy where each child has just one parent, so there should be only one match, or no match at all.

Find best match in tree given a combination of multiple keys

I have a structure / tree that looks similar to this.
CostType is mandatory and can exist by itself, but it can have a parent ProfitType or Unit and other CostTypes as children.
There can only be duplicate Units. Other cannot appear multiple times in the structure.
| ID | name | parent_id | ProfitType | CostType | Unit |
| -: | ------------- | --------: |
| 1 | Root | (NULL) |
| 2 | 1 | 1 | 300 | | |
| 3 | 1-1 | 2 | | 111 | |
| 4 | 1-1-1 | 3 | | | 8 |
| 5 | 1-2 | 2 | | 222 | |
| 6 | 1-2-1 | 5 | | 333 | |
| 7 | 1-2-1-1 | 6 | | | 8 |
| 8 | 1-2-1-2 | 6 | | | 9 |
Parameters | should RETURN |
(300,111,8) | 4 |
(null,111,8) | 4 |
(null,null,8) | first match, 4 |
(null,222,8) | best match, 5 |
(null,333,null) | 6 |
I am at a loss on how I could create a function that receives (ProfitType, CostType, Unit) and return the best matching ID from the structure.
This isn't giving exactly the answers you provided as example, but see my comment above - if (null,222,8) should be 7 to match how (null,333,8) returns 4 then this is correct.
Also note that I formatted this using temp tables instead of as a function, I don't want to trip a schema change audit so I posted what I have as temp tables, I can rewrite it as a function Monday when my DBA is available, but I thought you might need it before the weekend. Just edit the "DECLARE #ProfitType int = ..." lines to the values you want to test
I also put in quite a few comments because the logic is tricky, but if they aren't enough leave a comment and I can expand my explanation
/*
ASSUMPTIONS:
A tree can be of arbitrary depth, but will not exceed the recursion limit (defaults to 100)
All trees will include at least 1 CostType
All trees will have at most 1 ProfitType
CostType can appear multiple times in a traversal from root to leaf (can units?)
*/
SELECT *
INTO #Temp
FROM (VALUES (1,'Root',NULL, NULL, NULL, NULL)
, (2,'1', 1, 300, NULL, NULL)
, (3,'1-1', 2, NULL, 111, NULL)
, (4,'1-1-1', 3, NULL, NULL, 8)
, (5,'1-2', 2, NULL, 222, NULL)
, (6,'1-2-1', 5, NULL, 333, NULL)
, (7,'1-2-1-1', 6, NULL, NULL, 8)
, (8,'1-2-1-2', 6, NULL, NULL, 9)
) as TempTable(ID, RName, Parent_ID, ProfitType, CostType, UnitID)
--SELECT * FROM #Temp
DECLARE #ProfitType int = NULL--300
DECLARE #CostType INT = 333 --NULL --111
DECLARE #UnitID INT = NULL--8
--SELECT * FROM #Temp
;WITH cteMatches as (
--Start with all nodes that match one criteria, default a score of 100
SELECT N.ID as ReportID, *, 100 as Score, 1 as Depth
FROM #Temp AS N
WHERE N.CostType= #CostType OR N.ProfitType=#ProfitType OR N.UnitID = #UnitID
), cteEval as (
--This is a recursive CTE, it has a (default) limit of 100 recursions
--, but that can be raised if your trees are deeper than 100 nodes
--Start with the base case
SELECT M.ReportID, M.RName, M.ID ,M.Parent_ID, M.Score
, M.Depth, M.ProfitType , M.CostType , M.UnitID
FROM cteMatches as M
UNION ALL
--This is the recursive part, add to the list of matches the match when
--its immediate parent is also considered. For that match increase the score
--if the parent contributes another match. Also update the ID of the match
--to the parent's IDs so recursion can keep adding if more matches are found
SELECT M.ReportID, M.RName, N.ID ,N.Parent_ID
, M.Score + CASE WHEN N.CostType= #CostType
OR N.ProfitType=#ProfitType
OR N.UnitID = #UnitID THEN 100 ELSE 0 END as Score
, M.Depth + 1, N.ProfitType , N.CostType , N.UnitID
FROM cteEval as M INNER JOIN #Temp AS N on M.Parent_ID = N.ID
)SELECT TOP 1 * --Drop the "TOP 1 *" to see debugging info (runners up)
FROM cteEval
ORDER BY SCORE DESC, DEPTH
DROP TABLE #Temp
I'm sorry I don't have enough rep to comment.
You'll have to define "best answer" (for example, why isn't the answer to null,222,8 7 or null instead of 5?), but here's the approach I'd use:
Derive a new table where ProfitType and CostType are listed explicitly instead of only by inheritance. I would approach that by using a cursor (how awful, I know) and following the parent_id until a ProfitType and CostType is found -- or the root is reached. This presumes an unlimited amount of child/grandchild levels for parent_id. If there is a limit, then you can instead use N self joins where N is the number of parent_id levels allowed.
Then you run multiple queries against the derived table. The first query would be for an exact match (and then exit if found). Then next query would be for the "best" partial match (then exit if found), followed by queries for 2nd best, 3rd best, etc. until you've exhausted your "best" match criteria.
If you need nested parent CostTypes to be part of the "best match" criteria, then I would make duplicate entries in the derived table for each row that has multiple CostTypes with a CostType "level". level 1 is the actual CostType. level 2 is that CostType's parent, level 3 etc. Then your best match queries would return multiple rows and you'd need to pick the row with the lowest level (which is the closest parent/grandparent).

Recursive SELECT with stop condition in SQL?

My table called element looks like this:
id | successor | important
----------------------------
1 | NULL | 0
2 | 4 | 1
3 | 5 | 0
4 | 8 | 0
5 | 6 | 1
6 | 7 | 0
7 | NULL | 0
8 | 10 | 1
9 | 10 | 0
10 | NULL | 0
I start with an element’s ID. Each element may or may not have a succeeding element. So given any element ID I may build a chain of elements from 0..n elements depending on its successors and successor-successors, and so on.
Let’s say my starting ID is 2. This results in the following chain:
2 -> 4 -> 8 -> 10
Now I want to ask this question: Does a specific element chain contain at least one element where important == 1?
In pseudo-code a function realizing this without unneccessary checks may look like this:
boolean chainIsImportant(element)
{
if (element.important == 1) {
return true;
}
if (element.successor != NULL) {
return chainIsImportant(element.successor);
}
return false;
}
I guess this can be realized with WITH RECURSIVE, right? How can I stop recursion, once an element with important == 1 was found?
This is typically done by aggregating the columns in question and adding a condition on the join in the recursive part of the CTE:
with recursive all_elements as (
select id, successor, important, array[important] as important_elements
from elements
where successor is null
union all
select c.id, c.successor, c.important, p.important_elements||c.important
from elements c
join all_elements p on c.successor = p.id
where 1 <> all(p.important_elements)
)
select *
from all_elements;
Note that the condition is "flipped" because the where clause defines those rows that should be included.
CREATE TABLE booltree
( id INTEGER NOT NULL PRIMARY KEY
, successor INTEGER REFERENCES booltree(id)
, important Boolean NOT NULL
);
INSERT INTO booltree(id , successor , important) VALUES
( 1, NULL , False)
,(2, 4 , True)
,(3, 5 , False)
,(4, 8 , False)
,(5, 6 , True)
,(6, 7 , False)
,(7, NULL , False)
,(8, 10 , True)
,(9, 10 , False)
,(10, NULL , False)
;
-- SELECT * FROM booltree;
WITH RECURSIVE rec AS (
SELECT id, important
FROM booltree
WHERE successor IS NULL
UNION ALL
SELECT bt.id, GREATEST(rec.important, bt.important) AS important
FROM booltree bt
JOIN rec ON bt.successor = rec.id
)
SELECT id, important
FROM rec
ORDER BY important, id;
Result:
CREATE TABLE
INSERT 0 10
id | important
----+-----------
1 | f
6 | f
7 | f
9 | f
10 | f
2 | t
3 | t
4 | t
5 | t
8 | t
(10 rows)
Note: IMHO the recursion cannot be stopped once a True importance is found (basically, because LEFT JOINS are not allowed in RECURSIVE UNIONS)
But if you are looking for exactly one given id (or a set of them) then maybe you could use that as the start condition, and search the tree upwards.

PostgreSQL 9.3 - Compare two sets of data without duplicating values in first set

I have a group of tables that define some rules that need to be followed, for example:
CREATE TABLE foo.subrules (
subruleid SERIAL PRIMARY KEY,
ruleid INTEGER REFERENCES foo.rules(ruleid),
subrule INTEGER,
barid INTEGER REFERENCES foo.bars(barid)
);
INSERT INTO foo.subrules(ruleid,subrule,barid) VALUES
(1,1,1),
(1,1,2),
(1,2,2),
(1,2,3),
(1,2,4),
(1,3,3),
(1,3,4),
(1,3,5),
(1,3,6),
(1,3,7);
What this is defining is a set of "subrules" that need to be satisfied... if all "subrules" are satisfied then the rule is also satisfied.
In the above example, "subruleid" 1 can be satisfied with a "barid" value of 1 or 2.
Additionally, "subruleid" 2 can be satisfied with a "barid" value of 2, 3, or 4.
Likewise, "subruleid" 3 can be satisfied with a "barid" value of 3, 4, 5, 6, or 7.
I also have a data set that looks like this:
primarykey | resource | barid
------------|------------|------------
1 | A | 1
2 | B | 2
3 | C | 8
The tricky part is that once a "subrule" is satisfied with a "resource", that "resource" can't satisfy any other "subrule" (even if the same "barid" would satisfy the other "subrule")
So, what I need is to evaluate and return the following results:
ruleid | subrule | barid | primarykey | resource
------------|------------|------------|------------|------------
1 | 1 | 1 | 1 | A
1 | 1 | 2 | NULL | NULL
1 | 2 | 2 | 2 | B
1 | 2 | 3 | NULL | NULL
1 | 2 | 4 | NULL | NULL
1 | 3 | 3 | NULL | NULL
1 | 3 | 4 | NULL | NULL
1 | 3 | 5 | NULL | NULL
1 | 3 | 6 | NULL | NULL
1 | 3 | 7 | NULL | NULL
NULL | NULL | NULL | 3 | C
Interestingly, if "primarykey" 3 had a "barid" value of 2 (instead of 8) the results would be identical.
I have tried several methods including a plpgsql function that performs a grouping by "subruleid" with ARRAY_AGG(barid) and building an array from barid and checking if each element in the barid array is in the "subruleid" group via a loop, but it just doesn't feel right.
Is a more elegant or efficient option available?
The following fragment finds solutions, if there are any. The number three (resources) is hardcoded. If only one solution is needed some symmetry-breaker should be added.
If the number of resources is not bounded, I think there could be a solution by enumerating all possible tableaux (Hilbert? mixed-radix?), and selecting from them, after pruning the not-satifying ones.
-- the data
CREATE TABLE subrules
( subruleid SERIAL PRIMARY KEY
, ruleid INTEGER -- REFERENCES foo.rules(ruleid),
, subrule INTEGER
, barid INTEGER -- REFERENCES foo.bars(barid)
);
INSERT INTO subrules(ruleid,subrule,barid) VALUES
(1,1,1), (1,1,2),
(1,2,2), (1,2,3), (1,2,4),
(1,3,3), (1,3,4), (1,3,5), (1,3,6), (1,3,7);
CREATE TABLE resources
( primarykey INTEGER NOT NULL PRIMARY KEY
, resrc varchar
, barid INTEGER NOT NULL
);
INSERT INTO resources(primarykey,resrc,barid) VALUES
(1, 'A', 1) ,(2, 'B', 2) ,(3, 'C', 8)
-- ################################
-- uncomment next line to find a (two!) solution(s)
-- ,(4, 'D', 7)
;
-- all matching pairs of subrules <--> resources
WITH pairs AS (
SELECT sr.subruleid, sr.ruleid, sr.subrule, sr.barid
, re.primarykey, re.resrc
FROM subrules sr
JOIN resources re ON re.barid = sr.barid
)
SELECT
p1.ruleid AS ru1 , p1.subrule AS sr1 , p1.resrc AS one
, p2.ruleid AS ru2 , p2.subrule AS sr2 , p2.resrc AS two
, p3.ruleid AS ru3 , p3.subrule AS sr3 , p3.resrc AS three
-- self-join the pairs, excluding the ones that
-- use the same subrule or resource
FROM pairs p1
JOIN pairs p2 ON p2.primarykey > p1.primarykey -- tie-breaker
JOIN pairs p3 ON p3.primarykey > p2.primarykey -- tie breaker
WHERE 1=1
AND p2.subruleid <> p1.subruleid
AND p2.subruleid <> p3.subruleid
AND p3.subruleid <> p1.subruleid
;
Result (after uncommenting the line with missing resource) :
ru1 | sr1 | one | ru2 | sr2 | two | ru3 | sr3 | three
-----+-----+-----+-----+-----+-----+-----+-----+-------
1 | 1 | A | 1 | 1 | B | 1 | 3 | D
1 | 1 | A | 1 | 2 | B | 1 | 3 | D
(2 rows)
The resources {A,B,C} could of course be hard-coded, but that would prevent the 'D' record (or any other) to serve as the missing link.
Since you are not clarifying the question, I am going with my own assumptions.
subrule numbers are ascending without gaps for each rule.
(subrule, barid) is UNIQUE in table subrules.
If a there are multiple resources for the same barid, assignments are arbitrary among these peers.
As commented, the number of resources matches the number of subrules (which has no effect on my suggested solution).
The algorithm is as follows:
Pick the subrule with the smallest subrule number.
Assign a resource to the lowest barid possible (the first that has a matching resource), which consumes the resource.
After the first resource is matched, skip to the next higher subruleid and repeat 2.
Append all remaining resources after last subrule.
You can implement this with pure SQL using a recursive CTE:
WITH RECURSIVE cte AS ((
SELECT s.*, r.resourceid, r.resource
, CASE WHEN r.resourceid IS NULL THEN '{}'::int[]
ELSE ARRAY[r.resourceid] END AS consumed
FROM subrules s
LEFT JOIN resource r USING (barid)
WHERE s.ruleid = 1
ORDER BY s.subrule, r.barid, s.barid
LIMIT 1
)
UNION ALL (
SELECT s.*, r.resourceid, r.resource
, CASE WHEN r.resourceid IS NULL THEN c.consumed
ELSE c.consumed || r.resourceid END
FROM cte c
JOIN subrules s ON s.subrule = c.subrule + 1
LEFT JOIN resource r ON r.barid = s.barid
AND r.resourceid <> ALL (c.consumed)
ORDER BY r.barid, s.barid
LIMIT 1
))
SELECT ruleid, subrule, barid, resourceid, resource FROM cte
UNION ALL -- add unused rules
SELECT s.ruleid, s.subrule, s.barid, NULL, NULL
FROM subrules s
LEFT JOIN cte c USING (subruleid)
WHERE c.subruleid IS NULL
UNION ALL -- add unused resources
SELECT NULL, NULL, r.barid, r.resourceid, r.resource
FROM resource r
LEFT JOIN cte c USING (resourceid)
WHERE c.resourceid IS NULL
ORDER BY subrule, barid, resourceid;
Returns exactly the result you have been asking for.
SQL Fiddle.
Explain
It's basically an implementation of the algorithm laid out above.
Only take a single match on a single barid per subrule. Hence the LIMIT 1, which requires additional parentheses:
Sum results of a few queries and then find top 5 in SQL
Collecting "consumed" resources in the array consumed and exclude them from repeated assignment with r.resourceid <> ALL (c.consumed). Note in particular how I avoid NULL values in the array, which would break the test.
The CTE only returns matched rows. Add rules and resources without match in the outer SELECT to get the complete result.
Or you open two cursors on the tables subrule and resource and implement the algorithm with any decent programming language (including PL/pgSQL).

Value in one field as lookup from same table

I'm certain this is very easy, but I am very poor at database stuff...
I have the following table in access 2003:
title | id
/root | 1
/root/x | 2
/root/x/y | 3
/root/x/y/z | 4
/root/x/a | 5
/root/x/a/b | 6
i.e. a bunch of nodes and id numbers - you can see that /root/x is the parent of /root/x/y. I'd like to create another table which has a list of all the nodes, along with the id's of their parents. i.e:
id | parent id
1 | -
2 | 1
3 | 2
4 | 3
5 | 2
6 | 5
The follwing will give me the id and the value of the parent:
select id, left(c.title, instrrev(c.title, "/")-1) as parentValue from nodeIDs
yields
id | parentNode
1 |
2 | /root
3 | /root/x
4 | /root/x/y
5 | /root/x
6 | /root/x/a
What is the extra step needed to return the id's of those parent nodes, rather than their values, i.e, return '1' instead of '/root' in that last table?
Many thanks
Something like this perhaps:
select c.id,
left(c.title, instrrev(c.title, "/")-1) as parentValue
, p.id as parentID
from nodeIDs c
left join
nodeIDs p
on left(c.title, instrrev(c.title, "/")-1) = p.title
Something along these lines, I think.
select t1.id,
left(t1.title, instrrev(t1.title, "/")-1) as parentNode,
t2.id as parentID
from nodeIDs t1
inner join nodeIDs t2 on (left(t1.title, instrrev(t1.title, "/")-1)) = t2.title
I don't have any easy way to test this. But the basic idea is that, having derived the title of the parent node, you can do an inner join on it to get the associated id number.