I have a postgres query that traverses a graph. Since it performs some operations one the nodes, I want to track the nodes that have already been visited, to not perform these operations twice.
The documentation shows how to partialy solve the problem by tracking the path in the recursive query (eg: https://www.postgresql.org/docs/9.2/static/queries-with.html). But let's take the following example:
parent_id | child_id
1 | 2
1 | 3
2 | 4
3 | 4
4 | 1
and the example in the documentation:
WITH RECURSIVE search_graph(id, param_1, param_2, param_etc, depth, path, cycle) AS (
SELECT g.id, g.param_1, g.param_2, g.param_etc, 1,
ARRAY[g.id],
false
FROM graph g
UNION ALL
SELECT g.id, g.param_1, g.param_2, g.param_etc, sg.depth + 1,
path || g.id
g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.id AND NOT cycle
)
SELECT * FROM search_graph;
If I understand well, the query shown in the documentation will return two paths: [1, 2, 4, 1] and [1, 3, 4, 1]. Checking for cycles prevents double traversal inside a path (the #1 node will only be seen once) but not across paths: if I query all the nodes, both #1 and #4 will be seen twice (one time in each path).
I would like to know if a query has already been visited independently of the path. I tried to track it by running the recursive query in a function to which an array "seen" would be passed as argument. The idea was to append to this array and not reinitialize the path every time.
I tried this, without success
CREATE OR REPLACE FUNCTION recursive_function(id int, seen int[] DEFAULT '{}'::int[])
RETURNS TABLE (
id int,
param_1 text,
...
) AS
$func$
BEGIN
RETURN QUERY
WITH RECURSIVE search_graph(id, param_1, param_2, param_etc, path, cycle) AS (
SELECT g.id, g.param_1, g.param_2, g.param_etc,
-- here I try to update a variable that would be common to the loop
-- but it does not work
seen || ARRAY[g.id],
false,
FROM computations_i_do_not_want_to_run_twice() g
UNION ALL
SELECT g.id, g.param_1, g.param_2, g.param_etc,
seen || g.id
g.id = ANY(seen)
FROM computations_i_do_not_want_to_run_twice() q, search_graph sg
WHERE g.id = sg.id AND NOT cycle
)
SELECT * FROM search_graph;
END
$func$ LANGUAGE plpgsql;
SELECT
*
FROM
recursive_function(123);
Note that I cannot iterate the graph backwards, starting with the children. Any idea?
Related
I have two tables representing a graph. Nodes consists of an id, type and properties. Edges consist of an id, properties, type and an origin node id and destination node id. I've been working on a query that will basically take a single node and return the graph to a given depth - heavily based (basically copied) on an article here - https://www.alibabacloud.com/blog/postgresql-graph-search-practices---10-billion-scale-graph-with-millisecond-response_595039
However, whenever I run the query my results are always returned with many duplicates of the first query, as if it's not running running the recursive part - I'm not sure what I'm doing wrong here.
Here is my query.
WITH RECURSIVE search_graph(
id,
origin_id,-- point 1
origin_metatype_id,
origin_properties,
destination_id, -- point 2
destination_metatype_id,
destination_properties,
relationship_pair_id,
properties, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.id,
g.origin_id,-- point 1
n1.metatype_id AS origin_metatype_id,
n1.properties AS origin_properties,
g.destination_id, -- point 2
n2.metatype_id AS destination_metatype_id,
n2.properties AS destination_properties,
g.relationship_pair_id,
g.properties, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.id] as path -- initial path
FROM current_edges AS g
LEFT JOIN nodes n1 ON n1.id = g.origin_id
LEFT JOIN nodes n2 ON n2.id = g.destination_id
WHERE
origin_id = ? -- ROOT node =?
UNION ALL
SELECT
g.id,-- recursive clause
g.origin_id,-- point 1
n1.metatype_id AS origin_metatype_id,
n1.properties AS origin_properties,
g.destination_id, -- point 2
n2.metatype_id AS destination_metatype_id,
n2.properties AS destination_properties,
g.relationship_pair_id,
g.properties, -- edge property
sg.depth + 1 as depth, -- depth + 1
sg.path || g.id as path -- add a new point to the path
FROM current_edges AS g
LEFT JOIN nodes n1 ON n1.id = g.origin_id
LEFT JOIN nodes n2 ON n2.id = g.destination_id,
search_graph AS sg -- circular INNER JOIN
WHERE
g.origin_id = sg.destination_id -- recursive JOIN condition
AND (g.id != ALL(sg.path)) -- prevent from cycling
AND sg.depth <= 10 -- search depth =?
)
SELECT * FROM search_graph;
I have the following records:
My goal is to check the SUM of the children for each parent and make sure it is 1 (or 100%).
In the example above, you have a first parent:
12043
It has 2 children:
12484 & 12485
Child (now parent) 12484 has child 12486. The child here (12486) has a percentage of 0.6 (which is NOT 100%). This is NOT OK.
Child (now parent) 12485 has child 12487. The child here (12487) has a percentage of 1 (or 100%). This is OK.
I need to sum the percentages of each nested children and get that value because it doesn't sum up to 100%, then I have to display a message. I'm having a hard time coming up with a query for this. Can someone give me a hand?
This is what I tried and I'm getting the "The statement terminated. The maximum recursion 100 has been exhausted before statement completion." error message.
with cte
as (select cp.parent_payee_id,
cp.payee_id,
cp.payee_pct,
0 as level
from dbo.tp_contract_payee cp
where cp.participant_id = 12067
and cp.payee_id = cp.parent_payee_id
union all
select cp.parent_payee_id,
cp.payee_id,
cp.payee_pct,
c.level + 1 as level
from dbo.tp_contract_payee cp
inner join cte c
on cp.parent_payee_id = c.payee_id
where cp.participant_id = 12067
)
select *
from cte
I believe something like the following should work:
WITH RECURSIVE recCTE AS
(
SELECT
parent_payee_id as parent,
payee_id as child,
payee_pct
1 as depth,
parent_payee_id + '>' + payee_id as path
FROM
table
WHERE
--top most node
parent_payee_id = 12043
AND payee_id <> parent_payee_id --prevent endless recursion
UNION ALL
SELECT
table.parent_payee_id as parent,
table.payee_id as child,
table.payee_pct,
recCTE.depth + 1 as Depth,
recCTE.path + '>' + table.payee_id as path
FROM
recCTE
INNER JOIN table ON
recCTE.child = table.parent_payee_id AND
recCTE.child <> table.payee_id --again prevent records where parent is child
Where depth < 15 --prevent endless cycles
)
SELECT DISTINCT parent
FROM recCTE
GROUP BY parent
HAVING sum(payee_pct) <> 1;
This differs from yours mostly in the WHERE statements on both the Recursive Seed (query before UNION) and the recursive term (query after UNION). I believe yours is too restrictive, especially in the recursive term since you want to allow records that are children of 12067 through, but then you only allow 12067 as the parent id to pull in.
Here, though, we pull every descendant of 12043 (from your example table) and it's payee_pct. Then we analyze each parent in the final SELECT and the sum of all it's payee_pcts, which are essentially that parent's first childrens sum(payee_pct). If any of them are not a total of 1, then we display the parent in the output.
At any rate, between your query and mine, I would imagine this is pretty close to the requirements, so it should be tweaks to get you exactly where you need to be if this doesn't do the trick.
Let's say I am using materialized paths to store management chains:
Table: User
id name management_chain
1 Senior VP {1}
2 Middle Manager {1,2}
3 Cubicle Slave {1,2,3}
4 Janitor {1,2,4}
How do I construct a query given a user id that returns all of his direct reports, eg given the middle manager, it should return Cubicle Slave and Janitor, given the Senior VP it should return the Middle Manager. Put another way, what would be a good way to get all records where the management_chain contains the id queried for at a position that is second to last (given that the last item represent the user's own id).
In other words, how do I represent the following SQL:
SELECT *
FROM USER u
WHERE u.management_chain #> {stored_variable, u.id}
My current JS:
var collection = Users.forge()
.query('where', 'management_chain', '#>', [req.user.id, id]);
Which errors out with
ReferenceError: id is not defined
Assuming management_chain is an integer array (int[]) you could do the following (in plain SQL)
select *
from (
select id,
name,
'/'||array_to_string(management_chain, '/') as path
from users
) t
where path like '%/2/%';
This works, because array_to_string() will not append the delimiter to the end of the string. Therefore if a path contains the sequence /2/ it means there are more nodes "below" that one. The nodes where 2 is the last id in the management_chain will end with /2 (no trailing /) and will not be included in the result.
The expression will not make use of an index, so this might not be feasible for large tables.
However I don't know how this would translate into that JS thing.
SQLFiddle example: http://sqlfiddle.com/#!15/75948/2
Lookup WITH RECURSIVE
As an example take a look a this code:
CREATE VIEW
mvw_pre_import_cellpath_check
(
pkid_cell,
id_cell ,
id_parent,
has_child,
id_path ,
name_path,
string_path
) AS WITH RECURSIVE cell_paths
(
pkid_cell,
id_cell ,
id_parent,
id_path ,
name_path
) AS
(
SELECT
tbl_cell.pkid ,
tbl_cell.cell_id ,
tbl_cell.cell_parent_id ,
ARRAY[tbl_cell.cell_id] AS "array",
ARRAY[tbl_cell.cell_name] AS "array"
FROM
ufo.tbl_cell
WHERE
(((
tbl_cell.cell_parent_id IS NULL)
AND (
tbl_cell.reject_reason IS NULL))
AND (
tbl_cell.processed_dt IS NULL))
UNION ALL
SELECT
tbl_cell.pkid ,
tbl_cell.cell_id ,
tbl_cell.cell_parent_id ,
(cell_paths_1.id_path || tbl_cell.cell_id),
(cell_paths_1.name_path || tbl_cell.cell_name)
FROM
(cell_paths cell_paths_1
JOIN
ufo.tbl_cell
ON
((
tbl_cell.cell_parent_id = cell_paths_1.id_cell)))
WHERE
(((
NOT (
tbl_cell.cell_id = ANY (cell_paths_1.id_path)))
AND (
tbl_cell.reject_reason IS NULL))
AND (
tbl_cell.processed_dt IS NULL))
)
SELECT
cell_paths.pkid_cell,
cell_paths.id_cell ,
cell_paths.id_parent,
(
SELECT
COUNT(*) AS COUNT
FROM
ufo.tbl_cell x
WHERE
((
cell_paths.id_cell = x.cell_id)
AND (
EXISTS
(
SELECT
1
FROM
ufo.tbl_cell y
WHERE
(
x.cell_id = y.cell_parent_id))))) AS has_child,
cell_paths.id_path ,
cell_paths.name_path ,
array_to_string(cell_paths.name_path, ' -> '::text) AS string_path
FROM
cell_paths
ORDER BY
cell_paths.id_path;
There are plenty more examples to find on SO when looking for recursive CTE.
But in contrary with your example the top level cells (managers) have parent_id = NULL in my example. These are the starting points for the different branches.
HTH
I am rather stuck, I've had a good look around but I am not exactly sure how I can do this.
I've got to build a SP (TSQL) to bring back a navigation, but I am having a few issues with ordering the navigation correctly.
Table Example
NavID OrderID ParentID NavName
1 1 0 Home
2 2 0 About
3 3 0 Contact Us
4 1 2 About Us Page
5 2 2 About Us Page 2
6 1 4 Another SubPage
All I need to bring back is the navigation above, and one navigation below.
So if I passed NavigationID 2 I would expect the results to come back like this
Home
About
About Us Page
About Us Page 2
Contact Us
If I passed in NavigationID 6 I would expect to see ..
Home
About
About Us Page
Another SubPage
About Us Page 2
Contact Us
As you can see it takes in account the OrderID, but make's sure the Child's are in order first.
How can I achieve this?
Here's a complete script which does what you need (includes your test data):
DECLARE #nav TABLE (
NavID INT NOT NULL PRIMARY KEY,
OrderID INT NOT NULL,
ParentID INT,
NavName nvarchar(MAX) NOT NULL
);
INSERT #nav
SELECT 1, 1, 0, 'Home' UNION ALL
SELECT 2, 2, 0, 'About' UNION ALL
SELECT 3, 3, 0, 'Contact Us' UNION ALL
SELECT 4, 1, 2, 'About Us Page' UNION ALL
SELECT 5, 2, 2, 'About Us Page 2' UNION ALL
SELECT 6, 1, 4, 'Another SubPage';
DECLARE #NavigationID int;
SET #NavigationID = 2;
WITH Ancestors AS (
SELECT #NavigationID NavID
UNION ALL
SELECT n.ParentID
FROM #nav n
JOIN Ancestors a ON (n.NavID = a.NavID)
),
VisibleNav AS (
SELECT n.*, CONVERT(FLOAT, 1)/SUM(1) OVER (PARTITION BY n.ParentID) Mul, ROW_NUMBER() OVER (PARTITION BY n.ParentID ORDER BY n.OrderID)-1 Pos
FROM #nav n
JOIN Ancestors a ON n.ParentID = a.NavID
),
SortedNav AS (
SELECT vn.*, vn.Pos*vn.Mul Sort, 1 Depth
FROM VisibleNav vn
WHERE vn.ParentID = 0
UNION ALL
SELECT vn.NavID, vn.OrderID, vn.ParentID, vn.NavName, vn.Mul*sn.Mul, vn.Pos, vn.Pos*(vn.Mul*sn.Mul)+sn.Sort, sn.Depth + 1
FROM VisibleNav vn
JOIN SortedNav sn ON sn.NavID = vn.ParentID
)
SELECT sn.NavID, sn.OrderID, sn.ParentID, sn.NavName
FROM SortedNav sn
ORDER BY sn.Sort, sn.Depth;
Basically, I have a recursive CTE to create a list of all parents which need to be used in your navigation including the depth of the parent (so that the order is not dependent on the IDs), and then I join the navigation entries on that.
I'd be tempted to create a table variable, run the query for NavID <= the parent parameter where parentID = 0 and add to the table. Then get the child items, insert to table variable and finally get the > NavID Where ParentID = 0.
A fudge but it should work.
DECLARE #Table TABLE (NavName nvarchar(50), OrderID int)
INSERT #Table (NavName, OrderID) (SELECT NavName, OrderID FROM #Table1 WHERE (ParentID = 0) AND (NavID <= #ParentID))
INSERT #Table (NavName,OrderID) (SELECT NavName, OrderID+ 500 FROM #Table1 WHERE ParentID = #ParentID )
INSERT #Table (NavName,OrderID) (SELECT NavName, OrderID + 1000 FROM #Table1 WHERE ParentID = 0 AND NavID > #ParentID)
SELECT * FROM #Table ORDER BY OrderID
Common Table Expressions permit recursive queries. However, they can be very slow and can seriously confuse the query optimizer, particularly if you use (and you generally should) parametrized queries. For small tables (such as navigation) that will not participate in large joins with other (large) tables, CTE's work fine. You can also cache the result of the common table expression in a table and rerun the caching query anytime the navigation table changes (generally not frequently, I'm guessing) - the query optimizer deals with simple many-to-many relation tables much better.
However, there are also other ways of representing tree's in SQL server.
You could look at
lft/rgt style columns whereby a ordering is defined in as the order of the lft column and a node X is a descendant of Y whenever Y.lft < X.lft < Y.rgt. See: http://articles.sitepoint.com/article/hierarchical-data-database/2
HierarchyID's. Sql Server 2008 introduced a special data type for exactly this purpose; however updating the tree structure isn't always trivial with these. See: http://msdn.microsoft.com/en-us/magazine/cc794278.aspx
Finally, when I need to use CTE's I generally start with something along the lines of...
with nav_tree (ParentID, ChildID, depth_delta) as (
SELECT basenode.NavID, basenode.NavID, 0
FROM NavTable AS basenode
UNION ALL
SELECT treenode.ParentID, basenode.NavID, depth_delta+1
FROM NavTable AS basenode
JOIN nav_tree AS treenode ON basenode.ParentID=treenode.ChildID
)
--select statement here joining the nav_tree with the original table and whatnot
Note that your exact order requirements are quite tricky; in particular where you want the children of an element to immediately be listed inline; i.e. the "About Us Page, Another SubPage, About Us Page 2" segment of your second example. In particular, that means you cannot just order by depth_delta and secondarily by orderid - you'll need a path-based sort. You might want to do this in code, rather than in sql, but you can construct a path in a CTE as follows:
with nav_tree (ParentID, ChildID, depth_delta,orderpath) as (
SELECT basenode.NavID, basenode.NavID, 0, convert(varchar(MAX), basenode.OrderID)
FROM NavTable AS basenode
UNION ALL
SELECT treenode.ParentID, basenode.NavID, depth_delta+1, treenode.orderpath+','+basenode.OrderID
FROM NavTable AS basenode
JOIN nav_tree AS treenode ON basenode.ParentID=treenode.ChildID
)
...and then you can order by that orderpath. Since you also want "one level below" each ancestor-or-self unfolded, you'll still need to join with NavTable to get those.
However, given your sorting requirements, I'd recommend HierarchyIDs: they have your sorting semantics built-in, and they also avoid the potential performance issues CTE's can expose.
I am a little unclear as to precisely which nodes you wish to display... but if I understand correctly the main question here is how to order the resulting nodes. The principal difficulty is that the ordering criteria are variable length: a node must be ordered based upon the entire sequence of OrderId values for the node and all of its ancestors. For example, the ordering sequence for node 6 is '2, 1, 1'.
SQL does not handle such variable-length sequences well. I propose that we use an NVARCHAR(MAX) value for the ordering sequence. For node 6, we will use '0002.0001.0001'. In this form, the nodes can be ordered trivially using string comparison. Note that the identifier values must be zero-padded in order to ensure correct ordering (I arbitrarily chose to pad to 4 digits -- the real application may require a different choice).
So this brings us to the nuts and bolts. We'll start by creating a table called NavigationData to hold our test data:
SELECT NULL AS NavId, NULL AS OrderId, NULL AS ParentId, NULL AS NavName
INTO NavigationData WHERE 1=0
UNION SELECT 1, 1, 0, 'Home'
UNION SELECT 2, 2, 0, 'About'
UNION SELECT 3, 3, 0, 'Contact Us'
UNION SELECT 4, 1, 2, 'About Us Page'
UNION SELECT 5, 2, 2, 'About Us Page 2'
UNION SELECT 6, 1, 4, 'Another SubPage'
Now, we'll create a helper view that, for every possible desired node, lists all of the related nodes along with their computed path strings. As I said at the beginning, I feel the criteria for selecting the related nodes are underspecified, so the desired/related JOIN expression may need to be adjusted for datasets with more nodes than the simple example. With that caveat, here is the view:
CREATE VIEW NavigationHierarchy AS
WITH
hierarchy AS (
SELECT
NavId AS RootId
, 1 AS Depth
, NavId
, RIGHT('0000' + CAST(OrderId AS NVARCHAR(MAX)), 4) AS Path
, ParentId
, NavName
FROM NavigationData
WHERE ParentId = 0
UNION ALL
SELECT
parent.RootId
, parent.Depth + 1 AS Depth
, child.NavId
, parent.Path + '.'
+ RIGHT('0000' + CAST(child.OrderId AS NVARCHAR(MAX)), 4) AS Path
, child.ParentId
, child.NavName
FROM hierarchy AS parent
INNER JOIN NavigationData AS child
ON child.ParentId = parent.NavId
)
SELECT
desired.NavId AS DesiredNavId
, related.*
FROM hierarchy AS desired
INNER JOIN hierarchy AS related
ON related.Depth <= desired.Depth + 1
AND related.RootId IN (desired.RootId, related.RootId)
Most of the query is a straight-forward recursive descent of the hierarchy using a common table expression. The heart of the solution is the generation of the Path columns. Naturally, you may prefer to bake this query directly into a larger query or stored proc rather than creating a view. The view is convenient for testing, however.
Armed with the view, we can now generate the desired results, in the requested order. I have included the generated path in the query result for illustrative purposes. Here is the query for node 2:
SELECT NavName, Path
FROM NavigationHierarchy
WHERE DesiredNavId = 2
ORDER BY Path
yielding:
Home 0001
About 0002
About Us Page 0002.0001
About Us Page 2 0002.0002
Contact Us 0003
and for node 6:
SELECT NavName, Path
FROM NavigationHierarchy
WHERE DesiredNavId = 6
ORDER BY Path
yielding:
Home 0001
About 0002
About Us Page 0002.0001
Another SubPage 0002.0001.0001
About Us Page 2 0002.0002
Contact Us 0003
I am working on an application that has to assign numeric codes to elements. This codes are not consecutives and my idea is not to insert them in the data base until have the related element, but i would like to find, in a sql matter, the not assigned codes and i dont know how to do it.
Any ideas?
Thanks!!!
Edit 1
The table can be so simple:
code | element
-----------------
3 | three
7 | seven
2 | two
And I would like something like this: 1, 4, 5, 6. Without any other table.
Edit 2
Thanks for the feedback, your answers have been very helpful.
This will return NULL if a code is not assigned:
SELECT assigned_codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE codes.code = #code
This will return all non-assigned codes:
SELECT codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE assigned_codes.code IS NULL
There is no pure SQL way to do exactly the thing you want.
In Oracle, you can do the following:
SELECT lvl
FROM (
SELECT level AS lvl
FROM dual
CONNECT BY
level <=
(
SELECT MAX(code)
FROM elements
)
)
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
In PostgreSQL, you can do the following:
SELECT lvl
FROM generate_series(
1,
(
SELECT MAX(code)
FROM elements
)) lvl
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
Contrary to the assertion that this cannot be done using pure SQL, here is a counter example showing how it can be done. (Note that I didn't say it was easy - it is, however, possible.) Assume the table's name is value_list with columns code and value as shown in the edits (why does everyone forget to include the table name in the question?):
SELECT b.bottom, t.top
FROM (SELECT l1.code - 1 AS top
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code - 1)) AS t,
(SELECT l1.code + 1 AS bottom
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code + 1)) AS b
WHERE b.bottom <= t.top
AND NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code >= b.bottom AND l2.code <= t.top);
The two parallel queries in the from clause generate values that are respectively at the top and bottom of a gap in the range of values in the table. The cross-product of these two lists is then restricted so that the bottom is not greater than the top, and such that there is no value in the original list in between the bottom and top.
On the sample data, this produces the range 4-6. When I added an extra row (9, 'nine'), it also generated the range 8-8. Clearly, you also have two other possible ranges for a suitable definition of 'infinity':
-infinity .. MIN(code)-1
MAX(code)+1 .. +infinity
Note that:
If you are using this routinely, there will generally not be many gaps in your lists.
Gaps can only appear when you delete rows from the table (or you ignore the ranges returned by this query or its relatives when inserting data).
It is usually a bad idea to reuse identifiers, so in fact this effort is probably misguided.
However, if you want to do it, here is one way to do so.
This the same idea which Quassnoi has published.
I just linked all ideas together in T-SQL like code.
DECLARE
series #table(n int)
DECLARE
max_n int,
i int
SET i = 1
-- max value in elements table
SELECT
max_n = (SELECT MAX(code) FROM elements)
-- fill #series table with numbers from 1 to n
WHILE i < max_n BEGIN
INSERT INTO #series (n) VALUES (i)
SET i = i + 1
END
-- unassigned codes -- these without pair in elements table
SELECT
n
FROM
#series AS series
LEFT JOIN
elements
ON
elements.code = series.n
WHERE
elements.code IS NULL
EDIT:
This is, of course, not ideal solution. If you have a lot of elements or check for non-existing code often this could cause performance issues.