I have two tables representing a graph. Nodes consists of an id, type and properties. Edges consist of an id, properties, type and an origin node id and destination node id. I've been working on a query that will basically take a single node and return the graph to a given depth - heavily based (basically copied) on an article here - https://www.alibabacloud.com/blog/postgresql-graph-search-practices---10-billion-scale-graph-with-millisecond-response_595039
However, whenever I run the query my results are always returned with many duplicates of the first query, as if it's not running running the recursive part - I'm not sure what I'm doing wrong here.
Here is my query.
WITH RECURSIVE search_graph(
id,
origin_id,-- point 1
origin_metatype_id,
origin_properties,
destination_id, -- point 2
destination_metatype_id,
destination_properties,
relationship_pair_id,
properties, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.id,
g.origin_id,-- point 1
n1.metatype_id AS origin_metatype_id,
n1.properties AS origin_properties,
g.destination_id, -- point 2
n2.metatype_id AS destination_metatype_id,
n2.properties AS destination_properties,
g.relationship_pair_id,
g.properties, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.id] as path -- initial path
FROM current_edges AS g
LEFT JOIN nodes n1 ON n1.id = g.origin_id
LEFT JOIN nodes n2 ON n2.id = g.destination_id
WHERE
origin_id = ? -- ROOT node =?
UNION ALL
SELECT
g.id,-- recursive clause
g.origin_id,-- point 1
n1.metatype_id AS origin_metatype_id,
n1.properties AS origin_properties,
g.destination_id, -- point 2
n2.metatype_id AS destination_metatype_id,
n2.properties AS destination_properties,
g.relationship_pair_id,
g.properties, -- edge property
sg.depth + 1 as depth, -- depth + 1
sg.path || g.id as path -- add a new point to the path
FROM current_edges AS g
LEFT JOIN nodes n1 ON n1.id = g.origin_id
LEFT JOIN nodes n2 ON n2.id = g.destination_id,
search_graph AS sg -- circular INNER JOIN
WHERE
g.origin_id = sg.destination_id -- recursive JOIN condition
AND (g.id != ALL(sg.path)) -- prevent from cycling
AND sg.depth <= 10 -- search depth =?
)
SELECT * FROM search_graph;
Related
Hello friendly internet wizards.
I am attempting to extract a levelled bill of materials (BOM) from a dataset, running in DB2 on an AS400 server.
I have constructed most of the query (with a lot of help from online resources), and this is what I have so far;
#set item = '10984'
WITH BOM (origin, PMPRNO, PMMTNO, BOM_Level, BOM_Path, IsCycle, IsLeaf) AS
(SELECT CONNECT_BY_ROOT PMPRNO AS origin, PMPRNO, PMMTNO,
LEVEL AS BOM_Level,
SYS_CONNECT_BY_PATH(TRIM(PMMTNO), ' : ') BOM_Path,
CONNECT_BY_ISCYCLE IsCycle,
CONNECT_BY_ISLEAF IsLeaf
FROM MPDMAT
WHERE PMCONO = 405 AND PMFACI = 'M01' AND PMSTRT = 'STD'
START WITH PMPRNO = :item
CONNECT BY NOCYCLE PRIOR PMMTNO = PMPRNO)
SELECT 0 AS BOM_Level, '' AS BOM_Path, MMITNO AS Part_Number, MMITDS AS Part_Name,
IFSUNO AS Supplier_Number, IDSUNM AS Supplier_Name, IFSITE AS Supplier_Part_Number
FROM MITMAS
LEFT OUTER JOIN MITVEN ON MMCONO = IFCONO AND MMITNO = IFITNO AND IFSUNO <> 'ZGA'
LEFT OUTER JOIN CIDMAS ON MMCONO = IDCONO AND IDSUNO = IFSUNO
WHERE MMCONO = 405
AND MMITNO = :item
UNION ALL
SELECT BOM.BOM_Level, BOM_Path, BOM.PMMTNO AS Part_Number, MMITDS AS Part_Name,
IFSUNO AS Supplier_Number, IDSUNM AS Supplier_Name, IFSITE AS Supplier_Part_Number
FROM BOM
LEFT OUTER JOIN MITMAS ON MMCONO = 405 AND MMITNO = BOM.PMMTNO
LEFT OUTER JOIN MITVEN ON IFCONO = MMCONO AND IFITNO = MMITNO AND IFSUNO <> 'ZGA' AND MMMABU = '2'
LEFT OUTER JOIN CIDMAS ON MMCONO = IDCONO AND IDSUNO = IFSUNO
;
This is correctly extracting the components for a given item, as well as the sub-components (etc).
Current data looks like this (I have stripped out some columns that aren't relevant to the issue);
https://pastebin.com/LUnGKRqH
My issue is the order that the data is being presented in.
As you can see in the pastebin above, the first column is the 'level' of the component. This starts with the parent item at level 0, and can theoretically go down as far as 99 levels.
The path is also show there, so for example the second component 853021 tells us that it's a 2nd level component, the paths up to INST363 (shown later in the list as a 1st level), then up to the parent at level 0.
I would like for the output to show in path order (for lack of a better term).
Therefore, after level 0, it should be showing the first level 1 component, and then immediately be going into it's level 2 components and so on, until no further level is found. Then at that point, it returns back up the path to the next valid record.
I hope I have explained that adequately, but essentially the data should come out as;
Level
Path
Item
0
10984
1
: INST363
INST363
2
: INST363 : 853021
853021
1
: 21907
21907
Any help that can be provided would be very much appreciated!
Thanks,
This is an interesting query. Frankly I am surprised it works as well as it does since it is not structured the way I usually structure queries with a recursive CTE. The main issue is that while you have the Union in there, it does not appear to be within the CTE portion of the query.
When I write a recursive CTE, it is generally structured like this:
with cte as (
priming select
union all
secondary select)
select * from cte
So to get a BOM from an Item Master that looks something like:
CREATE TABLE item (
ItemNo Char(10) PRIMARY KEY,
Description Char(50));
INSERT INTO item
VALUES ('Item0', 'Root Item'),
('Item1a', 'Second Level Item'),
('Item1b', 'Another Second Level Item'),
('Item2a', 'Third Level Item');
and a linkage table like this:
CREATE TABLE linkage (
PItem Char(10),
CItem Char(10),
Quantity Dec(5,0),
PRIMARY KEY (PItem, CItem));
INSERT INTO linkage
VALUES ('Item0', 'Item1a', 2),
('Item0', 'Item1b', 3),
('Item1b', 'Item2a', 5)
The recursive CTE to list a BOM for 'Item0' looks like this:
WITH bom (Level, ItemNo, Description, Quantity)
AS (
-- Load BOM with root item
SELECT 0,
ItemNo,
Description,
1
FROM Item
WHERE ItemNo = 'Item0'
UNION ALL
-- Retrieve all child items
SELECT a.Level + 1,
b.CItem,
c.Description,
a.Quantity * b.Quantity
FROM bom a
join linkage b ON b.pitem = a.itemno
join item c ON c.itemno = b.citem)
-- Set the list order
SEARCH DEPTH FIRST BY itemno SET seq
-- List BOM
SELECT * FROM bom
ORDER BY seq
Here are my results:
LEVEL
ITEMNO
DESCRIPTION
QUANTITY
0
Item0
Root Item
1
1
Item1a
Second Level Item
2
1
Item1b
Another Second Level Item
3
2
Item2a
Third Level Item
15
Notice the search clause, that generates a column named seq which you can use to sort the output either depth first or breadth first. Depth first is what you want here.
NOTE: This isn't necessarily an optimum query since the description is in the CTE, and that increases the size of the CTE result set without really adding anything to it that couldn't be added in the final select. But it does make things a bit simpler since the 'priming query' retrieves the description.
Note also: the column list on the with clause following BOM. This is there to remove the confusion that DB2 had with the expected column list when the explicit column list was omitted. It is not always necessary, but if DB2 complains about an invalid column list, this will fix it.
I have a postgres query that traverses a graph. Since it performs some operations one the nodes, I want to track the nodes that have already been visited, to not perform these operations twice.
The documentation shows how to partialy solve the problem by tracking the path in the recursive query (eg: https://www.postgresql.org/docs/9.2/static/queries-with.html). But let's take the following example:
parent_id | child_id
1 | 2
1 | 3
2 | 4
3 | 4
4 | 1
and the example in the documentation:
WITH RECURSIVE search_graph(id, param_1, param_2, param_etc, depth, path, cycle) AS (
SELECT g.id, g.param_1, g.param_2, g.param_etc, 1,
ARRAY[g.id],
false
FROM graph g
UNION ALL
SELECT g.id, g.param_1, g.param_2, g.param_etc, sg.depth + 1,
path || g.id
g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.id AND NOT cycle
)
SELECT * FROM search_graph;
If I understand well, the query shown in the documentation will return two paths: [1, 2, 4, 1] and [1, 3, 4, 1]. Checking for cycles prevents double traversal inside a path (the #1 node will only be seen once) but not across paths: if I query all the nodes, both #1 and #4 will be seen twice (one time in each path).
I would like to know if a query has already been visited independently of the path. I tried to track it by running the recursive query in a function to which an array "seen" would be passed as argument. The idea was to append to this array and not reinitialize the path every time.
I tried this, without success
CREATE OR REPLACE FUNCTION recursive_function(id int, seen int[] DEFAULT '{}'::int[])
RETURNS TABLE (
id int,
param_1 text,
...
) AS
$func$
BEGIN
RETURN QUERY
WITH RECURSIVE search_graph(id, param_1, param_2, param_etc, path, cycle) AS (
SELECT g.id, g.param_1, g.param_2, g.param_etc,
-- here I try to update a variable that would be common to the loop
-- but it does not work
seen || ARRAY[g.id],
false,
FROM computations_i_do_not_want_to_run_twice() g
UNION ALL
SELECT g.id, g.param_1, g.param_2, g.param_etc,
seen || g.id
g.id = ANY(seen)
FROM computations_i_do_not_want_to_run_twice() q, search_graph sg
WHERE g.id = sg.id AND NOT cycle
)
SELECT * FROM search_graph;
END
$func$ LANGUAGE plpgsql;
SELECT
*
FROM
recursive_function(123);
Note that I cannot iterate the graph backwards, starting with the children. Any idea?
I have the following records:
My goal is to check the SUM of the children for each parent and make sure it is 1 (or 100%).
In the example above, you have a first parent:
12043
It has 2 children:
12484 & 12485
Child (now parent) 12484 has child 12486. The child here (12486) has a percentage of 0.6 (which is NOT 100%). This is NOT OK.
Child (now parent) 12485 has child 12487. The child here (12487) has a percentage of 1 (or 100%). This is OK.
I need to sum the percentages of each nested children and get that value because it doesn't sum up to 100%, then I have to display a message. I'm having a hard time coming up with a query for this. Can someone give me a hand?
This is what I tried and I'm getting the "The statement terminated. The maximum recursion 100 has been exhausted before statement completion." error message.
with cte
as (select cp.parent_payee_id,
cp.payee_id,
cp.payee_pct,
0 as level
from dbo.tp_contract_payee cp
where cp.participant_id = 12067
and cp.payee_id = cp.parent_payee_id
union all
select cp.parent_payee_id,
cp.payee_id,
cp.payee_pct,
c.level + 1 as level
from dbo.tp_contract_payee cp
inner join cte c
on cp.parent_payee_id = c.payee_id
where cp.participant_id = 12067
)
select *
from cte
I believe something like the following should work:
WITH RECURSIVE recCTE AS
(
SELECT
parent_payee_id as parent,
payee_id as child,
payee_pct
1 as depth,
parent_payee_id + '>' + payee_id as path
FROM
table
WHERE
--top most node
parent_payee_id = 12043
AND payee_id <> parent_payee_id --prevent endless recursion
UNION ALL
SELECT
table.parent_payee_id as parent,
table.payee_id as child,
table.payee_pct,
recCTE.depth + 1 as Depth,
recCTE.path + '>' + table.payee_id as path
FROM
recCTE
INNER JOIN table ON
recCTE.child = table.parent_payee_id AND
recCTE.child <> table.payee_id --again prevent records where parent is child
Where depth < 15 --prevent endless cycles
)
SELECT DISTINCT parent
FROM recCTE
GROUP BY parent
HAVING sum(payee_pct) <> 1;
This differs from yours mostly in the WHERE statements on both the Recursive Seed (query before UNION) and the recursive term (query after UNION). I believe yours is too restrictive, especially in the recursive term since you want to allow records that are children of 12067 through, but then you only allow 12067 as the parent id to pull in.
Here, though, we pull every descendant of 12043 (from your example table) and it's payee_pct. Then we analyze each parent in the final SELECT and the sum of all it's payee_pcts, which are essentially that parent's first childrens sum(payee_pct). If any of them are not a total of 1, then we display the parent in the output.
At any rate, between your query and mine, I would imagine this is pretty close to the requirements, so it should be tweaks to get you exactly where you need to be if this doesn't do the trick.
I made a table to store a Binary Tree like below:
- NodeID
- NodeLeft
- NodeRight
NodeLeft store the ID of the left node. And Node right store the ID of the right node.
I need to write a Procedure that if i pass a NodeID, it'll count how many child node on the left and how many child node on the right. Can separate to 2 Procedure.
Try this:
WITH CTE_Node(
NodeID,
NodeRigth,
NodeLeft,
Level,
RigthOrLeft
)
AS
(
SELECT
NodeID,
NodeRigth,
NodeLeft,
0 AS Level,
'P'
FROM Node
WHERE NodeID = 1
UNION ALL
SELECT
Node.NodeID,
Node.NodeRigth,
Node.NodeLeft,
Level + 1,
CASE WHEN CTE_Node.NodeLeft = Node.NodeID THEN 'R' ELSE 'L' END
FROM Node
INNER JOIN CTE_Node ON CTE_Node.NodeLeft = Node.NodeID
OR CTE_Node.NodeRigth = Node.NodeID
)
SELECT DISTINCT RigthOrLeft,
COUNT(NodeID) OVER(PARTITION BY RigthOrLeft)
FROM CTE_Node
Here is an SQL Fiddle.
The Level is just there to see how is it working. May you can use it later.
I found this topic.
http://www.sqlservercentral.com/Forums/Topic1152543-392-1.aspx
The table structure is different from my designed table. But it is an Binary Tree so i can use it.
And the SQL Fiddle is very helpful.
I am working on an application that has to assign numeric codes to elements. This codes are not consecutives and my idea is not to insert them in the data base until have the related element, but i would like to find, in a sql matter, the not assigned codes and i dont know how to do it.
Any ideas?
Thanks!!!
Edit 1
The table can be so simple:
code | element
-----------------
3 | three
7 | seven
2 | two
And I would like something like this: 1, 4, 5, 6. Without any other table.
Edit 2
Thanks for the feedback, your answers have been very helpful.
This will return NULL if a code is not assigned:
SELECT assigned_codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE codes.code = #code
This will return all non-assigned codes:
SELECT codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE assigned_codes.code IS NULL
There is no pure SQL way to do exactly the thing you want.
In Oracle, you can do the following:
SELECT lvl
FROM (
SELECT level AS lvl
FROM dual
CONNECT BY
level <=
(
SELECT MAX(code)
FROM elements
)
)
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
In PostgreSQL, you can do the following:
SELECT lvl
FROM generate_series(
1,
(
SELECT MAX(code)
FROM elements
)) lvl
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
Contrary to the assertion that this cannot be done using pure SQL, here is a counter example showing how it can be done. (Note that I didn't say it was easy - it is, however, possible.) Assume the table's name is value_list with columns code and value as shown in the edits (why does everyone forget to include the table name in the question?):
SELECT b.bottom, t.top
FROM (SELECT l1.code - 1 AS top
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code - 1)) AS t,
(SELECT l1.code + 1 AS bottom
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code + 1)) AS b
WHERE b.bottom <= t.top
AND NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code >= b.bottom AND l2.code <= t.top);
The two parallel queries in the from clause generate values that are respectively at the top and bottom of a gap in the range of values in the table. The cross-product of these two lists is then restricted so that the bottom is not greater than the top, and such that there is no value in the original list in between the bottom and top.
On the sample data, this produces the range 4-6. When I added an extra row (9, 'nine'), it also generated the range 8-8. Clearly, you also have two other possible ranges for a suitable definition of 'infinity':
-infinity .. MIN(code)-1
MAX(code)+1 .. +infinity
Note that:
If you are using this routinely, there will generally not be many gaps in your lists.
Gaps can only appear when you delete rows from the table (or you ignore the ranges returned by this query or its relatives when inserting data).
It is usually a bad idea to reuse identifiers, so in fact this effort is probably misguided.
However, if you want to do it, here is one way to do so.
This the same idea which Quassnoi has published.
I just linked all ideas together in T-SQL like code.
DECLARE
series #table(n int)
DECLARE
max_n int,
i int
SET i = 1
-- max value in elements table
SELECT
max_n = (SELECT MAX(code) FROM elements)
-- fill #series table with numbers from 1 to n
WHILE i < max_n BEGIN
INSERT INTO #series (n) VALUES (i)
SET i = i + 1
END
-- unassigned codes -- these without pair in elements table
SELECT
n
FROM
#series AS series
LEFT JOIN
elements
ON
elements.code = series.n
WHERE
elements.code IS NULL
EDIT:
This is, of course, not ideal solution. If you have a lot of elements or check for non-existing code often this could cause performance issues.