Recursive query based on array column - sql

I have a table with parent and child as columns, my parent column is an array because the child can have multiple parents. So could anyone suggest on how to build cte (Postgres 11.2) to get parent and ancestors of a child.This is my data. The cte should return the correct parent irrespective of the array position.
CREATE TABLE public.mytable (
id uuid NOT NULL,
parent_id uuid[],
name character varying(255) COLLATE pg_catalog."default",
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL,
CONSTRAINT mytable_pkey PRIMARY KEY (id)
);
The data is
id parentId
1 null
2 null
3 1
4 1
5 1
6 [3,4]
7 2
8 6
Expected output for the id=8:
parent
1
[3+4]
6

This works for your test case:
WITH RECURSIVE ancestry AS (
SELECT parent_id
FROM tbl
WHERE id = 8 -- your id here
UNION ALL
SELECT DISTINCT t.parent_id
FROM ancestry a
JOIN tbl t ON t.id = ANY (a.parent_id)
WHERE t.parent_id IS NOT NULL
)
TABLE ancestry;
db<>fiddle here
The query does not, however, merge elements of multiple distinct arrays on the same level. Your test case is not revealing in this respect. You probably have to do more. But you'll first have to define exactly what's allowed in the data and how to deal with duplicates in multiple ancestry arrays on the same level.

Related

How can I stop my Postgres recusive CTE from indefinitely looping?

Background
I'm running Postgres 11 on CentOS 7.
I recently learned the basics of recursive CTEs in Postgres thanks to S-Man's answer to my recent question.
The problem
While working on a closely related issue (counting parts sold within bundles and assemblies) and using this recursive CTE, I ran into a problem where the query looped indefinitely and never completed.
I tracked this down to the presence of non-spurious 'self-referential' entries in the relator table, i.e. rows with the same value for parent_name and child_name.
I know that these are the source of the problem because when I recreated the situation with test tables and data, the undesired looping behavior occurred when these rows were present, and disappeared when these rows were absent or when UNION (which excludes duplicate returned rows) was used in the CTE rather than UNION ALL .
I think the data model itself probably needs adjusting so that these 'self-referential' rows aren't necessary, but for now, what I need to do is get this query to return the desired data on completion and stop looping.
How can I achieve this result? All guidance much appreciated!
Tables and test data
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
/* NOTE: listing_name below is like an alias of a relator.parent_name as it appears in a catalog,
required to know because it is these listing_names that are reflected by sales.sold_name */
CREATE TABLE the_schema.catalog_listings (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
listing_name TEXT NOT NULL,
parent_name TEXT NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
CREATE VIEW the_schema.relationships_with_child_category AS (
SELECT
c.listing_name,
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.catalog_listings c
INNER JOIN
the_schema.relator r
ON c.parent_name = r.parent_name
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('child2', 'assembly'),('subChild1', 'component'),
('subChild2', 'component'), ('subChild3', 'component');
INSERT INTO the_schema.catalog_listings (listing_name, parent_name)
VALUES ('listing1', 'parent1'), ('parent1', 'child1'), ('parent1','child2'), ('child1', 'child1'), ('child2', 'child2');
INSERT INTO the_schema.catalog_listings (listing_name, parent_name)
VALUES ('parent1', 'child1'), ('parent1','child2');
/* note the two 'self-referential' entries */
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 1), ('child1', 'subChild2', 1)
('parent1', 'child2', 1),('child2', 'subChild1', 1), ('child2', 'subChild3', 1), ('child1', 'child1', 1), ('child2', 'child2', 1);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2), ('listing1', 1);
The present query, loops indefinitely with the required UNION ALL
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category as category
FROM
the_schema.sales s
JOIN the_schema.relationships_with_child_category r
ON s.sold_name = r.listing_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category
FROM cte
JOIN the_schema.relationships_with_child_category r
ON cte.child_name = r.parent_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
;
In catalog_listings table listing_name and parent_name is same for child1 and child2
In relator table parent_name and child_name is also same for child1 and child2
These rows are creating cycling recursion.
Just remove those two rows from both the tables:
delete from catalog_listings where id in (4,5)
delete from relator where id in (7,8)
Then your desired output will be as below:
child_name
sum
subChild2
8
subChild3
8
subChild1
16
Is this the result you are looking for?
If you can't delete the rows you can use below add parent_name<>child_name condition to avoid those rows:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category as category
FROM
the_schema.sales s
JOIN the_schema.relationships_with_child_category r
ON s.sold_name = r.listing_name and r.parent_name <>r.child_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
r.child_category
FROM cte
JOIN the_schema.relationships_with_child_category r
ON cte.child_name = r.parent_name and r.parent_name <>r.child_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name ;
You may be able to avoid infinite recursion simply by using UNION instead of UNION ALL.
The documentation describes the implementation:
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
"Getting rid of the duplicates" should cause the intermediate table to be empty at some point, which ends the iteration.

Postgres select distinct row per group without repeating

I have a query that generated matched data like this. For each parent I need to select a child but not repeat the same combination or parent or child. In the picture below, the black border shows the groups and the blue highlighted rows are the rows I want returned.
I also have the case where there are 6 parents and only 3 children. In this case I only want 3 rows max, the child and parent ids can't repeat. I just want the first matched children to parents.
So I conducting some research and found options that were close but didn't do the trick. I finally got exactly what I wanted.
WITH firstmatched AS (
SELECT parent,
child,
ROW_NUMBER() OVER(PARTITION BY child
ORDER BY parent DESC) AS rowkey1,
ROW_NUMBER() OVER(PARTITION BY parent
ORDER BY child DESC) AS rowkey2
FROM mytable)
SELECT *
FROM firstmatched where rowkey1 = rowkey2
Take the case where there are 6 parents and only children, this query without the where clause (where rowkey1 = rowkey2) looks like this.
Then with the where clause added it reduces to this.
Hopefully this helps someone with a similar issue.
Rock&Roll-method: just make all assignments and skip the errors:
\i tmp.sql
-- The data [in non-graphical form]
CREATE TABLE tableau(
parent integer NOT NULL
, child integer NOT NULL
, PRIMARY KEY (parent,child)
) ;
INSERT INTO tableau(parent,child) VALUES
( 450759,450768) , ( 450759,450771) , ( 450759,450773)
, ( 450763,450768) , ( 450763,450771) , ( 450763,450773)
, ( 450765,450768) , ( 450765,450771) , ( 450765,450773)
;
-- Receptor table for the results:
CREATE TEMP TABLE assignment(
parent integer NOT NULL UNIQUE
, child integer NOT NULL UNIQUE
);
-- Just do it!
INSERT INTO assignment(parent,child)
SELECT parent,child
FROM tableau
ON CONFLICT DO NOTHING --<< MAGIC!
;
SELECT * FROM tableau;
SELECT * FROM assignment;
Results:
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 9
CREATE TABLE
INSERT 0 3
parent | child
--------+--------
450759 | 450768
450759 | 450771
450759 | 450773
450763 | 450768
450763 | 450771
450763 | 450773
450765 | 450768
450765 | 450771
450765 | 450773
(9 rows)
parent | child
--------+--------
450759 | 450768
450763 | 450771
450765 | 450773
(3 rows)
Note: this solution is greedy; it can fail to find an optimal solution on some types of tableau data.

Given a parent / child key table, how can we recursively insert a copy of the structure into another table?

I have a recursive CTE which gives me a listing of a set of parent child keys as follows, lets say its in a temp table called [#relationtree]:
Parent | Child
--------------
1 | 3
3 | 5
5 | 6
5 | 9
I want to create a copy of these relationships into a table with, lets say, the following stucture:
CREATE TABLE [dbo].[Relations]
(
[Id] int identity(1,1)
[ParentId] int
)
How can I insert the above records but recursively obtain the previously inserted identity value to be able to insert that value as the ParentId column for each copy of a child I insert?
I would expect to have at the end of this in [dbo].[Relations] (given our current seed value is, say 50)
Id | ParentId
-------------
... other rows present before this query ...
50 | NULL
51 | 50
52 | 51
53 | 51
I'm not sure that scope_identity can work in this situation, or that creating a new temp table with a list of new IDs and inserting identity columns manually is the correct approach?
I could write a cursor / loop to do this, but there must be a nice way of doing some recursive select magic!
Since you're trying to put the tree into a segment of the table it looks like you're going to need to use SET IDENTITY_INSERT ON for the table anyway. You're going to need to make sure that there is room for the new tree. In this case, I'll assume that 49 is the current maximum id in your table so that we don't need to be concerned with overrunning a tree that's later in the table.
You'll need to be able to map the IDs from the old tree to the new tree. Unless there's some rule around the ids, the exact mapping should be irrelevant as long as it's accurate, so in that case, I'd just do something like this:
SET IDENTITY_INSERT dbo.Relations ON
;WITH CTE_MappedIDs AS
(
SELECT
old_id,
ROW_NUMBER() OVER(ORDER BY old_id) + 49 AS new_id
FROM
(
SELECT DISTINCT parent AS old_id FROM #relationtree
UNION
SELECT DISTINCT child AS old_id FROM #relationtree
) SQ
)
INSERT INTO dbo.Relations (Id, ParentId)
SELECT
CID.new_id,
PID.new_id
FROM
#relationtree RT
INNER JOIN CTE_MappedIDs PID ON PID.old_id = RT.parent
INNER JOIN CTE_MappedIDs CID ON CID.old_id = RT.parent
-- We need to also add the root node
UNION ALL
SELECT
NID.new_id,
NULL
FROM
#relationtree RT2
INNER JOIN CTE_MappedIDs NID ON NID.old_id = RT2.parent
WHERE
RT2.parent NOT IN (SELECT DISTINCT child FROM #relationtree)
SET IDENTITY_INSERT dbo.Relations OFF
I haven't tested that, but if it doesn't work as expected then hopefully it will point you in the right direction.
I know you already have a working answer, but I think you can accomplish the same thing a little more simply (not that there is anything at all wrong with Tom H's answer) using the LAG function to inspect the previous row, assuming you have SQL Server 2012 or later.
Setup:
CREATE TABLE #relationtree (
Parent INT,
Child INT
)
CREATE TABLE #relations (
Id INT IDENTITY(1,1),
ParentId INT
)
INSERT INTO #relationtree (Parent, Child) VALUES(1,3), (3,5), (5,6), (5,9)
INSERT INTO #relations (ParentId) values(1), (3), (5)
Solution:
DECLARE #offset INT = IDENT_CURRENT('#relations')
;WITH relationtreeids AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY Parent, Child) - 2 AS UnmodifiedParentId -- Simulate an identity field
FROM #relationtree
)
INSERT INTO #relations
-- The LAG window function allows you to inspect the previous row
SELECT CASE WHEN LAG(Parent) OVER(ORDER BY Parent) IS NULL
THEN NULL
WHEN LAG(Parent) OVER(ORDER BY Parent) = Parent
THEN UnmodifiedParentId + #offset ELSE UnmodifiedParentId + #offset + 1
END AS ParentId
FROM relationtreeids
Output:
Id ParentId
1 1
2 3
3 5
4 NULL
5 4
6 5
7 5

Recursive SQL select query in Oracle 11

I have 4 different tables, bommodule, bomitem and mapbomitemmodule, mapbomparentsubmodule.
bommodule-table:
CREATE TABLE "BOMMODULE"
(
"MODULEID" NUMBER(10,0) NOT NULL ENABLE,
"ISROOTMODULE" NUMBER(1,0) NOT NULL ENABLE,
"MODULENAME" VARCHAR2(255 CHAR),
.....
)
bomitem-table:
CREATE TABLE "BOMITEM"
(
"ITEMID" NUMBER(10,0) NOT NULL ENABLE,
....
)
mapbomitemmodule-table: (This table maps the items to one or more modules).
CREATE TABLE "SSIS2"."MAPBOMITEMMODULE"
(
"ITEMID" NUMBER(10,0) NOT NULL ENABLE,
"MODULEID" NUMBER(10,0) NOT NULL ENABLE,
CONSTRAINT "MAP_ITEM_MODULE_FK" FOREIGN KEY ("ITEMID") REFERENCES "BOMITEM" ("ITEMID") ENABLE,
CONSTRAINT "MAP_MODULE_ITEM_FK" FOREIGN KEY ("MODULEID") REFERENCES "BOMMODULE" ("MODULEID") ENABLE
)
mapbomparentsubmodule-table: (This table maps the module to submodules)
CREATE TABLE "MAPBOMPARENTSUBMODULE"
(
"PARENTMODULEID" NUMBER(10,0) NOT NULL ENABLE,
"SUBMODULEID" NUMBER(10,0) NOT NULL ENABLE,
CONSTRAINT "PARENTMODULE_SUBMODULE_FK" FOREIGN KEY ("SUBMODULEID") REFERENCES "BOMMODULE" ("MODULEID") ENABLE,
CONSTRAINT "SUBMODULE_PARENTMODULE_FK" FOREIGN KEY ("PARENTMODULEID") REFERENCES "BOMMODULE" ("MODULEID") ENABLE
)
So imagine a structure something like this.
root module
submodule 1
submodule 2
submodule 3
submodule 4
submodule 5
item 5
item 6
item 7
item 2
item 3
item 4
item 1
I need to find out all items that belong to a specific moduleId. All items on all submodule levels should be listed.
How can I do this? I am using Oracle 11 as database.
Thanks a lot for your help, much appreciate it!
Thanks for the interesting question
1 step. Prepare table for canonical oracle hierarhy.
select PARENTMODULEID, SUBMODULEID, 1 HTYPE
from MAPBOMPARENTSUBMODULE
union all
select null, MODULEID, 1 --link root to null parent
from BOMMODULE B
where ISROOTMODULE = 1
union all
select MODULEID, ITEMID, 2
from MAPBOMITEMMODULE
2 step. Expand the hierarchy by using connect by
select PARENTMODULEID
,SUBMODULEID
,HTYPE
,level
,lpad('|', level*3, '|')
from
(
select PARENTMODULEID, SUBMODULEID, 1 HTYPE
from MAPBOMPARENTSUBMODULE
union all
select null, MODULEID, 1
from BOMMODULE B
where ISROOTMODULE = 1
union all
select MODULEID, ITEMID, 2
from MAPBOMITEMMODULE
) ALL_HIER
connect by prior SUBMODULEID = PARENTMODULEID
and prior HTYPE = 1 --parent is always module
start with ALL_HIER.PARENTMODULEID is null
order siblings by HTYPE
3 step. Last. Join with value tables.
select PARENTMODULEID
,SUBMODULEID
,HTYPE
,ALL_VAL.VAL
,level
,rpad('|', level * 3, ' ') || ALL_VAL.VAL
from
(
select PARENTMODULEID, SUBMODULEID, 1 HTYPE
from MAPBOMPARENTSUBMODULE
union all
select null, MODULEID, 1
from BOMMODULE B
where ISROOTMODULE = 1
union all
select MODULEID, ITEMID, 2
from MAPBOMITEMMODULE
) ALL_HIER
,(
select MODULEID VAL_ID, MODULENAME VAL, 1 VTYPE
from BOMMODULE
union all
select ITEMID, 'item '||ITEMID, 2
from BOMITEM
) ALL_VAL
where ALL_VAL.VAL_ID = ALL_HIER.SUBMODULEID
and ALL_VAL.VTYPE = ALL_HIER.HTYPE
connect by prior SUBMODULEID = PARENTMODULEID
and prior HTYPE = 1
start with ALL_HIER.PARENTMODULEID is null
order siblings by HTYPE
Start by investigating the use of a hierarchical query, using the CONNECT BY clause. It is designed for exactly this type of model.
The principle for multiple tables is the same as that for a single table:
i. You identify "starting rows".
ii. You define the logic by which you identify the next level of rows based on the "current" set.
In some cases it can help to define the hierarchical query in an inline view, particularly if a single table holds all the data required for identifying both the starting rows and the connection between levels.
These queries are always tricky, and as with many SQL issues it often helps to start off simple and build up the complexity.

Get the last children from database

My situation:
Table A
(
ID
Parent_Id
TimeStamp
)
The root has Parent_Id null and children has Id of its father.
I simple want to get all LAST children of every Table A.
Father and Children I don't want. (except last one).
Is it possible to build a SQL to get this?
PS: I'm on sql anywhere 11. Maybe an ansi sql can solve this, i'm not sure.
EDIT: (edited to give additional details)
I don't want the last children from an element.
Example:
Id 1
Parent NULL
Id 2
Parent 1
Id 3 (the last child)
Parent 1
Id 4
Parent NULL
Id 5 (the last child)
parent 4
I want to get:
Id 3
Id 5
Using stored function
create function LastChild(in parent integer)
returns integer
begin
declare res integer;
select top 1 id into res from TableA where parent_id = parent order by timeCol desc;
return res;
end
select
select Id, lastchild(id) from TAbleA where parent_id is null
I'll work on another solution without stored function.
EDIT: without stored function:
select Id, (select top 1 id from TableA childs where parent_id = TableA.id order by timeCol desc) from TableA where parent_id = 0
If by "last children" you mean items that themselves have no children (and often referred to as leaf-level items), something like this should do:
SELECT ID
from A
where ID not in (select Parent_Id from A)
The correlated subquery version is a bit tricker to understand, but would work faster on large tables:
SELECT ID
from A OuterReference
where not exists (select 1 from A where Parenti_ID = OuterReference.ID)
("OuterReference" is an alias for table A)
I use SQL Server, but this is pretty basic syntax and should work for you with minimal modification.
select * from a where id not in (select parent_id from table a)
In other words, select everything from table a where the ID of the item is not the parent ID of any other item. This will give you all the leaf nodes of the graph.
EDIT:
Your edit is a bit confusing, and ID's aren't typically used as ordering mechanisms, but regardless, the example you give can be accomplished by this query
SELECT MAX( id )
FROM a
WHERE id NOT IN
(SELECT parent_id
FROM a
WHERE parent_id IS NOT NULL
)
GROUP BY parent_id
I had to update the query a little to get only child categories, for Postgres 9.4
select count(id) from A as outer_ref where not exists(
select 1 from A where parent_id=outer_ref.id) and parent_id is not null;