Postgres: hierarchical one to many jsonb aggregation - sql

I have a one-to-many relation between parent and child tables as follows:
Child table:
+----------+-----------+------------+--------------+
| table_id | parent_id | page_index | other_column |
+----------+-----------+------------+--------------+
| t1 | p1 | 1 | foo |
| t1 | p1 | 2 | bar |
| t2 | p2 | 1 | baz |
+----------+-----------+------------+--------------+
I want to get the final result as follows, i.e. group by parent_id and group by page_index:
+-----------+--------------------------------------------+
| parent_id | pages |
+-----------+--------------------------------------------+
| p1 | [{other_column: foo}, {other_column: bar}] |
| p2 | [{other_column: baz}] |
+-----------+--------------------------------------------+
I tried this query:
SELECT parent_table.parent_id, jsonb_agg(child_table.*) as pages
FROM parent_table
JOIN child_table ON child_table.parent_id = parent_table.parent_id
group by parent_table.parent_id, child_table.page_index
But I got the result containing three rows like:
+-----------+-----------------------+
| parent_id | pages |
+-----------+-----------------------+
| p1 | [{other_column: foo}] |
| p1 | [{other_column: bar}] |
| p2 | [{other_column: baz}] |
+-----------+-----------------------+
So I did another aggregation on top of that using a subquery and grouping by parent_id again as follows:
select sub_q.parent_id, jsonb_agg(sub_q.pages) as pages
from (
SELECT parent_table.parent_id, jsonb_agg(child_table.*) as pages
FROM parent_table
JOIN child_table ON child_table.parent_id = parent_table.parent_id
group by parent_table.parent_id, child_table.page_index
) as sub_q
group by sub_q.parent_id
but I ended up with
+-----------+------------------------------------------------+
| parent_id | pages |
+-----------+------------------------------------------------+
| p1 | [[{other_column: foo}], [{other_column: bar}]] |
| p2 | [{other_column: baz}] |
+-----------+------------------------------------------------+
how do I get the above desired result with each row having a one-dimensional array using the most optimal query?
Would be great if the answer has a db fiddle!

You seem to be overcomplicating this. As far as shown in your sample data, you can get the information you want directly from the child table with simple aggregation:
select
parent_id
jsonb_agg(jsonb_build_object('other_column', other_column) order by page_index) pages
from child_table
group by parent_id
Demo on DB Fiddle:
parent_id | pages
:-------- | :-------------------------------------------------
p1 | [{"other_column": "foo"}, {"other_column": "bar"}]
p2 | [{"other_column": "baz"}]

Related

Get all IDs that do not associate with a specific parent ID

There is a specific child/parent table structure in my DB:
CHILD_TABLE:
| child_table |
|-------------|
| id |
| node_id |
A PARENT_TABLE:
| parent_table |
|--------------|
| id |
| node_id |
and an ASSOCIATION_TABLE:
| association_table |
|-------------------|
| parent_node |
| child_node |
(ManyToOne on both parent and child tables)
Let's say we load them with test data as:
-- child table
| id | node_id |
|----|---------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
-- parent table
| id | node_id |
|----|---------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
-- association table
| parent_id | child_id |
|-----------|----------|
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
Given a list of parent IDs and a single parent ID, I want to find all child IDs that are associated with those IDs but not the single one.
In the example data above,
List of parent IDs : (1, 2)
Single parent ID: 4
The result should be child.id = 2 because that entry has no connection with parent.id = 4 but there is at least one connection with the given "parent IDs".
EDIT
I managed to get something working with by subtracting one result over the other:
SELECT child.id
FROM child_table child
WHERE child.node_id
IN (
SELECT assoc.child_node
FROM association_table assoc
WHERE assoc.parent_node
IN (
SELECT parent.node_id
FROM parent_table parent
WHERE parent.id IN (1, 2)
)
)
MINUS
SELECT child2.id
FROM child_table child2
WHERE child2.node_id
IN (
SELECT assoc2.child_node
FROM association_table assoc2
WHERE assoc2.parent_node
IN (
SELECT parent2.node_id
FROM parent_table parent2
WHERE parent2.id = 4
)
);
Is there an alternative/simpler way of doing the same thing?
You just need the association table. Select from it all children for the given parent list, from there use NOT EXISTS to remove all child associations with the single parent id. (see demo)
select a1.child_id
from association a1
where a1.parent_id in (1,2)
and not exists ( select null
from association a2
where a1.child_id = a2.child_id
and a2.parent_id = 4
);

Identifying heirarchical groupings from a Parent-Child associaiton list in SQL

I am trying to identify groupings of accounts from a Parent-Child association table in SQL. Rather than a big hierarchy tree, I am dealing with many small trees and I need to identify each Tree as a unique Group in order to label related accounts.
I have two tables, a table of all Unique ID's:
+------+-------+
| ID | Group |
+------+-------+
| A | NULL |
| B | NULL |
| C | NULL |
| etc. | NULL |
+------+-------+
And a Table showing Parent - Child association between them:
+--------+-------+
| Parent | Child |
+--------+-------+
| A | D |
| A | E |
| B | F |
| B | G |
| B | C |
| C | H |
+--------+-------+
I Need to Fill the Group field of my first table so that I can identify all accounts which have a direct or indirect relationship eg:
+----+-------+
| ID | Group |
+----+-------+
| A | 1 |
| B | 2 |
| C | 2 |
| D | 1 |
| E | 1 |
| F | 2 |
| G | 2 |
| H | 2 |
+----+-------+
Where I'm struggling is that a Parent could be a Child to another Parent eg:
Parent B -> Parent -> C -> Child H
These form a Group but there is no direct link between B and H and I am struggling to find a reliable way to identify all associated ID's
This type of logic requires a recursive CTE. The idea is to start at the parents and work your way down the hierarchy:
with cte as (
select row_number() over (order by node) as grp,
n.node as ultimate_parent, n.node as node, 1 as lev
from nodes n
where not exists (select 1 from pc where pc.child = n.node)
union all
select cte.grp, cte.ultimate_parent, pc.child, lev + 1
from cte join
pc
on cte.node = pc.parent
)
update nodes
set grp = cte.grp
from cte
where cte.node = nodes.node;
Here is a db<>fiddle.

PostgreSQL can't make Self Join

I have a table:
| acctg_cath_id | parent | description |
| 1 | 20 | Bills |
| 9 | 20 | Invoices |
| 20 | | Expenses |
| 88 | 30 |
| 89 | 30 |
| 30 | |
And I want to create a self join in order to group my items under a parent.
Have tried this, but it doesn't work:
SELECT
accounting.categories.acctg_cath_id,
accounting.categories.parent
FROM accounting.categories a1, accounting.categories a2
WHERE a1.acctg_cath_id=a2.parent
I get error: invalid reference to FROM-clause entry for table "categories"
When I try:
a.accounting.categories.acctg_cath_id
b.accounting.categories.acctg_cath_id
I get error: cross-database references are not implemented: a.accounting.categories.acctg_cath_id
Desired output:
Expenses (Parent 20)
Bills (Child 1)
Invoices (Child 9)
What am I doing wrong here?
It seems you merely want to sort the rows:
select *
from accounting.categorie
order by coalesce(parent, acctg_cath_id), parent nulls first, acctg_cath_id;
Result:
+---------------+--------+-------------+
| acctg_cath_id | parent | description |
+---------------+--------+-------------+
| 20 | | Expenses |
| 1 | 20 | Bills |
| 9 | 20 | Invoices |
| 30 | | |
| 88 | 30 | |
| 89 | 30 | |
+---------------+--------+-------------+
Your syntax is performing a cross join:
FROM accounting.categories a1, accounting.categories a2
Try the following:
SELECT
a2.acctg_cath_id,
a2.parent
FROM accounting.categories a1
JOIN accounting.categories a2 ON (a1.acctg_cath_id = a2.parent)
;
Examine the DBFiddle.
You don't need grouping, only self join:
select
c.acctg_cath_id parentid, c.description parent,
cc.acctg_cath_id childid, cc.description child
from (
select distinct parent
from categories
) p inner join categories c
on p.parent = c.acctg_cath_id
inner join categories cc on cc.parent = p.parent
where p.parent = 20
You can remove the WHERE clause if you want all the parents with all their children.
See the demo.
Results:
> parentid | parent | childid | child
> -------: | :------- | ------: | :-------
> 20 | Expences | 1 | Bills
> 20 | Expences | 9 | Invoices
You don't need a self-join. You don't need aggregation. You just need a group by clause:
SELECT ac.*
FROM accounting.categories ac
ORDER BY COALESCE(ac.parent, ac.acctg_cath_id),
(CASE WHEN ac.parent IS NULL THEN 1 ELSE 2 END),
ac.acctg_cath_id;

SQL to find linking column across tables without foreign keys

I am trying to find table links using duplicate column names. Say i have the following tables
T1:
| Prod_ID | Cust_Id | Value |
| P1 | C1 | 1 |
| P2 | C2 | 2 |
| P3 | C3 | 3 |
| P4 | C4 | 4 |
| P5 | C5 | 5 |
T2:
| Prod_ID | Prod_Num |
| P1 | PN1 |
| P2 | PN2 |
| P3 | PN3 |
| P4 | PN4 |
| P5 | PN5 |
I rely on system tables to fetch table information. The data looks like
| tabname | colname |
| T1 | Prod_ID |
| T1 | Cust_Id |
| T1 | Value |
| T2 | Prod_ID |
| T2 | Prod_Num |
| T3 | .... |
If i want to find all tables with columns Prod_ID and Cust_ID, i could do the same using
SELECT tabname, count(*)
FROM syscat.columns
WHERE colname IN ('Prod_ID', 'Cust_Id')
GROUP BY tabname
HAVING count(*) > 1
Now, when i want to find how two columns across tables are linked, the query is getting complex.
For example: To find how Cust_Id and Prod_Num are linked, the expected output would be something like
| tabname | colname |
| T1 | Cust_id |
| T1 | Prod_id |
| T2 | Prod_id |
| T2 | Prod_Num |
Suggesting that Prod_Id is contained in both tables and can be used to map Cust_Id and Prod_num. Is there a script for getting something like above?
I would use self-joins for that.
SELECT c1.tabname, c2.colname joinCol, c3.tabname
FROM syscat.columns c1
JOIN syscat.columns c2 ON c1.tabname = c2.tabname
JOIN syscat.columns c3 ON c3.tabname != c2.tabname and c3.colname = c2.colname
JOIN syscat.columns c4 ON c4.tabname = c3.tabname and c3.colname = c2.colname
WHERE c1.colname = 'Cust_Id' and c4.colname = 'Prod_Num'
The output is the following:
tabname joinCol tabname
---------------------------
T1 Prod_id T2
which means that table t1 is joined with t2 using prod_id (cust_id and prod_num are on the input, therefore there is no need to have them on the output)
demo - it is SQL Server, however, JOIN will work in DB2 as well ;)

PostgreSQL select all from one table and join count from table relation

I have two tables, post_categories and posts. I'm trying to select * from post_categories;, but also return a temporary column with the count for each time a post category is used on a post.
Posts
| id | name | post_category_id |
| 1 | test | 1 |
| 2 | nest | 1 |
| 3 | vest | 2 |
| 4 | zest | 3 |
Post Categories
| id | name |
| 1 | cat_1 |
| 2 | cat_2 |
| 3 | cat_3 |
Basically, I'm trying to do this without subqueries and with joins instead. Something like this, but in real psql.
select * from post_categories some-type-of-join posts, count(*)
Resulting in this, ideally.
| id | name | count |
| 1 | cat_1 | 2 |
| 2 | cat_2 | 1 |
| 3 | cat_3 | 1 |
Your help is greatly appreciated :D
You can use a derived table that contains the counts per post_category_id and left join it to the post_categories table
select p.*, coalesce(t1.p_count,0)
from post_categories p
left join (
select post_category_id, count(*) p_count
from posts
group by post_category_id
) t1 on t1.post_category_id = p.id
select post_categories.id, post_categories.name , count(posts.id)
from post_categories
inner join posts
on post_category_id = post_categories.id
group by post_categories.id, post_categories.name