postgresql with recursive grabs whole table - sql

I have the following postgresql structure
\d brand_categories;
Table "public.brand_categories"
Column | Type | Modifiers
----------------------+---------+---------------------------------------------------------------
id | integer | not null default nextval('brand_categories_id_seq'::regclass)
category_code | text | not null
correlation_id | uuid | not null default uuid_generate_v4()
created_by_id | integer | not null
updated_by_id | integer | not null
parent_category_code | text |
I am trying to get all the parents and childs of a category via WITH RECURSIVE but not take siblings of a category. I tried to do the following (inside ruby code):
WITH RECURSIVE included_categories(category_code) AS (
SELECT category_code FROM brand_categories
WHERE category_code = 'beer'
UNION ALL
SELECT children.category_code FROM brand_categories AS parents, brand_categories AS children
WHERE parents.category_code = children.parent_category_code AND parents.category_code != 'alcohol'
UNION SELECT parents.category_code FROM brand_categories AS children, brand_categories AS parents
WHERE parents.category_code = children.parent_category_code
)
SELECT * from included_categories
The problem is that it takes the whole set of categories even though most are completely unrelated. Is there something wrong in this query?
Note that this is a simple categorization with a depth of 2 or 3.

My boss helped me to solve the problem, it made more sense to do it in 2 parts:
Find all parents
Find all children
Here is the sql:
WITH RECURSIVE children_of(category_code) AS (
SELECT category_code FROM brand_categories WHERE parent_category_code = 'alcohol'
UNION ALL
SELECT brand_categories.category_code FROM brand_categories
JOIN children_of ON brand_categories.parent_category_code = children_of.category_code
),
parents_of(parent_category_code) AS (
SELECT parent_category_code FROM brand_categories WHERE category_code = 'alcohol'
UNION
SELECT brand_categories.parent_category_code FROM parents_of
JOIN brand_categories ON brand_categories.category_code = parents_of.parent_category_code
)
SELECT category_code FROM (SELECT * FROM children_of UNION SELECT parent_category_code FROM parents_of) t0(category_code)
WHERE category_code IS NOT NULL

Related

How can I write a SQL query to calculate the quantity of components sold with their parent assemblies? (Postgres 11/recursive CTE?)

My goal
To calculate the sum of components sold as part of their parent assemblies.
I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.
Background
I'm running Postgres 11 on CentOS 7.
I have some tables like as follows:
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS (
SELECT
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.relator r
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold
------------+----------
subChild1 | 30
subChild2 | 6
(2 rows)
The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way.
Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter's answer linked above, I arrive at:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total
-----------+------------+-------
parent1 | child1 | 1
parent1 | child1 | 2
parent1 | subChild1 | 10
parent1 | subChild1 | 20
parent1 | subChild2 | 2
parent1 | subChild2 | 4
(6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:
sold_name | child_name | total
-----------+------------+-------
(0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!
Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
My entire query looks like this (which only differs in the details I mentioned above from yours):
demo:db<>fiddle
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category as category
FROM
sales s
JOIN relator r
ON s.sold_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category
FROM cte
JOIN relator r ON cte.child_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
Note: I didn't use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that's just the way I personally like it :)
Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :
SELECT
cte.child_name,
sum(cte.total)
FROM
(
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte ) AS cte
INNER JOIN
the_schema.relationships_with_child_catetgory r1
ON cte.child_name = r1.child_name
WHERE r1.child_category = 'component'
GROUP BY cte.child_name
;
which gives the desired rows:
child_name | sum
------------+-----
subChild2 | 6
subChild1 | 30
(2 rows)
Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.

Recursively get nested URLs from database

I have a Database table structured with nested URLs, using ParentID and ID to tell which piece of an URL belongs where.
Table structure looks like this:
+-----+----------+------------+-------------+
| ID | ParentID | Name | Url |
+-----+----------+------------+-------------+
| 1 | 0 | Categories | categories |
| 34 | 1 | Movies | movies |
| 281 | 34 | Star Wars | star-wars |
| 33 | 1 | Books | a-good-book |
+-----+----------+------------+-------------+
What I want to do is that I want to be able to recursively go through all of the fields, and according to the ParentID, save all the possible url combinations.
So, from the table above, I'd like to get the following output:
mysite.com/categories
mysite.com/categories/movies
mysite.com/categories/movies/star-wars
mysite.com/categories/books
mysite.com/categories/books/a-good-book
I've started writing a CTE, looking like this:
WITH CategoriesCTE AS
(
SELECT
Name,
Url,
ParentID,
ID
FROM myDB
WHERE ParentID = 1
UNION ALL
SELECT
a.Name,
a.Url,
a.ParentID,
a.ID
FROM myDB.a
INNER JOIN CategoriesCTE s on a.ParentID = s.ID
)
SELECT * FROM CategoriesCTE
Thing is, this database call saves everything flat. What I would have to do, is that for EACH step, save all urls, and then for each ID, save the url according to what the ParentID is. Right now it of course isn't formatted but my output is flatly something like:
mysite.com/categories
mysite.com/movies
mysite.com/star-wars
mysite.com/a-good-book
Which creates a lot of broken links.
Is there some way to do an action/select for each recursive step? How should I be approaching this problem?
Add a few of new fields to your recursive CTE to track:
Depth of recursion (so you can find the record with the greatest depth
The path which will be built through each iteration by concatenating the latest value to it.
The starting point of the recursion so you know what record you started with
WITH CategoriesCTE AS
(
SELECT Name, Url, ParentID, ID, 1 as depth, CAST(url as VARCHAR(500)) as path, url as startingpoint
FROM myDB
WHERE ParentID = 1
UNION ALL
SELECT a.Name, a.Url, a.ParentID, a.ID, s.depth + 1, a.url + s.path, s.url
FROM myDB.a
INNER JOIN CategoriesCTE s on a.ParentID = s.ID
)
SELECT * FROM CategoriesCTE
See what you think of this...
IF OBJECT_ID('tempdb..#SomeTable', 'U') IS NOT NULL
DROP TABLE #SomeTable;
CREATE TABLE #SomeTable (
ID INT NOT NULL,
ParentID INT NOT NULL,
FolderName VARCHAR(20) NOT NULL,
UrlPath VARCHAR(8000) NULL
);
INSERT #SomeTable (ID, ParentID, FolderName) VALUES
(1 , 0 , 'categories'),
(34 , 1 , 'movies'),
(281, 34, 'star-wars'),
(33 , 1 , 'a-good-book');
-- SELECT * FROM #SomeTable st;
WITH
cte_Categories AS (
SELECT
SitePath = CAST(CONCAT('mysite.com/', st.FolderName) AS VARCHAR(8000)),
st.ID,
NodeLevel = 1
FROM
#SomeTable st
WHERE
st.ParentID = 0
UNION ALL
SELECT
SitePath = CAST(CONCAT(c.SitePath, '/', st.FolderName) AS VARCHAR(8000)),
st.ID,
nodeLevel = c.NodeLevel + 1
FROM
cte_Categories c
JOIN #SomeTable st
ON c.ID = st.ParentID
)
SELECT
c.SitePath,
c.ID,
c.NodeLevel
FROM
cte_Categories c;

How do I trace former ids using a recursive query?

I have a table of provider information (providers) that contains the columns reporting_unit and predesessor. Predesessor is either
null or contains the reporting_unit that that row used to represent. I need
to find what the current reporting_unit for any provider is. By that I mean for any reporting_unit with a predesessor, that reporting_unit is the current_reporting_unit for the predesessor.
I am trying
to use a recursive CTE to accomplish this because some of the time
there are multiple links.
The table looks like this:
CREATE TABLE providers (
reporting_unit TEXT,
predesessor TEXT
);
INSERT INTO providers
VALUES
(NULL, NULL),
('ARE88', NULL),
('99BX7', '99BX6'),
('99BX6', '99BX5'),
('99BX5', NULL)
;
The results I would like to get from that are:
reporting_unit | current_reporting_unit
---------------------------------------
'99BX5' | '99BX7'
'99BX6' | '99BX7'
My current query is :
WITH RECURSIVE current_ru AS (
SELECT reporting_unit, predesessor
FROM providers
WHERE predesessor IS NULL
UNION ALL
SELECT P.reporting_unit, P.predesessor
FROM providers P
JOIN current_ru CR
ON P.reporting_unit = CR.predesessor
)
SELECT *
FROM current_ru
;
But that isn't giving me the results I'm looking for. I have tried a number of variations on this query but they all seem to end up in an infinite loop. How
You should find relations in the reverse order. Add depth column to find the deepest link:
with recursive current_ru (reporting_unit, predesessor, depth) as (
select reporting_unit, predesessor, 1
from providers
where predesessor is not null
union
select r.reporting_unit, p.predesessor, depth+ 1
from providers p
join current_ru r
on p.reporting_unit = r.predesessor
)
select *
from current_ru;
reporting_unit | predesessor | depth
----------------+-------------+-------
99BX7 | 99BX6 | 1
99BX6 | 99BX5 | 1
99BX6 | | 2
99BX7 | 99BX5 | 2
99BX7 | | 3
(5 rows)
Now switch the two columns, change their names, eliminate null rows and select the deepest links:
with recursive current_ru (reporting_unit, predesessor, depth) as (
select reporting_unit, predesessor, 1
from providers
where predesessor is not null
union
select r.reporting_unit, p.predesessor, depth+ 1
from providers p
join current_ru r
on p.reporting_unit = r.predesessor
)
select distinct on(predesessor)
predesessor reporting_unit,
reporting_unit current_reporting_unit
from current_ru
where predesessor is not null
order by predesessor, depth desc;
reporting_unit | current_reporting_unit
----------------+------------------------
99BX5 | 99BX7
99BX6 | 99BX7
(2 rows)

How to select parent ids

I have table with such structure.
ElementId | ParentId
-------------------
1 | NULL
2 | 1
3 | 2
4 | 3
Let say current element has Id 4. I want to select all parent ids.
Result should be: 3, 2, 1
How I can do it? DB is MSSQL
You can use recursive queries for this: http://msdn.microsoft.com/en-us/library/aa175801(SQL.80).aspx
You can use it like this:
with Hierachy(ElementID, ParentID, Level) as (
select ElementID, ParentID, 0 as Level
from table t
where t.ElementID = X -- insert parameter here
union all
select t.ElementID, t.ParentID, th.Level + 1
from table t
inner join Hierachy th
on t.ParentId = th.ElementID
)
select ElementID, ParentID
from Hierachy
where Level > 0
I think it might be easiest to do the following:
while parent != NULL
get parent of current element
I can't think of any way of doing this in plain SQL that wouldn't cause issues on larger databases.
if you want pure sql try:
select ParentId from myTable Desc
that would work in mysql... you might need to modify the Desc (sort in descending order) part

SQL Find all direct descendants in a tree

I have a tree in my database that is stored using parent id links.
A sample of what I have for data in the table is:
id | name | parent id
---+-------------+-----------
0 | root | NULL
1 | Node 1 | 0
2 | Node 2 | 0
3 | Node 1.1 | 1
4 | Node 1.1.1 | 3
5 | Node 1.1.2 | 3
Now I would like to get a list of all the direct descendants of a given node but if none exist I would like to have it just return the node itself.
I want the return for the query for children of id = 3 to be:
children
--------
4
5
Then the query for the children of id = 4 to be:
children
--------
4
I can change the way I am storing the tree to a nested set but I don't see how that would make the query I want possible.
In new PostgreSQL 8.4 you can do it with a CTE:
WITH RECURSIVE q AS
(
SELECT h, 1 AS level, ARRAY[id] AS breadcrumb
FROM t_hierarchy h
WHERE parent = 0
UNION ALL
SELECT hi, q.level + 1 AS level, breadcrumb || id
FROM q
JOIN t_hierarchy hi
ON hi.parent = (q.h).id
)
SELECT REPEAT(' ', level) || (q.h).id,
(q.h).parent,
(q.h).value,
level,
breadcrumb::VARCHAR AS path
FROM q
ORDER BY
breadcrumb
See this article in my blog for details:
PostgreSQL 8.4: preserving order for hierarchical query
In 8.3 or earlier, you'll have to write a function:
CREATE TYPE tp_hierarchy AS (node t_hierarchy, level INT);
CREATE OR REPLACE FUNCTION fn_hierarchy_connect_by(INT, INT)
RETURNS SETOF tp_hierarchy
AS
$$
SELECT CASE
WHEN node = 1 THEN
(t_hierarchy, $2)::tp_hierarchy
ELSE
fn_hierarchy_connect_by((q.t_hierarchy).id, $2 + 1)
END
FROM (
SELECT t_hierarchy, node
FROM (
SELECT 1 AS node
UNION ALL
SELECT 2
) nodes,
t_hierarchy
WHERE parent = $1
ORDER BY
id, node
) q;
$$
LANGUAGE 'sql';
and select from this function:
SELECT *
FROM fn_hierarchy_connect_by(4, 1)
The first parameter is the root id, the second should be 1.
See this article in my blog for more detail:
Hierarchical queries in PostgreSQL
Update:
To show only the first level children, or the node itself if the children do not exist, issue this query:
SELECT *
FROM t_hierarchy
WHERE parent = #start
UNION ALL
SELECT *
FROM t_hierarchy
WHERE id = #start
AND NOT EXISTS
(
SELECT NULL
FROM t_hierarchy
WHERE parent = #start
)
This is more efficient than a JOIN, since the second query will take but two index scans at most: the first one to make sure to find out if a child exists, the second one to select the parent row if no children exist.
Found a query that works the way I wanted.
SELECT * FROM
( SELECT id FROM t_tree WHERE name = '' ) AS i,
t_tree g
WHERE
( ( i.id = g.id ) AND
NOT EXISTS ( SELECT * FROM t_tree WHERE parentid = i.id ) ) OR
( ( i.id = g.parentid ) AND
EXISTS ( SELECT * FROM t_tree WHERE parentid = i.id ) )