How do I trace former ids using a recursive query? - sql

I have a table of provider information (providers) that contains the columns reporting_unit and predesessor. Predesessor is either
null or contains the reporting_unit that that row used to represent. I need
to find what the current reporting_unit for any provider is. By that I mean for any reporting_unit with a predesessor, that reporting_unit is the current_reporting_unit for the predesessor.
I am trying
to use a recursive CTE to accomplish this because some of the time
there are multiple links.
The table looks like this:
CREATE TABLE providers (
reporting_unit TEXT,
predesessor TEXT
);
INSERT INTO providers
VALUES
(NULL, NULL),
('ARE88', NULL),
('99BX7', '99BX6'),
('99BX6', '99BX5'),
('99BX5', NULL)
;
The results I would like to get from that are:
reporting_unit | current_reporting_unit
---------------------------------------
'99BX5' | '99BX7'
'99BX6' | '99BX7'
My current query is :
WITH RECURSIVE current_ru AS (
SELECT reporting_unit, predesessor
FROM providers
WHERE predesessor IS NULL
UNION ALL
SELECT P.reporting_unit, P.predesessor
FROM providers P
JOIN current_ru CR
ON P.reporting_unit = CR.predesessor
)
SELECT *
FROM current_ru
;
But that isn't giving me the results I'm looking for. I have tried a number of variations on this query but they all seem to end up in an infinite loop. How

You should find relations in the reverse order. Add depth column to find the deepest link:
with recursive current_ru (reporting_unit, predesessor, depth) as (
select reporting_unit, predesessor, 1
from providers
where predesessor is not null
union
select r.reporting_unit, p.predesessor, depth+ 1
from providers p
join current_ru r
on p.reporting_unit = r.predesessor
)
select *
from current_ru;
reporting_unit | predesessor | depth
----------------+-------------+-------
99BX7 | 99BX6 | 1
99BX6 | 99BX5 | 1
99BX6 | | 2
99BX7 | 99BX5 | 2
99BX7 | | 3
(5 rows)
Now switch the two columns, change their names, eliminate null rows and select the deepest links:
with recursive current_ru (reporting_unit, predesessor, depth) as (
select reporting_unit, predesessor, 1
from providers
where predesessor is not null
union
select r.reporting_unit, p.predesessor, depth+ 1
from providers p
join current_ru r
on p.reporting_unit = r.predesessor
)
select distinct on(predesessor)
predesessor reporting_unit,
reporting_unit current_reporting_unit
from current_ru
where predesessor is not null
order by predesessor, depth desc;
reporting_unit | current_reporting_unit
----------------+------------------------
99BX5 | 99BX7
99BX6 | 99BX7
(2 rows)

Related

How can I write a SQL query to calculate the quantity of components sold with their parent assemblies? (Postgres 11/recursive CTE?)

My goal
To calculate the sum of components sold as part of their parent assemblies.
I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.
Background
I'm running Postgres 11 on CentOS 7.
I have some tables like as follows:
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS (
SELECT
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.relator r
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold
------------+----------
subChild1 | 30
subChild2 | 6
(2 rows)
The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way.
Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter's answer linked above, I arrive at:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total
-----------+------------+-------
parent1 | child1 | 1
parent1 | child1 | 2
parent1 | subChild1 | 10
parent1 | subChild1 | 20
parent1 | subChild2 | 2
parent1 | subChild2 | 4
(6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:
sold_name | child_name | total
-----------+------------+-------
(0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!
Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
My entire query looks like this (which only differs in the details I mentioned above from yours):
demo:db<>fiddle
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
s.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category as category
FROM
sales s
JOIN relator r
ON s.sold_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
UNION ALL
SELECT
cte.sold_name,
cte.sold_quantity,
r.child_name,
r.child_quantity,
nc.thing_category
FROM cte
JOIN relator r ON cte.child_name = r.parent_name
JOIN names_categories nc
ON r.child_name = nc.thing_name
)
SELECT
child_name,
SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
Note: I didn't use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that's just the way I personally like it :)
Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :
SELECT
cte.child_name,
sum(cte.total)
FROM
(
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte ) AS cte
INNER JOIN
the_schema.relationships_with_child_catetgory r1
ON cte.child_name = r1.child_name
WHERE r1.child_category = 'component'
GROUP BY cte.child_name
;
which gives the desired rows:
child_name | sum
------------+-----
subChild2 | 6
subChild1 | 30
(2 rows)
Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.

SQL get top level object from joins

Working on a query right now where we want to understand which business is referring the most downstream orders for us. I've put together a very basic table for demonstration purposes here with 4 businesses listed. Bar and Donut were both ultimately referred by Foo and I want to be able to show Foo as a business has generated X number of orders. Obviously getting the the single referral for Foo (from Bar) and Bar (from Donut) are simple joins. But how do you go from Bar to get back to Foo?
I'll add that I've done some more googling this AM and found a few very similar questions about the top level parent and most of the responses suggest recursive CTE. It's been awhile since I've dug deep into SQL stuff, but 8 years ago I know these were not overly popular. Is there another way around this? Perhaps better to just store that parent ID on the order table at the time of order?
+----+--------+--------------------+
| Id | Name | ReferralBusinessId |
+----+--------+--------------------+
| 1 | Foo | |
| 2 | Bar | 1 |
| 3 | Donut | 2 |
| 4 | Coffee | |
+----+--------+--------------------+
WITH RECURSIVE entity_hierarchy AS (
SELECT id, name, parent FROM entities WHERE name = 'Donut'
UNION
SELECT e.id, e.name, e.parent FROM entities e INNER JOIN entity_hierarchy eh on e.id = eh.parent
)
SELECT id, name, parent FROM entity_hierarchy;
SQL Fiddle Example
Assuming you're using SQL Server, you could use a query like the one below to generate a hierarchical Id path for a particular business.
declare #tbl as table (Id int, Name varchar(30), ReferralBusinessId int)
insert into #tbl (id, Name, ReferralBusinessId) values
(1, 'Foo', null),
(2, 'Bar', 1),
(3, 'Donut', 2),
(4, 'Coffee', null);
;WITH business AS (
SELECT Id, Name, ReferralBusinessId
, 0 AS Level
, CAST(Id AS VARCHAR(255)) AS Path
FROM #tbl
UNION ALL
SELECT R.Id, R.Name, R.ReferralBusinessId
, Level + 1
, CAST(Path + '.' + CAST(R.Id AS VARCHAR(255)) AS VARCHAR(255))
FROM #tbl R
INNER JOIN business b ON b.Id = R.ReferralBusinessId
)
SELECT * FROM business ORDER BY Path

Recursive view that sum value from double tree structure SQL Server

First sorry for numerous repost of my question, I'm new around and getting used to properly and clearly asking questions.
I'm working on a recursive view that sum up values from a double tree structure.
I have researched around and found many questions about recursive sums but none of their solutions seemed to work for my issue specifically.
As of now I have issues aggregating the values in the right cells, the logic being i need the sum of each element per year in it's parent and also the sum of all the years for a given element.
Here is a fiddle of my tables and actual script:
SQL Fiddle
And here is a screenshot of the output I'm looking for:
My question is:
How can I get my view to aggregate the value from child to parent in this double tree structure?
If I understand your question correctly, you are trying to get an aggregation at 2 different levels to show in a single result set.
Clarification Scenario:
Below is an over-simplified sample data set for what I believe you are trying to achieve.
create table #agg_table
(
group_one int
, group_two int
, group_val int
)
insert into #agg_table
values (1, 1, 6)
, (1, 1, 7)
, (1, 2, 8)
, (1, 2, 9)
, (2, 3, 10)
, (2, 3, 11)
, (2, 4, 12)
, (2, 4, 13)
Given the sample data above, you want want to see the following output:
+-----------+-----------+-----------+
| group_one | group_two | group_val |
+-----------+-----------+-----------+
| 1 | NULL | 30 |
| 1 | 1 | 13 |
| 1 | 2 | 17 |
| 2 | NULL | 46 |
| 2 | 3 | 21 |
| 2 | 4 | 25 |
+-----------+-----------+-----------+
This output can be achieved by making use of the group by grouping sets
(example G. in the link) syntax in SQL Server as shown in the query below:
select a.group_one
, a.group_two
, sum(a.group_val) as group_val
from #agg_table as a
group by grouping sets
(
(
a.group_one
, a.group_two
)
,
(
a.group_one
)
)
order by a.group_one
, a.group_two
What that means for your scenario, is that I believe your Recursive-CTE is not the issue. The only thing that needs to change is in the final select query from the entire CTE.
Answer:
with Temp (EntityOneId, EntityOneParentId, EntityTwoId, EntityTwoParentId, Year, Value)
as
(
SELECT E1.Id, E1.ParentId, E2.Id, E2.ParentId, VY.Year, VY.Value
FROM ValueYear AS VY
FULL OUTER JOIN EntityOne AS E1
ON VY.EntityOneId = E1.Id
FULL OUTER JOIN EntityTwo AS E2
ON VY.EntityTwoId = E2.Id
),
T (EntityOneId, EntityOneParentId, EntityTwoId, EntityTwoParentId, Year, Value, Levels)
as
(
Select
T1.EntityOneId,
T1.EntityOneParentId,
T1.EntityTwoId,
T1.EntityTwoParentId,
T1.Year,
T1.Value,
0 as Levels
From
Temp
As T1
Where
T1.EntityOneParentId is null
union all
Select
T1.EntityOneId,
T1.EntityOneParentId,
T1.EntityTwoId,
T1.EntityTwoParentId,
T1.Year,
T1.Value,
T.Levels +1
From
Temp
AS T1
join
T
On T.EntityOneId = T1.EntityOneParentId
)
Select
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId,
T.Year,
sum(T.Value) as Value
from T
group by grouping sets
(
(
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId,
T.Year
)
,
(
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId
)
)
order by T.EntityOneID
, T.EntityOneParentID
, T.EntityTwoID
, T.EntityTwoParentID
, T.Year
FYI - I believe the sample data did not have the records necessary to match the expected output completely, but the last 20 records in the SQL Fiddle match the expected output perfectly.

Redshift split single dynamic column into multiple rows in new table

With a table like:
uid | segmentids
-------------------------+----------------------------------------
f9b6d54b-c646-4bbb-b0ec | 4454918|4455158|4455638|4455878|4455998
asd7a0s9-c646-asd7-b0ec | 1265899|1265923|1265935|1266826|1266596
gd3355ff-cjr8-assa-fke0 | 2237557|2237581|2237593
laksnfo3-kgi5-fke0-b0ec | 4454918|4455158|4455638|4455878
How to create a new table with:
uid | segmentids
-------------------------+---------------------------
f9b6d54b-c646-4bbb-b0ec | 4454918
f9b6d54b-c646-4bbb-b0ec | 1265899
f9b6d54b-c646-4bbb-b0ec | 2237557
f9b6d54b-c646-4bbb-b0ec | 4454918
f9b6d54b-c646-4bbb-b0ec | 4454918
asd7a0s9-c646-asd7-b0ec | 1265899
asd7a0s9-c646-asd7-b0ec | 1265923
asd7a0s9-c646-asd7-b0ec | 1265935
asd7a0s9-c646-asd7-b0ec | 1266826
asd7a0s9-c646-asd7-b0ec | 1266596
The number of segments are dynamic, can vary with each record.
I tried the Split function with delimiter, but it requires the index in string, which is dynamic here.
Any suggestions?
Here is the Redshift answer, it will work with up to 10 thousand segment ids values per row.
test data
create table test_split (uid varchar(50),segmentids varchar(max));
insert into test_split
values
('f9b6d54b-c646-4bbb-b0ec','4454918|4455158|4455638|4455878|4455998'),
('asd7a0s9-c646-asd7-b0ec','1265899|1265923|1265935|1266826|1266596'),
('asd7345s9-c646-asd7-b0ec','1235935|1263456|1265675696'),
('as345a0s9-c646-asd7-b0ec','12765899|12658883|12777935|144466826|1266226|12345')
;
code
with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
, generted_numbers AS
(
SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
)
, splitter AS
(
SELECT *
FROM generted_numbers
WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(segmentids, '\\|') + 1)
FROM test_split)
)
--select * from splitter;
, expanded_input AS
(
SELECT
uid,
split_part(segmentids, '|', s.gen_num) AS segment
FROM test_split AS ts
JOIN splitter AS s ON 1 = 1
WHERE split_part(segmentids, '|', s.gen_num) <> ''
)
SELECT * FROM expanded_input;
the first 2 cte steps (ten_numbers and generated_numbers) are used to generate a number of rows, this is needed because generate_series is not supported
The next step (splitter) just takes a number of rows equal to the max number of delimiters + 1 (which is the max number of segments)
finally, we cross join splitter with the input data, take the related value using split_part and then exclude blank parts (which are caused where the row has < the max number of segments)
You can iterate over the SUPER array returned by split_to_array -- see the "Unnesting and flattening" section of this post. Using the same test_split table as the previous answer:
WITH seg_array AS
(SELECT uid,
split_to_array(segmentids, '|') segs
FROM test_split)
SELECT uid,
segmentid::int
FROM seg_array a,
a.segs AS segmentid;
Redshift now has the super data type & the split_to_array function which is similar to postgresql string_to_array
Redshift now also supports unnesting arrays through a syntax similar to a LATERAL JOIN in postgresql.
Using these techniques, we may write the same transformation in 2022 as
WITH split_up AS (
SELECT
uid
, split_to_array(segmentids) segment_array
)
SELECT
su.uid
, CAST(sid AS VARCHAR) segmentid
FROM split_up su
JOIN split_up.segment_array sid ON TRUE

postgresql with recursive grabs whole table

I have the following postgresql structure
\d brand_categories;
Table "public.brand_categories"
Column | Type | Modifiers
----------------------+---------+---------------------------------------------------------------
id | integer | not null default nextval('brand_categories_id_seq'::regclass)
category_code | text | not null
correlation_id | uuid | not null default uuid_generate_v4()
created_by_id | integer | not null
updated_by_id | integer | not null
parent_category_code | text |
I am trying to get all the parents and childs of a category via WITH RECURSIVE but not take siblings of a category. I tried to do the following (inside ruby code):
WITH RECURSIVE included_categories(category_code) AS (
SELECT category_code FROM brand_categories
WHERE category_code = 'beer'
UNION ALL
SELECT children.category_code FROM brand_categories AS parents, brand_categories AS children
WHERE parents.category_code = children.parent_category_code AND parents.category_code != 'alcohol'
UNION SELECT parents.category_code FROM brand_categories AS children, brand_categories AS parents
WHERE parents.category_code = children.parent_category_code
)
SELECT * from included_categories
The problem is that it takes the whole set of categories even though most are completely unrelated. Is there something wrong in this query?
Note that this is a simple categorization with a depth of 2 or 3.
My boss helped me to solve the problem, it made more sense to do it in 2 parts:
Find all parents
Find all children
Here is the sql:
WITH RECURSIVE children_of(category_code) AS (
SELECT category_code FROM brand_categories WHERE parent_category_code = 'alcohol'
UNION ALL
SELECT brand_categories.category_code FROM brand_categories
JOIN children_of ON brand_categories.parent_category_code = children_of.category_code
),
parents_of(parent_category_code) AS (
SELECT parent_category_code FROM brand_categories WHERE category_code = 'alcohol'
UNION
SELECT brand_categories.parent_category_code FROM parents_of
JOIN brand_categories ON brand_categories.category_code = parents_of.parent_category_code
)
SELECT category_code FROM (SELECT * FROM children_of UNION SELECT parent_category_code FROM parents_of) t0(category_code)
WHERE category_code IS NOT NULL