Is the outer WHERE clause optimized in a Recursive CTE? - sql

With the following table definition:
CREATE TABLE Nodes(id INTEGER, child INTEGER);
INSERT INTO Nodes(id, child) VALUES(1, 10);
INSERT INTO Nodes(id, child) VALUES(1, 11);
INSERT INTO Nodes(id, child) VALUES(1, 12);
INSERT INTO Nodes(id, child) VALUES(10, 100);
INSERT INTO Nodes(id, child) VALUES(10, 101);
INSERT INTO Nodes(id, child) VALUES(10, 102);
INSERT INTO Nodes(id, child) VALUES(2, 20);
INSERT INTO Nodes(id, child) VALUES(2, 21);
INSERT INTO Nodes(id, child) VALUES(2, 22);
INSERT INTO Nodes(id, child) VALUES(20, 200);
INSERT INTO Nodes(id, child) VALUES(20, 201);
INSERT INTO Nodes(id, child) VALUES(20, 202);
With the following query:
WITH RECURSIVE members(base, id, level) AS (
SELECT n1.id, n1.id, 0
FROM Nodes n1
LEFT OUTER JOIN Nodes n2 ON n2.child = n1.id
WHERE n2.id IS NULL
UNION
SELECT m.base, n.child, m.level + 1
FROM members m
INNER JOIN Nodes n ON m.id=n.id
)
SELECT m.id, m.level
FROM members m
WHERE m.base IN (1)
Is the outer WHERE clause optimized in a Recursive CTE? An alternate that I have considered using is:
WITH RECURSIVE members(id, level) AS (
VALUES (1, 0)
UNION
SELECT n.child, m.level + 1
FROM members m
INNER JOIN Nodes n ON m.id=n.id
)
SELECT m.id, m.level
FROM members m
but it has the problem of not being able to create a view out of it. Therefore, if the performance difference between the two is minimal, I'd prefer to create a view out of the recursive CTE and then just query that.

To be able to apply the WHERE clause to the queries inside the CTE, the database would be required to prove that
all values in the first column are unchanged by the recursion and go back to the base query, and, in general, that
it is not possible for any filtered-out row to have any children that could show up in the result of the query, or affect the CTE in any other way.
Such a prover does not exist.
See restriction 22 of Subquery flattening.

To see why your first query is non-optimal, try running both with UNION ALL instead of just UNION. With the sample data given, the first will return 21 rows while the second returns only 7.
The duplicate rows in the actual first query are subsequently eliminated by performing a sort and duplicate elimination, while this step is not necessary in the actual second query.

Related

Can you sort the result in GROUP BY?

I have two tables one is objects with the attribute of id and is_green.The other table is object_closure with the attributes of ancestor_id, descendant_od, and created_at. ie.
Objects: id, is_green
Object_closure: ancestor_id, descendant_od, created_at
There are more attributes in the Object table but not necessary to mention in this question.
I have a query like this:
-- create a table
CREATE TABLE objects (
id INTEGER PRIMARY KEY,
is_green boolean
);
CREATE TABLE object_Closure (
ancestor_id INTEGER ,
descendant_id INTEGER,
created_at date
);
-- insert some values
INSERT INTO objects VALUES (1, 1 );
INSERT INTO objects VALUES (2, 1 );
INSERT INTO objects VALUES (3, 1 );
INSERT INTO objects VALUES (4, 0 );
INSERT INTO objects VALUES (5, 1 );
INSERT INTO objects VALUES (6, 1 );
INSERT INTO object_Closure VALUES (1, 2, 12-12-2020 );
INSERT INTO object_Closure VALUES (1, 3, 12-13-2020 );
INSERT INTO object_Closure VALUES (2, 3, 12-14-2020 );
INSERT INTO object_Closure VALUES (4, 5, 12-15-2020 );
INSERT INTO object_Closure VALUES (4, 6, 12-16-2020 );
INSERT INTO object_Closure VALUES (5, 6, 12-17-2020 );
-- fetch some values
SELECT
O.id,
P.id,
group_concat(DISTINCT P.id ) as p_ids
FROM objects O
LEFT JOIN object_Closure OC on O.id=OC.descendant_id
LEFT JOIN objects P on OC.ancestor_id=P.id AND P.is_green=1
GROUP BY O.id
The result is
query result
I would like to see P.id for O.id=6 is also 5 instead of null. Afterall,5 is still a parentID (p.id). More importantly, I also want the id shown in P.id as the first created id if there are more than one. (see P.created_at).
I understand the reason why it happens is that the first one the system pick is null, and the null was created by the join with the condition of is_green; however, I need to filter out those objects that are green only in the p.id.
I cannot do an inner join (because I need the other attributes of the table and sometimes both P.id and p_ids are null, but still need to show in the result) I cannot restructure the database. It is already there and cannot be changed. I also cannot just use a Min() or Max() aggregation because I want the ID that is picked is the first created one.
So is there a way to skip the null in the join?
or is there a way to filter the selection in the select clause?
or do an order by before the grouping?
P.S. My original code concat the P.id by the order of P.created_at. For some reason, I cannot replicate it in the online SQL simulator.

Referencing a calculated field in SQL statement

I have the following schema
CREATE TABLE QUOTE (id int, amount int);
CREATE TABLE QUOTE_LINE (id int, quote_id int, line_amount int);
INSERT INTO QUOTE VALUES(1, 100);
INSERT INTO QUOTE VALUES(2, 200);
INSERT INTO QUOTE VALUES(3, 100);
INSERT INTO QUOTE VALUES(4, 300);
INSERT INTO QUOTE_LINE VALUES(1, 1, 5);
INSERT INTO QUOTE_LINE VALUES(2, 1, 6);
INSERT INTO QUOTE_LINE VALUES(3, 1, 4);
INSERT INTO QUOTE_LINE VALUES(4, 1, 2);
INSERT INTO QUOTE_LINE VALUES(1, 2, 5);
INSERT INTO QUOTE_LINE VALUES(2, 2, 5);
INSERT INTO QUOTE_LINE VALUES(3, 2, 5);
INSERT INTO QUOTE_LINE VALUES(4, 2, 5);
And I need to run the following query:
SELECT QUOTE.id,
line_amount*12 AS amount,
amount*2 as amount_doubled
from QUOTE_LINE
LEFT JOIN QUOTE ON QUOTE_LINE.quote_id=QUOTE.id;
The 3rd line in the query amount*2 as amount_double needs to reference the amount calculated in the prior line i.e. line_amount*12 AS amount.
However if I run this query, it picks the amount from the QUOTE table instead the amount that was calculated. How can I make my query use the calculated amount without changing the name of the calculated field?
Here is the sqlfiddle for this:
http://sqlfiddle.com/#!17/914b2/1
Note: I understand that I can create a sub-query, CTE or a lateral join, but the tables I am working are very very wide tables, and the queries have many many joins. As such, I need to keep the LEFT INNER JOINS and also I don't always know if a calculated field will be duplicated in JOINed table or not. Table structures change.
Move the definition to the FROM clause using a LATERAL JOIN:
select q.id, v.amount, v.amount * 2 as as amount_doubled
from QUOTE_LINE ql left join
QUOTE q
on ql.quote_id = q.id CROSS JOIN LATERAL
(values (line_amount*12)) v(amount);
You can also use a subquery or CTE, but I like the lateral join method.
Note: I would expect QUOTE to be the first table in the LEFT JOIN.
Qualify all column names with the table name and use a subquery:
SELECT q.id,
q.amount,
q.amount * 2 AS amount_doubled
FROM (SELECT quote.id,
quote_line.line_amount * 12 AS amount,
FROM quote_line
LEFT JOIN quite
ON quote_line.quote_id = quote.id
) AS q;
Just a little simple algebra resolves the issue quite easily. It is clear that calculated amount is 12 times the line_amount and that amount_doubles is 2 times that. So
select q.id
, ql.line_amount*12 as amount
, ql.line_amount*12*2 as amount_doubled
from quote_line ql
left join quote q
on ql.quote_id = q.id;
However, child left join parent seems strange as it basically says "Give me the quote line amounts where there is no quote". One would hope a FK from line to quote would prevent that from happening.
If so then a inner join would suffice. Further if the id is the only column from quote the join can removed by taking quote_id from quote_line. So perhaps reducing to:
select ql.quote_id as id
, ql.line_amount*12 as amount
, ql.line_amount*24 as amount_doubled
from quote_line ql;

Creating cte statement to traverse a tree created with a bridge table

I will preface this by saying that I am completely new to cte statements.
I have 2 tables an items table and a linker table. the linker table containing the parent id and child id of the relationship.
I have seen lots of examples on how to traverse a tree with what I believe is called an associative system, where the parent id is stored on the child record instead of in a bridge table. But I can not seem to extrapolate this out to my scenario.
I believe I need to have a bridge table like this because any single item could have multiple parents and each parent will most likely have multiple children.
I hacked together my own way of traversing the tree but it was within another language, xojo, and I really did not like how it turned out. Essentially I was using recursive functions to dig down into each tree and queried the database each time I needed a child.
Now I am trying to create a cte statement that does the same thing. Keeping the descendants ordered below the parents is not a huge deal
So I have created a sample database and other materials to describe my issue:
This diagram shows visually what the relationships are:
diagram
This is what I would like returned from the database (some items show up in multiple places:
1 : Audio
2 : Speaker
3 : Microphone
4 : Mic Pack
3 : Microphone
5 : Di
6 : Passive Di
11 : Rapco Di
13 : Dbx Di
7 : Lighting
9 : Safety
12 : Small Safety
8 : Rigging
10 : Light Rigging
9 : Safety
12 : Small Safety
An example table:
CREATE TABLE items ( id INTEGER, name TEXT, PRIMARY KEY(id) );
CREATE TABLE `linker` ( `parent` INTEGER, `child` INTEGER, PRIMARY KEY(`parent`,`child`) );
Insert Into items(id, name) Values(1, 'Audio');
Insert Into items(id, name) Values(2, 'Speaker');
Insert Into items(id, name) Values(3, 'Microphone');
Insert Into items(id, name) Values(4, 'Mic Pack');
Insert Into items(id, name) Values(5, 'Di');
Insert Into items(id, name) Values(6, 'Passive Di');
Insert Into items(id, name) Values(7, 'Lighting');
Insert Into items(id, name) Values(8, 'Rigging');
Insert Into items(id, name) Values(9, 'Safety');
Insert Into items(id, name) Values(10, 'Lighting Rigging');
Insert Into items(id, name) Values(11, 'Rapco Di');
Insert Into items(id, name) Values(12, 'Small Safety');
Insert Into items(id, name) Values(13, 'Dbx Di');
Insert Into linker(parent, child) Values(1, 2);
Insert Into linker(parent, child) Values(1, 4);
Insert Into linker(parent, child) Values(1, 3);
Insert Into linker(parent, child) Values(4, 3);
Insert Into linker(parent, child) Values(4, 5);
Insert Into linker(parent, child) Values(5, 6);
Insert Into linker(parent, child) Values(6, 11);
Insert Into linker(parent, child) Values(6, 13);
Insert Into linker(parent, child) Values(7, 9);
Insert Into linker(parent, child) Values(9, 12);
Insert Into linker(parent, child) Values(8, 10);
Insert Into linker(parent, child) Values(10, 9);
This is the cte that I came up with that I believe came the closest, but its probably still pretty far off:
with cte As
(
Select
id,
name,
0 as level,
Cast(name as varchar(255) as sort
From items i
Left outer Join
linker li
On i.id = li.child
And li.parent is Null
Union All
Select
id,
name,
cte.level + 1,
Cast(cte.sort + '.' + i.name As Varchar(255)) as sort
From cte
Left Outer Join linker li
on li.child = cte.id
Inner Join items i
On li.parent = i.id
)
Select
id,
name,
level,
sort
From cte
Order By Sort;
Thanks for any help in advance. I am very open to the idea that everything I am doing from the data structure up is wrong, so keep that in mind when you are answering.
Edit: It is probably worth noting that the results don't need to be in order. I plan on creating a ancestry path field in the cte statement, and using that patb to populate my tree.
Edit: oops, I copied and pasted the wrong bit of cte code. I am on mobile so I did my best to change it to what I was doing on my desktop. Once I have a chance I will double check the cte statement against my notes.
Alright, seems like my brain just needed some rest. I have figured out a solution to my issue with much less frustration than yesterday.
My cte statement was:
with cte As(
Select
id,
name,
li.parent,
li.child,
Cast(name as Varchar(255)) as ancestory
From items i
Left Outer Join linker li
On i.id = li.child
Where li.parent is null
Union All
Select
i.id,
i.name,
li.parent,
li.child,
Cast(cte.ancestory || "." || i.name as Varchar(100)) as ancestory
From cte
Left join linker li
On cte.id = li.parent
Inner Join items i
On li.child = i.id
)
select * from cte
The results come back as:
id name parent child ancestory
1 Audio Audio
7 Lighting Lighting
8 Rigging Rigging
2 Speaker 1 2 Audio.Speaker
3 Microphone 1 3 Audio.Microphone
4 Mic Pack 1 4 Audio.Mic Pack
9 Safety 7 9 Lighting.Safety
10 Lighting 8 10 Rigging.Lighting Rigging
Rigging
3 Microphone 4 3 Audio.Mic Pack.Microphone
5 Di 4 5 Audio.Mic Pack.Di
12 Small 9 12 Lighting.Safety.Small Safety
Safety
9 Safety 10 9 Rigging.Lighting Rigging.Safety
6 Passive Di 5 6 Audio.Mic Pack.Di.Passive Di
12 Small Safety 9 12 Rigging.Lighting Rigging.Safety.Small Safety
11 Rapco Di 6 11 Audio.Mic Pack.Di.Passive Di.Rapco Di
13 Dbx Di 6 13 Audio.Mic Pack.Di.Passive Di.Dbx Di

oracle correlated subquery using distinct listagg

I have an interesting query I'm trying to figure out. I have a view which is getting a column added to it. This column is pivoted data coming from other tables, to form into a single row. Now, I need to wipe out duplicate entries in this pivoted data. Listagg is great for getting the data to a single row, but I need to make it unique. While I know how to make it unique, I'm tripping up on the fact that correlated sub-queries only go 1 level deep. So... not really sure how to get a distinct list of values. I can get it to work if I don't do the distinct just fine. Anyone out there able to work some SQL magic?
Sample data:
drop table test;
drop table test_widget;
create table test (id number, description Varchar2(20));
create table test_widget (widget_id number, test_fk number, widget_type varchar2(20));
insert into test values(1, 'cog');
insert into test values(2, 'wheel');
insert into test values(3, 'spring');
insert into test_widget values(1, 1, 'A');
insert into test_widget values(2, 1, 'A');
insert into test_widget values(3, 1, 'B');
insert into test_widget values(4, 1, 'A');
insert into test_widget values(5, 2, 'C');
insert into test_widget values(6, 2, 'C');
insert into test_widget values(7, 2, 'B');
insert into test_widget values(8, 3, 'A');
insert into test_widget values(9, 3, 'C');
insert into test_widget values(10, 3, 'B');
insert into test_widget values(11, 3, 'B');
insert into test_widget values(12, 3, 'A');
commit;
Here is an example of the query that works, but shows duplicate data:
SELECT A.ID
, A.DESCRIPTION
, (SELECT LISTAGG (WIDGET_TYPE, ', ') WITHIN GROUP (ORDER BY WIDGET_TYPE)
FROM TEST_WIDGET
WHERE TEST_FK = A.ID) widget_types
FROM TEST A
Here is an example of what does NOT work due to the depth of where I try to reference the ID:
SELECT A.ID
, A.DESCRIPTION
, (SELECT LISTAGG (WIDGET_TYPE, ', ') WITHIN GROUP (ORDER BY WIDGET_TYPE)
FROM (SELECT DISTINCT WIDGET_TYPE
FROM TEST_WIDGET
WHERE TEST_FK = A.ID))
WIDGET_TYPES
FROM TEST A
Here is what I want displayed:
1 cog A, B
2 wheel B, C
3 spring A, B, C
If anyone knows off the top of their head, that would fantastic! Otherwise, I can post up some sample create statements to help you with dummy data to figure out the query.
You can apply the distinct in a subquery, which also has the join - avoiding the level issue:
SELECT ID
, DESCRIPTION
, LISTAGG (WIDGET_TYPE, ', ')
WITHIN GROUP (ORDER BY WIDGET_TYPE) AS widget_types
FROM (
SELECT DISTINCT A.ID, A.DESCRIPTION, B.WIDGET_TYPE
FROM TEST A
JOIN TEST_WIDGET B
ON B.TEST_FK = A.ID
)
GROUP BY ID, DESCRIPTION
ORDER BY ID;
ID DESCRIPTION WIDGET_TYPES
---------- -------------------- --------------------
1 cog A, B
2 wheel B, C
3 spring A, B, C
I was in a unique situation using the Pentaho reports writer and some inconsistent data. The Pentaho writer uses Oracle to query data, but has limitations. The data pieces were unique but not classified in a consistent manner, so I created a nested listagg inside of a left join to present the data the way I wanted to:
left join
(
select staff_id, listagg(thisThing, ' --- '||chr(10) ) within group (order by this) as SCHED_1 from
(
SELECT
staff_id, RPT_STAFF_SHIFTS.ORGANIZATION||': '||listagg(
RPT_STAFF_SHIFTS.DAYS_OF_WEEK
, ',' ) within group (order by BEGIN_DATE desc)
as thisThing
FROM "RPT_STAFF_SHIFTS" where "RPT_STAFF_SHIFTS"."END_DATE" is null
group by staff_id, organization)
group by staff_id
) schedule_1 on schedule_1.staff_id = "RPT_STAFF"."STAFF_ID"
where "RPT_STAFF"."STAFF_ID" ='555555'
This is a different approach than using the nested query, but it some situations it might work better by taking into account the level issue when developing the query and taking an extra step to fully concatenate the results.

SQL: couple people who assisted to the same event

create table people(
id_pers int,
nom_pers char(25),
d_nais date,
d_mort date,
primary key(id_pers)
);
create table event(
id_evn int,
primary key(id_evn)
);
create table assisted_to(
id_pers int,
id_evn int,
foreign key (id_pers) references people(id_pers),
foreign key (id_evn) references event(id_evn)
);
insert into people(id_pers, nom_pers, d_nais, d_mort) values (1, 'A', current_date - integer '20', current_date);
insert into people(id_pers, nom_pers, d_nais, d_mort) values (2, 'B', current_date - integer '50', current_date - integer '20');
insert into people(id_pers, nom_pers, d_nais, d_mort) values (3, 'C', current_date - integer '25', current_date - integer '20');
insert into event(id_evn) values (1);
insert into event(id_evn) values (2);
insert into event(id_evn) values (3);
insert into event(id_evn) values (4);
insert into event(id_evn) values (5);
insert into assisted_to(id_pers, id_evn) values (1, 5);
insert into assisted_to(id_pers, id_evn) values (2, 5);
insert into assisted_to(id_pers, id_evn) values (2, 4);
insert into assisted_to(id_pers, id_evn) values (3, 5);
insert into assisted_to(id_pers, id_evn) values (3, 4);
insert into assisted_to(id_pers, id_evn) values (3, 3);
I need to find couples who assisted to the same event on any particular day.
I tried:
select p1.id_pers, p2.id_pers from people p1, people p2, assisted_event ae
where ae.id_pers = p1.id_pers
and ae.id_pers = p2.id_pers
But returns 0 rows.
What am I doing wrong?
Try this:
select distint ae.id_evn,
p1.nom_pers personA, p2.nom_pers PersonB
from assieted_to ae
Join people p1
On p1.id_pers = ae.id_pers
Join people p2
On p2.id_pers = ae.id_pers
And p2.id_pers > p1.id_pers
This generates all pairs of people [couples] who assisted on the same event. With your schema, there is no way to restrict the results to cases where they assisted on the same day. The assumption is that if they assisted on the same event, then that event can only have occurred on one day.
You select two persons, so you need to select two assisted_event rows as well, because each person has its own assignment row in the assisted_event table. The idea is to build a link between p1 and p2 through a pair of assisted_event rows sharing the same id_evn
select p1.id_pers, p2.id_pers
from people p1, people p2
where exists (
select *
from assisted_event e1
join assisted_event e2 on e1.id_evn=e2.id_evn
where e1.id_pers=p1.id_pers and e2.id_pers=p2.id_pers
)
When re-phrased into ANSI JOIN syntax so I can read it, your query reads:
select p1.id_pers, p2.id_pers
from assisted_event ae
inner join people p1 ON (ae.id_pers = p1.id_pers)
inner join people p2 ON (ae.id_pers = p2.id_pers)
Since id_pers is the primary key of p1, it is impossible for ae.id_pers to be simultaneously equal to p1.id_pers and p2.id_pers. You'll need to find another approach.
You don't need to join on people at all for this, though you'll probably want to in order to populate their details. You need to self-join the people-to-events join table not the people table in order to get the desired results, filtering the self-join to include only rows where the event ID is the same but the people are different. Using > rather than <> means you don't have to use another pass to filter out the (a,b) vs (b,a) pairings.
Something like:
select ae1.id_evn event_id, ae1.id_pers id_pers1, ae2.id_pers id_pers2
from assisted_to ae1
inner join assisted_to ae2
on (ae2.id_evn = ae1.id_evn and ae1.id_pers > ae2.id_pers)
You can now, if desired, add additional joins on the event and persion tables to populate details. You'll need to join people twice with different aliases to populate the two different "sides". See Charles Bretana's example.