SQL nested loops are too costly

SQL nested loops are too costly - sql

I have an sql query like this:
select w.name, c.address, b.salary, a.product, d.contract_amount
from w
left join c c.id = w.id
left join b b.id = w.id
left join a a.id = w.id and a.date > sysdate-30
left join d d.id = w.id
where w.id = '12345';
And it's plan:
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 849 |18896868| 00:01:14 |
| 1 | NESTED LOOPS OUTER | | 1 | 849 |18896868| 00:01:14 |
| 2 | NESTED LOOPS OUTER | | 1 | 849 |18896868| 00:01:14 |
| 3 | NESTED LOOPS OUTER | | 1 | 670 |18896868| 00:01:14 |
| 4 | NESTED LOOPS OUTER | | 1 | 596 |18896868| 00:01:14 |
| 5 | TABLE ACCESS STORAGE FULL | w | 1 | 415 | 20 | 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | c | 1 | 22 | 3 | 00:00:01 |
| 7 | INDEX UNIQUE SCAN |c_id_nd| 1 | | | 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | b | 1 | 66 | 2 | 00:00:01 |
| 9 | INDEX UNIQUE SCAN |b_id_nd| 1 | | | 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | a | 1 | 11 | 3 | 00:00:01 |
| 11 | INDEX UNIQUE |a_id_nd| 1 | | | 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | d | 1 | 25 | 1 | 00:00:01 |
| 13 | INDEX UNIQUE |d_id_nd| 1 | | | 00:00:01 |
-----------------------------------------------------------------------------------
Now its work about for 15-18 seconds and it's too long. I am new at tuning and I don't know how to improve its performance. Actually, all tables have about 33-54 millions of rows and all id columns have indexes. Also statistics was gathered for tables and i'm not able to use parallel hint.
What optimizations can I do?

For this query:
select w.name, c.address, b.salary, a.product, d.contract_amount
from w left join
c
on c.id = w.id left join
b
on b.id = w.id left join
a
on a.id = w.id and a.date > sysdate-30 left join
d
on d.id = w.id
where w.id = '12345';
You want indexes on w(id), c(id),b(id), a(id, date), and d(id).

I guess there is nothing wrong with your query, I'd think the bad execution plan was generated initially and it is still sitting in cache.You can overwrite query in a different way and probably you'll get a better plan (eg with CTE). You can also try to filter id before joining. Try smth like this
with
W as (select id, name from w where w.id = '12345')
,C as (select id, address from C where c.id = '12345')
,B as (select id, salary from B where b.id = '12345')
,A as (select id, product from A where a.id = '12345' and a.date > sysdate - 30)
,D as (select id, contract_amount from D where d.id = '12345')
select w.name, c.address, b.salary, a.product, d.contract_amount
from w
left join c on c.id = w.id
left join b on b.id = w.id
left join a on a.id = w.id
left join d on d.id = w.id
Or this:
with
W1 as (select w.id, w.name from w where w.id = '12345')
,W2 as (select w1.* , c.address from W1 left outer join C on w1.id = c.id)
,W3 as (select w2.*, b.salary from W2 left outer join B on w2.id = b.id)
,W4 as (select w3.*, a.product from W3 left outer join A on w3.id = a.id and a.date > sysdate - 30)
Select w4.*, d.contract_amount from W4 left outer join D on w4.id = d.id

With 35 million records in tables. Are the tables partitioned.? If so are the query ensuring partition pruning

I think the problem lies on cardinality estimator. Due to multiple left joins from - probably - master to detail 'type' tables there is wrong assumption of rows return. A poor cardinality estimation may lead to poor plan selection. I suggest a try with isolated selects as proposed by Mike and compare timings. I am not sure how smart CTEs perform in Oracle so i recommend surely isolated statements even if you have to use temp or memory tables. Select each table alone using your id value and put results into a temporary table. Then perform the final select on those temporary tables.

Related

SQL convert from Oracle to ANSI JOIN

How to convert from old Oracle join type to ANSI joins and why?
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx = c.yyy
AND c.id (+) = a.id2
--Should be this 1?
select * from
A
left outer join B on B.ID = A.ID1
left outer join C on C.ID = A.ID2 AND B.xxx = C.yyy
--or this 2?
select * from
A
left outer join C on C.ID = A.ID2
left outer join B on B.ID = A.ID1 AND B.xxx = C.yyy

You can also cheat by using Oracle SQL Developer.
Paste in your ANSI SQL query or Oracle SQL Query, select it, right click, and use the convert feature.
The SQL parser will rewrite the JOINS for you.
So this would be for any query, not just the one you have in your question/scenario.
You should of course check the execution plans and data returned by the queries to make sure they are functionally equivalent.

According to the Oracle documentation:
If the WHERE clause contains a condition that compares a column from table B with a constant, then the (+) operator must be applied to the column so that Oracle returns the rows from table A for which it has generated nulls for this column. Otherwise Oracle returns only the results of a simple join.
So there is an inner join between b and c. And because of the overall conditions, this is going to turn all the joins into INNER JOIN (there needs to be valid values in b and c for that condition to work.
I think the equivalent logic is:
SELECT *
FROM a JOIN
b
ON b.id = a.id1 JOIN
c
ON c.id = a.id2 AND b.xxx = c.yyyy;
That is, the simple equality eliminates turns the outer joins into inner joins.
Of course, you can test this.

Neither of your options:
select *
from A
left outer join B on B.ID = A.ID1
left outer join C on C.ID = A.ID2 AND B.xxx = C.yyy
Would be written as:
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx = c.yyy (+)
AND c.id (+) = a.id2
and:
select *
from A
left outer join C on C.ID = A.ID2
left outer join B on B.ID = A.ID1 AND B.xxx = C.yyy
Would be written as:
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx (+) = c.yyy
AND c.id (+) = a.id2
What you have is:
SELECT *
FROM a
INNER JOIN b ON (a.id1 = b.id)
INNER JOIN c ON (a.id2 = c.id AND b.xxx = c.yyy)
why?
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx = c.yyy
AND c.id (+) = a.id2
The line:
AND b.xxx = c.yyy
Requires that there is a b and a c row; this will not occur when there is a left-outer join so the join is the equivalent of an inner join and the query could be rewritten as:
SELECT *
FROM a, b, c
WHERE b.id = a.id1
AND b.xxx = c.yyy
AND c.id = a.id2
And all then it is clearer that all the joins are inner joins.
What you may have intended to write was:
select *
from A,
(
SELECT b.id AS b_id,
c.id AS c_id,
b.xxx,
c.yyy
FROM b, c
WHERE b.xxx = c.yyy
) bc
WHERE bc.b_id (+) = a.id1
AND bc.c_id (+) = a.id2
Which would be:
select *
from A
left outer join (
SELECT b.id AS b_id,
c.id AS c_id,
b.xxx,
c.yyy
FROM b
INNER JOIN c ON b.xxx = c.yyy
) bc
on bc.b_id = a.id1 AND bc.c_id = a.id2
or, using parentheses in the join to set the precedence of the joins:
SELECT *
FROM a
LEFT OUTER JOIN (
b
INNER JOIN c
ON b.xxx = c.yyy
)
ON b.id = a.id1 AND c.id = a.id2
db<>fiddle here

If you run simple explain plan and check the result, you may see that it generates two inner joins:
explain plan set statement_id = 'test' for
SELECT *
FROM a, b, c
WHERE
b.id (+) = a.id1
AND b.xxx = c.yyy
AND c.id (+) = a.id2
select *
from table(dbms_xplan.display(null, 'test'))
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------- |
| Plan hash value: 1502482080 |
| |
| ---------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 2 | HASH JOIN | | 1 | 52 | 4 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL| A | 1 | 26 | 2 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| B | 1 | 26 | 2 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL | C | 1 | 26 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("B"."XXX"="C"."YYY" AND "C"."ID"="A"."ID2") |
| 2 - access("B"."ID"="A"."ID1") |
| |
| Note |
| ----- |
| - dynamic statistics used: dynamic sampling (level=2) |
explain plan set statement_id = 'test1' for
select *
from a
left join b
on b.id = a.id1
left join c
on c.id = a.id2
and b.xxx = c.yyy
select *
from table(dbms_xplan.display(null, 'test1'))
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------- |
| Plan hash value: 2316364204 |
| |
| ---------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 1 | HASH JOIN OUTER | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 2 | HASH JOIN OUTER | | 1 | 52 | 4 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL| A | 1 | 26 | 2 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| B | 1 | 26 | 2 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL | C | 1 | 26 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("C"."ID"(+)="A"."ID2" AND "B"."XXX"="C"."YYY"(+)) |
| 2 - access("B"."ID"(+)="A"."ID1") |
| |
| Note |
| ----- |
| - dynamic statistics used: dynamic sampling (level=2) |
db<>fiddle here

SQL: CROSS JOIN over table partitions

I have the following table
session_id | page_viewed
1 | A
1 | B
1 | C
2 | B
2 | E
What I would like to do is a cross join of the page_viewed column with itself but where the cross join is done on the partitions from session_id. So, from the table above the query would return:
session_id | page_1 | page_2
1 | A | A
1 | A | B
1 | A | C
1 | B | A
1 | B | B
1 | B | C
1 | C | A
1 | C | B
1 | C | C
2 | B | B
2 | B | E
2 | E | B
2 | E | E
I have looked into window functions today trying to find a way around it but it seems join functions cannot be used. Can anyone help?

You may join giving only the session_id as the join criteria:
SELECT
t1.session_id,
t1.page_viewed AS page_1,
t2.page_viewed AS page_2
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.session_id = t2.session_id;
-- ORDER BY clause optional, if you need it here
Demo

Hmmm . . . you seem to want a self-join:
select t1.session_id, t1.page_viewed as page_1, t2.page_viewed as page_2
from t t1 join
t t2
on t1.session_id = t2.session_id
order by t1.session_id, t1.page_viewed, t2.page_viewed;

Is it possible to filter in a left join?

I have three tables, Clients, Bills and BillsStates. I would like to get always the client and if it has bills, only the bills that can be modified. I am trying something like that:
select * from Clients
left join Bills on Bills.IDClient = Clients.IDClient
left join BillsStates on BillsStates.IDBillState = Bills.IDState
and BillsStates.AllowModify = 1
The problem with that is that I get all the bills of the client, no matter if they can be modified or not.
I have tried to with a right join, but in this case I have not get any result.
Is it possible with joins or perhaps I need some subquery? I would prefer a solution with joins, but if there is no way to do it in this way, I would accept another solution.

select * from Clients
left join Bills
inner join BillsStates on BillsStates.IDBillState = Bills.IDState
on Bills.IDClient = Clients.IDClient
and BillsStates.AllowModify = 1
The problem you have is that you only cause the BillsStates record to be excluded, because your filter is only in its join condition. Instead, you can re-order and move it into Bills's join condition.

Your query simply replaces BillsStates.* with null values where BillsStates.AllowModify = 1 condition fails:
| IDClient | Name | IDClient | IDState | Name | IDBillState | AllowModify |
|----------|------|----------|---------|--------|-------------|-------------|
| 1 | John | 1 | 1 | Bill 1 | NULL | NULL |
| 1 | John | 1 | 2 | Bill 2 | 2 | 1 |
| 2 | Jane | NULL | NULL | NULL | NULL | NULL |
Rearrange the join type and condition to get the desired result:
SELECT *
FROM Clients
LEFT JOIN (Bills
INNER JOIN BillsStates ON BillsStates.IDBillState = Bills.IDState) ON Bills.IDClient = Clients.IDClient AND BillsStates.AllowModify = 1;
| IDClient | Name | IDClient | IDState | Name | IDBillState | AllowModify |
|----------|------|----------|---------|--------|-------------|-------------|
| 1 | John | 1 | 2 | Bill 2 | 2 | 1 |
| 2 | Jane | NULL | NULL | NULL | NULL | NULL |

you can try this.
select * from Clients
left join
( select * from Bills
inner join BillsStates on BillsStates.IDBillState = Bills.IDState
and BillsStates.AllowModify = 1
) B ON B.IDClient = Clients.IDClient

use inner join between bills and BillsStates instead left join and do left join with client
select c.* from Clients c
left join Bills b on b.IDClient = c.IDClient
inner join BillsStates bs
on bs.IDBillState = b.IDState and b.AllowModify = 1

Rather than joining on a subquery, you can also re-arrange the order of joins
SELECT c.*, b.*, bs.* -- Todo: only the relevant columns
FROM Bills b
JOIN BillsStates bs ON BillsStates.IDBillState = Bills.IDState
RIGHT JOIN Clients c ON b.IDClient = c.IDClient

Three table join where middle table has duplicate foreign keys

I am working with a database with a structure similar to the illustration below (except with more columns). Basically, each person has a unique person_id and alt_id. However, the only thing connecting table A to table C is table B, and table B has one to many rows for each person/alt_id.
I need to get rows with a person_id, their alt id and their associated shapes.
I could do this:
SELECT DISTINCT a.person_id, a.color, b.alt_id, c.shape
FROM a
JOIN b ON a.person_id = b.person_id
JOIN c ON b.alt_id = c.alt_id
However, that seems inefficient as it will take a Cartesian product of rows from B and C with the same alt_id before finally using DISTINCT to narrow the results down. What's the best/most efficient way to do this query?
Table A
+-----------+-------+
| person_id | color |
+-----------+-------+
| 10 | red |
| 11 | blue |
| 12 | green |
+-----------+-------+
Table B
+-----------+--------+
| person_id | alt_id |
+-----------+--------+
| 10 | 225 |
| 10 | 225 |
| 11 | 226 |
| 11 | 226 |
| 11 | 226 |
| 12 | 227 |
+-----------+--------+
Table C
+--------+----------+
| alt_id | shape |
+--------+----------+
| 225 | square |
| 226 | circle |
| 226 | rhombus |
| 226 | ellipse |
| 227 | triangle |
+--------+----------+

Join to (select distinct * from b) b rather than just the base table b.
SELECT
a.person_id, a.color, b.alt_id, c.shape
FROM
a
INNER JOIN (select distinct * from b) b
ON a.person_id = b.person_id
INNER JOIN c
ON b.alt_id = c.alt_id

You can get a distinct list of values from b before you do your joins.
SELECT DISTINCT a.person_id, a.color, b.alt_id, c.shape
FROM a
JOIN (Select Distinct person_id, alt_id from b) b ON a.person_id = b.person_id
JOIN c ON b.alt_id = c.alt_id
Note that because of indexes, and statistics, getting a DISTINCT list is not always a good idea. Look at the actual execution plan to evaluate how good this is, especially if you have a lot of data.

You could use aggregation along with a common table expression (or subquery, but a CTE might be neater):
WITH ab AS (
SELECT a.person_id, a.color, MAX(b.alt_id) AS alt_id
FROM a INNER JOIN b
ON a.person_id = b.person_id
GROUP BY a.person_id, a.color
)
SELECT ab.person_id, ab.color, ab.alt_id, c.shape
FROM ab INNER JOIN c ON ab.alt_id = c.alt_id;

SQL join 3 tables (based on 2 criterias?)

I have 3 tables setup like this (a bit simplified):
time_tracking: id, tr_proj_id, tr_min, tr_type
time_projects: id, project_name
time_tasks: id, task_name
Basically, I want to retrieve either project_name or task_name based on tr_type which can be of value "project" or "task"
An example
time_tracking
+----+------------+--------+---------+
| id | tr_proj_id | tr_min | tr_type |
+----+------------+--------+---------+
| 1 | 3 | 60 | project |
| 2 | 3 | 360 | task |
| 3 | 1 | 120 | project |
| 4 | 2 | 30 | project |
| 5 | 2 | 30 | task |
| 6 | 1 | 90 | task |
+----+------------+--------+---------+
time_projects
+----+------------------------+
| id | project_name |
+----+------------------------+
| 1 | Make someone happy |
| 2 | Start a project |
| 3 | Jump out of the window |
+----+------------------------+
time_tasks
+----+---------------------+
| id | task_name |
+----+---------------------+
| 1 | drink a beer |
| 2 | drink a second beer |
| 3 | drink more |
+----+---------------------+
Desired output
+----+------------------------+------------+--------+---------+
| id | name | tr_proj_id | tr_min | tr_type |
+----+------------------------+------------+--------+---------+
| 1 | Jump out of the window | 3 | 60 | project |
| 2 | drink more | 3 | 360 | task |
| 3 | Make someone happy | 1 | 120 | project |
| 4 | Start a project | 2 | 30 | project |
| 5 | drink a second beer | 2 | 30 | task |
| 6 | drink a beer | 1 | 90 | task |
+----+------------------------+------------+--------+---------+
And being really bad at the whole JOIN thing, here's the only thing I've come up with so far (which doesn't work..):
SELECT tt.tr_proj_id, tt.tr_type, tt.tr_min, pp.project_name, pp.id, ta.task_name, ta.id
FROM time_tracking as tt, time_projects as pp, time_tasks as ta
WHERE ((tt.tr_type = 'project' AND pp.id = tt.tr_proj_id) OR (tt.tr_type = 'task' AND ta.id = tt.tr_proj_id))
AND tt.tr_min > 0
ORDER BY tt.tr_proj_id DESC
If anyone has an idea on how to do this, feel free to share!
Update: Looks like I forgot to specify that I'm using an access database. Which apparently doesn't accept things like CASE or coalesce.. Apparently there is IIF() but I'm not quite sure on how to use it in this case.

Use join clauses and move your join conditions from the where clause into the on clauses:
SELECT
tt.tr_proj_id,
tt.tr_type,
tt.tr_min,
pp.project_name,
pp.id,
ta.task_name,
ta.id
FROM time_tracking as tt
left join time_projects as pp on tt.tr_type = 'project' AND pp.id = tt.tr_proj_id
left join time_tasks as ta on tt.tr_type = 'task' AND ta.id = tt.tr_proj_id
WHERE tt.tr_min > 0
ORDER BY tt.tr_proj_id DESC,tt.tr_day ASC
I've used left join, which gives you a row from the main table even if one doesn't exist for the join (you get nulls from columns in the joined table if there's no join)
A key point here, that many SQL programmers do not realise, is that the ON clause may contain any conditions, even ones not from the joined table (as in this example). Many programmers assume that the conditions must be only those relating to the formal foreign key relationship.

Try this:
SELECT
tt.id,
CASE WHEN tt.tr_type = 'project' THEN pp.project_name
WHEN tt.tr_type = 'task' THEN ta.task_name END as name,
tt.tr_proj_id,
tt.tr_type,
tt.tr_min,
FROM time_tracking as tt
left join time_projects as pp on pp.id = tt.tr_proj_id
left join time_tasks as ta on ta.id = tt.tr_proj_id
WHERE tt.tr_min > 0
ORDER BY tt.tr_proj_id DESC

perform a union on two joins:
select tt.id, tp.project_name name, tt.tr_proj_id, tt.tr_min, tt.tr_type
from time_tracking tt
inner join time_projects tp on tp.id = tt.tr_proj_id
where tt.tr_type = 'project'
union all
select tt.id, tp.project_name name, tt.tr_proj_id, tt.tr_min, tt.tr_type
from time_tracking tt
inner join time_tasks tk on tk.id = tt.tr_proj_id
where tt.tr_type = 'task'
That will give you the exact table results you want

SELECT
time_tracking.id,
time_tracking.tr_min,
time_tracking.tr_type,
coalesce(time_projects.project_name, time_tasks.task_name) as name
FROM time_tracking
LEFT OUTER JOIN time_projects on time_projects.id = time_tracking.tr_proj_id AND time_tracking.tr_type = 'project'
LEFT OUTER JOIN time_tasks on time_tasks.id = time_tracking.tr_proj_id AND time_tracking.tr_type = 'task'
WHERE time_tracking.tr_min > 0
ORDER BY time_tracking.id DESC -- ...
coalesce is MSSQL, there's equivalent ISNULL and such in other database technologies
The idea is you join to the tables and if the join fails, you'll get NULL where the join failed. Then you use COALESCE to pick out the successful join value.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL nested loops are too costly - sql

With 35 million records in tables. Are the tables partitioned.? If so are the query ensuring partition pruning

Related

SQL convert from Oracle to ANSI JOIN

SQL: CROSS JOIN over table partitions

Is it possible to filter in a left join?

Three table join where middle table has duplicate foreign keys

SQL join 3 tables (based on 2 criterias?)

Categories

Resources