SQL convert from Oracle to ANSI JOIN - sql

How to convert from old Oracle join type to ANSI joins and why?
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx = c.yyy
AND c.id (+) = a.id2
--Should be this 1?
select * from
A
left outer join B on B.ID = A.ID1
left outer join C on C.ID = A.ID2 AND B.xxx = C.yyy
--or this 2?
select * from
A
left outer join C on C.ID = A.ID2
left outer join B on B.ID = A.ID1 AND B.xxx = C.yyy

You can also cheat by using Oracle SQL Developer.
Paste in your ANSI SQL query or Oracle SQL Query, select it, right click, and use the convert feature.
The SQL parser will rewrite the JOINS for you.
So this would be for any query, not just the one you have in your question/scenario.
You should of course check the execution plans and data returned by the queries to make sure they are functionally equivalent.

According to the Oracle documentation:
If the WHERE clause contains a condition that compares a column from table B with a constant, then the (+) operator must be applied to the column so that Oracle returns the rows from table A for which it has generated nulls for this column. Otherwise Oracle returns only the results of a simple join.
So there is an inner join between b and c. And because of the overall conditions, this is going to turn all the joins into INNER JOIN (there needs to be valid values in b and c for that condition to work.
I think the equivalent logic is:
SELECT *
FROM a JOIN
b
ON b.id = a.id1 JOIN
c
ON c.id = a.id2 AND b.xxx = c.yyyy;
That is, the simple equality eliminates turns the outer joins into inner joins.
Of course, you can test this.

Neither of your options:
select *
from A
left outer join B on B.ID = A.ID1
left outer join C on C.ID = A.ID2 AND B.xxx = C.yyy
Would be written as:
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx = c.yyy (+)
AND c.id (+) = a.id2
and:
select *
from A
left outer join C on C.ID = A.ID2
left outer join B on B.ID = A.ID1 AND B.xxx = C.yyy
Would be written as:
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx (+) = c.yyy
AND c.id (+) = a.id2
What you have is:
SELECT *
FROM a
INNER JOIN b ON (a.id1 = b.id)
INNER JOIN c ON (a.id2 = c.id AND b.xxx = c.yyy)
why?
SELECT *
FROM a, b, c
WHERE b.id (+) = a.id1
AND b.xxx = c.yyy
AND c.id (+) = a.id2
The line:
AND b.xxx = c.yyy
Requires that there is a b and a c row; this will not occur when there is a left-outer join so the join is the equivalent of an inner join and the query could be rewritten as:
SELECT *
FROM a, b, c
WHERE b.id = a.id1
AND b.xxx = c.yyy
AND c.id = a.id2
And all then it is clearer that all the joins are inner joins.
What you may have intended to write was:
select *
from A,
(
SELECT b.id AS b_id,
c.id AS c_id,
b.xxx,
c.yyy
FROM b, c
WHERE b.xxx = c.yyy
) bc
WHERE bc.b_id (+) = a.id1
AND bc.c_id (+) = a.id2
Which would be:
select *
from A
left outer join (
SELECT b.id AS b_id,
c.id AS c_id,
b.xxx,
c.yyy
FROM b
INNER JOIN c ON b.xxx = c.yyy
) bc
on bc.b_id = a.id1 AND bc.c_id = a.id2
or, using parentheses in the join to set the precedence of the joins:
SELECT *
FROM a
LEFT OUTER JOIN (
b
INNER JOIN c
ON b.xxx = c.yyy
)
ON b.id = a.id1 AND c.id = a.id2
db<>fiddle here

If you run simple explain plan and check the result, you may see that it generates two inner joins:
explain plan set statement_id = 'test' for
SELECT *
FROM a, b, c
WHERE
b.id (+) = a.id1
AND b.xxx = c.yyy
AND c.id (+) = a.id2
select *
from table(dbms_xplan.display(null, 'test'))
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------- |
| Plan hash value: 1502482080 |
| |
| ---------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 2 | HASH JOIN | | 1 | 52 | 4 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL| A | 1 | 26 | 2 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| B | 1 | 26 | 2 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL | C | 1 | 26 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("B"."XXX"="C"."YYY" AND "C"."ID"="A"."ID2") |
| 2 - access("B"."ID"="A"."ID1") |
| |
| Note |
| ----- |
| - dynamic statistics used: dynamic sampling (level=2) |
explain plan set statement_id = 'test1' for
select *
from a
left join b
on b.id = a.id1
left join c
on c.id = a.id2
and b.xxx = c.yyy
select *
from table(dbms_xplan.display(null, 'test1'))
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------- |
| Plan hash value: 2316364204 |
| |
| ---------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 1 | HASH JOIN OUTER | | 1 | 78 | 6 (0)| 00:00:01 | |
| |* 2 | HASH JOIN OUTER | | 1 | 52 | 4 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL| A | 1 | 26 | 2 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| B | 1 | 26 | 2 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL | C | 1 | 26 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("C"."ID"(+)="A"."ID2" AND "B"."XXX"="C"."YYY"(+)) |
| 2 - access("B"."ID"(+)="A"."ID1") |
| |
| Note |
| ----- |
| - dynamic statistics used: dynamic sampling (level=2) |
db<>fiddle here

Related

How to access TABLE C from TABLE A with meantiming values from TABLE B

We have table A, B and C
A
+----+-----+
| id | b_1 |
+----+-----+
| 1 | 51 |
| 2 | 52 |
| 3 | 53 |
| 4 | 54 |
+----+-----+
B
+----+-----+
| id | c_1 |
+----+-----+
| 51 | 71 |
| 52 | 72 |
| 53 | 73 |
| 54 | 74 |
+----+-----+
C
+----+--------+
| id | locked |
+----+--------+
| 71 | 1 |
| 72 | 0 |
| 73 | 0 |
| 74 | 1 |
+----+--------+
Now I want to do something like this:
SELECT * FROM WHERE (SELECT locked FROM C WHERE id = (SELECT c_1 FROM B WHERE id = b_1)) = 0
So the result of this pseudo code should be all the values of table A with the value locked = 0 in table C. But for this I have to jump over B and get the id pairs.
How can I do this?
You can use INNER JOINs between those tables :
select a.*
from tableA a
join tableB b on b.id = a.b_1
join tableC c on c.id = b.c_1
where c.locked = 0;
id b_1
2 52
3 53
which returns only the column values of TableA.
Demo
Here's what you need.
SELECT *
FROM A
INNER JOIN B ON (B.id = A.b_1)
INNER JOIN C ON (C.id = B.c_1)
WHERE
c.locked=0
You should use a JOIN
SELECT *
FROM A
JOIN B ON (B.A_id = A.A_id)
JOIN C ON (C.A_id = A.A_id)
I used A, B, C, to refer to the tables and table_id to refer to the FK you need to select.
This is just an example you need to adapt that to your case.
Using JOIN should do the thing :
select *
from A
join B on b.id = a.b_1
join C on c.id = b.c_1
where c.locked = '0'
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=6ef50499b7bfed71f7ec9626ad196cba
Use
select A.*
if you just want table A elements.

Oracle SQL: Exclude IDs from another table without subquery join

I would like to know if the following is possible without joining the same table twice:
Table A:
+----+------+
| ID | ColA |
+----+------+
| 1 | A1 |
| 2 | A2 |
| 3 | A3 |
| 4 | A4 |
+----+------+
Table B:
+----+------+
| ID | ColB |
+----+------+
| 1 | B1 |
| 2 | B2 |
| 3 | B3 |
| 4 | B4 |
| 5 | B5 |
| 6 | B6 |
+----+------+
Table C:
+----+
| ID |
+----+
| 1 |
| 2 |
+----+
Desired result: (A LEFT JOIN B WITHOUT C)
+----+------+------+
| ID | ColA | ColB |
+----+------+------+
| 3 | A3 | B3 |
| 4 | A4 | B4 |
+----+------+------+
So basically I need to add Column B to Table A, hence left join, and exclude all IDs which occur in Table C.
Current solution:
SELECT a.id, a.ColA, b.ColB
FROM tableA a
LEFT JOIN tableB b ON a.id = b.id
WHERE a.id NOT IN(
SELECT a2.id FROM tableA a2
LEFT JOIN tableC c on a2.id = c.id)
What's irritating me is, that the exclusion of table C requires an additional left join of table A with table C. Isn't there a more straight-forward approach, without having to join table A again as part of the subquery, if all I want to do is to exclude IDs which occur in table C from the resultset?.
Thanks
Use a not exists:
SELECT a.id, a.ColA, b.ColB
FROM tableA a
LEFT JOIN tableB b ON a.id = b.id
where not exists(select 1 from tablec c where a.id = c.id)
The issue with using a not in with a select in Oracle is that:
a) it has to return the whole subquery dataset
b) if there are nulls, it breaks
TOM link regarding these 2 issues
won't this work?
SELECT a.id, a.ColA, b.ColB
FROM tableA a
JOIN tableB b ON a.id = b.id
WHERE a.id NOT IN (SELECT c.Id FROM tableC c)
this can also be done in a join
SELECT a.id, a.ColA, b.ColB
FROM tableA a
JOIN tableB b ON a.id = b.id
LEFT JOIN tableC C ON a.id = c.id
WHERE c.Id is null

SQL nested loops are too costly

I have an sql query like this:
select w.name, c.address, b.salary, a.product, d.contract_amount
from w
left join c c.id = w.id
left join b b.id = w.id
left join a a.id = w.id and a.date > sysdate-30
left join d d.id = w.id
where w.id = '12345';
And it's plan:
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 849 |18896868| 00:01:14 |
| 1 | NESTED LOOPS OUTER | | 1 | 849 |18896868| 00:01:14 |
| 2 | NESTED LOOPS OUTER | | 1 | 849 |18896868| 00:01:14 |
| 3 | NESTED LOOPS OUTER | | 1 | 670 |18896868| 00:01:14 |
| 4 | NESTED LOOPS OUTER | | 1 | 596 |18896868| 00:01:14 |
| 5 | TABLE ACCESS STORAGE FULL | w | 1 | 415 | 20 | 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | c | 1 | 22 | 3 | 00:00:01 |
| 7 | INDEX UNIQUE SCAN |c_id_nd| 1 | | | 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | b | 1 | 66 | 2 | 00:00:01 |
| 9 | INDEX UNIQUE SCAN |b_id_nd| 1 | | | 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | a | 1 | 11 | 3 | 00:00:01 |
| 11 | INDEX UNIQUE |a_id_nd| 1 | | | 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | d | 1 | 25 | 1 | 00:00:01 |
| 13 | INDEX UNIQUE |d_id_nd| 1 | | | 00:00:01 |
-----------------------------------------------------------------------------------
Now its work about for 15-18 seconds and it's too long. I am new at tuning and I don't know how to improve its performance. Actually, all tables have about 33-54 millions of rows and all id columns have indexes. Also statistics was gathered for tables and i'm not able to use parallel hint.
What optimizations can I do?
For this query:
select w.name, c.address, b.salary, a.product, d.contract_amount
from w left join
c
on c.id = w.id left join
b
on b.id = w.id left join
a
on a.id = w.id and a.date > sysdate-30 left join
d
on d.id = w.id
where w.id = '12345';
You want indexes on w(id), c(id),b(id), a(id, date), and d(id).
I guess there is nothing wrong with your query, I'd think the bad execution plan was generated initially and it is still sitting in cache.You can overwrite query in a different way and probably you'll get a better plan (eg with CTE). You can also try to filter id before joining. Try smth like this
with
W as (select id, name from w where w.id = '12345')
,C as (select id, address from C where c.id = '12345')
,B as (select id, salary from B where b.id = '12345')
,A as (select id, product from A where a.id = '12345' and a.date > sysdate - 30)
,D as (select id, contract_amount from D where d.id = '12345')
select w.name, c.address, b.salary, a.product, d.contract_amount
from w
left join c on c.id = w.id
left join b on b.id = w.id
left join a on a.id = w.id
left join d on d.id = w.id
Or this:
with
W1 as (select w.id, w.name from w where w.id = '12345')
,W2 as (select w1.* , c.address from W1 left outer join C on w1.id = c.id)
,W3 as (select w2.*, b.salary from W2 left outer join B on w2.id = b.id)
,W4 as (select w3.*, a.product from W3 left outer join A on w3.id = a.id and a.date > sysdate - 30)
Select w4.*, d.contract_amount from W4 left outer join D on w4.id = d.id
With 35 million records in tables. Are the tables partitioned.? If so are the query ensuring partition pruning
I think the problem lies on cardinality estimator. Due to multiple left joins from - probably - master to detail 'type' tables there is wrong assumption of rows return. A poor cardinality estimation may lead to poor plan selection. I suggest a try with isolated selects as proposed by Mike and compare timings. I am not sure how smart CTEs perform in Oracle so i recommend surely isolated statements even if you have to use temp or memory tables. Select each table alone using your id value and put results into a temporary table. Then perform the final select on those temporary tables.

Postgres group by columns and within group select other columns by max aggregate

This is probably a standard problem, and I've keyed off some other greatest-n-per-group answers, but so far been unable to resolve my current problem.
A B C
+----+-------+ +----+------+ +----+------+-------+
| id | start | | id | a_id | | id | b_id | name |
+----+-------+ +----+------+ +----+------+-------+
| 1 | 1 | | 1 | 1 | | 1 | 1 | aname |
| 2 | 2 | | 2 | 1 | | 2 | 2 | aname |
+----+-------+ | 3 | 2 | | 3 | 3 | aname |
+----+------+ | 4 | 3 | bname |
+----+------+-------+
In English what I'd like to accomplish is:
For each c.name, select its newest entry based on the start time in a.start
The SQL I've tried is the following:
SELECT a.id, a.start, c.id, c.name
FROM a
INNER JOIN (
SELECT id, MAX(start) as start
FROM a
GROUP BY id
) a2 ON a.id = a2.id AND a.start = a2.start
JOIN b
ON a.id = b.a_id
JOIN c
on b.id = c.b_id
GROUP BY c.name;
It fails with errors such as:
ERROR: column "a.id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8
To be useful I really need the ids from the query, but cannot group on them since they are unique. Here is an example of output I'd love for the first case above:
+------+---------+------+--------+
| a.id | a.start | c.id | c.name |
+------+---------+------+--------+
| 2 | 2 | 3 | aname |
| 2 | 2 | 4 | bname |
+------+---------+------+--------+
Here is a Sqlfiddle
Edit - removed second case
Case 1
select distinct on (c.name)
a.id, a.start, c.id, c.name
from
a
inner join
b on a.id = b.a_id
inner join
c on b.id = c.b_id
order by c.name, a.start desc
;
id | start | id | name
----+-------+----+-------
2 | 2 | 3 | aname
2 | 2 | 4 | bname
Case 2
select distinct on (c.name)
a.id, a.start, c.id, c.name
from
a
inner join
b on a.id = b.a_id
inner join
c on b.id = c.b_id
where
b.a_id in (
select a_id
from b
group by a_id
having count(*) > 1
)
order by c.name, a.start desc
;
id | start | id | name
----+-------+----+-------
1 | 1 | 1 | aname

Left Outer join with same table as part of the outer join

I was wondering how I could go by writing a the outer join query to get the required outputs (described below)
where the tables I am outer joining are part of the other join conditions in the statement
given the following datastructure where
- Table A is the main table containing some arbitrary objects
- Table B is referenced by A where A.TYPE_ID = B.ID
- Table C defininfs relations between the objects in Table A where C.SOURCE_ID references A.ID and C.TARGET_ID references A.ID
This is how the schema is defined and I can't do anything about it (it's a legacy system)
TABLE_A
---------------------------
| ID | TYPE_ID | Name |
|-------------------------|
| 1 | 1 | Name 1 |
| 2 | 2 | Name 2 |
| 3 | 1 | Name 3 |
| 4 | 1 | Name 4 |
| 5 | 3 | Name 5 |
|-------------------------|
TABLE_B
----------------------
| ID | TYPE_NAME |
|--------------------|
| 1 | Type 1 |
| 2 | Type 2 |
| 3 | Type 3 |
| 4 | Type 4 |
|--------------------|
TABLE_C
-------------------------------
| PK | SOURCE_ID | TARGET_ID |
|-----------------------------|
| 11 | 2 | 1 |
| 12 | 2 | 3 |
| 13 | 5 | 1 |
| 13 | 5 | 4 |
-------------------------------
What I would like to get is all the objects in Table A of "Type 1" with the name of the object they are associated to (null otherwise) which are of Type 2,
i.e an outer join to get all the objects of Type 1 regardless if they have an association, but if they do then I need the name of the object.
Note that objects of Type 1 will always been in the TARGET in the relstionship.
The output for the above example would be
-------------------------------
| Target Name | Source Name |
|-----------------------------|
| Name 1 | Name 2 |
| Name 3 | Name 2 |
| Name 4 | (NULL) |
|-----------------------------|
My original join query (couldn't get the outer join to work) this is the normal join not showing objects with no associations.
select atrgt.NAME, asrc.NAME
from TABLE_A atrgt
JOIN TABLE_B trgttype on atrgt.TYPE_ID = trgttype.ID
and trgttype.TYPE_NAME = 'Type 1'
JOIN TABLE_C assoc on atrgt.ID = assoc.TARGET_ID
JOIN TABLE_A asrc on asrc.ID = assoc.SOURCE_ID
JOIN TABLE_B srctype on asrc.TYPE_ID = srctype.ID
and srctype.TYPE_NAME = 'Type 2'
Basically in these situations I think the best approach is to subdivide the query into two normal joins, then do the outer join between those results sets. If you think of SQL as procedural code, you may think it looks inefficient, but the query optimizer will not necessarily run the two subjoins independently.
You didn't say what RDBMS you are using. In Oracle I would probably write it like this:
with
src_type_2 as (
select c.target_id, a.name
from table_c c
join table_a on a.id = c.source_id
join table_b on b.id = a.type_id
where b.type_name = 'Type 2'
),
all_type_1 as (
select a.id, a.name
from table_a a
join table_b on b.id = a.type_id
where b.type_name = 'Type 1'
)
select tgt.name, src.name
from all_type_1 tgt
left join src_type_2 src on src.target_id = tgt.id
Try
select atrgt.NAME, baseview.NAME
from TABLE_A atrgt
JOIN TABLE_B trgttype on atrgt.TYPE_ID = trgttype.ID
and trgttype.TYPE_NAME = 'Type 1'
JOIN TABLE_C assoc on atrgt.ID = assoc.TARGET_ID
LEFT JOIN (
TABLE_A asrc on asrc.ID = assoc.SOURCE_ID
JOIN TABLE_B srctype on asrc.TYPE_ID = srctype.ID
and srctype.TYPE_NAME = 'Type 2'
) as baseview
I think this should work:
SELECT
TGT.NAME, SRC_TYPE.TYPE_NAME
FROM TABLE_A TGT
JOIN TABLE_B TGT_TYPE ON TGT.TYPE_ID = TGT_TYPE.ID
LEFT JOIN TABLE_C REL ON TGT.ID = REL.TARGET_ID
LEFT JOIN TABLE_A SRC ON REL.SOURCE_ID = SRC.ID
LEFT JOIN TABLE_B SRC_TYPE ON SRC_TYPE.ID = SRC.TYPE_ID
WHERE TGT_TYPE.TYPE_NAME = 'Type 1' AND COALESCE(SRC_TYPE.TYPE_NAME, 'Type 2') = 'Type 2'
If you're using Oracle, you could replace the COALESCE with NVL(SRC_TYPE.TYPE_NAME, 'Type 2').