Simplifying a query with correlated subquery to simple joins

Simplifying a query with correlated subquery to simple joins - sql

I need help in simplifying the below query.
I was able to check for '0' count without using Group By/having Count clauses in the below query but with correlated subquery.
Now, I've been asked to simplify the below query as simple joins!.
I tried merging the query into one. But the output differs.
Could you please suggest any other idea of simplifying the query, which is checking for '0' count.
select distinct tab1.col1
from tab1
where tab1.col2 = 'A'
And 0 = (select count(tab2.col1)
from tab2
where tab2.col2 = 'B'
and tab2.col1 = tab1.col1)

This sort of thing would normally be written as a NOT EXISTS
SELECT distinct tab1.col1
FROM tab1
WHERE tab1.col2 = 'A'
AND NOT EXISTS(
SELECT 1
FROM tab2
WHERE tab2.col2 = 'B'
AND tab2.col1 = tab1.col1 )
However you could also write
SELECT tab1.col1, count(tab2.col1)
FROM (SELECT * FROM tab1 WHERE col2 = 'A') tab1,
(SELECT * FROM tab2 WHERE col2 = 'B') tab2
WHERE tab1.col1 = tab2.col2(+)
GROUP BY tab1.col1
HAVING count(tab2.col1) = 0

Try some of these.
If col1 is declared as not null, the first two queries have the same execution plan (anti-joins). The second alternative is my personal advice, since it matches your requirements the best.
-- Non-correlated subquery
select distinct col1
from tab1
where col2 = 'A'
and col1 not in(select col1
from tab2
where col2 = 'B');
-- Correlated subquery
select distinct col1
from tab1
where col2 = 'A'
and not exists(select 'x'
from tab2
where tab2.col2 = 'B'
and tab2.col1 = tab1.col1);
-- Using join
select distinct tab1.col1
from tab1
left join tab2 on(tab2.col2 = 'B' and tab2.col1 = tab1.col1)
where tab1.col2 = 'A'
and tab2.col1 is null;
-- Using aggregation
select tab1.col1
from tab1
left join tab2 on(tab2.col2 = 'B' and tab2.col1 = tab1.col1)
where tab1.col2 = 'A'
group
by tab1.col1
having count(tab2.col2) = 0;

Related

Possible way to rewrite multiple NOT EXISTS clauses in a SQL query?`

I have this SQL statement where there are many not exists clauses. Is there a way to rewrite the conditions and avoid table same table scan?
select col1,
col2,....,colN
from tab1
join <some join conditions> tab3
where not exists (select null
from tab2 p
where <some conditions eg: name = 'ABC'>
and tab1.some_col = tab2.some_col)
and not exists (select null
from tab2 p
where <some conditions eg: last_name = 'XYZ'>
and tab1.some_col = tab2.some_col)
and not exists (select null
from tab2 p
where <some conditions eg: country = 'PQR'>
and tab1.some_col = tab2.some_col)
and not exists (select null
from tab2 p
where <similar conditions>
and tab1.some_col = tab2.some_col)
and not exists (select null
from tab2 p
where <similar conditions>
and tab1.some_col = tab2.some_col);
In the above query there are more not exists of similar fashion. since the not exists clause has the same table to be validated against is there a way to club these not exists into a single kind of sub query.

You may OR together the various conditions:
SELECT col1, col2,. .., colN
FROM tab1 t1
INNER JOIN tab3 t3
<join conditions>
WHERE NOT EXISTS (SELECT 1
FROM tab2 p
WHERE
(name = 'ABC' OR
last_name = 'XYZ' OR
country = 'PQR') AND
t1.some_col = p.some_col);

You can use in ( 'ABC','XYZ',PQR'...) like below
select col1,
col2,....,colN
from tab1
join <some join conditions>
where not exists (select null
from tab2 p
where <some conditions eg: name in( 'ABC','XYZ',PQR')>
and tab1.some_col = tab2.some_col
)

SQL code performance

I have a SQL query which takes a lot of time to execute.
It goes like this
select
columns
from
tab1
where
tab1.id in (select col from tab2 where conditions) --32000 rows
or
tab1.id in (select col from tab3 where conditions) ---14000 rows
or
tab1.id in (select col from tab4 where conditions) --6000 rows
Is there any way I can increase the performance here?
I've tried using EXISTS() too but that did not help.

Oracle should be pretty good with optimizing queries that have in with a subquery. Your best bet is adding indexes. However, your query is not detailed enough to suggest particular indexes. You need to be explicit about the where clause.

option1:
select
columns
from
tab1
where
tab1.id in (select col from tab2 where conditions --32000 rows
union all
select col from tab3 where conditions ---14000 rows
union all
select col from tab4 where conditions --6000 rows
);
option2:
select
columns
from
tab1
inner join (select distinct col
from (select col from tab2 where conditions --32000 rows
union all
select col from tab3 where conditions ---14000 rows
union all
select col from tab4 where conditions --6000 rows
)
) x
on tab1.id = x.col;
option3:
select
columns
from
tab1
where
exists (select col from tab2 where conditions --??? rows
where col = tab1.id
union all
select col from tab3 where conditions ---??? rows
where col = tab1.id
union all
select col from tab4 where conditions --??? rows
where col = tab1.id
);

Joining with max date from table

SELECT COL1,
COL2,
COL3
FROM TABLE1,
TABLE2,
TABLE3,
TABLE4
WHERE TABLE1.KEY1 = TABLE2.KEY1
AND TABLE2.KEY = TABLE3.KEY
AND TABLE2.FILTER = 'Y'
AND TABLE3.FILTER = 'Y'
AND TABLE2.KEY = TABLE3.KEY
AND TABLE3.KEY = TABLE4.KEY
I have a similar query and I need to do modification, in a table 3 there is a date column and I need to pick highest day value row for joining. Lets say there are 4 rows from table number 3 which are getting satisfied for join, I need to pick highest date row out of those 4 for joining purpose and then show the result.
Hope question is clear. Database oracle 10g

Try something like this query.
SELECT
COL1,
COL2,
COL3,
T33.*
FROM TABLE1
JOIN TABLE2 ON TABLE1.KEY1 = TABLE2.KEY1
JOIN TABLE4 ON TABLE2.KEY = TABLE4.KEY
JOIN
(
SELECT MAX(T.Day) as DT, T.KEY
FROM TABLE3 T
WHERE T.FILTER = 'Y'
GROUP BY T.KEY
) T3 on TABLE4.KEY = T3.KEY
JOIN TABLE3 T33 ON T3.KEY = T33.KEY AND T3.DT = T33.Day
WHERE
TABLE2.FILTER = 'Y'
The main idea is that instead of
joining to TABLE3 you do this:
SELECT MAX(T.Day) as DT, T.KEY
FROM TABLE3 T
WHERE T.FILTER = 'Y'
GROUP BY T.KEY
give that table/recordset a name and join to it instead.
Then you can join again to the original TABLE3 (see T33)
to pull all the other needed columns from TABLE3 which are
not present in T3.
You can work out the other details, I think.

To minimally modify your current query, you can add a condition in your WHERE clause
AND TABLE3.DATE = (SELECT MAX(DATE) FROM TABLE3 WHERE TABLE3.FILTER = 'Y')
Although in the future I recommend using explicit JOINS.
SELECT COL1,
COL2,
COL3
FROM TABLE1
INNER JOIN TABLE2 ON TABLE1.KEY1 = TABLE2.KEY1
INNER JOIN TABLE3 ON TABLE2.KEY = TABLE3.KEY
INNER JOIN TABLE4 ON TABLE3.KEY = TABLE4.KEY
WHERE
TABLE2.FILTER = 'Y'
AND TABLE3.FILTER = 'Y'
AND TABLE3.DATE = (SELECT MAX(DATE) FROM TABLE3 WHERE TABLE3.FILTER = 'Y')

Get rows from the first tables even if they don't exist in the last table in a join without left joining

I'm working with an Oracle database, and i have a query where i have to perform a join of 7 different tables.
Now, my problem is, i need to get rows that fulfill the requirements of the join (obviously) but even if they don't match the conditions of the last join i need to get the rows from the first 6 tables.
I can't do a left outer join, so what alternatives do i have?
The code looks something like this:
with
tmp as (select col1, col2, col3, col4, row_number() over (partition by col1 order by col2 desc) rn
from
(select /*+ MATERIALIZE */
col1, col2, col3, col4
from
table1
where
col3 in ('A','R','F') and
somedate >= sysdate-720 and
col5 is null
and col1<> '0000000000'))
select /*+ use_hash(a,b,c,d,e,f,g,h) */
b.col5,
a.col1,
d.col6,
e.col7,
c.col8 ,
(CASE when f.col9= 'B' then 'Foo' else 'Bar' END) as "col9",
a.col2,
a.col3,
h.col10
from tmp a
join table2 b on
a.col1= b.col1 and
a.col4=b.col4 and
b.col11='P' and
(b.otherDate>= sysdate OR b.otherDate is null) and
b.col5 is null
join table3 c on
b.col12 = c.col12 and
(c.otherDate is null or b.otherDate >= sysdate) and
c.col5 is null
join table4 d on
a.col1= d.col1 and
d.col13 in ('R','A','F') and
d.col5 is null
join table5 e on
e.col1=b.col1 and
e.col14=d.col14 and
d.col6=e.col6 and
d.col15 = e.col15 and
e.col5 is null
join table6 f on
f.col4= a.col4 and
f.col5 is null
join table7 g on
g.col16= case when f.col15 is null then null else f.col15 end
and g.col5is null
and (g.otherDate is null or g.otherDate >= sysdate)
join table8 h on
h.col17= g.col17
and (h.otherDate >= sysdate or h.otherDate is null)
and h.col5 is null
and a.rn=1;

I'm not going to attempt to work with your actual query, but in principle you could change:
select tab1.col1, tab2.col2, tab3.col3
from tab1
join tab2 on tab2.fk = tab1.pk
join tab3 on tab3.fk = tab2.pk
into:
select tab1.col1, tab2.col2, tab3.col3
from tab1
join tab2 on tab2.fk = tab1.pk
left join tab3 on tab3.fk = tab2.pk
which you could replace (in your out-joins-not-allowed world) with:
with tmp as (
select tab1.col1, tab2.col2, tab3.pk
from tab1
join tab2 on tab2.fk = tab1.pk
)
select tmp.col1, tmp.col2, tab3.col3
from tmp
join tab3 on tab3.fk = tmp.pk
union all
select tmp.col1, tmp.col2, null as col3
from tmp
where not exists (
select null from tab3
where tab3.fk = tmp.pk
)
Which is quite ugly - I've minimised the repetition with a CTE, but even so not nice - and is likely to not perform as well as the outer join would.
Of course, without knowing why you can't use an outer join, I don't know if there are other restrictions that would make this approach unacceptable as well...

Records difference in three tables - MS Access

I have three tables Tab1, Tab2 and Tab3 with almost same structre (in MS Access). But Tab2 and Tab3 have a few more columns than Tab1.
Tab2 and Tab3 are exactly same structure. Following are the joining keys
col1
col2
col3
Basically Tab1 records should tally with Tab2 and Tab3 together.
If I need to get the missing records in Tab2 and Tab3 when compare to Tab1 what could be the efficient way
Appreciate your response

If you only care about the keys, here is a good approach:
select col1, sum(isTab1) as numTab1, sum(isTab2) as numTab2, sum(isTab3) as numTab3
from ((select col1 as col, 1 as isTab1, 0 as isTab2, 0 as isTab3 from tab1
) union all
(select col2, 0 as isTab1, 1 as isTab2, 0 as isTab3 from tab2
) union all
(select col3, 0 as isTab1, 0 as isTab2, 1 as isTab3 from tab3
)
) t
group by col
having sum(isTab1)*sum(isTab2)*sum(isTab3) <> 1
This returns each of the key values and tells you which tables they are in and not in, for keys that are not in all three tables. As a bonus, this will also tell you if any of the tables have duplicate keys.

usually you would SELECT FROM tab1 LEFT JOINing the tab2 and tab3 LEFT JOINed together. That way you will get ALL records from tab1. When there are some missing records in tab2 and tab3 there will be nulls. You can check for nulls in the WHERE clause
So, the query would look similar to this one (please note the brackets - it is a requirement for ms-access):
SELECT * FROM
tab1 LEFT JOIN (tab2 LEFT JOIN tab3 ON tab2.col1 = tab3.col1 AND tab2.col2 = tab3.col2)
ON tab1.col1 = tab2.col1 AND tab1.col2 = tab2.col2
WHERE tab2.col1 Is Null;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Simplifying a query with correlated subquery to simple joins - sql

Related

Possible way to rewrite multiple NOT EXISTS clauses in a SQL query?`

SQL code performance

Joining with max date from table

Get rows from the first tables even if they don't exist in the last table in a join without left joining

Records difference in three tables - MS Access

Categories

Resources