Microsoft SQL Server performance- or in on clause [duplicate] - sql

This question already has answers here:
UNION ALL vs OR condition in sql server query
(3 answers)
Closed 7 years ago.
I need to join 2 tables using a field 'cdi' from the 2nd table with cdi or cd_cliente from the 1st table. I mean that it might match the same field or cd_cliente from the 1st table.
My original query was
select
a.cd_cliente, a.cdi as cdi_cli,b.*
from
clientes a
left join
rightTable b on a.cdi = b.cdi or a.cd_cliente = b.cdi
But since it took too much time, I changed it to:
Select a.cd_cliente, a.cdi, b.*
from clientes a
left join
(select
a.cd_cliente, a.cdi as cdi_cli, b.*
from
clientes a
inner join
rightTable b on a.cdi = b.cdi
union
select
a.cd_cliente, a.cdi as cdi_cli, b.*
from
clientes a
inner join
rightTable b on a.cd_cliente = b.cdi) b
on a.cd_cliente=b.cd_cliente
And it took less time. I'm not sure if the results would be the same. And if so, why the time taken by the 2nd query is considerably less?

I'm not sure if the results would be the same. Most likely not.
Consider a row in clientes that matched a row in rightTable on cdi but did not match any row on cd_cliente. The first query will return one row for the match. The second query will return two rows. Once for the match, and once for the not match, but with nulls filled in the rightTable columns because of the left outer join.
Also, if the first query returns any legitimate duplicates those will be removed by the union operator in the second query.

SQL Server isn't good with OR and indexes. Not sure why. Your second query is getting around that by (most likely) seeking via indexes twice and then merging them somehow.
There are simpler queries you could try, such as this one:
SELECT
a.cd_cliente,
cdi_cli = a.cdi,
b.*
FROM
dbo.clientes a
OUTER APPLY (
SELECT *
FROM dbo.rightTable b
WHERE a.cdi = b.cdi
UNION
SELECT *
FROM dbo.rightTable b
WHERE a.cd_cliente = b.cdi
) b
;
And here's a weird one that could actually work, though I'm not sure:
SELECT
a.cd_cliente,
cdi_cli = a.cdi,
b.*
FROM
dbo.clientes a
OUTER APPLY (
SELECT *
FROM dbo.rightTable b
WHERE EXISTS (
SELECT 1 WHERE a.cdi = c.cdi
UNION
SELECT 1 WHERE a.cd_cliente = b. cd_cliente
)
) b
;
Told you it was weird! And here's an even weirder (and probably inadvisable) one.
SELECT
a.cd_cliente,
cdi_cli = a.cdi,
BColumn1 = Max(BColumn1),
BColumn2 = Max(BColumn2),
BColumn3 = Max(BColumn3),
BColumn4 = Max(BColumn4)
-- all columns of B
FROM
dbo.clientes a
CROSS APPLY (VALUES
(a.cdi),
(a.cd_cliente)
) c (cdi)
LEFT JOIN dbo.rightTable b
ON c.cdi = b.cdi
GROUP BY
a.cd_cliente,
a.cdi,
-- all columns of A
;
Given some time to play with your data and indexes and work with execution plans, I'm sure we could come up with something that would really sizzle.

This is your original query:
select a.cd_cliente, a.cdi as cdi_cli,b.*
from clientes a left join
rightTable b
on a.cdi = b.cdi or a.cd_cliente = b.cdi;
The performance problem is due to the or in the on condition. This generally interferes with using indexes.
If you only cared about one column from b, you could do:
select a.cd_cliente, a.cdi as cdi_cli, coalesce(b1.col, b2.col)
from clientes a left join
rightTable b1
on a.cdi = b1 left join
rightTable b2
on a.cd_cliente = b2.cdi;
These easily generalizes to a small handful of columns, but is cumbersome if b is wide.
Another way of writing the query would be much more cumbersome. It would start with the b table, double left join to a and then union in the remaining values from a:
select coalesce(a1.cd_cliente, a2.cd_cliente) as cd_cliente,
coalesce(a1.cdi, a2.cd) as cdi_cli,
b.*
from rightTable b left join
clientes a1
on a1.cdi = b.cdi left join
clientes a2
on a2.cd_cliente = b.cdi
where a1.cdi is not null or c2.cdi is not null
union all
select a.cd_cliente, a.cdi, b.*
from clientes a left join
righttable b
on 1 = 0
where not exists (select 1 from righttable b where a.cdi = b.cdi) and
not exists (select 1 from righttable b where a.cd_cliente = b.cdi)
The first part of the query gets all the matching rows, to one or the other tables. The second adds in the unmatching rows. Note the strange use of left join with a condition that always evaluates to FALSE. That makes it easier to bring in the tables from b.
Although this looks complicated, the joins and not exists subqueries can all take advantage of appropriate indexes on the tables. That means that it should have more reasonable performance.

Related

Most efficient way to join two tables on multiple fields?

I'm working with an Oracle SQL DB and attempting to join 2 tables together. My issue is that there are 3 different dimensions (4 total fields) upon which the two tables may be joined and I'm looking to identify all records where any one of those methods delivers a match and then pull in a certain field from that 2nd table in those instances.
My current plan is as follows:
SELECT a.*,
CASE
WHEN b.field_1 IS NOT NULL THEN b.field_5
WHEN c.field_2 IS NOT NULL THEN c.field_5
WHEN d.field_3 IS NOT NULL THEN c.field_5
END AS match
FROM table_1 a
LEFT JOIN table_2 b ON a.field_1 = b.field_1
LEFT JOIN table_3 c ON a.field_2 = c.field_2
LEFT JOIN table_4 d ON a.field_3 = d.field3 AND a.field_4 = d.field4
I believe this will give me the results I'm looking for, but I imagine this isn't the most efficient way to accomplish that. Any thoughts on a better approach?
[TL;DR] Your query is fine.
You need to use JOINs to correlate the relationships between the four tables.
If you want to be able to include rows from the driving table when there are no rows in the related tables then the join wants to be an OUTER JOIN.
If you put the driving table first then it will be a LEFT OUTER JOIN (or just LEFT JOIN)
You do not have much option on this.
If you want to get the field_5 values then you either want:
SELECT a.*,
b.field_5 AS b_match,
c.field_5 AS c_match,
d.field_5 AS d_match
FROM table_1 a
LEFT JOIN table_2 b ON a.field_1 = b.field_1
LEFT JOIN table_3 c ON a.field_2 = c.field_2
LEFT JOIN table_4 d ON a.field_3 = d.field3 AND a.field_4 = d.field4
If you want all the matches.
Or, you want to use your query:
SELECT a.*,
CASE
WHEN b.field_1 IS NOT NULL THEN b.field_5
WHEN c.field_2 IS NOT NULL THEN c.field_5
WHEN d.field_3 IS NOT NULL THEN c.field_5 -- Should this be d.field_5?
END AS match
FROM table_1 a
LEFT JOIN table_2 b ON a.field_1 = b.field_1
LEFT JOIN table_3 c ON a.field_2 = c.field_2
LEFT JOIN table_4 d ON a.field_3 = d.field3 AND a.field_4 = d.field4
If you want to get a single match in preference order of tables b, c and then d.
If you are using Oracle 12 or later, a third alternative could be to use UNION ALL in a LATERAL join:
SELECT a.*, l.field_5
FROM table_1 a
LEFT OUTER JOIN LATERAL (
SELECT 1 AS priority, b.field_5
FROM table_2 b
WHERE a.field_1 = b.field_1
UNION ALL
SELECT 2 AS priority, c.field_5
FROM table_3 c
WHERE a.field_2 = c.field_2
UNION ALL
SELECT 3 AS priority, d.field_5
FROM table_3 d
WHERE a.field_3 = d.field_3
AND a.field_4 = d.field_4
ORDER BY priority ASC
FETCH FIRST ROW WITH TIES
) l
ON (1 = 1)
Which may reduce the number of duplicate rows from not having multiple JOINs (that you are potentially ignoring with your CASE expression) but you should test whether it does return your desired results and if it would be more or less performant.

How to write an optimized SQL query?

How to write an optimized SQL query to get results in table 1 which are not in table 2 and table 3, similarly in table 2 which are not in table 1 and table 3 and for in table 3 which are not in table 1 and table 2
I'm trying to improve performance of my query that I'm Currently working on
TABLE A
LEFT JOIN B
LEFT JOIN C
WHERE B is NULL and C is NULL
UNION
TABLE B
LEFT JOIN A
LEFT JOIN C
WHERE A is NULL and C is NULL
UNION
TABLE C
LEFT JOIN A
LEFT JOIN B
WHERE A is NULL and B is NULL
Is there any way where we can avoid reading the table 3 times?
Try using a full outer join.
select a.*, b.*, c.*
from a full outer join b on
a.A_keys = b.B_keys
full outer join c on
a.A_keys = c.C_keys AND
b.B_keys = c.C_keys
As others have indicated because you did not supply sample data or join condition it is impossible to guess whether the second join (to c) requires conditions to both A and B or just A or just B. So you will need to work that bit out for your self.
For reference, a FULL OUTER JOIN is like doing a LEFT and RIGHT OUTER JOIN in one go. That is, it will return:
all records that match on both sides of the join (the INNER join), plus
all records that are in the LEFT table but not in the right (the LEFT outer join). Right table records will be NULL
all records that are in the RIGHT table but not in the left (the RIGHT outer join). Left table records will be NULL
I will leave it to you to add the "NULL filters" to get the records you are interested in.
So, unique from A, unique from B, unique from C ? As there are no other rows from other tables, I think you just want
SELECT * FROM A
UNION
SELECT * FROM B
UNION
SELECT * FROM C

Query Optimization : using (Union instead of OR) and (exists instead of null)

i have a Query optimisation issue.
for the context, this query has always been running instantly
but today it took way more time. (3h+)
so i tried to fix it.
The query is Like -->
Select someCols from A
inner join B left join C
Where A.date = Today
And (A.col In ( Select Z.colseekedinA from tab Z) --A.col is the same column for
-- than below
OR
A.col In ( Select X.colseekedinA from tab X)
)
-- PART 1 ---
Select someCols from A
inner join B left join C -- takes 1 second 150 lines
Where A.date = Today
-- Part 2 ---
Select Z.colseekedinA from tab Z
OR -- Union -- takes 1 seconds 180 lines
Select X.colseekedinA from tab X
When i join now the two parts with the In, the query becomes incredibly long.
so i optimized it using union instead or OR and exists instead of in
but it still takes 3 minutes
i want to get it done again down to 5 seconds.
do you see some query issue ?
thank you
Using Union and Exists
Select someCols
from A
inner join B on a.col = b.col
left join C on b.col = c.col
Where A.date = Today
and exists(
Select Z.colseekedinA from tab Z where Z.colseekedinA = A.col
Union
Select X.colseekedinA from tab X where x.colseekedinA = A.col )
Also, if possible change below join to Left join.
inner join B on a.col = b.col
The exists approach may give spurious results as you will get rows that do not match either condition just if 1 row does match. This might be avoided by using exists within a correlated subquery but it isn't something I have experimented with enough to recommend.
For speed I'd go for a cross apply and specify the parent table within the cross apply expression (correlated subquery to create a derived table). That way the join condition is specified before the data is returned, if the columns in question have indexes on them (i.e. they are primary keys) then the optimiser can work out an efficient plan for this.
Union all is used within the cross apply expression as this prevents a distinct sort within the derived table which is generally heavier in terms of cost than bringing the data itself back (union has to identify all rows anyway including duplications).
Finally if this is still slow then potentially you might want to add an index to the date column in table a. This overcomes the lack of sargability inherent in a date column and means the optimiser can leverage the index rather than scanning all of the rows in the result set and testing whether or not the date equals today.
Select someCols from A
inner join B left join C
cross apply (Select Z.colseekedinA from tab Z where a.col=z.colseekedinA
union all
Select X.colseekedinA from tab X where a.col=x.colseekedina) d
Where A.date = Today
You code is confused but for the first part
You could try using a select UNION for the inner subquery ( these with OR )
and avoid the IN clause using a inner JOIN
Select someCols from A
inner join B
left join C
INNER JOIN (
Select Z.colseekedinA from tab Z
UNION
Select X.colseekedinA from tab X
) t on A.col = t.colseekedinA
Where A.date = Today

FULL OUTER JOIN with an OR condition

I have 2 tables with 2 different id's. I want to join based on the 2 different id's and a few other parameters, but the problem is that the 1 id's don't always match. Sometimes id number 1 will have matches, some times id number 2 will not match any and some times both will match.
Using full outer join with an OR condition in the JOIN clause really slows down my query. Is there a more efficient way of doing this?
I know you can use unions instead in case of inner joins but am not sure how to optimize using outer joins.
SELEC A.*, B.*
FROM A
FULL OUTER JOIN B
ON (A.id_1 = B.id_1 or A.id_2 = B.id_2)
AND A.pay_month = B.pay_month
AND A.plan = B.plan
Hmmm . . . This might be sufficient:
select A.*, B.*
from A full outer join
B
on A.id_1 = B.id_1 and A.pay_month = B.pay_month and A.plan = B.plan
union -- intentionally to remove duplicates
select A.*, B.*
from A full outer join
B
on A.id_2 = B.id_2 and A.id_1 <> B.id_1 and A.pay_month = B.pay_month and A.plan = B.plan;
This is not 100% equivalent -- for instance, this removes duplicates even within a table. Also, the union adds the overhead of removing duplicates. But the results may be good for your purposes.
Also, is a full outer join really necessary? I rarely use it in my code.

SQL Server double left join counts are different

Code:
Select a.x,
a.y,
b.p,
c.i
from table1 a left join table2 b on a.z=b.z
left join table3 on a.z=c.z;
When I am using the above code I am not getting the correct counts:
Table1 has 30 records.
After first left join I get 30 records but after 2nd left join I am getting 33 records.
I am having hard time figuring out why I am getting different counts. According to my understanding I should be getting 30 counts even after the 2nd left join.
Can anyone help me understand this difference?
I am using sql server 2012
There are multiple rows in table3 with the same z value.
You can find them by doing:
select z, count(*)
from table3
group by z
having count(*) >= 2
order by count(*) desc;
If you want at most one match, then outer apply can be useful:
Select a.x, a.y, b.p, c.i
from table1 a outer apply
(select top 1 b.*
from table2 b
where a.z = b.z
) b outer apply
(select top 1 c.*
from table3 c
where a.z = c.z
) c;
Of course, top 1 should be used with order by, but I don't know which row you want. And, this is probably a stop-gap; you should figure out why there are duplicates.
In your table table3 contain more then 1 row per 1 row in table1. Check one value which is occured more times in both tables.
You can use group by with max function to make one to one row.