Oracle SQL: Is it more efficient to use a WHERE clause in a subquery or after the join? - sql

I wanted to know which would be more efficient and why:
example 1:
SELECT a.CUSTOMER_KEY a.LAST_NAME b.TRASACTION_AMT,
FROM CUSTOMER_TABLE a
LEFT JOIN TRANSACTION_TABLE b
ON a.CUSTOMER_KEY = b.CUSTOMER_KEY
WHERE b.DATE_TRANSACTION > 20150101 AND a.CUSTOMER_ACTIVE_FLAG = 'Y';
or example 2:
SELECT a.CUSTOMER_KEY a.LAST_NAME b.TRASACTION_AMT,
FROM
(SELECT *
FROM CUSTOMER_TABLE
WHERE CUSTOMER_ACTIVE_FLAG = 'Y') a
LEFT JOIN
(SELECT *
FROM TRANSACTION_TABLE
WHERE b.DATE_TRANSACTION > 20150101) b
ON a.CUSTOMER_KEY = b.CUSTOMER_KEY
For instance would option 2 be better optimized because it would filter out the records not satisfying the where clause first?
(NOTE: the query looks to join customer information with transaction information based on customer key. The customer key is unique to the customer table. Both querys produce equivalent output.)

The correct equivalent query without a join is:
SELECT a.CUSTOMER_KEY a.LAST_NAME b.TRASACTION_AMT,
FROM CUSTOMER_TABLE a LEFT JOIN
TRANSACTION_TABLE b
ON a.CUSTOMER_KEY = b.CUSTOMER_KEY AND b.DATE_TRANSACTION > 20150101
WHERE a.CUSTOMER_ACTIVE_FLAG = 'Y';
The condition on the second table goes in the ON clause.
The best way to know is to look at the execution plans and run-times for the two queries. I would expect the equivalent versions to have the same execution plan. Oracle has a smart optimizer and should optimize away the subqueries. However, it might miss a particular case or two, which is why you should check on your own queries.

Related

Where clause applied to only one column in join

I'm having some trouble with writing a certain SQL query. I have a wallet and a balance which I do join. The query now looks like that:
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN `balances` ON
( `wallet`.`currency` = `balances`.`currency` )
WHERE
`balances`.`user_id` = '181'
Because of the where clause, the query returns just matching records. I want to get all records from wallets table and only those from balances which do match where clause... hope I explained it well enough!
Cheers!
use subquery
SELECT w.*,t.*
FROM
wallet w
LEFT JOIN ( select * from balances where user_id = 181
) t ON w.currency =t.currency
Issue is you are applying filter on left join table wallets.
use below query.
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN (select * from `balances` `user_id` = '181') ON
( `wallet`.`currency` = `balances`.`currency` );
The question is not fully clear, but you almost definitely need an extra join clause on some sort of ID. Now there is no way to match a wallet with its balance(s). Assuming that balance have eg. a wallet_id, you'll want something like:
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN `balances` ON
(`wallet`.`id` = `balance`.`wallet_id` )
WHERE
`balances`.`user_id` = '181'
Move the condition to the ON clause. Don't use subqueries!
SELECT w.*, b.id
FROM wallet w LEFT JOIN
balances b
ON w.currency = b.currency AND
b.user_id = 181;
Notes:
The subquery in the FROM can impede the optimizer.
If you are using a LEFT JOIN, you should be selecting columns from the first table.
I am guessing that user_id is a number, so I removed the quotes around the comparison value.
Table aliases make the query easier to write and to read.
Backticks make the query harder to write and harder to read.

SQL Help- in rewriting a query

How can we reduce the Execution time of the below query?
Need help in rewriting below SQL query in a more efficient way?
SELECT A.*, C.*, F.*, D.*
FROM TABLE1 A INNER JOIN
TABLE2 C
ON A.CODE = C.CODE INNER JOIN
TABLE3 D
ON A.CODE = D.CODE INNER JOIN
TABLE4 F
ON A.CODE = F.CODE
WHERE D.IND1 = 'N' AND
D.IND2 = 'N' AND
D.EFF_DATE = (SELECT MAX(X.EFF_DATE)
FROM TABLE3 X
WHERE X.CODE = D.CODE AND X.EFF_DATE <= A.EFFECTIVE_DATE
) AND
F.EFF_DATE = (SELECT MAX(Z.EFF_DATE)
FROM TABLE4 Z
WHERE Z.DETAIL_CODE = F.DETAIL_CODE AND Z.EFF_DATE <= A.EFFECTIVE_DATE
)
For performance, I would start with indexes on:
TABLE3(IND1, IND2, CODE, EFF_DATE)
TABLE3(CODE, EFF_DATE)
TABLE1(CODE, EFF_DATE)
TABLE2(CODE)
TABLE4(CODE)
TABLE4(DETAIL_CODE, EFF_DATE)
If you have performance issues, though, I suspect your code may be generating unexpected Cartesian products. Debugging that requires much more information. I might suggest asking another question.
If you can find out where the bottlenecks in your query are -- i.e. sub-queries, joins -- that will give you a better idea of what to look at. In the absence of that, take a look at:
modify your column projections (i.e. A., C., F., D.) to only return the columns you need
look at table partitioning for the queries accessing rows based on DATE values (TABLE3.EFF_DATE, TABLE4.EFF_DATE) (http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56partition-090450.html)
look at adding materialized view(s) either on the entire query OR the sub-queries (https://oracle-base.com/articles/misc/materialized-views)
look at statistic generation if the query plan is not optimal (https://docs.oracle.com/cd/A97630_01/server.920/a96533/stats.htm#26713)
If you can provide an EXPLAIN plan (or Oracle's equivalent), that would be helpful.
Note that because of the conditions on the two sub-queries all the records in your result will have D.EFF_DATE <= A.EFFECTIVE_DATE and F.EFF_DATE <= A.EFFECTIVE_DATE, so I would suggest to put those conditions in the JOIN clauses.
Secondly, analytic functions may give better performance than subqueries:
SELECT *
FROM (
SELECT A.*,C.*,F.*,D.*,
RANK() OVER (PARTITION BY D.CODE
ORDER BY D.EFF_DATE DESC) AS D_RANK,
RANK() OVER (PARTITION BY F.DETAIL_CODE
ORDER BY F.EFF_DATE DESC) AS F_RANK
FROM TABLE1 A
INNER JOIN TABLE2 C
ON A.CODE = C.CODE
INNER JOIN TABLE3 D
ON A.CODE = D.CODE
AND D.EFF_DATE <= A.EFFECTIVE_DATE
INNER JOIN TABLE4 F
ON A.CODE = F.CODE
AND F.EFF_DATE <= A.EFFECTIVE_DATE
WHERE D.IND1 = 'N'
AND D.IND2 = 'N'
)
WHERE D_RANK = 1 AND F_RANK = 1
Evidently you need to have the right indexes to optimise the execution plan.
Another thing to consider is the total number of columns your query returns, you seem to be selecting all the columns from 4 tables.
We found that our complex queries ran in under a second when selecting only a few columns but took orders of magnitude longer when selecting many columns.
Question why you need so many columns in your result set.

Restricting inner query with outer query atttribute

I currently have a large SQL query (not mine) which I need to modify. I have a transaction and valuation table. The transaction has a one-to-many relationship with valuations. The two tables are being joined via a foreign key.
I've been asked to prevent any transactions (along with their subsequent valuations) from being returned if no valuations for a transaction exist past a certain date. The way I thought I would achieve this would be to use an inner query, but I need to make the inner query aware of the outer query and the transaction. So something like:
SELECT * FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V WHERE T.VAL_FK = V.ID
WHERE (SELECT COUNT(*) FROM V WHERE V.DATE > <GIVEN DATE>) > 1
Obviously the above wouldn't work as the inner query is separate and I can't reference the outer query V reference from the inner. How would I go about doing this, or is there a simpler way?
This would just be the case of setting the WHERE V.DATE > in the outer query as I want to prevent any valuation for a given transaction if ANY of them exceed a specified date, not just the ones that do.
Many thanks for any help you can offer.
You may looking for this
SELECT *
FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V1 ON T.VAL_FK = V1.ID
WHERE (SELECT COUNT(*)
FROM VALUATION_TABLE V2
WHERE V2.ID = V1.ID AND V2.DATE > <GIVEN DATE>) > 1
SELECT *
FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V1 ON T.VAL_FK = V.ID
WHERE V.ID IN ( SELECT ID
FROM VALUATION_TABLE
WHERE DATE > <GIVEN DATE>
)
If execution time is important, you may want to test the various solutions on your actual data and see which works best in your situation.

Difference between Two Queries - Join vs IN

I have the following two queries. Query1 is returning 1000 as row count where as Query2 is returning 4000 as row count. Can someone please explain the difference between both the queries. I was hoping both would return same count.
Query1:
SELECT COUNT(*)
FROM TableA A
WHERE A.VIN IN (
SELECT VIN
FROM TableB B, TableC C
WHERE B.MODEL_YEAR = '2014' AND B.VIN_NBR = C.VIN
)
Query2:
SELECT COUNT(*)
FROM TABLEA A, TableB B, TableC C
WHERE B.MODEL_YEAR = '2014' AND B.VIN_NBR = C.VIN AND A.VIN = C.VIN
In many cases, they will return the same answer, but not necessarily. The first counts the number of rows in A that match the conditions -- each row is counted only once, regardless of the number of matches. The second does a join, which can multiply the number of rows.
The second query would be equivalent in results if it used count(distinct A.id), where id is unique or a primary key.
That said, although they are similar in functionality, how they are executed can be quite different. Different SQL engines might do a better job of optimizing one version or the other.
By the way, you should avoid the archaic join syntax that you are using. Since 1992, explicit joins have been part of SQL syntax.

SQL: Speed Improvement - Left Join on cond1 or cond2

SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
)
Two tables that are basically the same
I don't have access to the table structure or data input (thus no cleaning up primary keys)
Sometimes the user_id is populated in one and not the other
Sometimes names are equal, sometimes they are not
I've found that I can get the most of the data by matching on user_id or the first/last names. I'm using the ' ' between the names to avoid cases where one user has the same first name as another's last name and both are missing the other field (unlikely, but plausible).
This query runs in 33000ms, whereas individualized they are each about 200ms.
I've been up late and can't think straight right now
I'm thinking that I could do a UNION and only query by name where a user_id does not exist (the default join is the user_id, if a user_id doesn't exist then I want to join by the name)
Here is some free points to anyone that wants to help
Please don't ask for the execution plan.
Looks like you can easily avoid the string concatenation:
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
Change it to:
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
Rather than concatenating first and last name and comparing them, try comparing them individually instead. Assuming you have them (and you should create them if you don't), this should improve your chances of using indexes on the first name and last name columns.
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR (a.f_name = b.f_name and a.l_name = b.l_name)
)
If people's suggestions don't provide a major speed increase, there is a possibility that your real problem is that the best query plan for your two possible join conditions is different. For that situation you would want to do two queries and merge results in some way. This is likely to make your query much, much uglier.
One obscure trick that I have used for that kind of situation is to do a GROUP BY off of a UNION ALL query. The idea looks like this:
SELECT a_field1, a_field2, ...
MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ...
FROM (
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
UNION ALL
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.f_name = b.f_name AND a.l_name = b.l_name
)
GROUP BY a_field1, a_field2, ...
And now the database can do each of the two joins using the most efficient plan.
(Warning of a drawback in this approach. If a row in current_tbl joins to multiple rows in import_tbl, then you'll wind up merging data in a very odd way.)
Incidental random performance tip. Unless you have reason to believe that there are potential duplicate rows, avoid DISTINCT. It forces an implicit GROUP BY, which can be expensive.
I don't really understand why you're concatenating those strings. Seems like that's where your slowdown would be. Does this work instead?
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
)
Here is Yet Another Ugly Way To Do It.
SELECT a.*
, CASE WHEN b.user_id IS NULL THEN c.field1 ELSE b.field1 END as b_field1
, CASE WHEN b.user_id IS NULL THEN c.field2 ELSE b.field2 END as b_field2
...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
LEFT JOIN import_tbl c
ON a.f_name = c.f_name AND a.l_name = c.l_name;
This avoids any GROUP BY, and also handles conflicting matches in a somewhat reasonable way.
Try using JOIN hints:
http://msdn.microsoft.com/en-us/library/ms173815.aspx
We were encountering the same type of behavior with one of our queries. As a last resort we added the LOOP hint, and the query ran much much faster.
It's important to note that Microsoft says this about JOIN hints:
Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints, including , be used only as a last resort by experienced developers and database administrators.
my boss at my last job.. I swear.. he thought that using UNIONS was ALWAYS FASTER THAN OR.
For example.. instead of writing
Select * from employees Where Employee_id = 12 or employee_id = 47
he would write (and have me write)
Select * from employees Where employee_id = 12
UNION
Select * from employees Where employee_id = 47
SQL Sever optimizer said that this was the right thing to do in SOME situations.. I have a friend who works on the SQL Server team at Microsoft, I emailed him about this and he told me that my stats were out of date or something along those lines.
I never really got a good answer on WHY the unions are faster, it seems REALLY counter-intuitive.
I'm not recommending you DO this, but in some situations it can help.
Also two more things-- GET RID OF THE DISTINCT CLAUSE unless you absolutely need it.. n
and more importantly, you can easily get rid of the concatenation in your join, like this for example (pardon my lack of mySQL knowledge)
SELECT DISTINCT a., b.
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name and a.l_name = b.l_name)
)
I've had some tests at work in a similiar situation that show 10x performance improvement by getting rid of the simple concatenation in your join