Which one is faster? - sql

Which one is faster?
SELECT FROM A INNER JOIN B ON A.ID = B.ID
...or:
SELECT FROM A , B WHERE A.ID = B.ID

I don't think one is faster than the other, but one is BETTER to use than the other:
SELECT (fields)
FROM A
INNER JOIN B ON A.ID = B.ID
is definitely the preferred way of expressing this (and conforms to the ANSI SQL standard for join syntax). It's clearer, it's more obvious to the observer what exactly is happening here.
Always use this syntax over the other - it's just easier and clearer!
PS: SQL Guru Aaron Bertrand seems to agree :-) Bad habits to kick: using old-style JOINs

Measure, don't guess.
Some DBMS' will run explicit joins faster than implicit ones but it depends entirely upon the DBMS itself (the one I use is smart enough to do both at full speed).
There's a reason why we have DBAs. They're meant to monitor and tune the performance of the database based on reality, not some (mis-)conception as to how things might perform.
That's because the performance changes based on the data in the tables.
So, you should not be worrying about how fast those two queries perform, until there's a performance problem. Use your best guess (with indexes and such) but keep an eye on what the actual performance is, in production, and adjust for that.
See also here.

They are equivalent, neither should be faster than the other. The best way to know is to use EXPLAIN.

Measure the time needed for the execution. It depends a lot from too many parameters to be answered without any doubt.

Check the execution plan for both queries, draw your conclusions from that.

actually to be specific, neither of them are faster. They will not run at all.
You need to at least specify a column or constant or even * in the SELECT clause.

Aren't they both equally fast? Because they basically translate to the same thing....
If you want to know for sure you could probably set up a simple test case and execute it.

Related

How optimize longer SQL statement in pl/sql?

I always write very long sql, but for later maintenance.
Is one sql statement divide into many statement better?
example:
select a.a1, a.a2, b.b3, sum(c.c4), b.b4...b.bn
from A a
inner join B b on a.a1=b.b1
left join C c on a.a2=c.c2
group by a.a1, a.a2, b.b3, b.b4,...,b.bn
I divide into
create temp_table select a.a1, a.a2, sum(c.c4)
from A a
left join C c on a.a2=c.c2
group by a.a1, a.a2
select temp.*, b.b3, b.b4,...b.bn
from temp_table temp
inner join B b on temp.a1=b.b1
But it need to create table in pl/sql.Is there a better way?
Can many sql statement execute faster by Oracle's CHOOSE(soft parse)?
Thanks to experience sharing.
I am a fan of writing SQL as a single statement. I find that approach is better for a variety of reasons:
A single statement is easier to maintain.
I don't have to name and remember intermediate table names.
I might make a mistake and not re-build an intermediate result when the logic changes.
The optimizer has a good chance of getting the right execution plan.
That said, the optimizer is not always right. Oracle has a good optimizer and one that makes use of statistics. On occasion, dividing a complex query into pieces can improve performance, under some circumstances:
The optimizer is not able to do a good job of estimated the size of the intermediate result. A table "knows" exactly how many rows it has.
You add indexes to the intermediate table.
You want to re-use results, say for inter-query optimization.
Although these might be beneficial, I myself shy away because of the complexity and maintainability. However, it can sometimes be faster.
It's rarely faster. You're hiding your intent from the optimizer. Generally give it one query with no user functions for optimum performance.
It won't be necessarily faster, as both are run on Oracle server, and your PL/SQL will be compiled anyway.
If you have everything done by one single SQL, you leave the query optimization to Oracle, while if you write your own PL/SQL, you might have more control of how the queries are executed. But sure if your write bad PL/SQL, it will definitely perform worse.
However, I am not sure breaking codes it up really improve maintainability. Unless you are saying you can reuse the broken pieces in other places, which improve code reuse, I would think making it one single statement seems more logical. You can definitely add more comments to explain as much detail as possible to make it clear to whoever read it in the future.

Optimizing INNER JOIN query performance

I'm using a database that requires optimized queries and I'm wondering which one of those queries are the optimized one, I used a timer but the result are too close. so I do not have to clue which one to use.
QUERY 1:
SELECT A.MIG_ID_ACTEUR, A.FL_FACTURE_FSL , B.VAL_NOM,
B.VAL_PRENOM, C.VAL_CDPOSTAL, C.VAL_NOM_COMMUNE, D.CCB_ID_ACTEUR
FROM MIG_FACTURE A
INNER JOIN MIG_ACTEUR B
ON A.MIG_ID_ACTEUR= B.MIG_ID_ACTEUR
INNER JOIN MIG_ADRESSE C
ON C.MIG_ID_ADRESSE = B.MIG_ID_ADRESSE
INNER JOIN MIG_CORR_REF_ACTEUR D
ON A.MIG_ID_ACTEUR= D.MIG_ID_ACTEUR;
QUERY 2:
SELECT A.MIG_ID_ACTEUR, A.FL_FACTURE_FSL , B.VAL_NOM, B.VAL_PRENOM,
C.VAL_CDPOSTAL, C.VAL_NOM_COMMUNE, D.CCB_ID_ACTEUR
FROM MIG_FACTURE A , MIG_ACTEUR B, MIG_ADRESSE C, MIG_CORR_REF_ACTEUR D
WHERE A.MIG_ID_ACTEUR= B.MIG_ID_ACTEUR
AND C.MIG_ID_ADRESSE = B.MIG_ID_ADRESSE
AND A.MIG_ID_ACTEUR= D.MIG_ID_ACTEUR;
If you are asking whether it is more efficient to use the SQL 99 join syntax (a inner join b) or whether it is more efficient to use the older join syntax of listing the join predicates in the WHERE clause, it shouldn't matter. I'd expect that the query plans for the two queries would be identical. If the query plans are identical, performance will be identical. If the plans are not identical, that would generally imply that you had encountered a bug in the database's query parsing engine.
Personally, I'd use the SQL 99 syntax (query 1) both because it is more portable when you want to do an outer join and because it generally makes the query more readable and decreases the probability that you'll accidentally leave out a join condition. That's solely a readability and maintainability consideration, though, not a performance consideration.
First things first:
"I used a timer but the result are too close" -- This is actually not a good way to test performance. Databases have caches. The results you get back won't be comparable with a stopwatch. You have system load to contend with, caching, and a million other things that make that particular comparison worthless. Instead of that, try using EXPLAIN to figure out the execution plan. Use SHOW PROFILES and SHOW STATUS to see where and how the queries are spending time. Check last_query_cost. But don't check your stopwatch. That won't tell you anything.
Second: this question can't be answered with the info your provided. In point of fact the queries are identical (verify that with Explain) and simply boil down to implicit vs explicit joins. Doesn't make either one of them optimized though. Again, you need to dig into the join itself and see if it's making use of indices, for example, or if it's doing a lot temp tables or file sorts.
Optimizing the query is a good thing... but these two are the same. A stop watch won't help you. Use explain, show profiles, show status.... not a stop watch :-)

joins in mysql5

I've seen people writing the following query
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
written like this
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name2.column_name =table_name1.column_name
does it actually make any difference?
There is no difference.
table_name1.column_name=table_name2.column_name
and
table_name2.column_name=table_name1.column_name
have the same meaning.
The 'left' part of the join is referring the the tables and not to the comparison -- I guess that was your implicit question?
Not under normal circumstances, no. It's imaginable that there could be some sufficiently complex query, much more complex than your examples, where the join optimizer would be so overwhelmed that this could matter, but if that's even possible you'd have to work pretty hard.
You can write both of those as USING (column_name), by the way. :)
I'd also recommend, for purposes of DIY analysis of issues like this, that you familiarize yourself with EXPLAIN.
It's likely to be a matter of personal taste; I happen to write joins like your second example but couldn't say I'm 100% consistent in doing it.
Also, back in the Bad Old Days when the query optimizers weren't as smart as they are now, you could find that re-ordering the operands in a join expression or WHERE clause could make a big difference; i.e., the optimizer would use an index if the operands were ordered one way but not when they were reversed.
I think I'm having an Oracle 7 flasback. Gotta lie down...

Oracle outer joins - performance

EDIT 9-3-10: I found this blog entry recently that was very enlightening. http://optimizermagic.blogspot.com/2007/12/outerjoins-in-oracle.html
There are times when one or the other join syntax may in fact perform better. I have also found times when a have noticed a slight performance increase (only noticeable in VLDBs) when choosing the Oracle join syntax over the ANSI one. Probably not enough to get fussy over, but for those serious about mastering the Oracle DB, it may be helpful to review the article.
I am aware of two outer join syntaxes for Oracle:
select a, b
from table1
left outer join table2
on table2.foo = table1.foo
OR
select a, b
from table1, table2
where table2.foo(+) = table1.foo
(assuming I got the syntax of the second sample right.)
Is there a performance difference between these? At first I thought it must just be a style preference on the part of the developer, but then I read something that made me think maybe there would be a reason to use one style instead of the other.
"maybe there would be a reason to use
one style instead of the other. "
There are reasons, but not performance related ones. The ANSI style outer joins, as well as being standard, offer FULL OUTER JOINs and outer joins to multiple tables.
Oracle didn't support ANSI syntax prior to version 9i.
Since that version, these queries do the same and yield the same plan.
Correct pre-9i syntax is this:
SELECT a, b
FROM table1, table2
WHERE table2.foo(+) = table1.foo
There is no performance difference. You can also check the execution plans of both queries to compare.
Theoretically, the second query performs the Cartesian product of the two tables and then selects those meeting the join condition. In practice, though, the database engine will optimize it exactly the same as the first.
I found some additional information in answer to my own question. Looks like the old style is very limiting, as of this doc from 3 years ago.
http://www.freelists.org/post/oracle-l/should-one-use-ANSI-join-syntax-when-writing-an-Oracle-application,2
I think perhaps it would only make sense to use the old style if for some reason the queries might be run on an outdated version of Oracle.
The stuff I see at work is almost all in the old style, but it's probably just because the consultants have been working in Oracle since before 9i and they likely didn't see a reason to go update all the old stuff.
Thanks all!
It's not the same. In the first case you're forcing to join the tables in that order.
In the second case Oracle Planner can choose the best option to execute the query.
In this trivial case the result probably will be the same in all the executions, but if you use that syntax in more complex cases the difference will be shown.

subselect vs outer join

Consider the following 2 queries:
select tblA.a,tblA.b,tblA.c,tblA.d
from tblA
where tblA.a not in (select tblB.a from tblB)
select tblA.a,tblA.b,tblA.c,tblA.d
from tblA left outer join tblB
on tblA.a = tblB.a where tblB.a is null
Which will perform better? My assumption is that in general the join will be better except in cases where the subselect returns a very small result set.
RDBMSs "rewrite" queries to optimize them, so it depends on system you're using, and I would guess they end up giving the same performance on most "good" databases.
I suggest picking the one that is clearer and easier to maintain, for my money, that's the first one. It's much easier to debug the subquery as it can be run independently to check for sanity.
non-correlated sub queries are fine. you should go with what describes the data you're wanting. as has been noted, this likely gets rewritten into the same plan, but isn't guaranteed to! what's more, if table A and B are not 1:1 you will get duplicate tuples from the join query (as the IN clause performs an implicit DISTINCT sort), so it's always best to code what you want and actually think about the outcome.
Well, it depends on the datasets. From my experience, if You have small dataset then go for a NOT IN if it's large go for a LEFT JOIN. The NOT IN clause seems to be very slow on large datasets.
One other thing I might add is that the explain plans might be misleading. I've seen several queries where explain was sky high and the query run under 1s. On the other hand I've seen queries with excellent explain plan and they could run for hours.
So all in all do test on your data and see for yourself.
I second Tom's answer that you should pick the one that is easier to understand and maintain.
The query plan of any query in any database cannot be predicted because you haven't given us indexes or data distributions. The only way to predict which is faster is to run them against your database.
As a rule of thumb I tend to use sub-selects when I do not need to include any columns from tblB in my select clause. I would definitely go for a sub-select when I want to use the 'in' predicate (and usually for the 'not in' that you included in the question), for the simple reason that these are easier to understand when you or someone else has come back and change them.
The first query will be faster in SQL Server which I think is slighty counter intuitive - Sub queries seem like they should be slower. In some cases (as data volumes increase) an exists may be faster than an in.
It should be noted that these queries will produce different results if TblB.a is not unique.
From my observations, MSSQL server produces same query plan for these queries.
I created a simple query similar to the ones in the question on MSSQL2005 and the explain plans were different. The first query appears to be faster. I am not a SQL expert but the estimated explain plan had 37% for query 1 and 63% for the query 2. It appears that the biggest cost for query 2 is the join. Both queries had two table scans.