joins in mysql5 - sql

I've seen people writing the following query
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
written like this
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name2.column_name =table_name1.column_name
does it actually make any difference?

There is no difference.
table_name1.column_name=table_name2.column_name
and
table_name2.column_name=table_name1.column_name
have the same meaning.
The 'left' part of the join is referring the the tables and not to the comparison -- I guess that was your implicit question?

Not under normal circumstances, no. It's imaginable that there could be some sufficiently complex query, much more complex than your examples, where the join optimizer would be so overwhelmed that this could matter, but if that's even possible you'd have to work pretty hard.
You can write both of those as USING (column_name), by the way. :)
I'd also recommend, for purposes of DIY analysis of issues like this, that you familiarize yourself with EXPLAIN.

It's likely to be a matter of personal taste; I happen to write joins like your second example but couldn't say I'm 100% consistent in doing it.
Also, back in the Bad Old Days when the query optimizers weren't as smart as they are now, you could find that re-ordering the operands in a join expression or WHERE clause could make a big difference; i.e., the optimizer would use an index if the operands were ordered one way but not when they were reversed.
I think I'm having an Oracle 7 flasback. Gotta lie down...

Related

How optimize longer SQL statement in pl/sql?

I always write very long sql, but for later maintenance.
Is one sql statement divide into many statement better?
example:
select a.a1, a.a2, b.b3, sum(c.c4), b.b4...b.bn
from A a
inner join B b on a.a1=b.b1
left join C c on a.a2=c.c2
group by a.a1, a.a2, b.b3, b.b4,...,b.bn
I divide into
create temp_table select a.a1, a.a2, sum(c.c4)
from A a
left join C c on a.a2=c.c2
group by a.a1, a.a2
select temp.*, b.b3, b.b4,...b.bn
from temp_table temp
inner join B b on temp.a1=b.b1
But it need to create table in pl/sql.Is there a better way?
Can many sql statement execute faster by Oracle's CHOOSE(soft parse)?
Thanks to experience sharing.
I am a fan of writing SQL as a single statement. I find that approach is better for a variety of reasons:
A single statement is easier to maintain.
I don't have to name and remember intermediate table names.
I might make a mistake and not re-build an intermediate result when the logic changes.
The optimizer has a good chance of getting the right execution plan.
That said, the optimizer is not always right. Oracle has a good optimizer and one that makes use of statistics. On occasion, dividing a complex query into pieces can improve performance, under some circumstances:
The optimizer is not able to do a good job of estimated the size of the intermediate result. A table "knows" exactly how many rows it has.
You add indexes to the intermediate table.
You want to re-use results, say for inter-query optimization.
Although these might be beneficial, I myself shy away because of the complexity and maintainability. However, it can sometimes be faster.
It's rarely faster. You're hiding your intent from the optimizer. Generally give it one query with no user functions for optimum performance.
It won't be necessarily faster, as both are run on Oracle server, and your PL/SQL will be compiled anyway.
If you have everything done by one single SQL, you leave the query optimization to Oracle, while if you write your own PL/SQL, you might have more control of how the queries are executed. But sure if your write bad PL/SQL, it will definitely perform worse.
However, I am not sure breaking codes it up really improve maintainability. Unless you are saying you can reuse the broken pieces in other places, which improve code reuse, I would think making it one single statement seems more logical. You can definitely add more comments to explain as much detail as possible to make it clear to whoever read it in the future.

SQL Server - lack of NATURAL JOIN / x JOIN y USING(field)

I've just been reading up on NATURAL JOIN / USING - SQL92 features which are (sadly?) missing from SQL Server's current repertoire.
Has anyone come from a DBMS that supported these to SQL Server (or another non-supporting DBMS) - were they as useful as they sound, or a can of worms (which also sounds possible!)?
I never use NATURAL JOIN because I don't like the possibility that the join could do something I don't intend just because some column name exists in both tables.
I do use the USING join syntax occasionally, but just as often it turns out that I need a more complex join condition than USING can support, so I convert it to the equivalent ON syntax after all.
Would you consider a DBMS that was truly relational?:
in Tutorial D [a truly relational
language], the only “join” operator is
called JOIN, and it means “natural
join”... There should be no other kind
of join... Few people have had the
experience of using a proper
relational language. Of those who
have, I strongly suspect that none of
them ever complained about some
perceived inconvenience in pairing
columns according to their names
Source: "The Importance of Column Names" by Hugh Darwen
It's a matter of convenience. Not indispensable, but it should have its place, for example in interactive querying (every keystroke brings us closer to RSI, anyway), or some simple cases of hand written SQL even in production code (yes, I wrote that. And even seen JOIN USING in serious code, written by wise programmers other than myself. But, I'm digressing).
I found this question when looking for confirmation that SS is missing this feature, and I got it. I am only bewildered by the amount of hate against this syntax, which I attribute to the Sour Grapes Syndrome. I feel amused when being lectured with a patronising tone Sweets (read: syntactic sugar) is bad for your health. You don't need it anyway.
What is nice in the JOIN USING syntax, is that it works not just on column names, but also on column aliases, for example:
-- foreign key "order".customerId references (customer.id)
SELECT c.*, c.id as customerId, o.* from customer c
join "order" o using (customerId);
I don't agree with "Join using would be better, if only (...)". Or the argument, that you may need more complex conditions. From a different point of view, why use JOIN ON? Why not be pure, and move all conditions to the WHERE clause?
SELECT t1.*, t2.* from t1, t2 where t2.t1_id = t1.id;
I could now go mad and argue, how this is the cleanest way to express a join, and you can immediately start adding more conditions in the where clause, which you usually need anyway, blah blah blah...
So you shouldn't miss this particular syntax too dearly, but there's nothing to be happy about for not having it ("Phew, that was close. So good not to have JOIN USING. I was spared a lot of pain").
So, while I personally use JOIN ON 99% of the time, I feel no Schadenfreude when there is no JOIN USING or NATURAL JOIN.
I don't see the value of the USING or NATURAL syntax - as you've encountered, only ON is consistently implemented so it's best from a portability standpoint.
Being explicit is also better for maintenance, besides that the alternatives can be too limited to deal with situations. I'd also prefer my codebase be consistent.

Which one is faster?

Which one is faster?
SELECT FROM A INNER JOIN B ON A.ID = B.ID
...or:
SELECT FROM A , B WHERE A.ID = B.ID
I don't think one is faster than the other, but one is BETTER to use than the other:
SELECT (fields)
FROM A
INNER JOIN B ON A.ID = B.ID
is definitely the preferred way of expressing this (and conforms to the ANSI SQL standard for join syntax). It's clearer, it's more obvious to the observer what exactly is happening here.
Always use this syntax over the other - it's just easier and clearer!
PS: SQL Guru Aaron Bertrand seems to agree :-) Bad habits to kick: using old-style JOINs
Measure, don't guess.
Some DBMS' will run explicit joins faster than implicit ones but it depends entirely upon the DBMS itself (the one I use is smart enough to do both at full speed).
There's a reason why we have DBAs. They're meant to monitor and tune the performance of the database based on reality, not some (mis-)conception as to how things might perform.
That's because the performance changes based on the data in the tables.
So, you should not be worrying about how fast those two queries perform, until there's a performance problem. Use your best guess (with indexes and such) but keep an eye on what the actual performance is, in production, and adjust for that.
See also here.
They are equivalent, neither should be faster than the other. The best way to know is to use EXPLAIN.
Measure the time needed for the execution. It depends a lot from too many parameters to be answered without any doubt.
Check the execution plan for both queries, draw your conclusions from that.
actually to be specific, neither of them are faster. They will not run at all.
You need to at least specify a column or constant or even * in the SELECT clause.
Aren't they both equally fast? Because they basically translate to the same thing....
If you want to know for sure you could probably set up a simple test case and execute it.

Oracle outer joins - performance

EDIT 9-3-10: I found this blog entry recently that was very enlightening. http://optimizermagic.blogspot.com/2007/12/outerjoins-in-oracle.html
There are times when one or the other join syntax may in fact perform better. I have also found times when a have noticed a slight performance increase (only noticeable in VLDBs) when choosing the Oracle join syntax over the ANSI one. Probably not enough to get fussy over, but for those serious about mastering the Oracle DB, it may be helpful to review the article.
I am aware of two outer join syntaxes for Oracle:
select a, b
from table1
left outer join table2
on table2.foo = table1.foo
OR
select a, b
from table1, table2
where table2.foo(+) = table1.foo
(assuming I got the syntax of the second sample right.)
Is there a performance difference between these? At first I thought it must just be a style preference on the part of the developer, but then I read something that made me think maybe there would be a reason to use one style instead of the other.
"maybe there would be a reason to use
one style instead of the other. "
There are reasons, but not performance related ones. The ANSI style outer joins, as well as being standard, offer FULL OUTER JOINs and outer joins to multiple tables.
Oracle didn't support ANSI syntax prior to version 9i.
Since that version, these queries do the same and yield the same plan.
Correct pre-9i syntax is this:
SELECT a, b
FROM table1, table2
WHERE table2.foo(+) = table1.foo
There is no performance difference. You can also check the execution plans of both queries to compare.
Theoretically, the second query performs the Cartesian product of the two tables and then selects those meeting the join condition. In practice, though, the database engine will optimize it exactly the same as the first.
I found some additional information in answer to my own question. Looks like the old style is very limiting, as of this doc from 3 years ago.
http://www.freelists.org/post/oracle-l/should-one-use-ANSI-join-syntax-when-writing-an-Oracle-application,2
I think perhaps it would only make sense to use the old style if for some reason the queries might be run on an outdated version of Oracle.
The stuff I see at work is almost all in the old style, but it's probably just because the consultants have been working in Oracle since before 9i and they likely didn't see a reason to go update all the old stuff.
Thanks all!
It's not the same. In the first case you're forcing to join the tables in that order.
In the second case Oracle Planner can choose the best option to execute the query.
In this trivial case the result probably will be the same in all the executions, but if you use that syntax in more complex cases the difference will be shown.

Do you use the OUTER keyword when writing left/right JOINs in SQL?

I often see people who write SQL like this:
SELECT * from TableA LEFT OUTER JOIN TableB ON (ID1=I2)
I myself write simply:
SELECT * from TableA LEFT JOIN TableB ON (ID1=I2)
To me the "OUTER" keyword is like line noise - it adds no additional information, just clutters the SQL. It's even optional in most RDBMS that I know. So... why do people still write it? Is it a habit? Portability? (Are your SQL's really portable anyway?) Something else that I'm not aware of?
OUTER really is superfluous, as you write, since all OUTER joins are either LEFT or RIGHT, and reciprocally all LEFT or RIGHT joins are OUTER. So syntactically it's mostly noise, as you put it. It is optional even in ISO SQL. As for why people use it, I suppose some feel the need the insist on the join being OUTER, even if the left-or-right keyword already says so. For that matter, INNER also is superfluous!
YES
It just make things clearer in my opinion - the clearer and more obvious you state your intent, the better (especially for someone else trying to read and understand your code later on).
But that's just my opinion - it's not technically needed, so you can use it - or leave it.
No. I use
JOIN
LEFT JOIN
RIGHT JOIN
FULL OUTER JOIN
CROSS JOIN
There is no ambiguity for me.
One thing that several months on Stackoverflow has shown me is how much SQL is written and / or maintained by people with no previous exposure to SQL or relational databases at all.
For that reason, I think that the more explicit you can be the better off the next programmer is going to be when looking at your code.
It is simply a matter of taste, I guess that people use it because they find that it leads to more readable code. For example, I prefer to use the also optional AS keyword since SELECT ... FROM table AS t looks more readable than SELECT ... FROM table t for me.
I'm using 'inner join', 'left join', 'right join', and 'full outer join'. 'join' without 'inner' makes it somewhat ambigious to me; 'left' and 'right' are self-descriptive and 'full' is such kind of a beast that it deserves special syntax :)
I use the OUTER keyword myself. I agree it is merely a matter of taste but omitting it strikes me as being a little sloppy but not as bad a omitting the INNER keyword (sloppy) or writing SQL keywords in lower case (very sloppy).
I think there is no such thing as portable SQL in the year 2009 anyway... At some point, you need to write DBMS-specific statements (like retrieving top N rows).
I personally find the JOIN syntax redundant and instead I comma-separate table names.