Formatting Clear and readable SQL queries - sql

I'm writing some SQL queries with several subqueries and lots of joins everywhere, both inside the subquery and the resulting table from the subquery.
We're not using views so that's out of the question.
After writing it I'm looking at it and scratching my head wondering what it's even doing cause I can't follow it.
What kind of formatting do you use to make an attempt to clean up such a mess? Indents perhaps?

With large queries I tend to rely a lot on named result sets using WITH. This allows to define the result set beforehand and it makes the main query simpler. Named results sets may help to make the query plan more efficient as well e.g. postgres stores the result set in a temporary table.
Example:
WITH
cubed_data AS (
SELECT
dimension1_id,
dimension2_id,
dimension3_id,
measure_id,
SUM(value) value
FROM
source_data
GROUP BY
CUBE(dimension1, dimension2, dimension3),
measure
),
dimension1_label AS(
SELECT
dimension1_id,
dimension1_label
FROM
labels
WHERE
object = 'dimension1'
), ...
SELECT
*
FROM
cubed_data
JOIN dimension1_label USING (dimension1_id)
JOIN dimension2_label USING (dimension2_id)
JOIN dimension3_label USING (dimension3_id)
JOIN measure_label USING (measure_id)
The example is a bit contrived but I hope it shows the increase in clarity compared to inline subqueries. Named result sets have been a great help for me when I've been preparing data for OLAP use. Named results sets are also must if you have/want to create recursive queries.
WITH works at least on current versions of Postgres, Oracle and SQL Server

Boy is this a loaded question. :) There are as many ways to do it right as there are smart people on this site. That said, here is how I keep myself sane when building complex sql statements:
select
c.customer_id
,c.customer_name
,o.order_id
,o.order_date
,o.amount_taxable
,od.order_detail_id
,p.product_name
,pt.product_type_name
from
customer c
inner join
order o
on c.customer_id = o.customer_id
inner join
order_detail od
on o.order_id = od.order_id
inner join
product p
on od.product_id = p.product_id
inner join
product_type pt
on p.product_type_id = pt.product_type_id
where
o.order_date between '1/1/2011' and '1/5/2011'
and
(
pt.product_type_name = 'toys'
or
pt.product_type_name like '%kids%'
)
order by
o.order_date
,pt.product_type_name
,p.product_name
If you're interested, I can post/send layouts for inserts, updates and deletes as well as correlated subqueries and complex join predicates.
Does this answer your question?

Generally, people break lines on reserved words, and indent any sub-queries:
SELECT *
FROM tablename
WHERE value in
(SELECT *
FROM tablename2
WHERE condition)
ORDER BY column

In general, I follow a simple hierarchical set of formatting rules. Basically, keywords such as SELECT, FROM, ORDER BY all go on their own line. Each field goes on its own line (in a recursive fashion)
SELECT
F.FIELD1,
F.FIELD2,
F.FIELD3
FROM
FOO F
WHERE
F.FIELD4 IN
(
SELECT
B.BAR
FROM
BAR B
WHERE
B.TYPE = 4
AND B.OTHER = 7
)

Table aliases and simple consistency will get you a long, long way
What looks decent is breaking lines on main keywords SELECT, FROM, WHERE (etc..).
Joins can be trickier, indenting the ON part of joins brings out the important part of it to the front.
Breaking complicated logical expressions (joins and where conditions both) on the same level also helps.
Indenting logically the same level of statement (subqueries, opening brackets, etc)
Capitalize all keywords and standard functions.
Really complex SQL will not shy away from comments - although typically you find these in SQL scripts not dynamic SQL.
EDIT example:
SELECT a.name, SUM(b.tax)
FROM db_prefix_registered_users a
INNER JOIN db_prefix_transactions b
ON a.id = b.user_id
LEFT JOIN db_countries
ON b.paid_from_country_id = c.id
WHERE a.type IN (1, 2, 7) AND
b.date < (SELECT MAX(date)
FROM audit) AND
c.country = 'CH'
So, at the end to sum it up - consistency matters the most.

I like to use something like:
SELECT col1,
col2,
...
FROM
MyTable as T1
INNER JOIN
MyOtherTable as T2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
LEFT JOIN
(
SELECT 1,2,3
FROM Someothertable
WHERE somestuff = someotherstuff
) as T3
ON t1.field = t3.field

The only true and right way to format SQL is:
SELECT t.mycolumn AS column1
,t.othercolumn AS column2
,SUM(t.tweedledum) AS column3
FROM table1 t
,(SELECT u.anothercol
,u.memaw /*this is a comment*/
FROM table2 u
,anothertable x
WHERE u.bla = :b1 /*the bla value*/
AND x.uniquecol = :b2 /*the widget id*/
) v
WHERE t.tweedledee = v.anothercol
AND t.hohum = v.memaw
GROUP BY t.mycolumn
,t.othercolumn
HAVING COUNT(*) > 1
;
;)
Seriously though, I like to use WITH clauses (as already suggested) to tame very complicated SQL queries.

Put it in a view so it's easier to visualize, maybe keep a screenshot as part of the documentation. You don't have to save the view or use it for any other purpose.

Indenting certainly but you can also split the subqueries up with comments, make your alias names something really meaningful and specify which subquery they refer to e.g. innerCustomer, outerCustomer.
Common Table Expressions can really help in some cases to break up a query into meaningful sections.

An age-old question with a thousand opinions and no one right answer, and one of my favorites. Here's my two cents.
With regards to subqueries, lately I've found it easier to follow what's going on with "extreme" indenting and adding comments like so:
SELECT mt.Col1, mt.Col2, subQ.Dollars
from MyTable1 mt
inner join (-- Get the dollar total for each SubCol
select SubCol, sum(Dollars) Dollars
from MyTable2
group by SubCol) subQ
on subQ.SubCol = mt.Col1
order by mt.Col2
As for the other cent, I only use upper case on the first word. With pages of run-on queries, it makes it a bit easier to pick out when a new one starts.
Your mileage will, of course, vary.

Wow, alot of responses here, but one thing I haven't seen in many is COMMENTS! I tend to add a lot of comments throughout, especially with large SQL statements. Formatting is important, but well placed and meaningful comments are extremely important, not just for you but the poor soul who needs to maintain the code ;)

Related

Is there a better way to optimize this SQL query?

Assume my table has millions of records.
The following columns have indexes:
type
date
unique_id
Here is my query:
SELECT TOP (1000) T.TIME, T.TYPE, F.NAME,
B.NAME, T.MESSAGE
FROM MY_TABLE T
LEFT OUTER JOIN FOO F ON F.ID = T.FID
LEFT OUTER JOIN BAR B ON B.ID = T.BID
WHERE T.TYPE IN ('success', 'failure')
AND T.DATE BETWEEN 1592585183437 AND 1594232320525
AND T.UNIQUE_ID = "my unique ID"
ORDER BY T.DATE DESC
My question is am I causing myself any trouble with this query if I have tons of records in my table? Can this be optimized further?
My question is am I causing myself any trouble with this query if I have tons of records
in my table?
Wrong question. As long as you are only asking for data you need, you NEED it. Any trouble means you STIL need it.
The query looks as good as it gets. I somehow doubt the TOP 10000 (that is a LOT of data unless you compress it somehow).
The question is more whether you have the proper indices. Without the proper indices. I would also not use strings like this for TYPE, but then the question is about the query, not possible failures in table design.
Check indices.

How to do self join

I want to do self-join I have written a code as well but its throwing an error and it seems that there is some issue with an alias.
Apart from this it would also be helpful for me if someone let me know any best site where I can learn query in MS Access. I searched a number of places everywhere, it is showing through UI Interface but I want to learn query in MS Access.
(SELECT distinct itemname,vendorname,price,count(*)
from vendor_Details1
group by itemname,vendorname,price
order by vendorname) A
inner join
(SELECT distinct itemname,vendorname,price,count(*)
from vendor_Details1
group by itemname,vendorname,price
order by vendorname) B
on A.vendorname=B.vendorname
It is entirely unclear what you are trying to do. However, "join" is an operator in the FROM clause that operates on two tables, views, or subqueries.
The structure of a self-join looks like:
select . . . -- list of columns here
from t as t1 inner join -- you need aliases for the table so you can distinguish the references
t as t2
on t1.? = t2.? -- the join condition goes here
Your query doesn't even have a select.

I Need some sort of Conditional Join

Okay, I know there are a few posts that discuss this, but my problem cannot be solved by a conditional where statement on a join (the common solution).
I have three join statements, and depending on the query parameters, I may need to run any combination of the three. My Join statement is quite expensive, so I want to only do the join when the query needs it, and I'm not prepared to write a 7 combination IF..ELSE.. statement to fulfill those combinations.
Here is what I've used for solutions thus far, but all of these have been less than ideal:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
WHERE jt.someCol = conditions
OR #neededJoin is null
(This is just too expensive, because I'm performing the join even when I don't need it, just not evaluating the join)
OUTER APPLY
(SELECT TOP(1) * FROM joinedTable jt
WHERE jt.someCol = someCol
AND #neededjoin is null)
(this is even more expensive than always left joining)
SELECT #sql = #sql + ' INNER JOIN joinedTable jt ' +
' ON jt.someCol = someCol ' +
' WHERE (conditions...) '
(this one is IDEAL, and how it is written now, but I'm trying to convert it away from dynamic SQL).
Any thoughts or help would be great!
EDIT:
If I take the dynamic SQL approach, I'm trying to figure out what would be most efficient with regards to structuring my query. Given that I have three optional conditions, and I need the results from all of them my current query does something like this:
IF condition one
SELECT from db
INNER JOIN condition one
UNION
IF condition two
SELECT from db
INNER JOIN condition two
UNION
IF condition three
SELECT from db
INNER JOIN condition three
My non-dynamic query does this task by performing left joins:
SELECT from db
LEFT JOIN condition one
LEFT JOIN condition two
LEFT JOIN condition three
WHERE condition one is true
OR condition two is true
OR condition three is true
Which makes more sense to do? since all of the code from the "SELECT from db" statement is the same? It appears that the union condition is more efficient, but my query is VERY long because of it....
Thanks!
LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND #neededjoin ...
...
OR
LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...
) jt ON jt.someCol = someCol
...
OR
;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...
To be honest, there is no such construct as a conditional JOIN unless you use literals.
If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE
the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.
I would go for a simple and straightforward approach like this:
DECLARE #ret TABLE(...) ;
IF <coondition one> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition two> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition three> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
SELECT DISTINCT ... FROM #ret ;
Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.
However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.
Try this:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
AND jt.someCol = conditions
AND #neededJoin = 1 -- or whatever indicates join is needed
I think you'll find it is good performance and does what you need.
Update
If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:
Col1 Col2 TableCode
1 2 A
1 4 A
1 3 B
1 5 B
2 2 C
2 5 C
1 11 C
Then,
SELECT
V.*,
LookedUpValue =
CASE M.TableCode
WHEN 'A' THEN A.Value
WHEN 'B' THEN B.Value
WHEN 'C' THEN C.Value
END
FROM
ValueMaster V
INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
LEFT JOIN TableA A ON M.TableCode = 'A'
LEFT JOIN TableB B ON M.TableCode = 'B'
LEFT JOIN TableC C ON M.TableCode = 'C'
This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).
This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.
If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).
Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.
The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.
The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.
When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:
a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10
In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:
Note that table 'a' was referenced in the query:
Add 'FullTableName AS a' to the FROM clause
Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).
The same basic technique should be applied here - but the details are slightly different.
First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...
The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
One part will be the core query:
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
ON M.PrimaryKey = C1.ForeignKey
Other tables can be added as necessary:
JOIN AuxilliaryTable1 AS A
ON M.ForeignKey1 = A.PrimaryKey
Or you can specify a full query:
JOIN (SELECT RelevantColumn1, RelevantColumn2
FROM AuxilliaryTable1
WHERE Column1 BETWEEN 1 AND 50) AS A
In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).
Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.
Good luck...
Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.
Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.
with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
,CASE 1
WHEN 1 THEN
t1.Num
WHEN 2 THEN
t2.Num
WHEN 3 THEN
t3.Num
END SelectedNum
FROM m
LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num

optimize SQL query

What more can I do to optimize this query?
SELECT * FROM
(SELECT `item`.itemID, COUNT(`votes`.itemID) AS `votes`,
`item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin' ,
`myItems`.userID as userIDFav, `myItems`.deleted as myDeleted
FROM (votes `votes` RIGHT OUTER JOIN item `item`
ON (`votes`.itemID = `item`.itemID))
INNER JOIN
users `users`
ON (`users`.userID = `item`.userID)
LEFT OUTER JOIN
myItems `myItems`
ON (`myItems`.itemID = `item`.itemID)
WHERE (`item`.deleted = 0)
GROUP BY `item`.itemID,
`votes`.itemID,
`item`.title,
`item`.itemTypeID,
`item`.submitDate,
`item`.deleted,
`item`.ItemCat,
`item`.counter,
`item`.userID,
`users`.name,
`myItems`.deleted,
`myItems`.userID
ORDER BY `item`.itemID DESC) as myTable
where myTable.userIDFav = 3 or myTable.userIDFav is null
limit 0, 20
I'm using MySQL
Thanks
What does the analyzer say for this query? Without knowledge about how many rows there are in the table you cant tell any optimization. So run the analyzer and you'll see what parts costs what.
Of course, as #theomega said, look at the execution plan.
But I'd also suggest to try and "clean up" your statement. (I don't know which one is faster - that depends on your table sizes.) Usually, I'd try to start with a clean statement and start optimizing from there. But typically, a clean statement makes it easier for the optimizer to come up with a good execution plan.
So here are some observations about your statement that might make things slow:
a couple of outer joins (makes it hard for the optimzer to figure out an index to use)
a group by
a lot of columns to group by
As far as I understand your SQL, this statement should do most of what yours is doing:
SELECT `item`.itemID, `item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin'
FROM (item `item` INNER JOIN users `users`
ON (`users`.userID = `item`.userID)
WHERE
Of course, this misses the info from the tables you outer joined, I'd suggest to try to add the required columns via a subselect:
SELECT `item`.itemID,
(SELECT count (itemID)
FROM votes v
WHERE v.itemID = 'item'.itemID) as 'votes', <etc.>
This way, you can get rid of one outer join and the group by. The outer join is replaced by the subselect, so there is a trade-off which may be bad for the "cleaner" statement.
Depending on the cardinality between item and myItems, you can do the same or you'd have to stick with the outer join (but no need to reintroduce the group by).
Hope this helps.
Some quick semi-random thoughts:
Are your itemID and userID columns indexed?
What happens if you add "EXPLAIN " to the start of the query and run it? Does it use indexes? Are they sensible?
DO you need to run the whole inner query and filter on it, or could you put move the where myTable.userIDFav = 3 or myTable.userIDFav is null part into the inner query?
You do seem to have too many fields in the Group By list, since one of them is itemID, I suspect that you could use an inner SELECT to preform the grouping and an outer SELECT to return the set of fields desired.
Can't you add the where clause myTable.userIDFav = 3 or myTable.userIDFav is null to WHERE (item.deleted = 0)?
Regards
Lieven
Look at the way your query is built. You join a lot of stuff, then limit the output to 20 rows. You should have the outer join on items and myitems, since your conditions only apply to these two tables, limit the output to the first 20 rows, then join and aggregate. Here you are performing a lot of work that is going to be discarded.

Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?

When I started writing database queries I didn't know the JOIN keyword yet and naturally I just extended what I already knew and wrote queries like this:
SELECT a.someRow, b.someRow
FROM tableA AS a, tableB AS b
WHERE a.ID=b.ID AND b.ID= $someVar
Now that I know that this is the same as an INNER JOIN I find all these queries in my code and ask myself if I should rewrite them. Is there something smelly about them or are they just fine?
My answer summary: There is nothing wrong with this query BUT using the keywords will most probably make the code more readable/maintainable.
My conclusion: I will not change my old queries but I will correct my writing style and use the keywords in the future.
Filtering joins solely using WHERE can be extremely inefficient in some common scenarios. For example:
SELECT * FROM people p, companies c
WHERE p.companyID = c.id AND p.firstName = 'Daniel'
Most databases will execute this query quite literally, first taking the Cartesian product of the people and companies tables and then filtering by those which have matching companyID and id fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.
A better approach is to group the constraints with the JOINs where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:
SELECT * FROM people p JOIN companies c ON p.companyID = c.id
WHERE p.firstName = 'Daniel'
It's a little longer, but the database is able to look at the ON clause and use it to compute the fully-constrained JOIN directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.
I change every query I see which uses the "comma JOIN" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.
The more verbose INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN are from the ANSI SQL/92 syntax for joining. For me, this verbosity makes the join more clear to the developer/DBA of what the intent is with the join.
In SQL Server there are always query plans to check, a text output can be made as follows:
SET SHOWPLAN_ALL ON
GO
DECLARE #TABLE_A TABLE
(
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL
)
INSERT INTO #TABLE_A
SELECT 'ABC' UNION
SELECT 'DEF' UNION
SELECT 'GHI' UNION
SELECT 'JKL'
DECLARE #TABLE_B TABLE
(
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL
)
INSERT INTO #TABLE_B
SELECT 'ABC' UNION
SELECT 'DEF' UNION
SELECT 'GHI' UNION
SELECT 'JKL'
SELECT A.Data, B.Data
FROM
#TABLE_A AS A, #TABLE_B AS B
WHERE
A.ID = B.ID
SELECT A.Data, B.Data
FROM
#TABLE_A AS A
INNER JOIN #TABLE_B AS B ON A.ID = B.ID
Now I'll omit the plan for the table variable creates, the plan for both queries is identical though:
SELECT A.Data, B.Data FROM #TABLE_A AS A, #TABLE_B AS B WHERE A.ID = B.ID
|--Nested Loops(Inner Join, OUTER REFERENCES:([A].[ID]))
|--Clustered Index Scan(OBJECT:(#TABLE_A AS [A]))
|--Clustered Index Seek(OBJECT:(#TABLE_B AS [B]), SEEK:([B].[ID]=#TABLE_A.[ID] as [A].[ID]) ORDERED FORWARD)
SELECT A.Data, B.Data FROM #TABLE_A AS A INNER JOIN #TABLE_B AS B ON A.ID = B.ID
|--Nested Loops(Inner Join, OUTER REFERENCES:([A].[ID]))
|--Clustered Index Scan(OBJECT:(#TABLE_A AS [A]))
|--Clustered Index Seek(OBJECT:(#TABLE_B AS [B]), SEEK:([B].[ID]=#TABLE_A.[ID] as [A].[ID]) ORDERED FORWARD)
So, short answer - No need to rewrite, unless you spend a long time trying to read them each time you maintain them?
It's more of a syntax choice. I prefer grouping my join conditions with my joins, hence I use the INNER JOIN syntax
SELECT a.someRow, b.someRow
FROM tableA AS a
INNER JOIN tableB AS b
ON a.ID = b.ID
WHERE b.ID = ?
(? being a placeholder)
Nothing is wrong with the syntax in your example. The 'INNER JOIN' syntax is generally termed 'ANSI' syntax, and came after the style illustrated in your example. It exists to clarify the type/direction/constituents of the join, but is not generally functionally different than what you have.
Support for 'ANSI' joins is per-database platform, but it's more or less universal these days.
As a side note, one addition with the 'ANSI' syntax was the 'FULL OUTER JOIN' or 'FULL JOIN'.
Hope this helps.
In general:
Use the JOIN keyword to link (ie. "join") primary keys and foreign keys.
Use the WHERE clause to limit your result set to only the records you are interested in.
The one problem that can arise is when you try to mix the old "comma-style" join with SQL-92 joins in the same query, for example if you need one inner join and another outer join.
SELECT *
FROM table1 AS a, table2 AS b
LEFT OUTER JOIN table3 AS c ON a.column1 = c.column1
WHERE a.column2 = b.column2;
The problem is that recent SQL standards say that the JOIN is evaluated before the comma-join. So the reference to "a" in the ON clause gives an error, because the correlation name hasn't been defined yet as that ON clause is being evaluated. This is a very confusing error to get.
The solution is to not mix the two styles of joins. You can continue to use comma-style in your old code, but if you write a new query, convert all the joins to SQL-92 style.
SELECT *
FROM table1 AS a
INNER JOIN table2 AS b ON a.column2 = b.column2
LEFT OUTER JOIN table3 AS c ON a.column1 = c.column1;
Another thing to consider in the old join syntax is that is is very easy to get a cartesion join by accident since there is no on clause. If the Distinct keyword is in the query and it uses the old style joins, convert it to an ANSI standard join and see if you still need the distinct. If you are fixing accidental cartesion joins this way, you can improve performance tremendously by rewriting to specify the join and the join fields.
I avoid implicit joins; when the query is really large, they make the code hard to decipher
With explicit joins, and good formatting, the code is more readable and understandable without need for comments.
It also depends on whether you are just doing inner joins this way or outer joins as well. For instance, the MS SQL Server syntax for outer joins in the WHERE clause (=* and *=) can give different results than the OUTER JOIN syntax and is no longer supported (http://msdn.microsoft.com/en-us/library/ms178653(SQL.90).aspx) in SQL Server 2005.
And what about performances ???
As a matter of fact, performances is a very important problem in RDBMS.
So the question is what is the most performant... Using JOIN or having joined table in the WHERE clause ?
Because optimizer (or planer as they said in PG...) ordinary does a good job, the two execution plans are the same, so the performances while excuting the query will be the same...
But devil are hidden in some details....
All optimizers have a limited time or a limited amount of work to find the best plan... And when the limit is reached, the result is the best plan among all computed plans, and not the better of all possible plans !
Now the question is do I loose time when I use WHERE clause instead of JOINs for joining tables ?
And the answer is YES !
YES, because the relational engine use relational algebrae that knows only JOIN operator, not pseudo joins made in the WHERE clause. So the first thing that the optimizer do (in fact the parser or the algrebriser) is to rewrite the query... and this loose some chances to have the best of all plans !
I have seen this problem twice, in my long RDBMS career (40 years...)