This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Explicit vs implicit SQL joins
Stmt1: SELECT ... FROM ((a JOIN b ON <cond1>) JOIN c ON <cond2>)
Stmt2: SELECT ... FROM a, b, c WHERE <cond1> AND <cond2>
I'm not sure whether the second statement can give a smaller resultset. If there are several rows in B matching to one row in A, do we get all these matches with the second statement?
As a final result, yes.
Regarding the execution: the query optimizer might end up creating the same query execution plan for both queries.
This will be the case if, according to its approximate statistics (approximate equi-depth histograms for instance - which are not all the time up-to-date, by the way), the optimizer will determine that the first join is more selective than the second one and, consequently, it will execute this one first.
Stmt1 allows you to specify the order of the joins and, considering that you know exactly what the tables contain, this might be a better solution.
Semantically the queries are going to be identical. However, trying to rely on the plans to prove this is not a good idea.
It would also be possible to drop in arbitrary <cond1>and <cond2> such that the query is valid in the second form but not legal in the first one.
In that sense the second on is more general, but as long as the first one is good, then the second one is equivalent.
Related
This question already has answers here:
SQL JOIN vs IN performance?
(6 answers)
Closed 8 years ago.
I have another programmer who wrote a bunch of delete statements that look like this:
DELETE dbo.Test WHERE TestId IN (SELECT TestId FROM #Tests )
(This one is simple but there are others with sub and sub-sub in statements like this)
I always write those kinds of statements as a join. It seems to me that this is like having an in-line function that will be called over and over.
However, I know the optimizer is capable of some serious magic, and new things are added all the time. I have not researched the difference between Join vs In for a while and I thought I would ask if it is still something that should be a join.
Does it matter if you use "join" or "in"?
Most modern SQL optimizers will figure out a join from a clause like this, but it's not guaranteed, and the more complex the query gets, the less likely the optimizer will choose the proper action.
As a general rule, using IN in this sort of scenario is not a good practice. (personal opinion warning) It's really not meant to be used that way.
A good rule of thumb (again, this is debatable but not wrong) is, for using IN, stick to finite lists. For example:
SELECT DISTINCT * FROM foo WHERE id IN (1, 2, 3, ...);
When going against another table, one of these is preferable:
SELECT DISTINCT f.* FROM foo AS f
INNER JOIN bar as b on b.foo_id = f.id;
SELECT DISTINCT * FROM foo AS f
WHERE EXISTS (SELECT NULL FROM bar AS b WHERE b.foo_id = f.id);
Depending on what you are doing, and the nature of your data, your mileage will vary with these.
Note that in this simple example, the IN, the JOIN, and the EXISTS will very likely produce exactly the same query plan. When you start getting into some serious business logic against multiple tables, however, you may find the query plans significantly diverge.
There are three ways we can look at code. Does it functionally work? Does it provide good code maintenance/read-ability? And does it perform well?
Functionally speaking, there is no difference between writing the IN clause or using the join, if both preform the same operation.
From a maintenance/read-ability aspect, one could argue that in the simple cases the join syntax would be straightforward. However, if the sub-query used within the IN clause was a complex multi-join operation, then that may be more descriptive and easier to debug at a later time (put yourself in the shoes of the person who has to look at the code with limited context.)
Finally, from a performance perspective, this would depend on the number of rows in the tables, indexes available (including their statistics), and how the cost based optimizer handles the query ( which may vary depending on the SQL version) as to which would perform better.
So as with most decisions in the IT field, the real answer is … it depends.
The most effective route will be
Delete t1
From table1 t1
Inner Join table2 t2 on t1.col1=t2.col2
In table2 you can assign the temp table (#Tests) which will be much faster.
This question already has answers here:
INNER JOIN ON vs WHERE clause
(12 answers)
Closed 8 years ago.
These two SQL syntaxtes produces the same result, which one is better to use and why?
1st:
SELECT c.Id,c.Name,s.Id,s.Name,s.ClassId
FROM dbo.ClassSet c,dbo.StudentSet s WHERE c.Id=s.ClassId
2nd:
SELECT c.Id,c.Name,s.Id,s.Name,s.ClassId
FROM dbo.ClassSet c JOIN dbo.StudentSet s ON c.Id=s.ClassId
The 2:nd one is better.
The way youre joining in the first query in considered outdated. Avoid using , and use JOIN
"In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b."
"Using , notation is similar to processing the WHERE and ON conditions at the same time"
Found the details about it here, MySQL - SELECT, JOIN
Read more about SQL standards
http://en.wikipedia.org/wiki/SQL-92
SELECT c.Id,c.Name,s.Id,s.Name,s.ClassId FROM dbo.ClassSet c JOIN dbo.StudentSet s ON c.Id=s.ClassId
Without any doubt the above one is better when comparing to your first one.In the precedence table "On" is sitting Second and "Where" is on fourth
But for the simpler query like you don't want to break your head like this, for project level "JOIN" is always recommended
Check this link Is a JOIN faster than a WHERE?
Answer by #MehrdadAfshari
Theoretically, no, it shouldn't be any faster. The query optimizer
should be able to generate an identical execution plan. However, some
DB engines can produce better execution plans for one of them (not
likely to happen for such a simple query but for complex enough ones).
You should test both and see (on your DB engine).
The second because it's more readable. That is all.
I have got below SQL query in Procedure, can this more optimized for best results.
SELECT DISTINCT
[PUBLICATION_ID] as n
,[URL] as u
FROM [LINK_INFO]
WHERE Component_Template_Priority > 0
AND PUBLICATION_ID NOT IN (232,481)
ORDER BY URL
Please suggest, is using NOT Exists is better way in this.
Thanks
It is possible to use NOT EXISTS. Just going from the code above you probably shouldn't, but it's technically possible. As a general rule; a very small, quickly resolved set (two literals would definitely apply) will perform better as a NOT IN than as a NOT EXISTS. NOT EXISTS wins when NOT IN has to do enough comparisons against each row that the correlated subquery for NOT EXISTS (which stops at the first match) resolves more quickly.
This assumes that the comparison set cannot include NULL. Otherwise NOT IN and NOT EXISTS do not return the same results because NOT IN ( NULL, ...) always returns NULL and therefore no rows whereas NOT EXISTS excludes rows for which it finds a match and NULL won't generate a match and so won't exclude the row.
A third way to compare two sets for mismatches is with an OUTER JOIN. I don't see a reason to go into that from what we've got so far, so I'll let that one go for now.
A definitive answer would depend on a lot of variables (hence the comments on your question)...
What is the cardinality (number of different values) of the publication_id column?
Is there an index on the column?
How many rows are in the table?
Where did you get the values in your NOT IN clause?
Will they always be literals or are they going to come from parameters or a subquery?
... just to name a few. Of course, the best way to find out is by writing the query different ways and looking at execution times and query plans.
EDIT Another is with set operators like EXCEPT. Again, probably overkill to go into that.
select *
from ContactInformation c
where exists (select * from Department d where d.Id = c.DepartmentId )
select *
from ContactInformation c
inner join Department d on c.DepartmentId = d.Id
Both the queries give out the same output, which is good in performance wise join or correlated sub query with exists clause, which one is better.
Edit :-is there alternet way for joins , so as to increase performance:-
In the above 2 queries i want info from dept as well as contactinformation tables
Generally, the EXISTS clause because you may need DISTINCT for a JOIN for it to give the expected output. For example, if you have multiple Department rows for a ContactInformation row.
In your example above, the SELECT *:
means different output too so they are not actually equivalent
less chance of a index being used because you are pulling all columns out
Saying that, even with a limited column list, they will give the same plan: until you need DISTINCT... which is why I say "EXISTS"
You need to measure and compare - there's no golden rule which one will be better - it depends on too many variables and things in your system.
In SQL Server Management Studio, you could put both queries in a window, choose Include actual execution plan from the Query menu, and then run them together.
You should get a comparison of both their execution plans and a percentage of how much of the time was spent on one or the other query. Most likely, both will be close to 50% in this case. If not - then you know which of the two queries performs better.
You can learn more about SQL Server execution plans (and even download a free e-book) from Simple-Talk - highly recommended.
I assume that either you meant to add the DISTINCT keyword to the SELECT clause in your second query (or, less likely, a Department has only one Contact).
First, always start with 'logical' considerations. The EXISTS construct is arguably more intuitive so, all things 'physical' being equal, I'd go with that.
Second, there will be one day when you will need to ports this code, not necessarily to a different SQL product but, say, the same product but with a different optimizer. A decent optimizer should recognise that both are equivalent and come up with the same ideal plan. Consider that, in theory, the EXISTS construct has slightly more potential to short circuit.
Third, test it using a reasonably large data set. If performance isn't acceptable, start looking at the 'physical' considerations (but I suggest you always keep your 'logically-pure' code in comments for the forthcoming day when the perfect optimizer arrives :)
Your first query should output Department columns, while the second one should not.
If you're only interested in ContactInformation, these queries are equivalent. You could run them both and examine the query execution plan to see which one runs faster. For example, on MYSQL, where exists is more efficient with nullable columns, while inner join performs better if neither column is nullable.
SELECT a, b FROM products WHERE (a = 1 OR b = 2)
or...
SELECT a, b FROM products WHERE NOT (a != 1 AND b != 2)
Both statements should achieve the same results. However, the second one avoids the infamously slow "OR" operand in SQL. Does that make the 2nd statement faster?
Traditionally the latter was easier for the optimiser to deal with in that it could easily resolve an and to a s-arg, which (loosely speaking) is a predicate that can be resolved using an index.
Historically, query optimisers could not resolve OR statements to s-args and queries using OR predicates could not make effective use of indexes. Thus, the recommendation was to avoid it and re-cast the query in terms like the latter example. More recent optimisers are better at recognising OR statements that are amenable to this transform, but complex OR statements may still confuse them, resulting in unnecessary table scans.
This is the origin of the 'OR is slow' meme. The performance is nothing to do with the efficiency of processing the expression but rather the ability of the optimiser to recognise opportunities to make use of indexes.
No, a != 1 and b != 2 is identical to a = 1 or b = 2.
The query optimizer will run the same query plan for both, at least in any marginally sophisticated implementation of Sql.
There are no inherently slow or fast operators in SQL. When you issue a query, you describe the results you want. If two semantically identical queries (especially simple ones like this) yield very different run times, your SQL implementation is not very clever.
SQL Server rewrites all queries before optimizing, and most likely both queries will be the same after rewriting.
YOu can examine their execution plans in SSMS, just hit Ctrl+L, most likely they will be the same.
Also run the following:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
and rerun your queries - you should see identical real execution costs.
Ideally OR should be faster in this case because for every n steps, if it already found a=1 then it will not test second condition. Also there is no inverse operator (NOT) involved.
However for AND to be true, SQL has to test both the conditions, so for every n steps there are 2n conditions evaluated where else in OR, the number of conditions evaluated will always be less then 2n. Plus it has an additional operator to be evaluated.
However if one of the a or b is indexed, the query execution plan may differ because indexed column comparison involves intersect and union join operations over individual compare result sets !!
Also it would be wrong to consider OR as slow operator, when you consider your complex queries with joins over multiple tables, that time OR could be a big problem as mentioned by other contributor in this question. But for smaller query, OR should be fine. Infact every query has its own challenges, it not only depends on whats documented on help file, but also depends on how your data is distributed, its repeatation and variance factor.