This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
SQL Server - Query Short-Circuiting?
Is the SQL WHERE clause short-circuit evaluated?
I have a question regarding performance of logical OR operators in T-SQL (SQL Server 2005).
I have searched around a little but I couldn't find anything on the subject.
If you have the following query:
SELECT * FROM Table WHERE (randomboolean OR HeavyToEvaluateCondition)
Wouldn't the procedure interpreter go as far as the randomboolean and skip evaluation of the heavy condition in order to save performance given that the first condition is true?
Since one of the values in an OR statement is true it would be unnecessary to evaluate the second condition since we already know that the first condition is met!
I know it works like this in C# but I want to know if I can count on it in T-SQL too.
You can't count on short circuit evaluation in TSQL.
The optimiser is free to evaluate the conditions in which ever order it sees fit and may in some circumstances evaluate both parts of an expression even when the second evaluation cannot change the result of the expression (Example).
That is not to say it never does short circuit evaluation however. You may well get a start up predicate on the expensive condition so it is only executed when required.
Additionally the presence of the OR in your query can convert a sargable search condition into an unsargable one meaning that indexes are not used optimally. Especially in SQL Server 2005 (In 2008 OPTION (RECOMPILE) can help here).
For example compare the plans for the following. The version with OR ends up doing a full index scan rather than an index seek to the specific values.
DECLARE #number INT;
SET number = 0;
SELECT COUNT(*)
FROM master..spt_values
WHERE #number IS NULL OR number = 0
SELECT COUNT(*)
FROM master..spt_values
WHERE number = 0
Its called short-circuiting. And yes SQL Server does do it in certain cases. In what order depends on many factors and forms part of the execution plan optimisation.
However, there are details online that this is limitted to JOIN conditions, CASE statements, etc.
See this SO post... SQL Server - Query Short-Circuiting?
Firstly where condition is executed than OR operator is executed when control goes to first condition.if first condition is true than it is not check the second condition .if you are given 100 condition and in this scenario first condition is false then it check next condition.
Related
My colleague wrote this SQL statement and I had a hard time understanding it. What exactly is the purpose of using a colon in the where clause?
WHERE MGM_YYMM like :AS_YYMM
Full Query:
SELECT A.MGM_YYMM,
A.MGM_DATE,
A.MGM_GB,
A.INDATE,
A.SUDATE,
A.EMPNUM,
FROM SE_MAGAM A(NOLOCK)
WHERE MGM_YYMM like :AS_YYMM
ORDER BY MGM_YYMM DESC
It is a bind variable.
The program (or whatever else is issuing the query) will assign a value to :AS_YYMM, in this case the pattern to match against the MGM_YYMM column.
These kind of parameterized queries are useful because they can be prepared/parsed/compiled/analyzed once and then be run multiple times for varying inputs with reduced overhead (compared to a new query each time). Also helps against SQL injection (compared to building a dynamic SQL statement from user input).
I am working on a join condition between 2 tables where one of the columns to match on is a concatentation of values. I need to join columnA from tableA to the first 2 characters of columnB from tableB.
I have developed 2 different statements to handle this and I have tried to analyze the performance of each method.
Method 1:
ON tB.columnB like tA.columnA || '%'
Method 2:
ON substr(tB.columnB,1,2) = tA.columnA
The query execution plan has a lot less steps using Method 1 compared to Method 2, however, it looks like Method 2 executes much faster. Also, the execution plan shows a recommended index for Method 2 that could improve its performance.
I am running this on an IBM iSeries, though would be interested in answers in a general sense to learn more about sql query optimization.
Does it make sense that Method 2 would execute faster?
This SO question is similar, but it looks like no one provided any concrete answers to the performance difference of these approaches: T-SQL speed comparison between LEFT() vs. LIKE operator.
PS: The table design that requires this type of join is not something that I can get changed at this time. I realize having the fields separated which hold different types of data would be preferrable.
I ran the following in the SQL Advisor in IBM Data Studio on one of the tables in my DB2 LUW 10.1 database:
SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
AND DB30_VERSION_ID = 'YYY'
AND SUBSTR(DB30_REL_TABLE_NM, 1, 4) = 'ZZZZ'
and
SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
AND DB30_VERSION_ID = 'YYY'
AND DB30_REL_TABLE_NM LIKE 'ZZZZ%'
They both had the exact same access path utilizing the same index, the same estimated IO cost and the same estimated cardinality, the only difference being the estimated total CPU cost for the LIKE was 178,343.75 while the SUBSTR was 197,518.48 (~10% difference).
The cumulative total cost for both were the same though, so this difference is negligible as per the advisor.
Yes, Method 2 would be faster. LIKE is not as efficient a function.
To compare performance of various techniques, try using Visual Explain. You will find it buried in System i Navigator. Under your system connection, expand databases, then click onyour RDB name. In the lower right pane you can then click on the option to Run an SQL Script. Enter in your SELECT statement, and choose the menu option for Visual Explain or Run and Explain. Visual explain will break down the execution plan for your statement and show you the cost for each part as estimated on your tables with the indexes available.
You can actually run with real examples in your database.
LIKE is always better at my run.
select count(*) from u_log where log_text like 'AUT%';
1 row(s) returned : 90ms taken
select count(*) from u_log where substr(log_text,1,3)='AUT';
1 row(s) returned : 493ms taken
I found this reference in an IBM redbook related to SQL performance. It sounds like the SUBSTR scalar function can be handled in an optimized manner by an iSeries.
If you search for the first character and want to use the SQE instead
of the CQE, you can use the scalar function substring on the left sign
of the equal sign. If you have to search for additional characters in
the string, you can additionally use the scalar function POSSTR. By
splitting the LIKE predicate into several scalar function, you can
affect the query optimizer to use the SQE.
http://publib-b.boulder.ibm.com/abstracts/sg246654.html?Open
I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.
Are boolean expressions in SQL WHERE clauses short-circuit evaluated
?
For example:
SELECT *
FROM Table t
WHERE #key IS NULL OR (#key IS NOT NULL AND #key = t.Key)
If #key IS NULL evaluates to true, is #key IS NOT NULL AND #key = t.Key evaluated?
If no, why not?
If yes, is it guaranteed? Is it part of ANSI SQL or is it database specific?
If database specific, SQLServer? Oracle? MySQL?
ANSI SQL Draft 2003 5WD-01-Framework-2003-09.pdf
6.3.3.3 Rule evaluation order
[...]
Where the precedence is not determined by the Formats or by
parentheses, effective evaluation of expressions is generally
performed from left to right. However, it is
implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might
cause conditions to be raised or if the results of the expressions
can be determined without completely evaluating all parts of the
expression.
From the above, short circuiting is not really available.
If you need it, I suggest a Case statement:
Where Case when Expr1 then Expr2 else Expr3 end = desiredResult
Expr1is always evaluated, but only one of Expr2 and Expr3 will be evaluated per row.
I think this is one of the cases where I'd write it as if it didn't short-circuit, for three reasons.
Because for MSSQL, it's not resolved by looking at BOL in the obvious place, so for me, that makes it canonically ambiguous.
because at least then I know my code will work. And more importantly, so will those who come after me, so I'm not setting them up to worry through the same question over and over again.
I write often enough for several DBMS products, and I don't want to have to remember the differences if I can work around them easily.
I don't believe that short circuiting in SQL Server (2005) is guaranteed. SQL Server runs your query through its optimization algorithm that takes into account a lot of things (indexes, statistics, table size, resources, etc) to come up with an effective execution plan. After this evaluation, you can't say for sure that your short circuit logic is guaranteed.
I ran into the same question myself sometime ago and my research really did not give me a definitive answer. You may write a small query to give you a sense of proof that it works but can you be sure that as the load on your database increases, the tables grow to be bigger, and things get optimized and changed in the database, that conclusion will hold. I could not and therefore erred on the side of caution and used CASE in WHERE clause to ensure short circuit.
You have to keep in mind how databases work. Given a parameterized query the db builds an execution plan based on that query without the values for the parameters. This query is used every time the query is run regardless of what the actual supplied values are. Whether the query short-circuits with certain values will not matter to the execution plan.
I typically use this for optional parameters. Is this the same as short circuiting?
SELECT [blah]
FROM Emp
WHERE ((#EmpID = -1) OR (#EmpID = EmpID))
This gives me the option to pass in -1 or whatever to account for optional checking of an attribute. Sometimes this involves joining on multiple tables, or preferably a view.
Very handy, not entirely sure of the extra work that it gives to the db engine.
Just stumbled over this question, and had already found this blog-entry: http://rusanu.com/2009/09/13/on-sql-server-boolean-operator-short-circuit/
The SQL server is free to optimize a query anywhere she sees fit, so in the example given in the blog post, you cannot rely on short-circuiting.
However, a CASE is apparently documented to evaluate in the written order - check the comments of that blog post.
For SQL Server, I think it depends on the version but my experience with SQL Server 2000 is that it still evaluates #key = t.Key even when #key is null. In other words, it does not do efficient short circuiting when evaluating the WHERE clause.
I've seen people recommending a structure like your example as a way of doing a flexible query where the user can enter or not enter various criteria. My observation is that Key is still involved in the query plan when #key is null and if Key is indexed then it does not use the index efficiently.
This sort of flexible query with varying criteria is probably one case where dynamically created SQL is really the best way to go. If #key is null then you simply don't include it in the query at all.
Main characteristic of short circuit evaluation is that it stops evaluating the expression as soon as the result can be determined. That means that rest of expression can be ignored because result will be same regardless it is evaluated or not.
Binary boolean operators are comutative, meaning:
a AND b == b AND a
a OR b == b OR a
a XOR b == b XOR a
so there is no guarantee on order of evaluation. Order of evaluation will be determined by query optimizer.
In languages with objects there can be situations where you can write boolean expressions that can be evaluated only with short circuit evaluation. Your sample code construction is often used in such languages (C#, Delphi, VB). For example:
if(someString == null | someString.Length == 0 )
printf("no text in someString");
This C# example will cause exception if someString == null because it will be fully evaluated. In short circuit evaluation, it will work every time.
SQL operates only on scalar variables (no objects) that cannot be uninitialized, so there is no way to write boolean expression that cannot be evaluated. If you have some NULL value, any comparison will return false.
That means that in SQL you cannot write expression that is differently evaluated depending on using short circuit or full evaluation.
If SQL implementation uses short circuit evaluation, it can only hopefully speed up query execution.
i don't know about short circuting, but i'd write it as an if-else statement
if (#key is null)
begin
SELECT *
FROM Table t
end
else
begin
SELECT *
FROM Table t
WHERE t.Key=#key
end
also, variables should always be on the right side of the equation. this makes it sargable.
http://en.wikipedia.org/wiki/Sargable
Below a quick and dirty test on SQL Server 2008 R2:
SELECT *
FROM table
WHERE 1=0
AND (function call to complex operation)
This returns immediately with no records. Kind of short circuit behavior was present.
Then tried this:
SELECT *
FROM table
WHERE (a field from table) < 0
AND (function call to complex operation)
knowing no record would satisfy this condition:
(a field from table) < 0
This took several seconds, indicating the short circuit behavior was not there any more and the complex operation was being evaluated for every record.
Hope this helps guys.
Here is a demo to prove that MySQL does perform WHERE clause short-circuiting:
http://rextester.com/GVE4880
This runs the following queries:
SELECT myint FROM mytable WHERE myint >= 3 OR myslowfunction('query #1', myint) = 1;
SELECT myint FROM mytable WHERE myslowfunction('query #2', myint) = 1 OR myint >= 3;
The only difference between these is the order of operands in the OR condition.
myslowfunction deliberately sleeps for a second and has the side effect of adding an entry to a log table each time it is run. Here are the results of what is logged when running the above two queries:
myslowfunction called for query #1 with value 1
myslowfunction called for query #1 with value 2
myslowfunction called for query #2 with value 1
myslowfunction called for query #2 with value 2
myslowfunction called for query #2 with value 3
myslowfunction called for query #2 with value 4
The above shows that a slow function is executed more times when it appears on the left side of an OR condition when the other operand isn't always true (due to short-circuiting).
This takes an extra 4 seconds in query analyzer, so from what I can see IF is not even shorted...
SET #ADate = NULL
IF (#ADate IS NOT NULL)
BEGIN
INSERT INTO #ABla VALUES (1)
(SELECT bla from a huge view)
END
It would be nice to have a guaranteed way!
The quick answer is: The "short-circuit" behavior is undocumented implementation.
Here's an excellent article that explains this very topic.
Understanding T-SQL Expression Short-Circuiting
It is but obvious that MS Sql server supports Short circuit theory, to improve the performance by avoiding unnecessary checking,
Supporting Example:
SELECT 'TEST'
WHERE 1 = 'A'
SELECT 'TEST'
WHERE 1 = 1 OR 1 = 'A'
Here, the first example would result into error 'Conversion failed when converting the varchar value 'A' to data type int.'
While the second runs easily as the condition 1 = 1 evaluated to TRUE and thus the second condition doesn't ran at all.
Further more
SELECT 'TEST'
WHERE 1 = 0 OR 1 = 'A'
here the first condition would evaluate to false and hence the DBMS would go for the second condition and again you will get the error of conversion as in above example.
NOTE: I WROTE THE ERRONEOUS CONDITION JUST TO REALIZE WEATHER THE CONDITION IS EXECUTED OR SHORT-CIRCUITED
IF QUERY RESULTS IN ERROR MEANS THE CONDITION EXECUTED, SHORT-CIRCUITED OTHERWISE.
SIMPLE EXPLANATION
Consider,
WHERE 1 = 1 OR 2 = 2
as the first condition is getting evaluated to TRUE, its meaningless to evaluate the second condition because its evaluation in whatever value
would not affect the result at all, so its good opportunity for Sql Server to save Query Execution time by skipping unnecessary condition checking or evaluation.
in case of "OR" if first condition is evaluated to TRUE the entire chain connected by "OR" would considered as evaluated to true without evaluating others.
condition1 OR condition2 OR ..... OR conditionN
if the condition1 is evaluated to true, rest all of the conditions till conditionN would be skipped.
In generalized words at determination of first TRUE, all other conditions linked by OR would be skipped.
Consider the second condition
WHERE 1 = 0 AND 1 = 1
as the first condition is getting evalutated to FALSE its meaningless to evaluate the second condition because its evaluation in whatever value
would not affect the result at all, so again its good opportunity for Sql Server to save Query Execution time by skipping unnecessary condition checking or evaluation.
in case of "AND" if first condition is evaluated to FALSE the entire chain connected with the "AND" would considered as evaluated to FALSE without evaluating others.
condition1 AND condition2 AND ..... conditionN
if the condition1 is evaluated to FALSE, rest all of the conditions till conditionN would be skipped.
In generalized words at determination of first FALSE, all other conditions linked by AND would be skipped.
THEREFOR, A WISE PROGRAMMER SHOULD ALWAYS PROGRAM THE CHAIN OF CONDITIONS IN SUCH A WAY THAT, LESS EXPENSIVE OR MOST ELIMINATING CONDITION GETS EVALUATED FIRST,
OR ARRANGE THE CONDITION IN SUCH A WAY THAT CAN TAKE MAXIMUM BENEFIT OF SHORT CIRCUIT
This question already has answers here:
Subquery using Exists 1 or Exists *
(6 answers)
Closed 7 years ago.
I've seen some people use EXISTS (SELECT 1 FROM ...) rather than EXISTS (SELECT id FROM ...) as an optimization--rather than looking up and returning a value, SQL Server can simply return the literal it was given.
Is SELECT(1) always faster? Would Selecting a value from the table require work that Selecting a literal would avoid?
In SQL Server, it does not make a difference whether you use SELECT 1 or SELECT * within EXISTS. You are not actually returning the contents of the rows, but that rather the set determined by the WHERE clause is not-empty. Try running the query side-by-side with SET STATISTICS IO ON and you can prove that the approaches are equivalent. Personally I prefer SELECT * within EXISTS.
For google's sake, I'll update this question with the same answer as this one (Subquery using Exists 1 or Exists *) since (currently) an incorrect answer is marked as accepted. Note the SQL standard actually says that EXISTS via * is identical to a constant.
No. This has been covered a bazillion times. SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.
Quoth Microsoft:
http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4
The select list of a subquery
introduced by EXISTS almost always
consists of an asterisk (*). There is
no reason to list column names because
you are just testing whether rows that
meet the conditions specified in the
subquery exist.
Also, don't believe me? Try running the following:
SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )
If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.
EDIT: Note, the SQL Standard actually talks about this.
ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that is immediately contained in an <exists predicate>, then the <select list> is equivalent to a <value expression> that is an arbitrary <literal>.
When you use SELECT 1, you clearly show (to whoever is reading your code later) that you are testing whether the record exists. Even if there is no performance gain (which is to be discussed), there is gain in code readability and maintainability.
Yes, because when you select a literal it does not need to read from disk (or even from cache).
doesn't matter what you select in an exists clause. most people do select *, then sql server automatically picks the best index
As someone pointed out sql server ignores the column selection list in EXISTS so it doesn't matter. I personally tend to use "SELECT null ..." to indicate that the value is not used at all.
If you look at the execution plan for
select COUNT(1) from master..spt_values
and look at the stream aggregate you will see that it calculates
Scalar Operator(Count(*))
So the 1 actually gets converted to *
However I have read somewhere in the "Inside SQL Server" series of books that * might incur a very slight overhead for checking column permissions. Unfortunately the book didn't go into any more detail than that as I recall.
Select 1 should be better to use in your example. Select * gets all the meta-data assoicated with the objects before runtime which adss overhead during the compliation of the query. Though you may not see differences when running both types of queries in your execution plan.