How does Oracle perform OR condition validation? - sql

In Java, a logical OR condition behaves such that if the first condition is true then it does not evaluate the second condition.
For example:
int a = 10;
if (a == 10 || a == 0) {
// Logic
}
Java does not evaluate the second test (a == 0) because the first condition (a == 10) is true.
If we have an Oracle SQL statement like this:
select * from student where city = :city and
(:age is null or age > :age)
How are (age > :age or :age is null) evaluated? If the parameter :age is NULL, then does it evaluate the second condition as well?

The database cost optimizer will consider many factors in structuring the execution of a query. Probably the most important will be the existence of indexes on the columns in question. It will decide the order based on the selectivity of the test and could perform them in different order at different times. Since SQL is a declarative and not procedural language, you cannot generally control the way in which these conditions are evaluated.
There may be some "hints" you can provide to suggest a specific execution order, but you risk adversely affecting performance.

PL/SQL
In PL/SQL, Oracle OR is another example of short circuit evaluation. Oracle PL/SQL Language Fundamentals says (in part)
Short-Circuit Evaluation
When evaluating a logical expression, PL/SQL uses short-circuit evaluation. That is, PL/SQL stops evaluating the expression as soon as it can determine the result. Therefore, you can write expressions that might otherwise cause errors.
SQL
However, in regular SQL, the OR might be evaluated in either order. As pointed out by #JonHeller in his comment the expressions in this question are safe, more caution would be required if dealing with potential division by 0.

Let Oracle decide for you. It will most of the time make a much better decision. In this case, there is even a construct that combines test for null with testing a value.
Replace
:age is null or age > :age
With
age > nvl(:age, age - 1)

Related

Performance of OR? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
SQL Server - Query Short-Circuiting?
Is the SQL WHERE clause short-circuit evaluated?
I have a question regarding performance of logical OR operators in T-SQL (SQL Server 2005).
I have searched around a little but I couldn't find anything on the subject.
If you have the following query:
SELECT * FROM Table WHERE (randomboolean OR HeavyToEvaluateCondition)
Wouldn't the procedure interpreter go as far as the randomboolean and skip evaluation of the heavy condition in order to save performance given that the first condition is true?
Since one of the values in an OR statement is true it would be unnecessary to evaluate the second condition since we already know that the first condition is met!
I know it works like this in C# but I want to know if I can count on it in T-SQL too.
You can't count on short circuit evaluation in TSQL.
The optimiser is free to evaluate the conditions in which ever order it sees fit and may in some circumstances evaluate both parts of an expression even when the second evaluation cannot change the result of the expression (Example).
That is not to say it never does short circuit evaluation however. You may well get a start up predicate on the expensive condition so it is only executed when required.
Additionally the presence of the OR in your query can convert a sargable search condition into an unsargable one meaning that indexes are not used optimally. Especially in SQL Server 2005 (In 2008 OPTION (RECOMPILE) can help here).
For example compare the plans for the following. The version with OR ends up doing a full index scan rather than an index seek to the specific values.
DECLARE #number INT;
SET number = 0;
SELECT COUNT(*)
FROM master..spt_values
WHERE #number IS NULL OR number = 0
SELECT COUNT(*)
FROM master..spt_values
WHERE number = 0
Its called short-circuiting. And yes SQL Server does do it in certain cases. In what order depends on many factors and forms part of the execution plan optimisation.
However, there are details online that this is limitted to JOIN conditions, CASE statements, etc.
See this SO post... SQL Server - Query Short-Circuiting?
Firstly where condition is executed than OR operator is executed when control goes to first condition.if first condition is true than it is not check the second condition .if you are given 100 condition and in this scenario first condition is false then it check next condition.

Is there any reason why you cannot select a statement as a bit in SQL Server?

I am wondering why the following fails:
SELECT price<500 as PriceIsCheap
and forces you to do the following:
SELECT CASE WHEN (price<500) THEN 1 ELSE 0 END as PriceIsCheap
When, as per the answer to this related question, the conversion table says that an implicit conversion should occur.
There is no boolean data type in SQL, BIT is kind of a hack, but the main problem is that due to the SQL concept of NULL true boolean logic is impossible (for example, what would your query return if price was NULL?)
Note that I'm not saying that there are not possible ways to implement boolean logic that "mostly" work (for example, you could say that TRUE OR NULL is NULL or whatever) just that the people who designed the SQL standard couldn't decide on The One True Representation for boolean logic (for example, you could also argue that TRUE OR NULL is TRUE, since TRUE OR <anything> is TRUE).
The boolean expressions (=, <=, >=, etc) are only valid in certain places (notably, WHERE clauses and CASE labels) and not in any other place.
Well you'll also find you can't if you have a bit column called IsCheap do SELECT * FROM STUFF WHERE IsCheap, you have to do WHERE IsCheap=1.
The reason is simple, the data type is a bit, not a bool. True, it's basically the only use you'll put it to and it's implicitly converted by almost any data access framework, but it's still technically a bit with 0 or 1 rather than a bool with true or false. There's an obvious connection we can all see, but SQL wasn't written with this assumption in it so we have to provide the logic to convert true/false to 1/0.
The expression price < 500 returns a logical value: TRUE, FALSE or UNKNOWN. It is not a data value, which is why you need to use a CASE expression to return a corresponding data value.
FWIW the Microsoft Access Database Engine does indeed treat the results of expressions as data values e.g. you can ask all kinds of wacky questions such as:
SELECT 1 = 1, 1 = NULL, 1 <> NULL, 1 IN (NULL)
FROM Foo;
...and it will happily provide answers but of course this merely proves that Access does not implement the SQL language!
I am not MSSQL person, but I ran into the same problem with Oracle. The trivial answer is, because Boolean is not a valid column type in those databases. Now, why they decided you don't need Booleans as values is anybody's guess.
#paxdiablo, that's so missing the point... The OP's example is just a minimal example. This is still simplistic but real-world example: Consider a People table, containing names and ages. You want to get all the people, but also want to know if they are underage. In both MySQL and PostgreSQL, you can write
SELECT name, age < 18 AS minor FROM people

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.

Is the SQL WHERE clause short-circuit evaluated?

Are boolean expressions in SQL WHERE clauses short-circuit evaluated
?
For example:
SELECT *
FROM Table t
WHERE #key IS NULL OR (#key IS NOT NULL AND #key = t.Key)
If #key IS NULL evaluates to true, is #key IS NOT NULL AND #key = t.Key evaluated?
If no, why not?
If yes, is it guaranteed? Is it part of ANSI SQL or is it database specific?
If database specific, SQLServer? Oracle? MySQL?
ANSI SQL Draft 2003 5WD-01-Framework-2003-09.pdf
6.3.3.3 Rule evaluation order
[...]
Where the precedence is not determined by the Formats or by
parentheses, effective evaluation of expressions is generally
performed from left to right. However, it is
implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might
cause conditions to be raised or if the results of the expressions
can be determined without completely evaluating all parts of the
expression.
From the above, short circuiting is not really available.
If you need it, I suggest a Case statement:
Where Case when Expr1 then Expr2 else Expr3 end = desiredResult
Expr1is always evaluated, but only one of Expr2 and Expr3 will be evaluated per row.
I think this is one of the cases where I'd write it as if it didn't short-circuit, for three reasons.
Because for MSSQL, it's not resolved by looking at BOL in the obvious place, so for me, that makes it canonically ambiguous.
because at least then I know my code will work. And more importantly, so will those who come after me, so I'm not setting them up to worry through the same question over and over again.
I write often enough for several DBMS products, and I don't want to have to remember the differences if I can work around them easily.
I don't believe that short circuiting in SQL Server (2005) is guaranteed. SQL Server runs your query through its optimization algorithm that takes into account a lot of things (indexes, statistics, table size, resources, etc) to come up with an effective execution plan. After this evaluation, you can't say for sure that your short circuit logic is guaranteed.
I ran into the same question myself sometime ago and my research really did not give me a definitive answer. You may write a small query to give you a sense of proof that it works but can you be sure that as the load on your database increases, the tables grow to be bigger, and things get optimized and changed in the database, that conclusion will hold. I could not and therefore erred on the side of caution and used CASE in WHERE clause to ensure short circuit.
You have to keep in mind how databases work. Given a parameterized query the db builds an execution plan based on that query without the values for the parameters. This query is used every time the query is run regardless of what the actual supplied values are. Whether the query short-circuits with certain values will not matter to the execution plan.
I typically use this for optional parameters. Is this the same as short circuiting?
SELECT [blah]
FROM Emp
WHERE ((#EmpID = -1) OR (#EmpID = EmpID))
This gives me the option to pass in -1 or whatever to account for optional checking of an attribute. Sometimes this involves joining on multiple tables, or preferably a view.
Very handy, not entirely sure of the extra work that it gives to the db engine.
Just stumbled over this question, and had already found this blog-entry: http://rusanu.com/2009/09/13/on-sql-server-boolean-operator-short-circuit/
The SQL server is free to optimize a query anywhere she sees fit, so in the example given in the blog post, you cannot rely on short-circuiting.
However, a CASE is apparently documented to evaluate in the written order - check the comments of that blog post.
For SQL Server, I think it depends on the version but my experience with SQL Server 2000 is that it still evaluates #key = t.Key even when #key is null. In other words, it does not do efficient short circuiting when evaluating the WHERE clause.
I've seen people recommending a structure like your example as a way of doing a flexible query where the user can enter or not enter various criteria. My observation is that Key is still involved in the query plan when #key is null and if Key is indexed then it does not use the index efficiently.
This sort of flexible query with varying criteria is probably one case where dynamically created SQL is really the best way to go. If #key is null then you simply don't include it in the query at all.
Main characteristic of short circuit evaluation is that it stops evaluating the expression as soon as the result can be determined. That means that rest of expression can be ignored because result will be same regardless it is evaluated or not.
Binary boolean operators are comutative, meaning:
a AND b == b AND a
a OR b == b OR a
a XOR b == b XOR a
so there is no guarantee on order of evaluation. Order of evaluation will be determined by query optimizer.
In languages with objects there can be situations where you can write boolean expressions that can be evaluated only with short circuit evaluation. Your sample code construction is often used in such languages (C#, Delphi, VB). For example:
if(someString == null | someString.Length == 0 )
printf("no text in someString");
This C# example will cause exception if someString == null because it will be fully evaluated. In short circuit evaluation, it will work every time.
SQL operates only on scalar variables (no objects) that cannot be uninitialized, so there is no way to write boolean expression that cannot be evaluated. If you have some NULL value, any comparison will return false.
That means that in SQL you cannot write expression that is differently evaluated depending on using short circuit or full evaluation.
If SQL implementation uses short circuit evaluation, it can only hopefully speed up query execution.
i don't know about short circuting, but i'd write it as an if-else statement
if (#key is null)
begin
SELECT *
FROM Table t
end
else
begin
SELECT *
FROM Table t
WHERE t.Key=#key
end
also, variables should always be on the right side of the equation. this makes it sargable.
http://en.wikipedia.org/wiki/Sargable
Below a quick and dirty test on SQL Server 2008 R2:
SELECT *
FROM table
WHERE 1=0
AND (function call to complex operation)
This returns immediately with no records. Kind of short circuit behavior was present.
Then tried this:
SELECT *
FROM table
WHERE (a field from table) < 0
AND (function call to complex operation)
knowing no record would satisfy this condition:
(a field from table) < 0
This took several seconds, indicating the short circuit behavior was not there any more and the complex operation was being evaluated for every record.
Hope this helps guys.
Here is a demo to prove that MySQL does perform WHERE clause short-circuiting:
http://rextester.com/GVE4880
This runs the following queries:
SELECT myint FROM mytable WHERE myint >= 3 OR myslowfunction('query #1', myint) = 1;
SELECT myint FROM mytable WHERE myslowfunction('query #2', myint) = 1 OR myint >= 3;
The only difference between these is the order of operands in the OR condition.
myslowfunction deliberately sleeps for a second and has the side effect of adding an entry to a log table each time it is run. Here are the results of what is logged when running the above two queries:
myslowfunction called for query #1 with value 1
myslowfunction called for query #1 with value 2
myslowfunction called for query #2 with value 1
myslowfunction called for query #2 with value 2
myslowfunction called for query #2 with value 3
myslowfunction called for query #2 with value 4
The above shows that a slow function is executed more times when it appears on the left side of an OR condition when the other operand isn't always true (due to short-circuiting).
This takes an extra 4 seconds in query analyzer, so from what I can see IF is not even shorted...
SET #ADate = NULL
IF (#ADate IS NOT NULL)
BEGIN
INSERT INTO #ABla VALUES (1)
(SELECT bla from a huge view)
END
It would be nice to have a guaranteed way!
The quick answer is: The "short-circuit" behavior is undocumented implementation.
Here's an excellent article that explains this very topic.
Understanding T-SQL Expression Short-Circuiting
It is but obvious that MS Sql server supports Short circuit theory, to improve the performance by avoiding unnecessary checking,
Supporting Example:
SELECT 'TEST'
WHERE 1 = 'A'
SELECT 'TEST'
WHERE 1 = 1 OR 1 = 'A'
Here, the first example would result into error 'Conversion failed when converting the varchar value 'A' to data type int.'
While the second runs easily as the condition 1 = 1 evaluated to TRUE and thus the second condition doesn't ran at all.
Further more
SELECT 'TEST'
WHERE 1 = 0 OR 1 = 'A'
here the first condition would evaluate to false and hence the DBMS would go for the second condition and again you will get the error of conversion as in above example.
NOTE: I WROTE THE ERRONEOUS CONDITION JUST TO REALIZE WEATHER THE CONDITION IS EXECUTED OR SHORT-CIRCUITED
IF QUERY RESULTS IN ERROR MEANS THE CONDITION EXECUTED, SHORT-CIRCUITED OTHERWISE.
SIMPLE EXPLANATION
Consider,
WHERE 1 = 1 OR 2 = 2
as the first condition is getting evaluated to TRUE, its meaningless to evaluate the second condition because its evaluation in whatever value
would not affect the result at all, so its good opportunity for Sql Server to save Query Execution time by skipping unnecessary condition checking or evaluation.
in case of "OR" if first condition is evaluated to TRUE the entire chain connected by "OR" would considered as evaluated to true without evaluating others.
condition1 OR condition2 OR ..... OR conditionN
if the condition1 is evaluated to true, rest all of the conditions till conditionN would be skipped.
In generalized words at determination of first TRUE, all other conditions linked by OR would be skipped.
Consider the second condition
WHERE 1 = 0 AND 1 = 1
as the first condition is getting evalutated to FALSE its meaningless to evaluate the second condition because its evaluation in whatever value
would not affect the result at all, so again its good opportunity for Sql Server to save Query Execution time by skipping unnecessary condition checking or evaluation.
in case of "AND" if first condition is evaluated to FALSE the entire chain connected with the "AND" would considered as evaluated to FALSE without evaluating others.
condition1 AND condition2 AND ..... conditionN
if the condition1 is evaluated to FALSE, rest all of the conditions till conditionN would be skipped.
In generalized words at determination of first FALSE, all other conditions linked by AND would be skipped.
THEREFOR, A WISE PROGRAMMER SHOULD ALWAYS PROGRAM THE CHAIN OF CONDITIONS IN SUCH A WAY THAT, LESS EXPENSIVE OR MOST ELIMINATING CONDITION GETS EVALUATED FIRST,
OR ARRANGE THE CONDITION IN SUCH A WAY THAT CAN TAKE MAXIMUM BENEFIT OF SHORT CIRCUIT

How can I select rows that are null using bound queries in Perl's DBI?

I want to be able to pass something into an SQL query to determine if I want to select only the ones where a certain column is null. If I was just building a query string instead of using bound variables, I'd do something like:
if ($search_undeleted_only)
{
$sqlString .= " AND deleted_on IS NULL";
}
but I want to use bound queries. Would this be the best way?
my $stmt = $dbh->prepare(...
"AND (? = 0 OR deleted_on IS NULL) ");
$stmt->execute($search_undeleted_only);
Yes; a related trick is if you have X potential filters, some of them optional, is to have the template say " AND ( ?=-1 OR some_field = ? ) ", and create a special function that wraps the execute call and binds all the second ?s. (in this case, -1 is a special value meaning 'ignore this filter').
Update from Paul Tomblin: I edited the answer to include a suggestion from the comments.
So you're relying on short-circuiting semantics of boolean expressions to invoke your IS NULL condition? That seems to work.
One interesting point is that a constant expression like 1 = 0 that did not have parameters should be factored out by the query optimizer. In this case, since the optimizer doesn't know if the expression is a constant true or false until execute time, that means it can't factor it out. It must evaluate the expression for every row.
So one can assume this add a minor cost to the query, relative to what it would cost if you had used a non-parameterized constant expression.
Then combining with OR with the IS NULL expression may also have implications for the optimizer. It might decide it can't benefit from an index on deleted_on, whereas in a simpler expression it would have. This depends on the RDBMS implementation you're using, and the distribution of values in your database.
I think that's a reasonable approach. It follows the normal filter pattern nicely and should give good performance.