I’ve read that logical operator AND has higher order of precedence than logical operator IN, but that doesn’t make sense since if that was true, then wouldn’t in the following statement the AND condition got evaluated before the IN condition ( thus before IN operator would be able to check whether Released field equals to any of the values specified within parentheses ?
SELECT Song, Released, Rating
FROM Songs
WHERE
Released IN (1967, 1977, 1987)
AND
SongName = ’WTTJ’
thanx
EDIT:
Egrunin and ig0774, I’ve checked it and unless I totally misunderstood your posts, it seems that
WHERE x > 0 AND x < 10 OR special_case = 1
is indeed the the same as
WHERE (x > 0 AND x < 10) OR special_case = 1
Namely, I did the the following three queries
SELECT *
FROM Songs
WHERE AvailableOnCD='N' AND Released > 2000 OR Released = 1989
SELECT *
FROM Songs
WHERE (AvailableOnCD='N' AND Released > 2000) OR Released = 1989
SELECT *
FROM Songs
WHERE AvailableOnCD='N' AND (Released > 2000 OR Released = 1989)
and as it turns out the following two queries produce the same result:
SELECT *
FROM Songs
WHERE AvailableOnCD='N' AND Released > 2000 OR Released = 1989
SELECT *
FROM Songs
WHERE (AvailableOnCD='N' AND Released > 2000) OR Released = 1989
while
SELECT *
FROM Songs
WHERE AvailableOnCD='N' AND (Released > 2000 OR Released = 1989)
gives a different result
I'm going to assume you're using SQL Server, as in SQL Server AND has a higher order of precedence than IN. So, yes, the AND is evaluated first, but the rule for evaluating AND, is to check the expression on the left (in your sample, the IN part) and, if that is true, the expression on the right. In short, the AND clause is evaluated first, but the IN clause is evaluated as part of the AND evaluation.
It may be simpler to understand the order of precedence here as referring to how the statement is parsed, rather than how it is executed (even if MS's documentation equivocates on this).
Edit in response to comment from the OP:
I'm not all together certain that IN being classified as a logical operator is not specific to SQL Server. I've never read the ISO standard, but I would note that the MySQL and Oracle docs define IN as a comparison operator, Postgres as a subquery expression, and Sybase itself as a "list operator". In my view, Sybase is the nearest to the mark here since the expression a IN (...) asks whether the value of attribute a is an element of the list of items between the parentheses.
That said, I might imagine the reason that SQL Server chose to classify IN as a logical operator is two-fold:
IN and the like do not have the type restrictions of the SQL Server comparison operators (=, !=, etc. cannot apply to text, ntext or image types; IN and other subset operators can be used against any type, except, in strict ISO SQL, NULL)
The result of an IN, etc. operation is a boolean value just like the other "logical operators"
Again, to my mind, this is not a sensible classification, but it is what Microsoft chose. Maybe someone else has further insight into why they may have so decided?
Call me a n00b, but I always use parentheses in nontrivial compound conditions.
SELECT Song, Released, Rating
FROM Songs
WHERE
(Released IN (1967, 1977, 1987))
AND
SongName = ’WTTJ’
Edited (Corrected, the point remains the same.)
Just yesterday I got caught by this. Started with working code:
WHERE x < 0 or x > 10
Changed it in haste:
WHERE x < 0 or x > 10 AND special_case = 1
Broke, because this is what I wanted:
WHERE (x < 0 or x > 10) AND special_case = 1
But this is what I got:
WHERE x < 0 or (x > 10 AND special_case = 1)
In Mysql at least, it has a lower precedence. See http://dev.mysql.com/doc/refman/5.0/en/operator-precedence.html
I think of IN is a comparison operator whereas AND is a logical operator. So it's a bit apples and oranges, since the comparison operator must be evaluated first to see if the condition is true, then the logical operator is used to evaluate the conditions.
Related
I am working on a small project for an online databases course and i was wondering if you could help me out with a problem I am having.
I have a web page that is searching a movie database and retrieving specific columns using a movie initial input field, a number input field, and a code field. These will all be converted to strings and used as user input for the query.
Below is what i tried before:
select A.CD, A.INIT, A.NBR, A.STN, A.ST, A.CRET_ID, A.CMNT, A.DT
from MOVIE_ONE A
where A.INIT = :init
AND A.CD = :cd
AND A.NBR = :num
The way the page must search is in three different cases:
(initial and number)
(code)
(initial and number and code)
The cases have to be independent so if certain field are empty, but fulfill a certain case, the search goes through. It also must be in one query. I am stuck on how to implement the cases.
The parameters in the query are taken from the Java parameters in the method found in an SQLJ file.
If you could possibly provide some aid on how i can go about this problem, I'd greatly appreciate it!
Consider wrapping the equality expressions in NVL (synonymous to COALESCE) so if parameter inputs are blank, corresponding column is checked against itself. Also, be sure to kick the a-b-c table aliasing habit.
SELECT m.CD, m.INIT, m.NBR, m.STN, m.ST, m.CRET_ID, m.CMNT, m.DT
FROM MOVIE_ONE m
WHERE m.INIT = NVL(:init, m.INIT)
AND m.CD = NVL(:cd, m.CD)
AND m.NBR = COALESCE(:num, m.NBR)
To demonstrate, consider below DB2 fiddles where each case can be checked by adjusting value CTE parameters all running on same exact data.
Case 1
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(NULL)),
n(num) AS (VALUES(53)),
cte AS
...
Case 2
WITH
i(init) AS (VALUES(NULL)),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(NULL)),
cte AS
...
Case 3
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(53)),
cte AS
...
However, do be aware the fiddle runs a different SQL due to nature of data (i.e., double and dates). But query does reflect same concept with NVL matching expressions on both sides.
SELECT *
FROM cte, i, c, n
WHERE cte.mytype = NVL(i.init, cte.mytype)
AND YEAR(CAST(cte.mydate AS date)) = NVL(c.cd, YEAR(CAST(cte.mydate AS date)))
AND ROUND(cte.mynum, 0) = NVL(n.num, ROUND(cte.mynum, 0));
A colleague of mine who is generally well-versed in SQL told me that the order of operands in a > or = expression could determine whether or not the expression was sargable. In particular, with a query whose case statement included:
CASE
when (select count(i.id)
from inventory i
inner join orders o on o.idinventory = i.idInventory
where o.idOrder = #order) > 1 THEN 2
ELSE 1
and was told to reverse the order of the operands to the equivalent
CASE
when 1 < (select count(i.id)
from inventory i
inner join orders o on o.idinventory = i.idInventory
where o.idOrder = #order) THEN 2
ELSE 1
for sargability concerns. I found no difference in query plans, though ultimately I made the change for the sake of sticking to team coding standards. Is what my co-worker said true in some cases? Does the order of operands in an expression have potential impact on its execution time? This doesn't mesh with how I understand sargability to work.
For Postgres, the answer is definitely: "No." (sql-server was added later.)
The query planner can flip around left and right operands of an operator as long as a COMMUTATOR is defined, which is the case for all instance of < and >. (Operators are actually defined by the operator itself and their accepted operands.) And the query planner will do so to make an expression "sargable". Related answer with detailed explanation:
Can PostgreSQL index array columns?
It's different for other operators without COMMUTATOR. Example for ~~ (LIKE):
LATERAL JOIN not using trigram index
If you're talking about the most popular modern databases like Microsoft SQL, Oracle, Postgres, MySql, Teradata, the answer is definitely NO.
What is a SARGable query?
A SARGable query is the one that strive to narrow the number of rows a database has to process in order to return you the expected result. What I mean, for example:
Consider this query:
select * from table where column1 <> 'some_value';
Obviously, using an index in this case is useless, because a database most certainly would have to look through all rows in a table to give you expected rows.
But what if we change the operator?
select * from table where column1 = 'some_value';
In this case an index can give good performance and return expected rows almost in a flash.
SARGable operators are: =, <, >, <= ,>=, LIKE (without %), BETWEEN
Non-SARGable operators are: <>, IN, OR
Now, back to your case.
Your problem is simple. You have X and you have Y. X > Y or Y < X - in both cases you have to determine the values of both variables, so this switching gives you nothing.
P.S. Of course, I concede, there could be databases with very poor optimizers where this kind of swithing could play role. But, as I said before, in modern databases you should not worry about it.
I have a spotfire question. I am attempting to create Shapes by Column Values in a Map Chart.
I would like to create these Shapes by a Custom Expression. The custom expression is below, I have simplified it so it is easier to read.
All I am saying is:
if((Current months oil rate - 12MonthAgo Oilrate)/12MonthAgo Oilrate)>0,"UP","Down")
When I run this calculation though it only gives me one value, (there are both positives and negatives so it should give two).
I am not sure what I have done wrong? Any help is appreciated.
<If(((((Sum(If(CurrentMonth),[OILRATECD],null))
-
Sum(If(12MonthsAgo),[OILRATECD],null)))))
/
Sum(If(12MonthsAgo),[OILRATECD],null)))>0,"UP","DOWN")>
ORIGINAL EQUATION:
<If(((((Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],null))
-
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null)))))
/
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null)))>0,"UP","DOWN")>
First, you have a lot of unnecessary parentheses but that shouldn't hurt anything.
If(
(
( --this open parentheses is unneeded
( --this open parentheses is unneeded
(
Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],null))
-
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null))
)
) --this closed parentheses is unneeded
) --this closed parentheses is unneeded
/
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null))
)
>0,"UP","DOWN")
The reason you are not getting UP and DOWN returned is because the condition isn't met. We need a sample data set with expected output to verify this.
However, here's a reason why you could be getting unexpected results, regarding NULL in your SUM(IF(...,[OILRATECD],NULL)) expression.
TL;DR
If your condtion in your IF() statement isn't ever evaluated to true
then NULL is returned to SUM(), and SUM(NULL,NULL,NULL) = NULL
and NULL is not > nor is it < 0, thus your outer IF()
statement would return NULL instead of "UP" or "DOWN"
LONG VERSION
Spotfire ignored NULL when evaluating SUM() which is what you want. For example, Sum(4,3,NULL) = 7 and this is the behavior we want. However...
Spotfire doesn't ignore NULL for addition, subtraction, division, and other comparison operators like >. So, 4 - NULL = NULL and NULL / 18 = NULL and so on. This means if either of your two SUM() methods return NULL then your entire expression will be NULL because...
NULL isn't > nor is it < and certainly not = 0. NULL is the absence of a value, and thus can't be compared or equated to anything. For example, If(NULL > 1,"YES","NO") doesn't return YES or NO... it returns NULL, the lack of a value. Also, If(NULL=NULL,"YES","NO") will also return NULL
HOW TO GET AROUND THIS
Use IS NULL and IS NOT NULL in a IF() statement to set it to a default value, or use 0 in place of NULL in your current expression.
Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],0))
BIG SIDE NOTE
You said your equation's pseudo code is:
If((Current months oil rate - 12MonthAgo Oilrate)/12MonthAgo Oilrate)>0,"UP","Down")
This doesn't seem to be what you are evaluating. Instead, I read:
x = (-var) + (-2) thus var < -2 (i.e. -3....-6...-55)
if((sum([2 months ago oil rate]) - sum([x months ago oil rate])) > 0, "UP","DOWN")
So you aren't ever looking at the current month's oil rate, but instead are looking at current - 2. Also, you are looking at the SUM() over that entire month... this may be what you want by you may be actually looking for Max() or Min()
OVER FUNCTION
You can handle this a lot easier most likely with a few calculated columns to keep your data / expressions clean and legible
DatePart("yy",[Date]) as [Year]
DatePart("mm",[Date]) as [Month]
Max([OILRATECD]) OVER (Intersect([Year],[Month])) as [YearMonthRate] or use SUM() if that's what you really want.
Check to see the difference with Rank()
I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:
... WHERE the_column IS NULL OR the_column < 10 ...
It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:
... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...
(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)
So my questions here are:
Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
What exactly is the issue? Does it have to do with how indexes work or something?
My grounding in SQL is pretty weak.
I don't know SQL Server so I can't speak to that.
Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:
Expression Evaluation Rules
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.
As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.
As far as an expression of the form:
the_column IS NULL OR the_column < 10
is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL isn't true so
null is null or null < 10
is just a complicated way of saying true or null and that's true regardless of which sub-expression is evaluated first.
The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:
When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:
SELECT ... WHERE x > 0 AND y/x > 1.5;
But this is safe:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:
Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.
So given this:
case when A then Ra
when B then Rb
when C then Rc
...
A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.
In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.
Instead of
the_column IS NULL OR the_column < 10
I'd do
isnull(the_column,0) < 10
or for the first example
WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...
I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL in an example, so it must not have been a very stern rule. My only concern with OR is that it has lower precedence than AND, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20 when that's not what you mean; but the usual solution is parentheses rather than a big CASE expression.
I think that in most RDBMSes, indices don't include null values, so an index on the_column wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE expression would be any more index-friendly.
(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)
Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.
I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR. In your first example, the conditions joined by the OR restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).
I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.
But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!
Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = #parameter. This post may be helpful Passing Null using OLEDB .
Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.
As seen below the two queries, we find that they both work well. Then I am confused why should we ever use BETWEEN because I have found that BETWEEN behaves differently in different databases as found in w3school
SELECT *
FROM employees
WHERE salary BETWEEN 5000 AND 15000;
SELECT *
FROM employees
WHERE salary >= 5000
AND salary <= 15000;
BETWEEN can help to avoid unnecessary reevaluation of the expression:
SELECT AVG(RAND(20091225) BETWEEN 0.2 AND 0.4)
FROM t_source;
---
0.1998
SELECT AVG(RAND(20091225) >= 0.2 AND RAND(20091225) <= 0.4)
FROM t_source;
---
0.3199
t_source is just a dummy table with 1,000,000 records.
Of course this can be worked around using a subquery, but in MySQL it's less efficient.
And of course, BETWEEN is more readable. It takes 3 times to use it in a query to remember the syntax forever.
In SQL Server and MySQL, LIKE against a constant with non-leading '%' is also a shorthand for a pair of >= and <:
SET SHOWPLAN_TEXT ON
GO
SELECT *
FROM master
WHERE name LIKE 'string%'
GO
SET SHOWPLAN_TEXT OFF
GO
|--Index Seek(OBJECT:([test].[dbo].[master].[ix_name_desc]), SEEK:([test].[dbo].[master].[name] < 'strinH' AND [test].[dbo].[master].[name] >= 'string'), WHERE:([test].[dbo].[master].[name] like 'string%') ORDERED FORWARD)
However, LIKE syntax is more legible.
Using BETWEEN has extra merits when the expression that is compared is a complex calculation rather than just a simple column; it saves writing out that complex expression twice.
BETWEEN in T-SQL supports NOT operator, so you can use constructions like
WHERE salary not between 5000 AND 15000;
In my opinion it's more clear for a human then
WHERE salary < 5000 OR salary > 15000;
And finally if you type column name just one time it gives you less chances to make a mistake
The version with "between" is easier to read. If I were to use the second version I'd probably write it as
5000 <= salary and salary <= 15000
for the same reason.
I vote #Quassnoi - correctness is a big win.
I usually find literals more useful than the syntax symbols like <, <=, >, >=, != etc. Yes, we need (better, accurate) results. And at least I get rid of probabilities of mis-interpreting and reverting meanings of the symbols visually. If you use <= and sense logically incorrect output coming from your select query, you may wander some time and only arrive to the conclusion that you did write <= in place of >= [visual mis-interpretation?]. Hope I am clear.
And aren't we shortening the code (along with making it more higher-level-looking), which means more concise and easy to maintain?
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
SELECT *
FROM emplyees
WHERE salary >= 5000 AND salary <= 15000;
First query uses only 10 words and second uses 12!
Personally, I wouldn't use BETWEEN, simply because there seems no clear definition of whether it should include, or exclude, the values which serve to bound the condition, in your given example:
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
The range could include the 5000 and 15000, or it could exclude them.
Syntactically I think it should exclude them, since the values themselves are not between the given numbers. But my opinion is precisely that, whereas using operators such as >= is very specific. And less likely to change between databases, or between incremements/versions of the same.
Edited in response to Pavel and Jonathan's comments.
As noted by Pavel, ANSI SQL (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) as far back as 1992, mandates the end-points should be considered within the returned date and equivalent to X >= lower_bound AND X <= upper_bound:
8.3
Function
Specify a range comparison.
Format
<between predicate> ::=
<row value constructor> [ NOT ] BETWEEN
<row value constructor> AND <row value constructor>
Syntax Rules
1) The three <row value constructor>s shall be of the same degree.
2) Let respective values be values with the same ordinal position
in the two <row value constructor>s.
3) The data types of the respective values of the three <row value
constructor>s shall be comparable.
4) Let X, Y, and Z be the first, second, and third <row value con-
structor>s, respectively.
5) "X NOT BETWEEN Y AND Z" is equivalent to "NOT ( X BETWEEN Y AND
Z )".
6) "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z".
If the endpoints are inclusive, then BETWEEN is the preferred syntax.
Less references to a column means less spots to update when things change. It's the engineering principle, that less things means less stuff can break.
It also means less possibility of someone putting the wrong bracket for things like including an OR. IE:
WHERE salary BETWEEN 5000 AND (15000
OR ...)
...you'll get an error if you put the bracket around the AND part of a BETWEEN statement. Versus:
WHERE salary >= 5000
AND (salary <= 15000
OR ...)
...you'd only know there's a problem when someone reviews the data returned from the query.
Semantically, the two expressions have the same result.
However, BETWEEN is a single predicate, instead of two comparison predicates combined with AND. Depending on the optimizer provided by your RDBMS, a single predicate may be easier to optimize than two predicates.
Although I expect most modern RDBMS implementations should optimize the two expressions identically.
worse if it's
SELECT id FROM entries
WHERE
(SELECT COUNT(id) FROM anothertable WHERE something LEFT JOIN something ON...)
BETWEEN entries.max AND entries.min;
Rewrite this one with your syntax without using temporary storage.
I'd better use the 2nd one, as you always know if it's <= or <
In SQL, I agree that BETWEEN is mostly unnecessary, and can be emulated syntactically with 5000 <= salary AND salary <= 15000. It is also limited; I often want to apply an inclusive lower bound and an exclusive upper bound: #start <= when AND when < #end, which you can't do with BETWEEN.
OTOH, BETWEEN is convenient if the value being tested is the result of a complex expression.
It would be nice if SQL and other languages would follows Python's lead in using proper mathematical notation: 5000 <= salary <= 15000.
One small tip that wil make your code more readable: use < and <= in preference to > and >=.