SQL plan compilation and truth tables - sql

If I have NOT ( 1 <> 1 AND NULL <> 1 )
I can see SQL turning this into in the execution plan XML: ( 1 = 1 OR NULL = 1)
If you would literally evaluate the former expression, the True AND Null would be Null and would eliminate the row. However, the compiled expression can return a row due to the OR.
Can I assume that this type of compilation is guaranteed to always happen? SQL Server would never attempt to bring the convoluted logic forward into the compiled plan? Is there some documentation on this?
This article was pretty helpful, but I am just missing a piece of the puzzle:
https://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/
Here is a SQL example
SELECT 1
FROM T T
LEFT JOIN T2 T2 --t2 has zero rows
ON T.id = t2.t_id
WHERE NOT ( T.id <> 99 AND T2.id <> 99 )
From my experience with SQL, I know that under normal circumstances (without short circuit evaluation) T2.id <> 99 effectively turns the left join into an inner join. That was the behavior I was initially expecting. I was surprised when this filter actually worked.

TL;DR The "compiled result" is not a helpful concept. What matters is the "specified result"--specified by the language definition. A DBMS must make the statement act the way you wrote it.
The truth [sic] table for AND in your link is wrong. AND with False is always False and OR with True is always True in SQL.
Comparisons in SQL return True, False or Unknown. Unknown can arise from a comparison to NULL or a 3VL logic connective (AND/OR/NOT etc) on Unknown. "NULL" is not a literal. True, False & Unknown are values with (assorted) literals in the SQL standard, but not in most DBMSs. (And Unknown can be returned as NULL.) IS is not a comparison; IS NULL and IS NOT NULL are unary 3Vl logic connectives and so are the similar ones named with TRUE, FALSE & UNKNOWN. They always return True or False.
True AND Null would be Null and would eliminate the row. However, the
compiled expression can return a row due to the OR.
No. The truth [sic] table for AND in your link is wrong. AND with False is always False and OR with True is always True in SQL. So your AND is always False from the NOT of False from the AND of False from 1 <> 1 and your OR is always True from 1 = 1. No matter what the other comparisons return (True, False or Unknown). If you work through these two expressions using the (correct) SQL truth tables), they both always give the same result, True.
One has to be very careful about rewriting conditions in SQL. One can interchange NOT (E1 *comparison* E2) by E1 *NOT-comparison* E2 or NOT (E IS ?) and E IS NOT ?. One can safely rewrite an expression using standard logic identities/rules if no value ever IS NULL. One can also safely apply rewrite rules to
(E1 *comparison* E2)
AND E1 IS NOT NULL AND E2 IS NOT NULL
Also beware that you must properly use an Unknown final result, which includes not matching for a WHERE but not failing for a constraint.
SELECT 1
FROM T T
LEFT JOIN T2 T2 --t2 has zero rows
ON T.id = t2.t_id
WHERE NOT ( T.id <> 99 AND T2.id <> 99 )
LEFT JOIN returns the rows of INNER JOIN plus unmatched rows of T extended by T2 columns NULL. (With T2 empty, the INNER JOIN is empty and all rows of T are unmatched.) All the extended rows have T2.id <> 99 Unknown since T2.id is NULL. For T.id = 99 the AND is False and the NOT is True; the WHERE returns all rows. For T1.id any other integer or NULL, the AND will be Unknown, the NOT will be Unknown; the WHERE returns no rows.
(There is no "short ciruit" evaluation of conditions in SQL. Every argument of a connective must be defined.)

If you would literally evaluate the former expression, the True AND Null would be Null and would eliminate the row.
No. You are evaluating the expression. NOT ( 1 <> 1 AND NULL <> 1 ) is NOT (FALSE AND UNKNOWN) is NOT FALSE is TRUE.
( 1 = 1 OR NULL = 1) is TRUE OR UNKNOWN is TRUE. They are both equivalent.
NOT ( 1 <> 1 AND NULL <> 1 ) can be rewritten as NOT ((NOT (1=1)) AND (NOT (NULL = 1))). In regular two value logic, by De Morgan's Laws that can be rewritten as NOT (NOT ((1 = 1) OR (NULL = 1))) and then (1=1) OR (NULL = 1). As it turns out De Morgan's Laws also hold in the three value logic of SQL. This can be demonstrated by creating exhaustive truth tables for the two laws.
The truth table showing that one of De Morgan's Laws, (NOT A) OR (NOT B) is equivalent to NOT (A AND B), holds in SQL's three value logic:
A B | (NOT A) OR (NOT B) | equiv? | NOT (A AND B)
========================================================
T T | F T F F T | T | F T T T
T F | F T T T F | T | T T F F
T U | F T U U U | T | U T U U
-------------------------------------------------------
F T | T F T F T | T | T F F T
F F | T F T T F | T | T F F F
F U | T F T U U | T | T F F U
-------------------------------------------------------
U T | U U U F T | T | U U U T
U F | U U T T F | T | T U F F
U U | U U U U U | T | U U U U
The other law, (NOT A) AND (NOT B) is equivalent to NOT (A OR B) can similarly be demonstrated.
Can I assume that this type of compilation is guaranteed to always happen?
No, specific compilations are never (hardly ever) guaranteed. Barring bugs in SQL Server, the query plans chosen, the transformations applied, will return the results specified by a query.
Edited to add: Let T.id be 99 and T2.id be NULL. Then:
WHERE NOT ( T.id <> 99 AND T2.id <> 99 )
WHERE NOT (99 <> 99 AND NULL <> 99)
WHERE NOT (FALSE AND UNKNOWN)
WHERE NOT (FALSE)
WHERE TRUE

Related

How can I make IF without ELSE on SQL WHERE condition?

I`m trying to make a querie that selects users and if user type equals 1 I need to select those with age. My table:
id (int 11) | type (int 11) | email (varchar 25) | age (int 11)
My querie:
SELECT * FROM users WHERE IF(type = 1, age <> 0)
The problem is that I need to have an ELSE condition, but I dont need one in this case. How can I make an IF inside WHERE without else condition?
Thanks
You can do it with CASE:
SELECT * FROM users
WHERE age = CASE WHEN type <> 1 THEN age ELSE 0 END
Q: How do I make IF without ELSE on SQL WHERE condition ?
A: It's not possible; there is always an ELSE. MySQL IF() function has three arguments. It doesn't matter where the IF() function is used, whether it's part of an expression in a WHERE clause, or an expression in the SELECT list.
As an alternative to the MySQL IF() function, we can use a more portable, more ANSI-standard compliant CASE expression. But that doesn't get away from the crux of the question, about avoiding an ELSE. There is always an ELSE with the CASE expression as well. If we omit the ELSE clause, it's the same as if we had specified ELSE NULL.
As an aside (unrelated to the question that was asked), I don't think we should be storing age as an attribute; typically age is the difference between the current date and a date in the past (date of birth, registration date, etc.)
I'm thinking we don't need an IF function in the WHERE clause. (That's specific to MySQL, so this answer assumes that the target DBMS is MySQL, and not some other RDBMS).
We can use a combination of conditions, combined with NOT, AND, OR and parens so specify an order of operations.
Sample data and example output goes a long way to explaining the spec.
id type age email
-- ---- ---- ----------
1 0 0 1#one
2 1 0 2#two
3 0 1 3#three
4 1 1 4#four
5 0 NULL 5#five
6 1 NULL 6#six
7 NULL NULL 7#seven
8 NULL 0 8#eight
9 NULL 1 9#nine
Which of these rows should be returned, and which rows should be excluded?
Here is an example query (MySQL specific syntax) that returns all rows except row id=2 (type=1, age=0)
SELECT u.id
, u.type
, u.age
, u.email
FROM user u
WHERE NOT ( u.type <=> 1 )
OR NOT ( u.age <=> 0 )
If there's a requirement to incorporate IF functions, we can do that, and return an equivalent result:
SELECT u.id
, u.type
, u.age
, u.email
FROM user u
WHERE NOT ( IF( u.type <=> 1 ,1,0) )
OR NOT ( IF( u.age <=> 0 ,1,0) )
^^^ ^^^^^
In the WHERE clause, an expression will be evaluated as a boolean value. A numeric value of 0 is FALSE, a non-zero value is TRUE, and NULL value is (as always) just NULL.
For a row to be returned, we need the expression in the WHERE clause to evaluate to a non-zero value (to evaluate to TRUE).
The third argument of the IF() function is the "else" value; for that value, we can return TRUE, FALSE or NULL. To exclude rows that do not satisfy the type=1 condition, we return either zero or NULL:
WHERE IF(type = 1, age <> 0 ,0 )
^^
or equivalently:
WHERE IF(type = 1, age <> 0 ,NULL )
^^^^^
If we want rows that don't satisfy type=1 condition to be returned, we can return any non-zero value:
WHERE IF(type = 1, age <> 0 ,42 )
^^^
RECAP:
Addressing the question that was asked:
Q: How do I make IF without ELSE on SQL WHERE condition ?
A: There is always an ELSE value with the MySQL IF() function; in the context of the WHERE clause, the value will be evaluated as a boolean: TRUE, FALSE or NULL.
I think you want:
SELECT *
FROM users
WHERE type <> 1 OR age <> 0;
I was in a similar situation and ended up with the following solution:
SELECT * FROM users WHERE IF(type = 1, age <> 0, 1=0)
The else part here is 1 = 0 which is never true, so you don't select anything in that case.

What is the difference between NOT and != operators in SQL?

What is the difference between NOT and != operators in SQL? I can't understand the difference. I guess they are same.
NOT negates the following condition so it can be used with various operators. != is the non-standard alternative for the <> operator which means "not equal".
e.g.
NOT (a LIKE 'foo%')
NOT ( (a,b) OVERLAPS (x,y) )
NOT (a BETWEEN x AND y)
NOT (a IS NULL)
Except for the overlaps operator above could also be written as:
a NOT LIKE 'foo%'
a NOT BETWEEN x AND y
a IS NOT NULL
In some situations it might be easier to understand to negate a complete expression rather then rewriting it to mean the opposite.
NOT can however be used with <> - but that wouldn't make much sense though: NOT (a <> b) is the same as a = b. Similarly you could use NOT to negate the equality operator NOT (a = b) is the same as a <> b
This question actually makes a lot more sense than people give it credit for.
Firstly, original SQL not-equal operator was <>, and only later on the C-style != was added as far as I know. I personally always use <> as != looks strange to me, but I'm old school.
Secondly, of course the original asker didn't mean to compare NOT with !=, but rather the difference between NOT a = b vs. a != b. And intuitively there should be a difference, but for all I know there isn't.
To make this all clear, here is an example session run on PostgreSQL (in Oracle you need more weird stuff such as SELECT ... FROM DUAL UNION ..., etc., which I avoid for the sake of brevity):
db=# with tst(a, b) as ( values (1,2), (2,3), (4, null) ) select * from tst;
a | b
---+---
1 | 2
2 | 3
4 |
(3 rows)
db=# with tst(a, b) as ( values (1,2), (2,3), (4, null) ) select * from tst where b = 2;
a | b
---+---
1 | 2
(1 row)
db=# with tst(a, b) as ( values (1,2), (2,3), (4, null) ) select * from tst where b != 2;
a | b
---+---
2 | 3
(1 row)
db=# with tst(a, b) as ( values (1,2), (2,3), (4, null) ) select * from tst where not b = 2;
a | b
---+---
2 | 3
(1 row)
Here we may think that this last query should also have returned the row (4, NULL). But it didn't. In PostgreSQL I can actually inspect this further, as follows:
db=# with tst(a, b) as ( values (1,2), (2,3), (4, null) ) select *, b = 2 as beq2 from tst;
a | b | beq2
---+---+------
1 | 2 | t
2 | 3 | f
4 | |
(3 rows)
You see that the Boolean expression b = 2 is NULL for the case where b is NULL. However, when a Boolean expression is NULL it is treated as false, or rather not true. And when you negate it with NOT, the Boolean value of the expression stays NULL and therefore is still not true.
Unfortunately I know of no other way than to handle NULL cases explicitly, so I have to write:
db=# with tst(a, b) as ( values (1,2), (2,3), (4, null) ) select * from tst where b is null or b = 2;
a | b
---+---
1 | 2
4 |
(2 rows)
So, instead of writing NOT <Boolean expression> you always have to write a IS NULL OR b IS NULL OR ... OR z IS NULL OR f(a, b, ..., z) where a, b, ..., z are variables in the given Boolean expression f(...).
It would be so much easier if instead of just NOT there were the Boolean operators MAYBE and CANNOT. So you could write WHERE MAYBE b = 2 or WHERE CANNOT b = 2 instead of this complicated OR combination of a bunch of IS NULL tests before your actual condition.
!= is a binary operator that returns true if its two arguments are not equal to each other.
NOT is a unary operator, which reverses its argument, a Boolean expression.
For example, this expression: a < 10 is true when a is any value less than 10. This condition can be negated: NOT a < 10. Negating this condition makes it true in the opposite cases, i.e. when a not less than 10. It's the same as a >= 10.
The expression a != 10 is true when a is any value less than 10 or any value greater than 10. This is a completely different case from a condition negated with NOT.
Both NOT operator and != almost serve a similar purpose.Both are used in Where clause of an sql query.
NOT operator shows records when a particular condition is not true.
Example:
SELECT * FROM Employees
WHERE NOT Country='Germany'
will get you records with all employees with countries other than Germany.
The != operator similarly checks if the values of two operands are equal or not, if values are not equal then condition becomes true.
Example:
SELECT * FROM Employees
WHERE Country!='Germany'
will get you all rows with country column having country other than Germany.

SQL: Most efficient way to select sequences of rows from a table

I have a tagged textual corpus stored in an SQL table like the following:
id tag1 tag2 token sentence_id
0 a e five 1
1 b f score 1
2 c g years 1
3 d h ago 1
My task is to search the table for sequences of tokens that meet certain criteria, sometimes with gaps between each token.
For example:
I want to be able to search for a sequence similar to the following:
the token has the value a in the tag1 column, and
the second token is one to two rows away from the first, and has the value g in tag2 or b in tag1, and
the third token should be at least three rows away, and has ago in the token column.
In SQL, this would be something like the following:
SELECT * FROM my_table t1
JOIN my_table t2 ON t1.sentence_id = t2.sentence_id
JOIN my_table t3 ON t3.sentence_id = t1.sentence_id
WHERE t1.tag1 = 'a' AND (t2.id = t1.id + 1 OR t2.id = t1.id + 2)
AND (t2.tag2 = 'g' OR t2.tag1 = 'b')
AND t3.id >= t1.id + 3 AND t3.token = 'ago'
So far I have only been able to achieve this by joining the table by itself each time I specify a new token in the sequence (e.g. JOIN my_table t4), but with millions of rows this gets quite slow. Is there a more efficient way to do this?
You could try this staged approach:
apply each condition (other than the various distance conditions) as a subquery
Calculate the distances between the tokens which meet the conditions
Apply all the distance conditions separately.
This might improve things, if you have indexes on the tag1, tag2 and token columns:
SELECT DISTINCT sentence_id FROM
(
-- 2. Here we calculate the distances
SELECT cond1.sentence_id,
(cond2.id - cond1.id) as cond2_distance,
(cond3.id - cond1.id) as cond3_distance
FROM
-- 1. These are all the non-distance conditions
(
SELECT * FROM my_table WHERE tag1 = 'a'
) cond1
INNER JOIN
(
SELECT * FROM my_table WHERE
(tag1 = 'b' OR tag2 = 'g')
) cond2
ON cond1.sentence_id = cond2.sentence_id
INNER JOIN
(
SELECT * FROM my_table WHERE token = 'ago'
) cond3
ON cond1.sentence_id = cond3.sentence_id
) conditions
-- 3. Now apply the distance conditions
WHERE cond2_distance BETWEEN 0 AND 2
AND cond3_distance >= 3
ORDER BY sentence_id;
If you apply this query to this SQL fiddle you get:
| sentence_id |
|-------------|
| 1 |
| 4 |
Which is what you want. Now whether it's any faster or not, only you (with your million-row database) can really tell, but from the perspective of having to actually write these queries, you'll find they're much easier to read, understand and maintain.
You need to edit your question and give more details on how these sequences of tokens work (for instance, what does "each time I specify a new token in the sequence" mean in practice?).
In postgresql you can solve this class of queries with a window function. Following your exact specification above:
SELECT *,
CASE
WHEN lead(tag2, 2) OVER w = 'g' THEN lead(token, 2) OVER w
WHEN lead(tag1) OVER w = 'b' THEN lead(token) OVER w
ELSE NULL::text
END AS next_token
FROM my_table
WHERE tag1 = 'a'
AND next_token IS NOT NULL
WINDOW w AS (PARTITION BY sentence_id ORDER BY id);
The lead() function looks ahead a number of rows (default is 1, when not specified) from the current row in the window frame, in this case all rows with the same sentence_id as specified in the partition of the window definition. So, lead(tag1, 2) looks at the value of tag1 two rows ahead to compare against your condition, and lead(token, 2) returns the token from two rows ahead as column next_token in the current row and having the same sentence_id. If the first CASE condition fails, the second is evaluated; if that fails NULL is returned. Note that the order of the conditions in the CASE clause is significant: different ordering gives different results.
Obviously, if you keep on adding conditions for subsequent tokens the query becomes very complex and you may have to put individual search conditions in separate stored procedures and then call these depending on your requirements.

unusual sql server query result

Let's say I have a table called nameAge:
ID Name Age
1 X 12
2 Y 12
3 null null
4 Z 12
and when I run a query like:
select * from nameAge where Age <> 12
it returns me an empty result set while I have row with id 3 where age is different than null?
Using Sql Server 2008 R2.
Any ideas?
Edit: Possibility to be duplicate with suggested answer may be at one point but does not cover at all and it shows how to use null values when compared with null but what I wanted to ask was about the result set which includes null values
This is the intended behavior. You cannot compare NULL values using = or <>. You have to use IS NULL or IS NOT NULL.
If you want NULL values only use IS NULL:
select * from nameAge where age IS NULL
If you want NULL values with age <> 12 values, use:
select * from nameAge where age <> 12 OR age IS NULL
The expression
WHERE NULL <> 12
does not return TRUE or FALSE, but actually returns UNKNOWN. This means that the third record in your table will not be returned by your query.
As #ughai mentioned, you should use IS NULL instead to query that record:
SELECT * FROM nameAge WHERE age IS NULL
Have a look at the Microsoft SQL Server documentation for more information.
When you are dealing with NULLs you should be always careful because of 3 valued logic used in Sql Server(when a predicate can be evaluated to TRUE, FALSE or UNKNOWN). Now here is a classic select statement where many newcomers make a mistake, suggesting that the statement would return all rows where Age <> 12 including NULLs.
But if you know the easy fact that comparing NULL to any value, even to NULL itself will evaluate to UNKNOWN it is getting more clear what is going on. WHERE clause will return ONLY those rows where predicate is evaluated to TRUE. Rows where predicate evaluates to FALSE or UNKNOWN will be filtered out from resultset.
Now let's see what is going on behind the scene. You have 4 rows:
ID Name Age
1 X 12
2 Y 12
3 null null
4 Z 12
and the predicate is:
where Age <> 12
When you evaluate this predicate for each row you get:
ID Name Age Evaluation result
1 X 12 FALSE --(because 12 <> 12 is FALSE)
2 Y 12 FALSE --(because 12 <> 12 is FALSE)
3 null null UNKNOWN --(because NULL <> 12 is UNKNOWN)
4 Z 12 FALSE --(because 12 <> 12 is FALSE)
Now remember that WHERE clause will return only rows where predicate evaluates to TRUE and it is clear that you will not get any result because no row evaluates to TRUE.

Single SQL conditions in brackets

I am refactoring some old oracle sql statements containing plenty of conditions. Some are single conditions put into brackets. Now, does the brackets matter for single conditions? Is there a difference between the two examples below?
example 1
WHERE
(
A = B
AND B = C
)
AND ( A > 5 )
AND ( B <> 0 )
example 2
WHERE
(
A = B
AND B = C
)
AND A > 5
AND B <> 0
As far as I know there ain't any semantic differences.
In my experience usually this is either
a relic of some old condition (maybe an OR was in that bracket somewhere in the past) or
just the style of the dev working in this.
There is no difference between the two examples you have posted. Oracle query executes from the end i.e your where condition's last part is filtered first and runs back eg:- where first filter condition would be b<>0 then A>5 ,so on.
Its good practice to use brackets when using AND and OR operators together as without brackets soemtimes the logic is unclear ,otherwise with only AND operator bracket doesn't make any difference
What about logic: you do need brackets if you have some OR logic, but in this case (only AND) it has no meaning. You can remove all brackets in your query.
And if we'll go deep: look the explain query analyze, you can see that interpreter puts brackets automatically even you missed it.
All are same until you have only AND. But if you have any other operator then comes the question of precedence.
example 1
WHERE
(
A = B
AND B = C
)
AND ( A > 5 )
AND ( B <> 0 )
example 2
WHERE
(
A = B
AND B = C
)
AND A > 5
AND B <> 0
example 3
WHERE
A = B
AND B = C
AND A > 5
AND B <> 0