Execute a WHERE clause before another one - sql

I have the following statement
SELECT * FROM foo
WHERE LEN(bar) = 4 AND CONVERT(Int,bar) >= 5000
The values in bar with a length of exactly 4 characters are integers. The other values are not integers and therefore it throws an conversion exception, when trying to convert one of them to an integer.
I thought it's enough to put the LEN(bar) before the CONVERT(Int,bar) >= 5000. But it's not.
How can I kind of prioritize a specific where clause? In my example I obviously want to select all values with a length of 4, before converting and comparing them.

6 answers and 5 of them don't work (for SQL Server)...
SELECT *
FROM foo
WHERE CASE WHEN LEN(bar) = 4 THEN
CASE WHEN CONVERT(Int,bar) >= 5000 THEN 1 ELSE 0 END
END = 1;
The WHERE/INNER JOIN conditions can be executed in any order that the query optimizer determines is best. There is no short-circuit boolean evaluation.
Specifically for your question, since you KNOW that the data with 4-characters is a number, then you can do a direct lexicographical (text) comparison (yes it works):
SELECT *
FROM foo
WHERE LEN(bar) = 4 AND bar > '5000';

try this
SELECT bar FROM
(
SELECT CASE
WHEN LEN(bar) = 4 THEN CAST( bar as int)
ELSE CAST(-1 as int) END bar
FROM Foo
) Foo
WHERE bar>5000

Related

remove last n characters from a varchar in SQL

I am trying to remove the last n characters from a string. I tried this:
replace( str, right(str, 3), '' )
But it fails on str where the pattern repeats more than once. 888106106. In this case I get 888, instead of 888106
Now I am using
left (str, length(str)-3)
Is there a more efficient away of achieving this?
If you fancy a regex based solution:
regexp_replace(str,'...$','')
It will leave strings with < 3 characters unchanged
So checking that LEFT and/or SUBSTR work equally (I assume LEFT is faster):
select
column1
,left(column1, length(column1) -3) as r1
,substr(column1, 0, length(column1) -3) as r2
from values
('abc123')
,('ab123')
,('a123')
,('123')
,('12')
,('1')
,('')
,(null);
gives:
COLUMN1
R1
R2
abc123
abc
abc
ab123
ab
ab
a123
a
a
123
null
null
12
null
null
1
null
null
''
null
null
null
null
null
so no checks are needed, nice to know.
if you do some perf testing:
create database test;
create schema test.test;
create or replace table test.test.many_string as
select seq8()::text as a
from table(generator(ROWCOUNT => 10000000));
ALTER SESSION SET USE_CACHED_RESULT = false;
select sum(length(left(a, length(a) -3))) from test.test.many_string;
select sum(length(substr(a, 0, length(a) -3))) from test.test.many_string;
after running them both a couple of times on my x-small, I get results in the order of 300ms, so these are equal.
So it seems you have a fast solution, and easy to read.
In SQL Server, that's the most effective way.
Note that you should do one of:
assert that all input strings will be length>2 [a bit lazy]
handle the error where 1 of the rows has a length<3 and the query terminates early [a bit shoddy]
use a case statement to handle the case where length < 3 [the preferred approach]
CASE
WHEN LENGTH(str) > 2 THEN LEFT(str, LENGTH(str) - 3)
ELSE str
END
Other flavours of SQL you may have to work without the case statement.

Operator does not exist: integer = integer[] in a query with ANY

I frequently used integer = ANY(integer[]) syntax, but now ANY operator doesn't work. This is the first time I use it to compare a scalar with an integer returned from CTE, but I thought this shouldn't cause problems.
My query:
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a FROM foo WHERE b = ANY ( SELECT bs FROM bar);
When I run it, it throws following error:
ERROR: operator does not exist: integer = integer[]: WITH bar AS (
SELECT array_agg(b) AS bs FROM foo WHERE c < 3 ) SELECT a FROM foo
WHERE b = ANY ( SELECT bs FROM bar)
Details in this SQL Fiddle.
So what am I doing wrong?
Based on the error message portion operator does not exist: integer = integer[], it appears that the bs column needs to be unnested, in order to get the right hand side back to an integer so the comparison operator can be found:
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a
FROM foo
WHERE b = ANY ( SELECT unnest(bs) FROM bar);
This results in the output:
A
2
3
Given the doc for the ANY function:
The right-hand side is a parenthesized subquery, which must return
exactly one column. The left-hand expression is evaluated and compared
to each row of the subquery result using the given operator, which
must yield a Boolean result. The result of ANY is "true" if any true
result is obtained. The result is "false" if no true result is found
(including the case where the subquery returns no rows).
... the error makes sense, as the left-hand expression is an integer -- column b -- while the right-hand expression is an array of integers, or integer[], and so the comparison ends up being of the form integer = integer[], which doesn't have an operator, and therefore results in the error.
unnesting the integer[] value makes the left- and right-hand expressions integers, and so the comparison can continue.
Modified SQL Fiddle.
Note: that the same behavior is seen when using IN instead of = ANY.
without unnest
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a FROM foo WHERE ( SELECT b = ANY (bs) FROM bar);
FYI, For me,
SELECT ... WHERE "id" IN (SELECT unnest(ids) FROM tablewithids)
was incomparably faster than
SELECT ... WHERE "id" = ANY((SELECT ids FROM tablewithids)::INT[])
Didn't do any research into why that was though.
column needs to be unnest
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a
FROM foo
WHERE b = ANY ( SELECT unnest(bs) FROM bar);

SQL query: convert

I'm trying to read a column from a database using a SQL query. The column consists of empty string or numbers as strings, such as
"7500" "4460" "" "2900" "2640" "1850" "" "2570" "9050" "8000" "9600"
I'm trying to find the right sql query to extract all the numbers (as integers) and removing the empty ones, but I'm stuck. So far I've got
SELECT *
FROM base
WHERE CONVERT(INT, code) IS NOT NULL
Done in program R (package sqldf)
If all columns are valid integers, you could use:
select * , cast(code as int) IntCode
from base
where code <> ''
To prevent cases when field code is not a valid number, use:
select *, cast(codeN as int) IntCode
from base
cross apply (select case when code <> '' and not code like '%[^0-9]%' then code else NULL end) N(codeN)
where codeN is not null
SQL Fiddle
UPDATE
To find rows where code is not a valid number, use
select * from base where code like '%[^0-9]%'
select *
from base
where col like '[1-9]%'
Example: http://sqlfiddle.com/#!6/f7626/2/0
If you don't need to test for the number being valid, ie. a string such as '909XY2' then this may run marginally faster, more or less depending on the size of the table
Is this what you want?
SELECT (case when code not like '%[^0-9]%' then cast(code as int) end)
FROM base
WHERE code <> '' and code not like '%[^0-9]%';
The conditions are repeated in the where and case on purpose. SQL Server does not guarantee that where filters are applied before logic in the select, so you can get an error with conversions. More recent versions of SQL Server have try_convert() to fix this problem.
Using sqldf with the default sqlite database and this test data:
DF <- data.frame(a = c("7500", "4460", "", "2900", "2640", "1850", "", "2570",
"9050", "8000", "9600"), stringsAsFactors = FALSE)
try this:
library(sqldf)
sqldf("select cast(a as aint) as aint from DF where length(a) > 0")
giving:
aint
1 7500
2 4460
3 2900
4 2640
5 1850
6 2570
7 9050
8 8000
9 9600
Note In plain R one could write:
transform(subset(DF, nchar(a) > 0), a = as.integer(a))

Specify order of (T)SQL execution

I have seen similar questions asked elsewhere on this site, but more in the context of optimization.
I am having an issue with the order of execution of the conditions in a WHERE clause. I have a field which stores codes, most of which are numeric but some of which contain non-numeric characters. I need to do some operations on the numeric codes which will cause errors if attempted on non-numeric strings. I am trying to do something like
WHERE isnumeric(code) = 1
AND CAST(code AS integer) % 2 = 1
Is there any way to make sure that the isnumeric() executes first? If it doesn't, I get an error...
Thanks in advance!
The only place order of evaluation is guaranteed is CASE
WHERE
CASE WHEN isnumeric(code) = 1
THEN CAST(code AS integer) % 2
END = 1
Also just because it passes the isnumeric test doesn't guarantee that it will successfully cast to an integer.
SELECT ISNUMERIC('$') /*Returns 1*/
SELECT CAST('$' AS INTEGER) /*Fails*/
Depending upon your needs you may find these alternatives preferable.
Why not simply do it using LIKE?:
Where Code Not Like '%[^0-9]%'
Btw, either using my solution or using IsNumeric, there are some edge cases which might lead one to using a UDF such as 1,234,567 where IsNumeric will return 1 but Cast will throw an exception.
Why not use a CASE statement to say something like:
WHERE
CASE WHEN isnumeric(code) = 1
THEN CAST(code AS int) % 2 = 1
ELSE /* What ever else if not numeric */ END
You could do it in a case statement in the select clause, then limit by the value in an outer select
select * from (
select
case when isNum = 1 then CAST(code AS integer) % 2 else 0 end as castVal
from (
select
Case when isnumeric(code) = 1 then 1 else 0 end as isNum
from table) t
) t2
where castval = 1

How do I map true/false/unknown to -1/0/null without repetition?

I am currently working on a tool to help my users port their SQL code to SQL-Server 2005. For this purpose, I parse the SQL into a syntax tree, analyze it for constructs which need attentions, modify it and transform it back into T-SQL.
On thing that I want to support, is the "bools are values too" semantics of other RDBMS. For example, MS-Access allows me to write select A.x and A.y as r from A, which is impossible in T-SQL because:
Columns can't have boolean type (column values can't be and'ed)
Logical predicates can not be used where expressions are expected.
Therefore, my transformation routine converts the above statement into this:
select case
when (A.x<>0) and (A.y<>0)
then -1
when not((A.x<>0) and (A.y<>0))
then 0
else
null
end as r
from A;
Which works, but is annoying, because I have to duplicate the logical expression (which can be very complex or contain subqueries etc.) in order to distinguish between true, false and unknown - the latter shall map to null. So I wonder if the T-SQL pro's here know a better way to achieve this?
UPDATE:
I would like to point out, that solutions which try to keep the operands in the integer domain have to take into account, that some operands may be logical expressions in the first place. This means that a efficient solution to convert a bool to a value is stil required. For example:
select A.x and exists (select * from B where B.y=A.y) from A;
I don't think there's a good answer, really it's a limitation of TSQL.
You could create a UDF for each boolean expression you need
CREATE FUNCTION AndIntInt
(
#x as int,#y as int
)
RETURNS int
AS
BEGIN
if (#x<>0) and (#y<>0)
return -1
if not((#x<>0) and (#y<>0))
return 0
return null
END
used via
select AndIntInt(A.x,A.y) as r from A
Boolean handling
Access seems to use the logic that given 2 booleans
Both have to be true to return true
Either being false returns false (regardless of nulls)
Otherwise return null
I'm not sure if this is how other DBMS (Oracle, DB2, PostgreSQL) deal with bool+null, but this answer is based on the Access determination (MySQL and SQLite agree). The table of outcomes is presented below.
X Y A.X AND B.Y
0 0 0
0 -1 0
0 (null) 0
-1 0 0
-1 -1 -1
-1 (null) (null)
(null) 0 0
(null) -1 (null)
(null) (null) (null)
SQL Server helper 1: function for boolean from any "single value"
In SQL Server in general, this function will fill the gap for the missing any value as boolean functionality. It returns a ternary result, either 1/0/null - 1 and 0 being the SQL Server equivalent of true/false (without actually being boolean).
drop function dbo.BoolFromAny
GO
create function dbo.BoolFromAny(#v varchar(max)) returns bit as
begin
return (case
when #v is null then null
when isnumeric(#v) = 1 and #v like '[0-9]%' and (#v * 1.0 = 0) then 0
else 1 end)
end
GO
Note: taking Access as a starting point, only the numeric value 0 evaluates to FALSE
This uses some SQL Server tricks
everything is convertible to varchar. Therefore only one function taking varchar input is required.
isnumeric is not comprehensive, '.' returns 1 for isnumeric but will fail at #v * 1.0, so an explicit test for LIKE [0-9]%`` is required to "fix" isnumeric.
#v * 1.0 is required to overcome some arithmetic issues. If you pass the string "1" into the function without *1.0, it will bomb
Now we can test the function.
select dbo.BoolFromAny('abc')
select dbo.BoolFromAny(1)
select dbo.BoolFromAny(0) -- the only false
select dbo.BoolFromAny(0.1)
select dbo.BoolFromAny(-1)
select dbo.BoolFromAny('')
select dbo.BoolFromAny('.')
select dbo.BoolFromAny(null) -- the only null
You can now safely use it in a query against ANY SINGLE COLUMN, such as
SELECT dbo.BoolFromAny(X) = 1
SQL Server helper 2: function to return result of BOOL AND BOOL
Now the next part is creating the same truth table in SQL Server. This query shows you how two bit columns interact and the simple CASE statement to produce the same table as Access and your more complicated one.
select a.a, b.a,
case
when a.a = 0 or b.a = 0 then 0
when a.a = b.a then 1
end
from
(select 1 A union all select 0 union all select null) a,
(select 1 A union all select 0 union all select null) b
order by a.a, b.a
This is easily expressed as a function
create function dbo.BoolFromBits(#a bit, #b bit) returns bit as
begin
return case
when #a = 0 or #b = 0 then 0
when #a = #b then 1
end
end
SQL Server conversion of other expressions (not of a single value)
Due to lack of support for bit-from-boolean conversion, expressions that are already [true/false/null] in SQL Server require repetition in a CASE statement.
An example is a "true boolean" in SQL Server, which cannot be the result for a column.
select A > B -- A=B resolves to one of true/false/null
from C
Needs to be expressed as
select case when A is null or B is null then null when A > B then 1 else 0 end
from C
But if A is not a scalar value but a subquery like (select sum(x)...), then as you can see A will appear twice and be evaluated twice in the CASE statement (repeated).
FINAL TEST
Now we put all the conversion rules to use in this long expression
SELECT X AND Y=Z AND C FROM ..
( assume X is numeric 5, and C is varchar "H" )
( note C contributes either TRUE or NULL in Access )
This translates to SQL Server (chaining the two functions and using CASE)
SELECT dbo.BoolFromBits(
dbo.BoolFromBits(dbo.BoolFromAny(X), CASE WHEN Y=Z then 1 else 0 end),
dbo.BoolFromAny(C))
FROM ...
Access Bool or bool
For completeness, here is the truth table for Access bool OR bool. Essentially, it is the opposite of AND, so
Both have to be false to return false
Either being true returns true (regardless of nulls)
Otherwise return null
The SQL SERVER case statement would therefore be
case
when a.a = 1 or b.a = 1 then 1
when a.a = b.a then 0
end
(the omission of an ELSE clause is intentional as the result is NULL when omitted)
EDIT: Based on additional information added to the Question and comments made on one of the suggested Answers, I am reformulating this answer:
If you are porting to SQL Server then I would expect that you are also transforming the data to match SQL Server types. If you have a boolean field then True, False, and Unknown map to 1, 0, and NULL as a NULLable BIT field.
With this in mind, you only need to worry about transforming pure boolean values. Expressions such as:
exists (select * from B where B.y=A.y)
and:
A.x in (1,2,3)
are already in a workable form. Meaning, statements like:
IF (EXISTS(SELECT 1 FROM Table))
and:
IF (value IN (list))
are already correct. So you just need to worry about the fact that "1" for "True" is not by itself enough. Hence, you can convert "1" values to boolean expressions by testing if they are in fact equal to "1". For example:
IF (value = 1)
is the equivalent of what you previously had as:
IF (value)
Putting all of this together, you should be able to simply translate all instances of pure boolean values of the old code into boolean expressions in the form of "value = 1" since a 1 will produce a True, 0 will produce False, and NULL will give you False.
HOWEVER, the real complexity is that SELECTing the value vs. testing with it via a WHERE condition is different. Boolean expressions evaluate correctly in WHERE conditions but have no direct representation to SELECT (especially since NULL / Unknown isn't really boolean). So, you can use the "Value = 1" translation in WHERE conditions but you will still need a CASE statement if you want to SELECT it as a result.
As mentioned briefly a moment ago, since NULL / Unknown isn't truly boolean, it is meaningless trying to convert "NULL AND NULL" to NULL for the purposes of a WHERE condition. In effect, NULL is truly FALSE since it cannot be determined to be TRUE. Again, this might be different for your purposes in a SELECT statement which is again why the CASE statement is your only choice there.