SQL query: convert - sql

I'm trying to read a column from a database using a SQL query. The column consists of empty string or numbers as strings, such as
"7500" "4460" "" "2900" "2640" "1850" "" "2570" "9050" "8000" "9600"
I'm trying to find the right sql query to extract all the numbers (as integers) and removing the empty ones, but I'm stuck. So far I've got
SELECT *
FROM base
WHERE CONVERT(INT, code) IS NOT NULL
Done in program R (package sqldf)

If all columns are valid integers, you could use:
select * , cast(code as int) IntCode
from base
where code <> ''
To prevent cases when field code is not a valid number, use:
select *, cast(codeN as int) IntCode
from base
cross apply (select case when code <> '' and not code like '%[^0-9]%' then code else NULL end) N(codeN)
where codeN is not null
SQL Fiddle
UPDATE
To find rows where code is not a valid number, use
select * from base where code like '%[^0-9]%'

select *
from base
where col like '[1-9]%'
Example: http://sqlfiddle.com/#!6/f7626/2/0
If you don't need to test for the number being valid, ie. a string such as '909XY2' then this may run marginally faster, more or less depending on the size of the table

Is this what you want?
SELECT (case when code not like '%[^0-9]%' then cast(code as int) end)
FROM base
WHERE code <> '' and code not like '%[^0-9]%';
The conditions are repeated in the where and case on purpose. SQL Server does not guarantee that where filters are applied before logic in the select, so you can get an error with conversions. More recent versions of SQL Server have try_convert() to fix this problem.

Using sqldf with the default sqlite database and this test data:
DF <- data.frame(a = c("7500", "4460", "", "2900", "2640", "1850", "", "2570",
"9050", "8000", "9600"), stringsAsFactors = FALSE)
try this:
library(sqldf)
sqldf("select cast(a as aint) as aint from DF where length(a) > 0")
giving:
aint
1 7500
2 4460
3 2900
4 2640
5 1850
6 2570
7 9050
8 8000
9 9600
Note In plain R one could write:
transform(subset(DF, nchar(a) > 0), a = as.integer(a))

Related

SQL IsNumeric function

SELECT IsNumeric('472369326D4')
is returning 1. Clearly, there is a aphabet D in the string. Why ?
472369326D4 is a valid float type. The D4 is translated as adding four 0 values, effectively multiplying the value before the D character by 10000.
Example Sql
SELECT cast('472369326D4' as float)
SELECT cast('472369326D3' as float)
SELECT cast('472369326D2' as float)
Output:
4723693260000
472369326000
47236932600
You probably want logic like this:
(case when str not like '%[^0-9]%' then 1 else 0 end) as isNumeric
Or, if you want to allow decimals and negative signs the logic is a little more cumbersome:
(case when str not like '%.%.%' and str not like '%[^.0-9]%' and
str not like '-%[^.0-9]%'
then 1 else 0
end)
I strongly recommend not using isnumeric(). It produces strange results. In addition to 'd', 'e' is allowed.
In SQL Server 2012+, you can try the following:
select x, isnumeric(x), try_convert(float, x)
from (values ('$'), ('$.'), ('-'), ('.')) v(x)
All of these return 1 for isnumeric, but NULL for the conversion, meaning that you cannot convert the value to a float. (You can run the code in SQL Server 2008 with convert() instead and watch the code fail.)

Invalid argument for function integer IBM DB2

I need to filter out rows in table where numer_lini column has number in it and it is between 100 and 999, below code works just fine when i comment out line where i cast marsnr to integer. However when i try to use it i get error: Invalid character found in a character string argument of the function "INTEGER". when looking at the list seems like replace and translate filters only numbers just fine and select only contains legit numbers (list of unique values is not long so its easy to scan by eye). So why does it fail to cast something? I also tried using integer(marsnr), but it produces the same error. I need casting because i need numeric range, otherwise i get results like 7,80 and so on. As I mentioned Im using IBM DB2 database.
select numer_lini, war_trasy, id_prz1, id_prz2
from alaska.trasa
where numer_lini in (
select marsnr
from (
select
distinct numer_lini marsnr
from alaska.trasa
where case
when replace(translate(numer_lini, '0','123456789','0'),'0','') = ''
then numer_lini
else 'no'
end <> 'no'
)
where cast(marsnr as integer) between 100 and 999
)
fetch first 300 rows only
If you look at the optimized SQL from the Db2 explain, you will see that Db2 has collapsed your code into a single select.
SELECT DISTINCT Q2.NUMER_LINI AS "NUMER_LINI",
Q2.WAR_TRASY AS "WAR_TRASY",
Q2.ID_PRZ1 AS "ID_PRZ1",
Q2.ID_PRZ2 AS "ID_PRZ2",
Q1.NUMER_LINI
FROM ALASKA.TRASA AS Q1,
ALASKA.TRASA AS Q2
WHERE (Q2.NUMER_LINI = Q1.NUMER_LINI)
AND (100 <= INTEGER(Q1.NUMER_LINI))
AND (INTEGER(Q1.NUMER_LINI) <= 999)
AND (CASE WHEN (REPLACE(TRANSLATE(Q1.NUMER_LINI,
'0',
'123456789',
'0'),
'0',
'') = '') THEN Q1.NUMER_LINI
ELSE 'no' END <> 'no')
Use a CASE to force Db2 to do the "is integer" check first. Also, you don't check for the empty string.
E.g. with this table and data
‪create‬‎ ‪TABLE‬‎ ‪alaska‬‎.‪trasa‬‎ ‪‬‎(‪numer_lini‬‎ ‪VARCHAR‬‎(‪10‬‎)‪‬‎,‪‬‎ ‪war_trasy‬‎ ‪INT‬‎ ‪‬‎,‪‬‎ ‪id_prz1‬‎ ‪INT‬‎,‪‬‎ ‪id_prz2‬‎ ‪INT‬‎)‪;
insert into alaska.trasa values ('',1,1,1),('99',1,1,1),('500',1,1,1),('3000',1,1,1),('00300',1,1,1),('AXS',1,1,1);
This SQL works
select numer_lini, war_trasy, id_prz1, id_prz2
from alaska.trasa
where case when translate(numer_lini, '','0123456789') = ''
and numer_lini <> ''
then integer(numer_lini) else 0 end
between 100 and 999
Although that does fail if there is an embedded space in the input. E.g. '30 0'. To cater for that, a regular expressing is probably preferred. E.g.
select numer_lini, war_trasy, id_prz1, id_prz2
from alaska.trasa
where case when regexp_like(numer_lini, '^\s*[+-]?\s*((\d+\.?\d*)|(\d*\.?\d+))\s*$'))
then integer(numer_lini) else 0 end
between 100 and 999

Determine if zip code contains numbers only

I have a field called zip, type char(5), which contains zip codes like
12345
54321
ABCDE
I'd like to check with an sql statement if a zip code contains numbers only.
The following isn't working
SELECT * FROM S1234.PERSON
WHERE ZIP NOT LIKE '%'
It can't work because even '12345' is an "array" of characters (it is '%', right?
I found out that the following is working:
SELECT * FROM S1234.PERSON
WHERE ZIP NOT LIKE ' %'
It has a space before %. Why is this working?
If you use SQL Server 2012 or up the following script should work.
DECLARE #t TABLE (Zip VARCHAR(10))
INSERT INTO #t VALUES ('12345')
INSERT INTO #t VALUES ('54321')
INSERT INTO #t VALUES ('ABCDE')
SELECT *
FROM #t AS t
WHERE TRY_CAST(Zip AS NUMERIC) IS NOT NULL
Using answer from here to check if all are digit
SELECT col1,col2
FROM
(
SELECT col1,col2,
CASE
WHEN LENGTH(RTRIM(TRANSLATE(ZIP , '*', ' 0123456789'))) = 0
THEN 0 ELSE 1
END as IsAllDigit
FROM S1234.PERSON
) AS Z
WHERE IsAllDigit=0
DB2 doesnot have regular expression facility like MySQL REGEXP
USE ISNUMERIC function;
ISUMERIC returns 1 if the parameter contains only numbers and zero if it not
EXAMPLE:
SELECT * FROM S1234.PERSON
WHERE ISNUMERIC(ZIP) = 1
Your statement doesn't validate against numbers but it says get everything that doesn't start with a space.
Let's suppose you ZIP code is a USA zip code, composed by 5 numbers.
db2 "with val as (
select *
from S1234.PERSON t
where xmlcast(xmlquery('fn:matches(\$ZIP,''^\d{5}$'')') as integer) = 1
)
select * from val"
For more information about xQuery:fn:matches: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.xml.doc/doc/xqrfnmat.html
mySql does not have a native isNumberic() function. This would be pretty straight-forward in Excel with the ISNUMBER() function, or in T-SQL with ISNUMERIC(), but neither work in MySQL so after a little searching around I came across this solution...
SELECT * FROM S1234.PERSON
WHERE ZIP REGEXP ('[0-9]')
Effectively we're processing a regular expression on the contents of the 'ZIP' field, it may seem like using a sledgehammer to crack a nut and I've no idea how performance would differ from a more simple approach but it worked and I guess that's the point.
I have made more error-prone version based on the solution https://stackoverflow.com/a/36211270/565525, added intermedia result, some examples:
select
test_str
, TRIM(TRANSLATE(replace(trim(test_str), ' ', 'x'), 'yyyyyyyyyyy', '0123456789'))
, case when length(TRIM(TRANSLATE(replace(trim(test_str), ' ', 'x'), 'yyyyyyyyyyy', '0123456789')))=5 then '5-digit-zip' else 'not 5d-zip' end is_zip
from (VALUES
(' 123 ' )
,(' abc ' )
,(' a12 ' )
,(' 12 3 ')
,(' 99435 ')
,('99323' )
) AS X(test_str)
;
The result for this example set is:
TEST_STR 2 IS_ZIP
-------- -------- -----------
123 yyy not 5d-zip
abc abc not 5d-zip
a12 ayy not 5d-zip
12 3 yyxy not 5d-zip
99435 yyyyy 5-digit-zip
99323 yyyyy 5-digit-zip
Try checking if there's a difference between lower case and upper case. Numerics and special chars will look the same:
SELECT *
FROM S1234.PERSON
WHERE UPPER(ZIP COLLATE Latin1_General_CS_AI ) = LOWER(ZIP COLLATE Latin1_General_CS_AI)
Here's a working example for the case where you'd want to check zip codes in a range. You could use this code for inspiration to make a simple single post code check, if you want:
if local_test_environment?
# SQLite supports GLOB which is similar to LIKE (which it only has limited support for), for matching in strings.
where("(zip_code NOT GLOB '*[^0-9]*' AND zip_code <> '') AND (CAST(zip_code AS int) >= :range_start AND CAST(zip_code AS int) <= :range_finish)", range_start: range_start, range_finish: range_finish)
else
# SQLServer supports LIKE with more advanced matching in strings than what SQLite supports.
# SQLServer supports TRY_PARSE which is non-standard SQL, but fixes the error SQLServer gives with CAST, namely: Conversion failed when converting the nvarchar value 'US-19803' to data type int.
where("(zip_code NOT LIKE '%[^0-9]%' AND zip_code <> '') AND (TRY_PARSE(zip_code AS int) >= :range_start AND TRY_PARSE(zip_code AS int) <= :range_finish)", range_start: range_start, range_finish: range_finish)
end
Use regex.
SELECT * FROM S1234.PERSON
WHERE ZIP REGEXP '\d+'

How to quickly compare many strings?

In SQL Server, I have a string column that contains numbers. Each entry I need is only one number so no parsing is needed. I need some way to find all rows that contain numbers from 400 to 450. Instead of doing:
...where my stringcolumn like '%400%' or stringcolumn like '%401%' or stringcolumn like '%402%' or ...
is there a better that can save on some typing?
There are also other values in these rows such as: '5335154', test4559#me.com', '555-555-5555'. Filtering those out will need to be taken into account.
...where stringcolumn like '4[0-4][0-9]' OR stringcolumn = '450'
You don't need the wildcard if you want to restrict to 3 digits.
Use regex to accomplish this.
...where stringcolumn like '4[0-4][0-9]' OR stringcolumn like '450'
one way
WHERE Column like '%4[0-4][09]%'
OR Column LIKE '%500%'
keep in mind that this will pick anything with the number in it, so 5000 will be returned as well
I would do the following:
select t.*
from (select t.*,
(case when charindex('4', col) > 0
then substrint(col, charindex('4', col), charindex('4', col) + 2)
end) as col4xx
from t
) t
where (case when isnumeric(col4xx) = 1
then (case when cast(col4xx as int) between 400 and 450 then 'true'
end)
end) = 'true'
I'm not a fan of having case statements in WHERE clauses. However, to ensure conversion to a number, this is needed (or the conversion could become a column in another subquery). Note that the following is not equivalent:
where col4xx between '400' and '450'
Since the string '44A' would match.

Specify order of (T)SQL execution

I have seen similar questions asked elsewhere on this site, but more in the context of optimization.
I am having an issue with the order of execution of the conditions in a WHERE clause. I have a field which stores codes, most of which are numeric but some of which contain non-numeric characters. I need to do some operations on the numeric codes which will cause errors if attempted on non-numeric strings. I am trying to do something like
WHERE isnumeric(code) = 1
AND CAST(code AS integer) % 2 = 1
Is there any way to make sure that the isnumeric() executes first? If it doesn't, I get an error...
Thanks in advance!
The only place order of evaluation is guaranteed is CASE
WHERE
CASE WHEN isnumeric(code) = 1
THEN CAST(code AS integer) % 2
END = 1
Also just because it passes the isnumeric test doesn't guarantee that it will successfully cast to an integer.
SELECT ISNUMERIC('$') /*Returns 1*/
SELECT CAST('$' AS INTEGER) /*Fails*/
Depending upon your needs you may find these alternatives preferable.
Why not simply do it using LIKE?:
Where Code Not Like '%[^0-9]%'
Btw, either using my solution or using IsNumeric, there are some edge cases which might lead one to using a UDF such as 1,234,567 where IsNumeric will return 1 but Cast will throw an exception.
Why not use a CASE statement to say something like:
WHERE
CASE WHEN isnumeric(code) = 1
THEN CAST(code AS int) % 2 = 1
ELSE /* What ever else if not numeric */ END
You could do it in a case statement in the select clause, then limit by the value in an outer select
select * from (
select
case when isNum = 1 then CAST(code AS integer) % 2 else 0 end as castVal
from (
select
Case when isnumeric(code) = 1 then 1 else 0 end as isNum
from table) t
) t2
where castval = 1