how do i filter out non-numeric values in a text field in teradata? - sql

oI have a teradata table with about 10 million records in it, that stores a numeric id field as a varchar. i need to transfer the values in this field to a bigint column in another table, but i can't simply say cast(id_field as bigint) because i get an invalid character error. looking through the values, i find that there could be a character at any position in the string, so let's say the string is varchar(18) i could filter out invalid rows like so :
where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)
etc, etc...
then the cast would work, but this is not feasible in the long run. it's slow and if the string has 18 possible characters, it makes the query unreadable. how can i filter out rows that have a value in this field that will not cast as a bigint without checking each character individually for an array of non-numeric characters?
example values would be
123abc464
a2.3v65
a_356087
........
000000000
BOB KNIGHT
1235468099
the values follow no specific patterns, I simply need to filter out the ones that contain ANY non-numeric data.
123456789 is okay but 123.abc_c3865 is not...

Starting with TD14 Teradata added some functions, now there are multiple ways, e.g.:
WHERE RTRIM(col, '0123456789') = ''
But the easiest way is TO_NUMBER, which returns NULL for bad data:
TO_NUMBER(col)

The best that I've ever managed is this:
where char2hexint(upper(id_field)) = char2hexint(lower(id_field))
Since upper case characters give a different hex value to lower case ones, this will ensure that you have no alphabetical characters, but will still leave you with underscores, colons and so forth. If this doesn't meet your requirements, you may need to write an UDF.

could we also try to divide the values in the field by some integer "if divided then must be a number and if not and throws some error,then must have some character...." guess this would be lot fast as has just mathematics involved...

I've faced the same issue to try to exclude alpha characters from street address house numbers. The following will work if you don't mind concatanating all the numeric numbers together......
It checks if the upper of a string equals the lower of the string, if so it's a number, if not it becomes null.
select cast(case when upper(substring('12E'from 1 for 1)) = lower(substring('12E'from 1 for 1)) then substring('12E'from 1 for 1) else null end ||
case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end ||
case when upper(substring('12E'from 3 for 1)) = lower(substring('12E'from 3 for 1)) then substring('12E'from 3 for 1) else null end ||
case when upper(substring('12E'from 4 for 1)) = lower(substring('12E'from 4 for 1)) then substring('12E'from 4 for 1) else null end ||
case when upper(substring('12E'from 5 for 1)) = lower(substring('12E'from 5 for 1)) then substring('12E'from 5 for 1) else null end ||
case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end
as integer)

Try using this code segment
WHERE id_Field NOT LIKE '%[^0-9]%'

I found lins314159 answer to be very helpful with a similar issue. It may be an old thread but for what it's worth, I used:
char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9')
to successfully cast the remaining VARCHAR results to INT

SELECT customer_id
FROM t
WHERE UPPER(customer_id)(CASESPECIFIC) <>
LOWER(customer_id)(CASESPECIFIC);
This works perfectly fine to check whether the values in a numeric field is non-numeric.

SELECT id_field
WHERE oTranslate(id_field, '0123456789','')<>'';
This works well for me! It reveals any id_field containing a non-numeric value

Related

Check specific integer is even

I want to know if the 4th integer in the ID, is even, or if its odd.
If the 4th number is even (if the number is either 0,2,4,6,8 I want to put the ID into a new column named 'even'
IF the 4th number is odd, the column should have the name 'Odd'
select ID as 'Female'
from Users2
where ID LIKE '%[02468]'
This shows if any of the numbers are even. I want to specify the 4th number
Try this:
select *, OddOrEven = iif(substring(ID,4,1) in ('0','2','4','6','8') , 'Even', 'Odd') from Users2
This will tell you whether the 4th character is Odd or Even.
This is of course assuming that the 4th character of ID column will be numeric.
To make it permanently part of the table, you can add a computed column as shown below.
alter table Users2
add OddOrEven as iif(substring(ID,4,1) in ('0','2','4','6','8'), 'Even', 'Odd')
Substring the character you are interested in
Convert to an int
Check whether modulus 2 returns 0 (i.e. even).
select id
, case when convert(int,substring(id, 4, 1)) % 2 = 0 then 'Even' else 'Odd' end
from Users;
Example:
select id
, case when convert(int,substring(id, 4, 1)) % 2 = 0 then 'Even' else 'Odd' end
from (values ('4545-4400'), ('4546-4400')) X (id);
Returns
id
4545-4400
Odd
4546-4400
Even
Thats assuming there is always a 4th character. If not you would need to check for it.
You were close, but only need to check a single character against a set of characters:
where Substring( Id, 4, 1 ) like '[02468]'
Note that there is no wildcard (%) in the pattern.
It can be used in an expression like:
case when Substring( Id, 4, 1 ) like '[02468]' then 'Even' else 'Odd' end as Oddity

Miscellaneous Star Looking character in SQL

I have a query in which display values are returned. Some of these values that are coming back have an undesired apostrophe before the number value and some have a weird looking character before or after the number value. The display value is varchar and of varying length. Here a is an example of what I am talking about for the special character:
What is this character and how can I strip it out using the replace() function?
I tried the following and got the following result:
substring(b.dsply_val, 5, 1) gives the character in question
UNICODE(SUBSTRING(b.dsply_val, 5, 1)) gives a result of 164
UPDATE
I am thinking of doing something like this, since results are inserted into a variable temp table:
, CASE WHEN unicode(substring(b.dsply_val, 1, 1)) = 164 THEN 1 ELSE 0 END AS POS_1
, CASE WHEN unicode(substring(b.dsply_val, 2, 1)) = 164 THEN 1 ELSE 0 END AS POS_2
, CASE WHEN unicode(substring(b.dsply_val, 3, 1)) = 164 THEN 1 ELSE 0 END AS POS_3
, CASE WHEN unicode(substring(b.dsply_val, 4, 1)) = 164 THEN 1 ELSE 0 END AS POS_4
, CASE WHEN unicode(substring(b.dsply_val, 5, 1)) = 164 THEN 1 ELSE 0 END AS POS_5
so if POS_1 = 1 then where clause it out in the final result set.
Use ASCII() in each and every character in the string, then update the question with the results so we can see the pattern.
Use the value returned by ASCII in the CHAR() function for replacement, i.e.
ASCII(bad character) returns x
REPLACE(string,CHAR(x),' ') - replaces that character with a space
or
REPLACE(string,CHAR(x),'*') - replaces that character with an asterisk
Also, contact whoever fills that field; see what they think they're putting in and how it should be displayed!
I'm not sure that this will help but you might be able to get the unicode code for that character and then replace it.
-- Test variable
DECLARE #test VARCHAR(10) = '12345ΒΆ'
-- Get the code
SELECT UNICODE(SUBSTRING(#test, 6, 1))
-- Replace
SELECT REPLACE(#test, CHAR(182), 'Z')

SQL - Changing data type of an alphanumeric column

I'm on Teradata. I have an ID column that looks like this:
23
34
W7
007
021
90
GS8
I want to convert the numbers to numeric so the 007 should be 7 and 021 be 21. When a number is stored as a string, I usually do column * 1 to convert to numeric but in this case it gives me a bad character error since there are letters in there.
How would I do this in a select statement within a query?
Assuming that numeric values always start with a number, then something like this should work:
update t
set col = (case when substr(col, 1, 1) between '0' and '9'
then cast(cast(col as int) as varchar(255))
else col
end);
Or, you can forget the conversion and do:
update t
set col = trim(leading '0' from col);
Note: both of these assume that if the first character is a digit then the whole string comprises digits. The second assumes that the values are not all zeroes (or, more specifically, that returns the empty string).
Simply use TO_NUMBER(col) which returns NULL when the cast fails.

How do identify the first character of a string as numeric or character in SQL

I need to identify the first character in my data as numeric or character in SQL Server. I am relatively new to this and I don't know where to begin on this one. But here is what I have done to this point. I had data that looked like this:
TypeDep
Transfer From 4Z2
Transfer From BZZ
Transfer From 123
Transfer From abc
I used the right function to remove the 'transfer from' and isolate the data I need to check.
UPDATE #decode
SET firstPartType = Right(z.TypeDep,17)
FROM #decode z
where z.TypeDep like 'TRANSFER FROM%'
firstPartType
4Z2
BZZ
123
abc
Now I need to add a column identifying the first character in the string. Producing the results below.
firstPartType SecondPartType
4Z2 Numeric
BZZ Alpha
123 Numeric
abc Alpha
Using LEFT and ISNUMERIC(), however be aware that ISNUMERIC thinks some additional characters such as . are numeric
UPDATE #decode
SET SecondPartType =
CASE WHEN ISNUMERIC(LEFT(firstPartType, 1)) = 1 THEN'Numeric'
ELSE 'Alpha'
END
FROM #decode;
A more robust approach is to use the limited regex functionality of sql server. ISNUMERIC will return false positives for single characters like .,$ to name a few.
SELECT
CASE WHEN left(firstPartType, 1) like '[0-9]' THEN 'Numeric'
ELSE 'Alpha'
END AS SecondPartType
I think this should work:
SELECT
CASE WHEN ISNUMERIC(SUBSTRING(firstPartType, 1, 1)) = 1
THEN 'Numeric'
ELSE 'Alpha'
END AS 'SecondPartType'
FROM TABLE
you can use this command
ISNUMERIC(LEFT(firstPartType, 1))
this return 1 if the first character is a Numbert
0 if isn't.
i think is all you need
You could try:
UPDATE #decode
SET SecondPartType =
CASE
WHEN LEFT(firstPartType, 1) IN ('0','1','2','3','4','5','6','7','8','9')
THEN'Numeric'
ELSE 'Alpha'
END
FROM #decode;
select ISNUMERIC(left('4ello world',1)) will be a "1" if the first character is a number.

checking for invalid values sql

I have a stored procedure where I have a condition to check whether a Rating Code is 1,2 or 3 in the where clause. Something like this:
WHERE
CONVERT(INT, LEFT(RatingCode, 1)) IN (1,2,3) AND
At times when there are bad values in RatingCode column the above line throws an error. Hence I came up with the below solution:
WHERE
CASE WHEN ISNUMERIC(LEFT(RatingCode, 1)) = 1
THEN CASE WHEN CONVERT(INT, LEFT(RatingCode, 1)) IN (1,2,3)
THEN 1
ELSE 0
END
ELSE 0
END = 1 AND
Here if there is an invalid value(non numeric) in RatingCode column then I want to ignore that record. Is my above solution a good one? Or is there any better solution?
In that specific case, you could also just use
WHERE
LEFT(RatingCode, 1) IN ('1','2','3') AND
Besides that, also string comparisons are allowed in tsql.
WHERE
LEFT(RatingCode, 1) BETWEEN '1' AND '3' AND
This does not throw an error for non-numeric letters.