Finding rows that don't contain numeric data in Oracle - sql

I am trying to locate some problematic records in a very large Oracle table. The column should contain all numeric data even though it is a varchar2 column. I need to find the records which don't contain numeric data (The to_number(col_name) function throws an error when I try to call it on this column).

I was thinking you could use a regexp_like condition and use the regular expression to find any non-numerics. I hope this might help?!
SELECT * FROM table_with_column_to_search WHERE REGEXP_LIKE(varchar_col_with_non_numerics, '[^0-9]+');

To get an indicator:
DECODE( TRANSLATE(your_number,' 0123456789',' ')
e.g.
SQL> select DECODE( TRANSLATE('12345zzz_not_numberee',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"contains char"
and
SQL> select DECODE( TRANSLATE('12345',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
and
SQL> select DECODE( TRANSLATE('123405',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
Oracle 11g has regular expressions so you could use this to get the actual number:
SQL> SELECT colA
2 FROM t1
3 WHERE REGEXP_LIKE(colA, '[[:digit:]]');
COL1
----------
47845
48543
12
...
If there is a non-numeric value like '23g' it will just be ignored.

In contrast to SGB's answer, I prefer doing the regexp defining the actual format of my data and negating that. This allows me to define values like $DDD,DDD,DDD.DD
In the OPs simple scenario, it would look like
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^[0-9]+$');
which finds all non-positive integers. If you wau accept negatiuve integers also, it's an easy change, just add an optional leading minus.
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+$');
accepting floating points...
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+(\.[0-9]+)?$');
Same goes further with any format. Basically, you will generally already have the formats to validate input data, so when you will desire to find data that does not match that format ... it's simpler to negate that format than come up with another one; which in case of SGB's approach would be a bit tricky to do if you want more than just positive integers.

Use this
SELECT *
FROM TableToSearch
WHERE NOT REGEXP_LIKE(ColumnToSearch, '^-?[0-9]+(\.[0-9]+)?$');

After doing some testing, i came up with this solution, let me know in case it helps.
Add this below 2 conditions in your query and it will find the records which don't contain numeric data
and REGEXP_LIKE(<column_name>, '\D') -- this selects non numeric data
and not REGEXP_LIKE(column_name,'^[-]{1}\d{1}') -- this filters out negative(-) values

Starting with Oracle 12.2 the function to_number has an option ON CONVERSION ERROR clause, that can catch the exception and provide default value.
This can be used for the test of number values. Simple set NULL when the conversion fails and filer all not NULL values.
Example
with num as (
select '123' vc_col from dual union all
select '1,23' from dual union all
select 'RV12P2000' from dual union all
select null from dual)
select
vc_col
from num
where /* filter numbers */
vc_col is not null and
to_number(vc_col DEFAULT NULL ON CONVERSION ERROR) is not null
;
VC_COL
---------
123
1,23

From http://www.dba-oracle.com/t_isnumeric.htm
LENGTH(TRIM(TRANSLATE(, ' +-.0123456789', ' '))) is null
If there is anything left in the string after the TRIM it must be non-numeric characters.

I've found this useful:
select translate('your string','_0123456789','_') from dual
If the result is NULL, it's numeric (ignoring floating point numbers.)
However, I'm a bit baffled why the underscore is needed. Without it the following also returns null:
select translate('s123','0123456789', '') from dual
There is also one of my favorite tricks - not perfect if the string contains stuff like "*" or "#":
SELECT 'is a number' FROM dual WHERE UPPER('123') = LOWER('123')

After doing some testing, building upon the suggestions in the previous answers, there seem to be two usable solutions.
Method 1 is fastest, but less powerful in terms of matching more complex patterns.
Method 2 is more flexible, but slower.
Method 1 - fastest
I've tested this method on a table with 1 million rows.
It seems to be 3.8 times faster than the regex solutions.
The 0-replacement solves the issue that 0 is mapped to a space, and does not seem to slow down the query.
SELECT *
FROM <table>
WHERE TRANSLATE(replace(<char_column>,'0',''),'0123456789',' ') IS NOT NULL;
Method 2 - slower, but more flexible
I've compared the speed of putting the negation inside or outside the regex statement. Both are equally slower than the translate-solution. As a result, #ciuly's approach seems most sensible when using regex.
SELECT *
FROM <table>
WHERE NOT REGEXP_LIKE(<char_column>, '^[0-9]+$');

You can use this one check:
create or replace function to_n(c varchar2) return number is
begin return to_number(c);
exception when others then return -123456;
end;
select id, n from t where to_n(n) = -123456;

I tray order by with problematic column and i find rows with column.
SELECT
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND,
D.COL1 AS COL1
FROM
VW_DATA_ALL_GC D
WHERE
(D.PERIOADA IN (:pPERIOADA)) AND
(D.FORM = 62)
AND D.COL1 IS NOT NULL
-- AND REGEXP_LIKE (D.COL1, '\[\[:alpha:\]\]')
-- AND REGEXP_LIKE(D.COL1, '\[\[:digit:\]\]')
--AND REGEXP_LIKE(TO_CHAR(D.COL1), '\[^0-9\]+')
GROUP BY
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND ,
D.COL1
ORDER BY
D.COL1

Related

Equivalent of SELECT 0 as something but for strings. Is it SELECT '' as something?

What is the equivalent of:
SELECT 0 as foo;
but for strings, is it:
SELECT '' as bar;
?
For more context, this is for a UNION ALL query
[edit]
SELECT NULL is what I was looking for.
In the top query I was doing: SELECT COUNT(DISTINCT(bar)),
But empty strings counted as a distinct char, NULL doesn't.
If you want an "equivalent" of SELECT 0 in string form, then it would be SELECT '0'::int.
If you are looking for a non-null string to represent 'no data,' SELECT '' AS something would suffice
When you call UNION ALL, the expectation is that the columns would align and the data sets would be concatenated along their column orders. Since you want to UNION ALL an integer and char, you will get an error:
edb=# select 0 as something union all select '' as something;
ERROR: invalid input syntax for type integer: ""
LINE 1: select 0 as something union all select '' as something;
^
Therefore, in order to UNION ALL, you'll want to cast your 0 as a char:
edb=# select 0::char as something union all select '' as something;
something
-----------
0
(2 rows)
Caveat
I'm not sure if this is really what you want to do, but if you're looking to UNION ALL the two sets, that's how you'd do it. However, there's a chance this would open up a can of worms -- is '' going to be considered equivalent to 0? And what will you do with non-zero values? Will you still cast those integers into strings? I think you'll need to think through those implications and try to find a way to sanitize your data.
In general, it might be better to work with NULL values if your design allows for it

How to replace characters at specific position in several words using REGEX_REPLACE

I have a query similar to this:
SELECT YEAR_CODE FROM YEAR_CODES
and it returns several records: typically 1 but sometimes 2 or 3. The returned records look like this: 2018FOO, 2019BAR
I need to get the matching previous year of the returned codes. For instance:
2018FOO becomes 2017FOO
2019BAR becomes 2018BAR
Looking for something similar to:
REGEX_REPLACE(SELECT YEAR_CODE FROM YEAR_CODES, 4th character, 4th character minus 1)
You don't need regexp_replace(), using substr() string operator with concat() function (or concatenation operators ||) is enough :
with year_codes(year_code) as
(
select '2018FOO' from dual union all
select '2019BAR' from dual
)
select concat(substr(year_code,1,4) - 1,substr(year_code,-3)) as year_code
from year_codes;
YEAR_CODE
---------
2017FOO
2018BAR
to_number() conversion is redundant, since Oracle implicitly considers a string as a number which is completely composed of digits for an arithmetic operation.
You can do use string operations:
with c as (
<your query here>
)
select
from year_code yc
where to_number(substr(yc.code, 1, 4)) = to_number(substr(c.code)) - 1 and
substr(yc.code, 5) = substr(c.code, 5)

PL SQL replace conditionally suggestion

I need to replace the entire word with 0 if the word has any non-digit character. For example, if digital_word='22B4' then replace with 0, else if digital_word='224' then do not replace.
SELECT replace_funtion(digital_word,'has non numeric character pattern',0,digital_word)
FROM dual;
I tried decode, regexp_instr, regexp_replace but could not come up with the right solution.
Please advise.
Thank you.
the idea is simple - you need check if the value is numeric or not
script:
with nums as
(
select '123' as num from dual union all
select '456' as num from dual union all
select '7A9' as num from dual union all
select '098' as num from dual
)
select n.*
,nvl2(LENGTH(TRIM(TRANSLATE(num, ' +-.0123456789', ' '))),'0',num)
from nums n
result
1 123 123
2 456 456
3 7A9 0
4 098 098
see more articles below to see which way is better to you
How can I determine if a string is numeric in SQL?
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:15321803936685
How to tell if a value is not numeric in Oracle?
You might try the following:
SELECT CASE WHEN REGEXP_LIKE(digital_word, '\D') THEN '0' ELSE digital_word END
FROM dual;
The regular expression class \D matches any non-digit character. You could also use [^0-9] to the same effect:
SELECT CASE WHEN REGEXP_LIKE(digital_word, '\D') THEN '0' ELSE digital_word END
FROM dual;
Alternately you could see if the value of digital_word is made up of nothing but digits:
SELECT CASE WHEN REGEXP_LIKE(digital_word, '^\d+$') THEN digital_word ELSE '0' END
FROM dual;
Hope this helps.
The fastest way is to replace all digits with null (to simply delete them) and see if anything is left. You don't need regular expressions (slow!) for this, you just need the standard string function TRANSLATE().
Unfortunately, Oracle has to work around their own inconsistent treatment of NULL - sometimes as empty string, sometimes not. In the case of the TRANSLATE() function, you can't simply translate every digit to nothing; you must also translate a non-digit character to itself, so that the third argument is not an empty string (which is treated as a real NULL, as in relational theory). See the Oracle documentation for the TRANSLATE() function. https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions216.htm#SQLRF06145
Then, the result can be obtained with a CASE expression (or various forms of NULL handling functions; I prefer CASE, which is SQL Standard):
with
nums ( num ) as (
select '123' from dual union all
select '-56' from dual union all
select '7A9' from dual union all
select '0.9' from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your own table and column names.
select num,
case when translate(num, 'z0123456789', 'z') is null
then num
else '0'
end as result
from nums
;
NUM RESULT
--- ------
123 123
-56 0
7A9 0
0.9 0
Note: everything here is in varchar2 data type (or some other kind of string data type). If the results should be converted to number, wrap the entire case expression within TO_NUMBER(). Note also that the strings '-56' and '0.9' are not all-digits (they contain non-digits), so the result is '0' for both. If this is not what you needed, you must correct the problem statement in the original post.
Something like the following update query will help you:
update [table] set [col] = '0'
where REGEXP_LIKE([col], '.*\D.*', 'i')

to_number from char sql

I have to select only the IDs which have only even digits (an ID looks like: p19 ,p20 etc). That is, p20 is good (both 2 and 0 are even digits); p18 is not.
I thought to use substr to get each number from the IDs and then see if it's even .
select from profs
where to_number(substr(id_prof,2,2))%2=0 and to_number(substr(id_prof,3,2))%2=0;
IF you need all rows consist of 'p' in beginning and even digits on tail It should look like:
select *
from profs
where regexp_like (id_prof, '^p[24680]+$');
with
profs ( prof_id ) as (
select 'p18' from dual union all
select 'p24' from dual union all
select 'p53' from dual
)
-- End of test data; what is above this line is NOT part of the solution.
-- The solution (SQL query) begins here.
select *
from profs
where length(prof_id) = length(translate(prof_id, '013579', '0'));
PROF_ID
-------
p24
This solution should work faster than anything using regular expressions. All it does is to replace 0 with itself and DELETE all odd digits from the input string. (The '0' is included due to a strange but documented behavior of translate() - the third argument can't be empty). If the length of the input string doesn't change after the translation, that means the input string didn't have any odd digits.
where mod(to_number(regexp_replace(id_prof, '[^[:digit:]]', '')),2) = 0

Oracle "Select Level from dual" does not work as expected with to_number result

Why does
select *
from (
SELECT LEVEL as VAL
FROM DUAL
CONNECT BY LEVEL <= 1000
ORDER BY LEVEL
) n
left outer join (select to_number(trim(alphanumeric_column)) as nr from my_table
where NOT regexp_like (trim(alphanumeric_column),'[^[:digit:]]')) d
on n.VAL = d.nr
where d.nr is null
and n.VAL >= 100
throw a ORA-01722 invalid number (reason is the last row, n.VAL), whereas the similar version with numeric columns im my_table works fine:
select *
from (
SELECT LEVEL as VAL
FROM DUAL
CONNECT BY LEVEL <= 1000
ORDER BY LEVEL
) n
left outer join (select numeric_column as nr from my_table) d
on n.VAL = d.nr
where d.nr is null
and n.VAL >= 100
given that numeric_column is of type number and alphanumeric_column of type nvarchar_2. Note that the upper example works fine without the numerical comparison (n.VAL >= 100).
Does anybody know?
This problem was driving me crazy. I narrowed the problem to a simpler query
SELECT *
FROM (SELECT TO_NUMBER(TRIM (alphanumeric_column)) AS nr
FROM my_table
WHERE NOT REGEXP_LIKE (TRIM (alphanumeric_column), '[^[:digit:]]')) d
WHERE d.nr > 1
With alphanumeric_colum values of ('100','200','XXXX'); Running the above statement gave the "invalid number" error. I then made a slight change to the query to use the CAST function instead of TO_NUMBER:
SELECT *
FROM (SELECT CAST (TRIM (alphanumeric_column) AS NUMBER) AS nr
FROM my_table
WHERE NOT REGEXP_LIKE (TRIM (alphanumeric_column), '[^[:digit:]]')) d
WHERE d.nr > 1
And this correctly returned - 100, 200. I would think that those functions would be similar in behavior. It almost appears as though oracle is trying to evaluate the d.nr > 1 constraint before the view is constructed, which makes no sense. If anyone can shed light on why this is happening, I would be grateful. See SQLFiddle example
UPDATE: I did some more digging, because I don't like not knowing why something just works. I ran EXPLAIN PLAN on both queries and got some interesting results.
For the query that failed, the predicate information looks like this:
1 - filter(TO_NUMBER(TRIM("ALPHANUMERIC_COLUMN"))>1 AND NOT
REGEXP_LIKE (TRIM("ALPHANUMERIC_COLUMN"),'[^[:digit:]]'))
You will notice that the TO_NUMBER function is called first in the AND condition, then
the regexp to exclude alpha values. I am thinking oracle maybe does a short-circuit evaluation with the AND condition, and since it is executing TO_NUMBER first, it fails.
However, when we use the CAST function, the evaluation order is swapped, and the
regexp exclusion is evaluated first. Since for the alpha values, it is false, then the
second part of the AND clause is not evaluated, and the query works.
1 - filter( NOT REGEXP_LIKE (TRIM("ALPHANUMERIC_COLUMN"),'[^[:digit:]
]') AND CAST(TRIM("ALPHANUMERIC_COLUMN") AS NUMBER)>1)
Oracle can be strange sometimes.
I believe when it comes to the Predicate (where) clause, Oracle can/will reorder the entire plan as it sees fit. So with regard to the predicate, it will short-circuit (as OldProgrammer noted) the evaluation however it wants, and you wont be able to guarantee the exact order it occurs.
In your current SQL, you are depending on the predicate to remove non numbers. One option would be to not use "WHERE NOT regexp_like ..." and instead use regexp_substr with coalesce. For example:
create table t_tab2
(
col varchar2(10)
);
create index t_tab2_idx on t_tab2(col);
insert into t_tab2
select level from dual
connect by level <= 100;
insert into t_tab2 values ('123ABC456');
commit;
-- select values > 95 (96->100 exclude non numbers)
select d.* from
(
select COALESCE(TO_NUMBER(REGEXP_SUBSTR(trim(col), '^\d+$')), 0) as nr
from t_tab2
) d
where d.nr > 95;
This should run without throwing invalid number error. Note that the coalesce will return the number 0 for any non numbers coming from the data, you may want to change that based on your needs and data.