Oracle "Select Level from dual" does not work as expected with to_number result - sql

Why does
select *
from (
SELECT LEVEL as VAL
FROM DUAL
CONNECT BY LEVEL <= 1000
ORDER BY LEVEL
) n
left outer join (select to_number(trim(alphanumeric_column)) as nr from my_table
where NOT regexp_like (trim(alphanumeric_column),'[^[:digit:]]')) d
on n.VAL = d.nr
where d.nr is null
and n.VAL >= 100
throw a ORA-01722 invalid number (reason is the last row, n.VAL), whereas the similar version with numeric columns im my_table works fine:
select *
from (
SELECT LEVEL as VAL
FROM DUAL
CONNECT BY LEVEL <= 1000
ORDER BY LEVEL
) n
left outer join (select numeric_column as nr from my_table) d
on n.VAL = d.nr
where d.nr is null
and n.VAL >= 100
given that numeric_column is of type number and alphanumeric_column of type nvarchar_2. Note that the upper example works fine without the numerical comparison (n.VAL >= 100).
Does anybody know?

This problem was driving me crazy. I narrowed the problem to a simpler query
SELECT *
FROM (SELECT TO_NUMBER(TRIM (alphanumeric_column)) AS nr
FROM my_table
WHERE NOT REGEXP_LIKE (TRIM (alphanumeric_column), '[^[:digit:]]')) d
WHERE d.nr > 1
With alphanumeric_colum values of ('100','200','XXXX'); Running the above statement gave the "invalid number" error. I then made a slight change to the query to use the CAST function instead of TO_NUMBER:
SELECT *
FROM (SELECT CAST (TRIM (alphanumeric_column) AS NUMBER) AS nr
FROM my_table
WHERE NOT REGEXP_LIKE (TRIM (alphanumeric_column), '[^[:digit:]]')) d
WHERE d.nr > 1
And this correctly returned - 100, 200. I would think that those functions would be similar in behavior. It almost appears as though oracle is trying to evaluate the d.nr > 1 constraint before the view is constructed, which makes no sense. If anyone can shed light on why this is happening, I would be grateful. See SQLFiddle example
UPDATE: I did some more digging, because I don't like not knowing why something just works. I ran EXPLAIN PLAN on both queries and got some interesting results.
For the query that failed, the predicate information looks like this:
1 - filter(TO_NUMBER(TRIM("ALPHANUMERIC_COLUMN"))>1 AND NOT
REGEXP_LIKE (TRIM("ALPHANUMERIC_COLUMN"),'[^[:digit:]]'))
You will notice that the TO_NUMBER function is called first in the AND condition, then
the regexp to exclude alpha values. I am thinking oracle maybe does a short-circuit evaluation with the AND condition, and since it is executing TO_NUMBER first, it fails.
However, when we use the CAST function, the evaluation order is swapped, and the
regexp exclusion is evaluated first. Since for the alpha values, it is false, then the
second part of the AND clause is not evaluated, and the query works.
1 - filter( NOT REGEXP_LIKE (TRIM("ALPHANUMERIC_COLUMN"),'[^[:digit:]
]') AND CAST(TRIM("ALPHANUMERIC_COLUMN") AS NUMBER)>1)
Oracle can be strange sometimes.

I believe when it comes to the Predicate (where) clause, Oracle can/will reorder the entire plan as it sees fit. So with regard to the predicate, it will short-circuit (as OldProgrammer noted) the evaluation however it wants, and you wont be able to guarantee the exact order it occurs.
In your current SQL, you are depending on the predicate to remove non numbers. One option would be to not use "WHERE NOT regexp_like ..." and instead use regexp_substr with coalesce. For example:
create table t_tab2
(
col varchar2(10)
);
create index t_tab2_idx on t_tab2(col);
insert into t_tab2
select level from dual
connect by level <= 100;
insert into t_tab2 values ('123ABC456');
commit;
-- select values > 95 (96->100 exclude non numbers)
select d.* from
(
select COALESCE(TO_NUMBER(REGEXP_SUBSTR(trim(col), '^\d+$')), 0) as nr
from t_tab2
) d
where d.nr > 95;
This should run without throwing invalid number error. Note that the coalesce will return the number 0 for any non numbers coming from the data, you may want to change that based on your needs and data.

Related

Can we sort the characters of a string in SQL oracle? [duplicate]

I'm looking for a function that would sort chars in varchar2 alphabetically.
Is there something built-in into oracle that I can use or I need to create custom in PL/SQL ?
From an answer at http://forums.oracle.com/forums/thread.jspa?messageID=1791550 this might work, but don't have 10g to test on...
SELECT MIN(permutations)
FROM (SELECT REPLACE (SYS_CONNECT_BY_PATH (n, ','), ',') permutations
FROM (SELECT LEVEL l, SUBSTR ('&col', LEVEL, 1) n
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('&col')) yourtable
CONNECT BY NOCYCLE l != PRIOR l)
WHERE LENGTH (permutations) = LENGTH ('&col')
In the example col is defined in SQL*Plus, but if you make this a function you can pass it in, or could rework it to take a table column directly I suppose.
I'd take that as a start point rather than a solution; the original question was about anagrams so it's designed to find all permutations, so something similar but simplified might be possible. I suspect this doesn't scale very well for large values.
So eventually I went PL/SQL route, because after searching for some time I realized that there is no build-in function that I can use.
Here is what I came up with. Its based on the future of associative array which is that Oracle keeps the keys in sorted order.
create or replace function sort_chars(p_string in varchar2) return varchar deterministic
as
rv varchar2(4000);
ch varchar2(1);
type vcArray is table of varchar(4000) index by varchar2(1);
sorted vcArray;
key varchar2(1);
begin
for i in 1 .. length(p_string)
loop
ch := substr(p_string, i, 1);
if (sorted.exists(ch))
then
sorted(ch) := sorted(ch) || ch;
else
sorted(ch) := ch;
end if;
end loop;
rv := '';
key := sorted.FIRST;
WHILE key IS NOT NULL LOOP
rv := rv || sorted(key);
key := sorted.NEXT(key);
END LOOP;
return rv;
end;
Simple performance test:
set timing on;
create table test_sort_fn as
select t1.object_name || rownum as test from user_objects t1, user_objects t2;
select count(distinct test) from test_sort_fn;
select count (*) from (select sort_chars(test) from test_sort_fn);
Table created.
Elapsed: 00:00:01.32
COUNT(DISTINCTTEST)
-------------------
384400
1 row selected.
Elapsed: 00:00:00.57
COUNT(*)
----------
384400
1 row selected.
Elapsed: 00:00:00.06
You could use the following query:
select listagg(letter)
within group (order by UPPER(letter), ASCII(letter) DESC)
from
(
select regexp_substr('gfedcbaGFEDCBA', '.', level) as letter from dual
connect by regexp_substr('gfedcbaGFEDCBA', '.', level) is not null
);
The subquery splits the string into records (single character each) using regexp_substr, and the outer query merges the records into one string using listagg, after sorting them.
Here you should be careful, because alphabetical sorting depends on your database configuration, as Cine pointed.
In the above example the letters are sorted ascending "alphabetically" and descending by ascii code, which - in my case - results in "aAbBcCdDeEfFgG".
The result in your case may be different.
You may also sort the letters using nlssort - it would give you better control of the sorting order, as you would get independent of your database configuration.
select listagg(letter)
within group (order by nlssort(letter, 'nls_sort=german')
from
(
select regexp_substr('gfedcbaGFEDCBA', '.', level) as letter from dual
connect by regexp_substr('gfedcbaGFEDCBA', '.', level) is not null
);
The query above would give you also "aAbBcCdDeEfFgG", but if you changed "german" to "spanish", you would get "AaBbCcDdEeFfGg" instead.
You should remember that there is no common agreement what "alphabetically" means. It all depends on which country it is, and who is looking at your data and what context it is in.
For instance in DK, there are a large number of different sortings of a,aa,b,c,æ,ø,å
per the alphabet: a,aa,b,c,æ,ø,å
for some dictionary: a,aa,å,b,c,æ,ø
for other dictionaries: a,b,c,æ,ø,aa,å
per Microsoft standard: a,b,c,æ,ø,aa,å
check out http://www.siao2.com/2006/04/27/584439.aspx for more info. Which also happens to be a great blog for issues as these.
Assuming you don't mind having the characters returned 1 per row:
select substr(str, r, 1) X from (
select 'CAB' str,
rownum r
from dual connect by level <= 4000
) where r <= length(str) order by X;
X
=
A
B
C
For people using Oracle 10g, select listagg within group won't work. The accepted answer does work, but it generates every possible permutation of the input string, which results in terrible performance - my Oracle database struggles with an input string only 10 characters long.
Here is another alternative working for Oracle 10g. It's similar to Jeffrey Kemp's answer, only the result isn't splitted into rows:
select replace(wm_concat(ch), ',', '') from (
select substr('CAB', level, 1) ch from dual
connect by level <= length('CAB')
order by ch
);
-- output: 'ABC'
wm_concat simply concatenates records from different rows into a single string, using commas as separator (that's why we are also doing a replace later).
Please note that, if your input string had commas, they will be lost. Also, wm_concat is an undocumented feature, and according to this answer it has been removed in Oracle 12c. Use it only if you're stuck with 10g and don't have a better option (such as listagg, if you can use 11g instead).
From Oracle 12, you can use:
SELECT *
FROM table_name t
CROSS JOIN LATERAL (
SELECT LISTAGG(SUBSTR(t.value, LEVEL, 1), NULL) WITHIN GROUP (
ORDER BY SUBSTR(t.value, LEVEL, 1)
) AS ordered_value
FROM DUAL
CONNECT BY LEVEL <= LENGTH(t.value)
)
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'ZYX' FROM DUAL UNION ALL
SELECT 'HELLO world' FROM DUAL UNION ALL
SELECT 'aæøåbcæøåaæøå' FROM DUAL;
Outputs:
VALUE
ORDERED_VALUE
ZYX
XYZ
HELLO world
EHLLOdlorw
aæøåbcæøåaæøå
aabcåååæææøøø
If you want to change how the letters are sorted then use NLSSORT:
SELECT *
FROM table_name t
CROSS JOIN LATERAL (
SELECT LISTAGG(SUBSTR(t.value, LEVEL, 1), NULL) WITHIN GROUP (
ORDER BY NLSSORT(SUBSTR(t.value, LEVEL, 1), 'NLS_SORT = BINARY_AI')
) AS ordered_value
FROM DUAL
CONNECT BY LEVEL <= LENGTH(t.value)
)
Outputs:
VALUE
ORDERED_VALUE
ZYX
XYZ
HELLO world
dEHLLlOorw
aæøåbcæøåaæøå
aaåååbcøøøæææ
db<>fiddle here

Suggestion for the use of to_number in Oracle

Below query is giving error.
select sob.set_of_books_id, orginfo.org_information1
from
gl_sets_of_books sob,
hr_organization_information orginfo
where
sob.set_of_books_id = to_number(orginfo.org_information1);
The reason is set_of_books_id is number column and org_information1 is varchar which is containing string and also numeric string. so it is type mismatch. we have to pick only those values which has numeric string in org_information1
to overcome this we used regexp_like which will pick only record which are numeric.
select sob.set_of_books_id, orginfo.org_information1
from hr_organization_information orginfo, gl_sets_of_books sob
where sob.set_of_books_id = to_number(orginfo.org_information1)
and REGEXP_LIKE(orginfo.org_information1, '^[[:digit:]]+$');
We just added this line and REGEXP_LIKE(orginfo.org_information1, '^[[:digit:]]+$'); in previous query and it is working properly.
My question is even in the last query, we are using where clause and joining same condition which was failing in first query .and where will definetly run before AND. so why it is not failing? will it fail for some records? or the query is proper?
is there any better way to use the second query.
if I am not using existing where condition then it is giving error. i dont know what is the issue with below query.
We can not use to_char on set_of_books_id because that column contains index.and we cant modify to create func based index.We have to use index in our query
where will definitely run before and
No!
The optimizer is free to rearrange the conditions in your query. There's no guarantee it processes these top-to-bottom. This can lead to surprising effects if the plan changes (e.g. because you added/removes indexes).
Assuming you're on 12.2 or higher, instead of a regex you can use the on conversion error clause to map all the non-numeric values to null:
with rws as (
select level x,
case mod ( level, 3 )
when 0 then chr ( level+64 )
else to_char ( level )
end y
from dual
connect by level <= 10
)
select y from rws
where x = to_number ( y default null on conversion error );
Y
1
2
4
5
7
8
10
Use TO_CHAR on the number column rather than TO_NUMBER on the string column:
select sob.set_of_books_id,
orginfo.org_information1
from hr_organization_information orginfo
INNER JOIN gl_sets_of_books sob
ON ( TO_CHAR(sob.set_of_books_id) = orginfo.org_information1 );
Or, from Oracle 12.2, you can use TO_NUMBER(value DEFAULT NULL ON CONVERSION ERROR):
select sob.set_of_books_id,
orginfo.org_information1
from hr_organization_information orginfo
INNER JOIN gl_sets_of_books sob
ON ( sob.set_of_books_id
= TO_NUMBER(orginfo.org_information1 DEFAULT NULL ON CONVERSION ERROR)
);

Why is searching for the empty string so fast?

I noticed that running
select * from cdr_data where rownum < 10 and customer_tag = ''
is significantly faster than
select * from cdr_data where rownum < 10 and customer_tag = 'a'
What could be the cause?
Edit: Using Oracle 11g. There is no entry where customer_tag has data 'a'. Just to be clear. I am using a company tool to make these queries.
Here is a quote from the Oracle documentation (https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements005.htm):
Note: Oracle Database currently treats a character value with a length
of zero as null. However, this may not continue to be true in future
releases, and Oracle recommends that you do not treat empty strings
the same as nulls.
So your query of:
select * from cdr_data where rownum < 10 and customer_tag = ''
is equivalent to:
select * from cdr_data where rownum < 10 and customer_tag is null
One possible explanation: Perhaps Oracle can do an is null comparison faster than comparing two strings.

to_number from char sql

I have to select only the IDs which have only even digits (an ID looks like: p19 ,p20 etc). That is, p20 is good (both 2 and 0 are even digits); p18 is not.
I thought to use substr to get each number from the IDs and then see if it's even .
select from profs
where to_number(substr(id_prof,2,2))%2=0 and to_number(substr(id_prof,3,2))%2=0;
IF you need all rows consist of 'p' in beginning and even digits on tail It should look like:
select *
from profs
where regexp_like (id_prof, '^p[24680]+$');
with
profs ( prof_id ) as (
select 'p18' from dual union all
select 'p24' from dual union all
select 'p53' from dual
)
-- End of test data; what is above this line is NOT part of the solution.
-- The solution (SQL query) begins here.
select *
from profs
where length(prof_id) = length(translate(prof_id, '013579', '0'));
PROF_ID
-------
p24
This solution should work faster than anything using regular expressions. All it does is to replace 0 with itself and DELETE all odd digits from the input string. (The '0' is included due to a strange but documented behavior of translate() - the third argument can't be empty). If the length of the input string doesn't change after the translation, that means the input string didn't have any odd digits.
where mod(to_number(regexp_replace(id_prof, '[^[:digit:]]', '')),2) = 0

Finding rows that don't contain numeric data in Oracle

I am trying to locate some problematic records in a very large Oracle table. The column should contain all numeric data even though it is a varchar2 column. I need to find the records which don't contain numeric data (The to_number(col_name) function throws an error when I try to call it on this column).
I was thinking you could use a regexp_like condition and use the regular expression to find any non-numerics. I hope this might help?!
SELECT * FROM table_with_column_to_search WHERE REGEXP_LIKE(varchar_col_with_non_numerics, '[^0-9]+');
To get an indicator:
DECODE( TRANSLATE(your_number,' 0123456789',' ')
e.g.
SQL> select DECODE( TRANSLATE('12345zzz_not_numberee',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"contains char"
and
SQL> select DECODE( TRANSLATE('12345',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
and
SQL> select DECODE( TRANSLATE('123405',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
Oracle 11g has regular expressions so you could use this to get the actual number:
SQL> SELECT colA
2 FROM t1
3 WHERE REGEXP_LIKE(colA, '[[:digit:]]');
COL1
----------
47845
48543
12
...
If there is a non-numeric value like '23g' it will just be ignored.
In contrast to SGB's answer, I prefer doing the regexp defining the actual format of my data and negating that. This allows me to define values like $DDD,DDD,DDD.DD
In the OPs simple scenario, it would look like
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^[0-9]+$');
which finds all non-positive integers. If you wau accept negatiuve integers also, it's an easy change, just add an optional leading minus.
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+$');
accepting floating points...
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+(\.[0-9]+)?$');
Same goes further with any format. Basically, you will generally already have the formats to validate input data, so when you will desire to find data that does not match that format ... it's simpler to negate that format than come up with another one; which in case of SGB's approach would be a bit tricky to do if you want more than just positive integers.
Use this
SELECT *
FROM TableToSearch
WHERE NOT REGEXP_LIKE(ColumnToSearch, '^-?[0-9]+(\.[0-9]+)?$');
After doing some testing, i came up with this solution, let me know in case it helps.
Add this below 2 conditions in your query and it will find the records which don't contain numeric data
and REGEXP_LIKE(<column_name>, '\D') -- this selects non numeric data
and not REGEXP_LIKE(column_name,'^[-]{1}\d{1}') -- this filters out negative(-) values
Starting with Oracle 12.2 the function to_number has an option ON CONVERSION ERROR clause, that can catch the exception and provide default value.
This can be used for the test of number values. Simple set NULL when the conversion fails and filer all not NULL values.
Example
with num as (
select '123' vc_col from dual union all
select '1,23' from dual union all
select 'RV12P2000' from dual union all
select null from dual)
select
vc_col
from num
where /* filter numbers */
vc_col is not null and
to_number(vc_col DEFAULT NULL ON CONVERSION ERROR) is not null
;
VC_COL
---------
123
1,23
From http://www.dba-oracle.com/t_isnumeric.htm
LENGTH(TRIM(TRANSLATE(, ' +-.0123456789', ' '))) is null
If there is anything left in the string after the TRIM it must be non-numeric characters.
I've found this useful:
select translate('your string','_0123456789','_') from dual
If the result is NULL, it's numeric (ignoring floating point numbers.)
However, I'm a bit baffled why the underscore is needed. Without it the following also returns null:
select translate('s123','0123456789', '') from dual
There is also one of my favorite tricks - not perfect if the string contains stuff like "*" or "#":
SELECT 'is a number' FROM dual WHERE UPPER('123') = LOWER('123')
After doing some testing, building upon the suggestions in the previous answers, there seem to be two usable solutions.
Method 1 is fastest, but less powerful in terms of matching more complex patterns.
Method 2 is more flexible, but slower.
Method 1 - fastest
I've tested this method on a table with 1 million rows.
It seems to be 3.8 times faster than the regex solutions.
The 0-replacement solves the issue that 0 is mapped to a space, and does not seem to slow down the query.
SELECT *
FROM <table>
WHERE TRANSLATE(replace(<char_column>,'0',''),'0123456789',' ') IS NOT NULL;
Method 2 - slower, but more flexible
I've compared the speed of putting the negation inside or outside the regex statement. Both are equally slower than the translate-solution. As a result, #ciuly's approach seems most sensible when using regex.
SELECT *
FROM <table>
WHERE NOT REGEXP_LIKE(<char_column>, '^[0-9]+$');
You can use this one check:
create or replace function to_n(c varchar2) return number is
begin return to_number(c);
exception when others then return -123456;
end;
select id, n from t where to_n(n) = -123456;
I tray order by with problematic column and i find rows with column.
SELECT
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND,
D.COL1 AS COL1
FROM
VW_DATA_ALL_GC D
WHERE
(D.PERIOADA IN (:pPERIOADA)) AND
(D.FORM = 62)
AND D.COL1 IS NOT NULL
-- AND REGEXP_LIKE (D.COL1, '\[\[:alpha:\]\]')
-- AND REGEXP_LIKE(D.COL1, '\[\[:digit:\]\]')
--AND REGEXP_LIKE(TO_CHAR(D.COL1), '\[^0-9\]+')
GROUP BY
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND ,
D.COL1
ORDER BY
D.COL1