ORACLE - Select Count on a Subquery - sql

I've got an Oracle table that holds a set of ranges (RangeA and RangeB). These columns are varchar as they can hold both numeric and alphanumeric values, like the following example:
ID|RangeA|RangeB
1 | 10 | 20
2 | 21 | 30
3 | AB50 | AB70
4 | AB80 | AB90
I need to to do a query that returns only the records that have numeric values, and perform a Count on that query. So far I've tried doing this with two different queries without any luck:
Query 1:
SELECT COUNT(*) FROM (
SELECT RangeA, RangeB FROM table R
WHERE upper(R.RangeA) = lower(R.RangeA)
) A
WHERE TO_NUMBER(A.RangeA) <= 10
Query 2:
WITH A(RangeA,RangeB) AS(
SELECT RangeA, RangeB FROM table
WHERE upper(RangeA) = lower(RangeA)
)
SELECT COUNT(*) FROM A WHERE TO_NUMBER(A.RangeA) <= 10
The subquery is working fine as I'm getting the two records that have only numeric values, but the COUNT part of the query is failing. I should be getting only 1 on the count, but instead I'm getting the following error:
ORA-01722: invalid number
01722. 00000 - "invalid number"
What am I doing wrong? Any help is much appreciated.

You can test each column with a regular expression to determine if it is a valid number:
SELECT COUNT(1)
FROM table_of_ranges
WHERE CASE WHEN REGEXP_LIKE( RangeA, '^-?\d+(\.\d*)?$' )
THEN TO_NUMBER( RangeA )
ELSE NULL END
< 10
AND REGEXP_LIKE( RangeB, '^-?\d+(\.\d*)?$' );
Another alternative is to use a user-defined function:
CREATE OR REPLACE FUNCTION test_Number (
str VARCHAR2
) RETURN NUMBER DETERMINISTIC
AS
invalid_number EXCEPTION;
PRAGMA EXCEPTION_INIT(invalid_number, -6502);
BEGIN
RETURN TO_NUMBER( str );
EXCEPTION
WHEN invalid_number THEN
RETURN NULL;
END test_Number;
/
Then you can do:
SELECT COUNT(*)
FROM table_of_ranges
WHERE test_number( RangeA ) <= 10
AND test_number( RangeB ) IS NOT NULL;

Try this query:
SELECT COUNT(*)
FROM table R
WHERE translate(R.RangeA, 'x0123456789', 'x') = 'x' and
translate(R.RangeB, 'x0123456789', 'x') = 'x'
First, you don't need the subquery for this purpose. Second, using to_number() or upper()/lower() are prone to other problems. The function translate() replaces each character in the second argument with values from the third argument. In this case, it removes numbers. If nothing is left over, then the original value was an integer.
You can do more sophisticated checks for negative values and floating point numbers, but the example in the question seemed to be about positive integer values.

Coming to this question almost four years later (obviously, pointed here from a much newer thread). The other answers show how to achieve the desired output, but do not answer the OP's question, which was "what am I doing wrong?"
You are not doing anything wrong. Oracle is doing something wrong. It is "pushing" the predicate (the WHERE condition) from the outer query into the inner query. Pushing predicates is one of the most basic ways in which the Optimizer makes queries more efficient, but in some cases (and the question you ask is a PERFECT illustration) the result is not, in fact, logically equivalent to the original query.
There are ways to prevent the Optimizer from pushing predicates; or you can write the query in a better way (as shown in the other answers). But if you wanted to know why you saw what you saw, this is why.

Related

Prevent ORA-01722: invalid number in Oracle

I have this query
SELECT text
FROM book
WHERE lyrics IS NULL
AND MOD(TO_NUMBER(SUBSTR(text,18,16)),5) = 1
sometimes the string is something like this $OK$OK$OK$OK$OK$OK$OK, sometimes something like #P,351811040302663;E,101;D,07112018134733,07012018144712;G,4908611,50930248,207,990;M,79379;S,0;IO,3,0,0
if I would like to know if it is possible to prevent ORA-01722: invalid number, because is some causes the char in that position is not a number.
I run this query inside a procedure a process all the rows in a cursor, if 1 row is not a number I can't process any row
You could use VALIDATE_CONVERSION if it's Oracle 12c Release 2 (12.2),
WITH book(text) AS
(SELECT '#P,351811040302663;E,101;D,07112018134733,07012018144712;G,4908611,50930248,207,990;M,79379;S,0;IO,3,0,0'
FROM DUAL
UNION ALL SELECT '$OK$OK$OK$OK$OK$OK$OK'
FROM DUAL
UNION ALL SELECT '12I45678912B456781234567812345671'
FROM DUAL)
SELECT *
FROM book
WHERE CASE
WHEN VALIDATE_CONVERSION(SUBSTR(text,18,16) AS NUMBER) = 1
THEN MOD(TO_NUMBER(SUBSTR(text,18,16)),5)
ELSE 0
END = 1 ;
Output
TEXT
12I45678912B456781234567812345671
Assuming the condition should be true if and only if the 16-character substring starting at position 18 is made up of 16 digits, and the number is equal to 1 modulo 5, then you could write it like this:
...
where .....
and case when translate(substr(text, 18, 16), 'z0123456789', 'z') is null
and substr(text, 33, 1) in ('1', '6')
then 1 end
= 1
This will check that the substring is made up of all-digits: the translate() function will replace every occurrence of z in the string with itself, and every occurrence of 0, 1, ..., 9 with nothing (it will simply remove them). The odd-looking z is needed due to Oracle's odd implementation of NULL and empty strings (you can use any other character instead of z, but you need some character so no argument to translate() is NULL). Then - the substring is made up of all-digits if and only if the result of this translation is null (an empty string). And you still check to see if the last character is 1 or 6.
Note that I didn't use any regular expressions; this is important if you have a large amount of data, since standard string functions like translate() are much faster than regular expression functions. Also, everything is based on character data type - no math functions like mod(). (Same as in Thorsten's answer, which was only missing the first part of what I suggested here - checking to see that the entire substring is made up of digits.)
SELECT text
FROM book
WHERE lyrics IS NULL
AND case when regexp_like(SUBSTR(text,18,16),'^[^a-zA-Z]*$') then MOD(TO_NUMBER(SUBSTR(text,18,16)),5)
else null
end = 1;

Oracle SQL: Filtering rows with non-numeric characters

My question is very similar to this one: removing all the rows from a table with columns A and B, where some records include non-numeric characters (looking like '1234#5' or '1bbbb'). However, the solutions I read around don't seem to work for me. For example,
SELECT count(*) FROM tbl
--962060;
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[^0-9]') OR REGEXP_like(B,'[^0-9]') ) ;
--17
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[0-9]') and REGEXP_like(B,'[0-9]') )
;
--962060
From the 3rd query, I'd expect to see (962060-17)=962043. Why is it still 962060? An alternative query like this also gives the same answer:
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[[:digit:]]')and REGEXP_like(B,'[[:digit:]]') )
;
--962060
Of course, I could bypass the problem by doing query1 minus query2, but I'd like to learn how to do that using regular expressions.
If you use regexp you should take in account that any part of string may be matched as regexp. According your example you should specify that whole string should cntain only numbers ^ - is the beginig of string $ - is the end. And you may use \d- is digits
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^[0-9]+$') and REGEXP_like(B,'^[0-9]+$') )
or
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^\d+$') and REGEXP_like(B,'^\d+$') )
I know you specifically asked for a regex solution, but translate can solve these kind of questions as well (and usually faster because regexes use more processing power):
select count(1)
from tbl
where translate(a, 'x0123456789', 'x') is null
and translate(b, 'x0123456789', 'x') is null;
What this does: translate the characters 0123456789 to null, and if the result is null, then the input must have been all digits. The 'x' is just there because the third argument to translate can not be null.
Thought I should add this here, might be helpful to other readers.

SQL pattern matching

I have a question related to SQL.
I want to match two fields for similarities and return a percentage on how similar it is.
For example if I have a field called doc, which contains the following
This is my first assignment in SQL
and in another field I have something like
My first assignment in SQL
I want to know how I can check the similarities between the two and return by how much percent.
I did some research and wanted a second opinion plus I never asked for source code. Ive looked at Soundex(), Difference(), Fuzzy string matching using Levenshtein distance algorithm.
You didn't say what version of Oracle you are using. This example is based on 11g version.
You can use edit_distance function of utl_match package to determine how many characters you need to change in order to turn one string to another. greatest function returns the greatest value in the list of passed in parameters. Here is an example:
-- sample of data
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
-- the query
select trunc(((greatest(length(col1), length(col2)) -
(utl_match.edit_distance(col2, col1))) * 100) /
greatest(length(col1), length(col2)), 2) as "%"
from t1
result:
%
----------
70.58
Addendum
As #jonearles correctly pointed out, it is much simpler to use edit_distance_similarity function of utl_match package.
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
select utl_match.edit_distance_similarity(col1, col2) as "%"
from t1
;
Result:
%
----------
71

Oracle where condition priority

I've a table with a varchar column (A) and another integer column(B) indicating the type of data present in A. If B is 0, then A will always contain numeric digits.
So when I form an sql like this
SELECT COUNT(*) FROM TAB WHERE B = 0 AND TO_NUMBER(A) = 123;
I get an exception invalid number.
I expect B = 0 to be evaluated first, and then TO_NUMBER(A) second, but from the above scenario I suspect TO_NUMBER(A) is evaluated first. Is my guess correct?
In contrast to programming languages like C, C#, Java etc., SQL doesn't have so called conditional logical operators. For conditional logical operators, the right operand is only evaluated if it can influence the result. So the evaluation of && stops if the left operand returns false. For || it stops if the left operand returns true.
In SQL, both operands are always evaluated. And it's up to the query optimizer to choose which one is evaluated first.
I propose you create the following function, which is useful in many cases:
FUNCTION IS_NUMBER(P_NUMBER VARCHAR2)
RETURN NUMBER DETERMINISTIC
IS
X NUMBER;
BEGIN
X := TO_NUMBER(P_NUMBER);
RETURN X;
EXCEPTION
WHEN OTHERS THEN RETURN NULL;
END IS_NUMBER;
Then you can rewrite your query as:
SELECT COUNT(*) FROM TAB WHERE B = 0 AND IS_NUMBER(A) = 123;
You can also use the function to check whether a string is a number.
Here's a simple way to force the check on B to occur first.
SELECT COUNT(*) FROM TAB
WHERE 123 = DECODE(B, 0, TO_NUMBER(A), NULL);
you can use subquery to be confident in the correctness of the result
select /*+NO_MERGE(T)*/ count(*)
from (
select *
from TAB
where B = 0
) T
where TO_NUMBER(A) = 123
in your particular example, you can compare varchars like this:
SELECT COUNT(*) FROM TAB WHERE B = 0 AND A = '123';
or if you trust oracle to do the implicit conversions, this should almost always work (i don't know in what cases it won't work, but it would be hard to debug if something went wrong)
SELECT COUNT(*) FROM TAB WHERE B = 0 AND A = 123;
It should test B = 0 first.
I not sure your guess is correct or not though without seeing sample data.
SELECT *
FROM DUAL
WHERE 1=0 AND 1/0=0
You can try add 1/0 = 0 as the last statement (like this query)
to your query to know if Oracle short circuits the logical operator.
sqlfiddle here.

how can I force SQL to only evaluate a join if the value can be converted to an INT?

I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE