Oracle SQL: Filtering rows with non-numeric characters - sql

My question is very similar to this one: removing all the rows from a table with columns A and B, where some records include non-numeric characters (looking like '1234#5' or '1bbbb'). However, the solutions I read around don't seem to work for me. For example,
SELECT count(*) FROM tbl
--962060;
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[^0-9]') OR REGEXP_like(B,'[^0-9]') ) ;
--17
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[0-9]') and REGEXP_like(B,'[0-9]') )
;
--962060
From the 3rd query, I'd expect to see (962060-17)=962043. Why is it still 962060? An alternative query like this also gives the same answer:
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[[:digit:]]')and REGEXP_like(B,'[[:digit:]]') )
;
--962060
Of course, I could bypass the problem by doing query1 minus query2, but I'd like to learn how to do that using regular expressions.

If you use regexp you should take in account that any part of string may be matched as regexp. According your example you should specify that whole string should cntain only numbers ^ - is the beginig of string $ - is the end. And you may use \d- is digits
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^[0-9]+$') and REGEXP_like(B,'^[0-9]+$') )
or
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^\d+$') and REGEXP_like(B,'^\d+$') )

I know you specifically asked for a regex solution, but translate can solve these kind of questions as well (and usually faster because regexes use more processing power):
select count(1)
from tbl
where translate(a, 'x0123456789', 'x') is null
and translate(b, 'x0123456789', 'x') is null;
What this does: translate the characters 0123456789 to null, and if the result is null, then the input must have been all digits. The 'x' is just there because the third argument to translate can not be null.
Thought I should add this here, might be helpful to other readers.

Related

How to select rows that have numbers as a value?

I have got a table with a column that is type of VARCHAR2(255 BYTE). I would like to select only these rows that have numbers as a value, so I discard any other values as for example "lala","1z". I just want to have pure numbers from 1 to ..... 999999999 (just digital numbers in other words) :P
Could you tell me how to make it?
if you're using Oracle 12c r2 or later then use the built-in validate_conversion() function:
select *
from your_table
where validate_conversion(cast(your_column as number)) = 0
validate_conversion() returns 0 when the proposed conversion would succeed and 1 when it wouldn't. It also supports date and timestamp conversions. Find out more.
Something like this is the usual option. You could use regexp, but it's usually a bit slower.
select column1
from tableA
where translate(column1, '1234567890', '') is null;
Here's the regexp version kfinity referred to. The regex matches a line consisting of 1 or more digits.
select column1
from tableA
where regexp_like(column1, '^\d+$');
You don't want zero to start a number. So it seems like regular expressions are the way to go:
where regexp_like(column1, '^[1-9][0-9]*$');

Find duplicates in case-sensitive query in MS Access

I have a table containing Japanese text, in which I believe that there are some duplicate rows. I want to write a SELECT query that returns all duplicate rows. So I tried running the following query based on an answer from this site (I wasn't able to relocate the source):
SELECT [KeywordID], [Keyword]
FROM Keyword
WHERE [Keyword] IN (SELECT [Keyword]
FROM [Keyword] GROUP BY [Keyword] HAVING COUNT(*) > 1);
The problem is that Access' equality operator treats the two Japanese writing systems - hiragana and katakana - as the same thing, where they should be treated as distinct. Both writing systems have the same phonetic value, although the written characters used to represent the sound are different - e.g. あ (hiragana) and ア (katakana) both represent the sound 'a'.
When I run the above query, however, both of these characters will appear, as according to Access, they're the same character and therefore a duplicate. Essentially it's a case-insensitive search where I need a case-sensitive one.
I got around this issue when doing a simple SELECT to find a Keyword using StrComp to perform a binary comparison, because this method correctly treats hiragana and katakana as distinct. I don't know how I can adapt the query above to use StrComp, though, because it's not directly evaluating one string against another as in the linked question.
Basically what I'm asking is: how can I do a query that will return all duplicates in a table, case-sensitive?
You can use exists instead:
SELECT [KeywordID], [Keyword]
FROM Keyword as k
WHERE EXISTS (SELECT 1
FROM Keyword as k2
WHERE STRCOMP(k2.Keyword, k.KeyWord, 0) = 0 AND
k.KeywordID <> k2.KeywordID
);
Try with a self join:
SELECT k1.[KeywordID], k1.[Keyword], k2.[KeywordID], k2.[Keyword]
FROM Keyword AS k1 INNER JOIN Keyword AS k2
ON k1.[KeywordID] < k2.[KeywordID] AND STRCOMP(k1.[Keyword], k2.[Keyword], 0) = 0

SQL Like condition fails to run

I've been tasked to develop a query that behaves essentially like the following one:
SELECT * FROM tblTestData WHERE *.TestConditions LIKE '*textToSearch*'
The textToSearch is a string which contains information about the condition in which a given device is tested (Voltage, Current, Frequency, etc) in the following format as an example:
[V:127][PF:1][F:50][I:65]
The objective is to recover a list of any and all tests performed at a voltage of 127 Volts, so the SQL developed would look like the folllowing:
SELECT * FROM tblTestData WHERE *.TestConditions LIKE '*V:127*'
This works as intended but there is a problem due to an inproper introduction of data, there are cases in which the _textToSearch string looks like the following examples:
[V.127][PF:1][F:50][I:65]
[V.230][PF:1][F:50][I:65]
As you can see, my previous SQL transaction does not work as it does not meet the conditions.
If I try to do the following transaction with the objective of ignoring improper data format:
SELECT * FROM tblTestData WHERE *.TestConditions LIKE '*V*127*'
The transaction is not succesful and returns an error.
What am I doing wrong for this transaction not to work? I am approaching this problem wrong?
I see a pair of problems although with this transaction, if there were a group of test conditions like the following:
[V.127][PF:1][F:50][I:127]
[V.230][PF:1][F:50][I:127]
Would it return the values of both points given that both meet the condition of the transaction stated above?
In conclusion, my questions are:
What is wrong with the LIKE '*V*127*' condition for it not to work?
What implications has working with this condition? Can it return more information than desired if I am not careful?
I hope it is clear what I am asking for, if it isn't, please point out what is not clear and I will try to clarify it
One choice is to look for any character between the "V" and the "127":
WHERE TestConditions LIKE '%V_127%'
Note that % is the wildcard for a string of any length and _ is the wildcard for a single character.
You can also use regular expressions:
WHERE regexp_like(TestConditions, 'V[.:]127')
Note that regular expressions match anywhere in the string, so wildcards at the beginning and end are not needed.
You could check for both cases (although this will decrease performance)
SELECT *
FROM tblTestData
WHERE (TestConditions LIKE '%V:127%' OR TestConditions LIKE '%V.127%')
It is better to clean the data in your database if only old records have this problem.
Using regular expressions is recommended by Oracle for this kind of conditions. You could build a regular expression for your case:
WITH your_table AS (
SELECT '[V.127][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V.230][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V:127][PF:1][F:50][I:65]' text_to_search FROM dual
)
SELECT *
FROM your_table
WHERE REGEXP_LIKE(text_to_search,'\[V(.|:)127\]','i')
Or you could use the good old LIKE operator. In this case, you need to know that:
% matches zero or more characters
_ matches only one character
So you should use an underscore to match the : or the .
WITH your_table AS (
SELECT '[V.127][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V.230][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V:127][PF:1][F:50][I:65]' text_to_search FROM dual
)
SELECT *
FROM your_table
WHERE text_to_search LIKE '%V_127%';

ORACLE - Select Count on a Subquery

I've got an Oracle table that holds a set of ranges (RangeA and RangeB). These columns are varchar as they can hold both numeric and alphanumeric values, like the following example:
ID|RangeA|RangeB
1 | 10 | 20
2 | 21 | 30
3 | AB50 | AB70
4 | AB80 | AB90
I need to to do a query that returns only the records that have numeric values, and perform a Count on that query. So far I've tried doing this with two different queries without any luck:
Query 1:
SELECT COUNT(*) FROM (
SELECT RangeA, RangeB FROM table R
WHERE upper(R.RangeA) = lower(R.RangeA)
) A
WHERE TO_NUMBER(A.RangeA) <= 10
Query 2:
WITH A(RangeA,RangeB) AS(
SELECT RangeA, RangeB FROM table
WHERE upper(RangeA) = lower(RangeA)
)
SELECT COUNT(*) FROM A WHERE TO_NUMBER(A.RangeA) <= 10
The subquery is working fine as I'm getting the two records that have only numeric values, but the COUNT part of the query is failing. I should be getting only 1 on the count, but instead I'm getting the following error:
ORA-01722: invalid number
01722. 00000 - "invalid number"
What am I doing wrong? Any help is much appreciated.
You can test each column with a regular expression to determine if it is a valid number:
SELECT COUNT(1)
FROM table_of_ranges
WHERE CASE WHEN REGEXP_LIKE( RangeA, '^-?\d+(\.\d*)?$' )
THEN TO_NUMBER( RangeA )
ELSE NULL END
< 10
AND REGEXP_LIKE( RangeB, '^-?\d+(\.\d*)?$' );
Another alternative is to use a user-defined function:
CREATE OR REPLACE FUNCTION test_Number (
str VARCHAR2
) RETURN NUMBER DETERMINISTIC
AS
invalid_number EXCEPTION;
PRAGMA EXCEPTION_INIT(invalid_number, -6502);
BEGIN
RETURN TO_NUMBER( str );
EXCEPTION
WHEN invalid_number THEN
RETURN NULL;
END test_Number;
/
Then you can do:
SELECT COUNT(*)
FROM table_of_ranges
WHERE test_number( RangeA ) <= 10
AND test_number( RangeB ) IS NOT NULL;
Try this query:
SELECT COUNT(*)
FROM table R
WHERE translate(R.RangeA, 'x0123456789', 'x') = 'x' and
translate(R.RangeB, 'x0123456789', 'x') = 'x'
First, you don't need the subquery for this purpose. Second, using to_number() or upper()/lower() are prone to other problems. The function translate() replaces each character in the second argument with values from the third argument. In this case, it removes numbers. If nothing is left over, then the original value was an integer.
You can do more sophisticated checks for negative values and floating point numbers, but the example in the question seemed to be about positive integer values.
Coming to this question almost four years later (obviously, pointed here from a much newer thread). The other answers show how to achieve the desired output, but do not answer the OP's question, which was "what am I doing wrong?"
You are not doing anything wrong. Oracle is doing something wrong. It is "pushing" the predicate (the WHERE condition) from the outer query into the inner query. Pushing predicates is one of the most basic ways in which the Optimizer makes queries more efficient, but in some cases (and the question you ask is a PERFECT illustration) the result is not, in fact, logically equivalent to the original query.
There are ways to prevent the Optimizer from pushing predicates; or you can write the query in a better way (as shown in the other answers). But if you wanted to know why you saw what you saw, this is why.

How can I SELECT DISTINCT on the last, non-numerical part of a mixed alphanumeric field?

I have a data set that looks something like this:
A6177PE
A85506
A51SAIO
A7918F
A810004
A11483ON
A5579B
A89903
A104F
A9982
A8574
A8700F
And I need to find all the ENDings where they are non-numeric. In this example, that means PE, AIO, F, ON, B and F.
In pseudocode, I'm imagining I need something like
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,[SOME_CLEVER_LOGIC]) AS X FROM TABLE);
Any ideas? Can I solve this without learning regexp?
EDIT: To clarify, my data set is a lot larger than this example. Also, I'm only interested in the part of the string AFTER the numeric part. If the string is "A6177PE" I want "PE".
Disclaimer: I don't know Oracle SQL. But, I think something like this should work:
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,REGEXP_INSTR(COL, "[[:ALPHA:]]+$")) AS X FROM TABLE);
REGEXP_INSTR(COL, "[[:ALPHA:]]+$") should return the position of the first of the characters at the end of the field.
For readability, I'd recommend using the REGEXP_SUBSTR function (If there are no performance issues of course, as this is definitely slower than the accepted solution).
...also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself
SELECT DISTINCT SUBSTR(MY_COLUMN,REGEXP_SUBSTR("[a-zA-Z]+$")) FROM MY_TABLE;
(:alpha: is supported also, as #Audun wrote )
Also useful: Oracle Regexp Support (beginning page)
For example
SELECT SUBSTR(col,INSTR(TRANSLATE(col,'A0123456789','A..........'),'.',-1)+1)
FROM table;