Find value that is not a number or a predefined string - sql

I have to test a column of a sql table for invalid values and for NULL.
Valid values are: Any number and the string 'n.v.' (with and without the dots and in every possible combination as listed in my sql command)
So far, I've tried this:
select count(*)
from table1
where column1 is null
or not REGEXP_LIKE(column1, '^[0-9,nv,Nv,nV,NV,n.v,N.v,n.V,N.V]+$');
The regular expression also matches the single character values 'n','N','v','V' (with and without a following dot). This shouldn't be the case, because I only want the exact character combinations as written in the sql command to be matched. I guess the problem has to do with using REGEXP_LIKE. Any ideas?

I guess this regexp will work:
NOT REGEXP_LIKE(column1, '^([0-9]+|n\.?v\.?)$', 'i')
Note that , is not a separator, . means any character, \. means the dot character itself and 'i' flag could be used to ignore case instead of hard coding all combinations of upper and lower case characters.

No need to use regexp (performance will increase by large data) - plain old TRANSLATE is good enough for your validation.
Note that the first translate(column1,'x0123456789','x') remove all numeric charcters from the string, so if you end with nullthe string is OK.
The second translate(lower(column1),'x.','x') removes all dots from the lowered string so you expect the result nv.
To avoid cases as n.....v.... you also limit the string length.
select
column1,
case when
translate(column1,'x0123456789','x') is null or /* numeric string */
translate(lower(column1),'x.','x') = 'nv' and length(column1) <= 4 then 'OK'
end as status
from table1
COLUMN1 STATUS
--------- ------
1010101 OK
1012828n
1012828nv
n.....v....
n.V OK
Test data
create table table1 as
select '1010101' column1 from dual union all -- OK numbers
select '1012828n' from dual union all -- invalid
select '1012828nv' from dual union all -- invalid
select 'n.....v....' from dual union all -- invalid
select 'n.V' from dual; -- OK nv

You can use:
select count(*)
from table1
WHERE TRANSLATE(column1, ' 0123456789', ' ') IS NULL
OR LOWER(column1) IN ('nv', 'n.v', 'nv.', 'n.v.');
Which, for the sample data:
CREATE TABLE table1 (column1) AS
SELECT '12345' FROM DUAL UNION ALL
SELECT 'nv' FROM DUAL UNION ALL
SELECT 'NV' FROM DUAL UNION ALL
SELECT 'nV' FROM DUAL UNION ALL
SELECT 'n.V.' FROM DUAL UNION ALL
SELECT '...................n.V.....................' FROM DUAL UNION ALL
SELECT '..nV' FROM DUAL UNION ALL
SELECT 'n..V' FROM DUAL UNION ALL
SELECT 'nV..' FROM DUAL UNION ALL
SELECT 'xyz' FROM DUAL UNION ALL
SELECT '123nv' FROM DUAL;
Outputs:
COUNT(*)
5
or, if you want any quantity of . then:
select count(*)
from table1
WHERE TRANSLATE(column1, ' 0123456789', ' ') IS NULL
OR REPLACE(LOWER(column1), '.') = 'nv';
Which outputs:
COUNT(*)
9
db<>fiddle here

Related

Regexp pattern for special characters

I have the data in the format like
Input:
Code_1
FAB
?
USP BEN,
.
-
,
Output:
Code_1
FAB
IP BEN,
I need to exclude only the value which have length as 1 and and are special characters
I am using (regexp_like(code_1,'^[^<>{}"/|;:.,~!?##$%^=&*\]\\()\[¿§«»ω⊙¤°℃℉€¥£¢¡®©0-9_+]')) AND LENGTH(CODE_1)>=1
I have also tried REGEXP_LIKE(CODE_1,'[A-Za-z0-9]')
Based on your requirements which I understand are you want data that is not single character AND non-alpha numeric (at the same time), this should do it for you.
The 'WITH' clause just sets up test data in this case and can be thought of like a temp table here. It is a great way to help people help you by setting up test data. Always include data you don't expect!
The actual query starts below and selects data that uses grouping to get the data that is NOT a group of non-alpha numeric with a length of one. It uses a POSIX shortcut of [:alnum:] to indicate [A-Za-z0-9].
Note your requirements will allow multiple non-alnum characters to be selected as is indicated by the test data.
WITH tbl(DATA) AS (
SELECT 'FAB' FROM dual UNION ALL
SELECT '?' FROM dual UNION ALL
SELECT 'USP BEN,' FROM dual UNION ALL
SELECT '.' FROM dual UNION ALL
SELECT '-' FROM dual UNION ALL
SELECT '----' FROM dual UNION ALL
SELECT ',' FROM dual UNION ALL
SELECT 'A' FROM dual UNION ALL
SELECT 'b' FROM dual UNION ALL
SELECT '5' FROM dual
)
SELECT DATA
FROM tbl
WHERE NOT (REGEXP_LIKE(DATA, '[^[:alnum:]]')
AND LENGTH(DATA) = 1);
DATA
----------
FAB
USP BEN,
----
A
b
5
6 rows selected.

How to check in SQL which character occurs first in a string

I have a column with string values which I need to parse based on which of the 2 characters occurred first - # and /.
Can you please help me with a SQL select query that will check the string? The possible scenarios are:
Both # and / are present in the string, if so which one comes first
Only 1 of the character occurs in the string
You could use a combination of charindex and outer apply & values, from which you can select the first-occuring character:
select t.*, FirstChar
from t
outer apply(
select top (1) FirstChar
from (values
(CharIndex('#',string),'#'),
(CharIndex('/',string),'/')
)v(i,FirstChar)
where i > 0
order by i
)x;
Demo Fiddle
I wrote this in Oracle before I knew it was MS SQL you needed, maybe it can be of some help anyway?
with data (text) as(
select '#' from dual union all
select '/' from dual union all
select '#/' from dual union all
select '/#' from dual union all
select '/#/' from dual union all
select '//#' from dual union all
select 'aaa#aaa/aaa' from dual union all
select 'aaa/aaa#aaa' from dual union all
select 'aaaaaaaaa' from dual
)
select text ,
case when INSTR(text,'/')=0 and INSTR(text,'#')>0 then '#'
when INSTR(text,'#')=0 and INSTR(text,'/')>0 then '/'
when INSTR(text,'#')>INSTR(text,'/') then '/'
when INSTR(text,'#')<INSTR(text,'/') then '#'
else ' '
end first from data;

How to cut everything after a specific character, but in case string doesn't contain it do nothing?

Let's say i have following data:
fjflka, kdjf
ssssllkjf fkdsjl
skfjjsld, kjl
jdkfjlj, ksd
lkjlkj hjk
I want to cut out everything after ',' but in case the string doesn't contain this character, it wont do anything, if i use substr and cut everything after ',' the string which doesn't contain this character shows as null. How do i achieve this? Im using oracle 11g.
This should work. Simply use regexp_substr
with t_view as (
select 'fjflka, kdjf' as text from dual union
select 'ssssllkjf fkdsjl' from dual union
select 'skfjjsld, kjl' from dual union
select 'jdkfjlj, ksd' from dual union
select 'lkjlkj hjk' from dual
)
select text,regexp_substr(text,'[^,]+',1,1) from t_view;
Assuming your table :
SQL> desc mytable
s varchar2(100)
you may use:
select decode(instr(s,','),0,s,substr(s,1,instr(s,',')-1)) from mytable;
demo
Well the below query works as per your requirement.
with mytable as
(select 'aaasfasf wqwe' s from dual
union all
select 'aaasfasf, wqwe' s from dual)
select s,substr(s||',',1,instr(s||',',',')-1) from mytable;

how to select exact 7 or 10 world in oracle using regular expression

I am working on below query, I am expected to select exact 7 or 10 digit values columns using regular expression, I have used express in regexp_like() function of oracle, but its not working, please help
Query :
select * from
(select '1234567CELL' "a" from dual
union
select '123CaLLAsasd12' "a" from dual
union
select 'as9960488188CELLas12' "a" from dual
union
select '1234567' "a" from dual
union
select '9960488188' "a" from dual
union
select 'asdCELLqw' "a" from dual) b
where b."a" like '%CELL%' and regexp_like(b."a",'^(\d{7}|\d{10})$');
Expected output
"1234567"
"9960488188"
as above two rows, please check
^ and $ match the start and end of a string and the value cannot contain the string CELL and be solely a 7- or 10-digit number. Instead you could use the regular expression (^|\D)(\d{7}|\d{10})($|\D) which will match either the start of the string or a not digit character (^|\D) then either 7- or 10- digits and then either the end of the string or a non digit character ($|\D).
Like this:
WITH data ( a ) AS (
select '1234567CELL' from dual union
select '123CaLLAsasd12' from dual union
select 'as9960488188CELLas12' from dual union
select '1234567' from dual union
select '9960488188' from dual union
select 'asdCELLqw' from dual
)
SELECT a,
REGEXP_SUBSTR( a, '(^|\D)(\d{7}|\d{10})($|\D)', 1, 1, NULL, 2 ) AS val
FROM data
WHERE a LIKE '%CELL%'
AND REGEXP_LIKE( a, '(^|\D)(\d{7}|\d{10})($|\D)');
Output:
A VAL
-------------------- ----------
1234567CELL 1234567
as9960488188CELLas12 9960488188
You may just use
where regexp_like(b."a",'^([[:digit:]]{7}|[[:digit:]]{10})$')
Since the pattern is anchored (^ matches the start of the string and $ matches the end of the string) there can't be CELL inside the entries you fetch, and you can remove where b."a" like '%CELL%' from the query.

Filter the rows with number only data in a column SQL

I am trying to SELECT rows in a table, by applying a filter condition of identifying number only columns. It is a report only query, so we least bother the performance, as we dont have the privilege to compile a PL/SQL am unable to check by TO_NUMBER() and return if it is numeric or not.
I have to achieve it in SQL. Also the column is having the values like this, which have to be treated as Numbers.
-1.0
-0.1
-.1
+1,2034.89
+00000
1023
After ground breaking research, I wrote this.(Hard time)
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+0.1' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+1,2034.89.00' FROM dual
UNION ALL
SELECT '+1,2034.89' FROM dual
UNION ALL
SELECT 'Deva +21' FROM dual
UNION ALL
SELECT '1+1' FROM dual
UNION ALL
SELECT '1023' FROM dual
)
SELECT dummy_data.*,
REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (TRANSLATE(TRIM(txt),'+,-.','0000'),'^[-+]*[[:digit:]]');
I got this.
TXT REGEXP_COUNT(TXT,'.')
------------- ---------------------
-1.0 4
+0.1 4
-.1 3
+1,2034.89.00 13 /* Should not be returned */
+1,2034.89 10
1+1 3 /* Should not be returned */
1023 4
7 rows selected.
Now terribly confused with 2 Questions.
1) I get +1,2034.89.00 too in result, I should eliminate it. (means, two decimal points) Not just decimal point, double in every other special character (-+,) should be eliminated)
2) To make it uglier, planned to do a REGEXP_COUNT('.') <= 1. But it is not returning my expectation, while selecting it, I see strange values returned.
Can someone help me to frame the REGEXP for the avoiding the double occurences of ('.','+','-')
The following expression works for everything, except the commas:
'^[-+]*[0-9,]*[.]*[0-9]+$'
You can check for bad comma placement with additional checks like:
not regexp_like(txt, '[-+]*,$') and not regexp_like(txt, [',,'])
First you remove plus and minus with translate and then you wonder why their position is not considered? :-)
This should work:
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+0.1' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+12034.89.00' FROM dual -- invalid: duplicate decimal separator
UNION ALL
SELECT '+1,2034.89' FROM dual -- invalid: thousand separator placement
UNION ALL
SELECT 'Deva +21' FROM dual -- invalid: letters
UNION ALL
SELECT '1+1' FROM dual -- invalid: plus sign placement
UNION ALL
SELECT '1023' FROM dual
UNION ALL
SELECT '1.023,88' FROM dual -- invalid: decimal/thousand separators mixed up
UNION ALL
SELECT '1,234' FROM dual
UNION ALL
SELECT '+1,234.56' FROM dual
UNION ALL
SELECT '-123' FROM dual
UNION ALL
SELECT '+123,0000' FROM dual -- invalid: thousand separator placement
UNION ALL
SELECT '+234.' FROM dual -- invalid: decimal separator not followed by digits
UNION ALL
SELECT '12345,678' FROM dual -- invalid: missing thousand separator
UNION ALL
SELECT '+' FROM dual -- invalid: digits missing
UNION ALL
SELECT '.' FROM dual -- invalid: digits missing
)
select * from dummy_data
where regexp_like(txt, '[[:digit:]]') and
(
regexp_like(txt, '^[-+]{0,1}([[:digit:]]){0,3}(\,([[:digit:]]){0,3})*(\.[[:digit:]]+){0,1}$')
or
regexp_like(txt, '^[-+]{0,1}[[:digit:]]*(\.[[:digit:]]+){0,1}$')
);
You see, you need three regular expressions; one to guarantee that there is at least one digit in the string, one for numbers with thousand separators, and one for numbers without.
With thousand separators: txt may start with one plus or minus sign, then there may be up to three digits. These may be followed by a thousand separator plus three digits several times. Then there may be a decimal separator with at least one following number.
Without thousand separators: txt may start with one plus or minus sign, then there may be digits. Then there may be a decimal separator with at least one following number.
I hope I haven't overlooked anything.
I just tried to correct the mistakes of you and made the SQL simple as possible. But not neat!
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+.0' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+1,2034.89.0' FROM dual
UNION ALL
SELECT '+1,2034.89' FROM dual
UNION ALL
SELECT 'Deva +21' FROM dual
UNION ALL
SELECT 'DeVA 234 Deva' FROM dual
UNION ALL
SELECT '1023' FROM dual
)
SELECT to_number(REPLACE(txt,',')),
REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (txt,'^[-+]*')
AND NOT REGEXP_LIKE (TRANSLATE(txt,'+,-.','0000'),'[^[:digit:]]')
AND REGEXP_COUNT(txt,',') <= 1
AND REGEXP_COUNT(txt,'\+') <= 1
AND REGEXP_COUNT(txt,'\-') <= 1
AND REGEXP_COUNT(txt,'\.') <= 1;