Update ID value to format XXXXXXXX-X using oracle SQL - sql

Table name: TEST
Column name: ID [VARCHAR(200)]
The format of ID is ‘XXXXXXXX-X’, where ‘X’ is a number from 0 to 9.
Additional operations in case above format is not satisfied:
if the ID consists of 9 digits and there is a double dash between eighth and ninth digit , the extra dash is removed (e.g. 08452142--6 -> 08452142-6)
if the ID consists of 9 digits and there is/are space(s) between eighth and ninth digit and/or non-digits and/or non-letter symbol(s) then replace them to dash (e.g. 08452142 - . 3 -> 08452142-3)
if the ID consists 9 digits and starts/ends with non-digits and/or non-letter symbol(s) then delete that symbol(s) up to digit (e.g. 08452142-2.. -> 08452142-2)
if the ID contains only 9 digits then put a dash before the last digit (e.g. 123456789 -> 12345678-9)
I have achieved the necessary format by using the below snippet.
UPDATE TEST
SET ID = (SELECT REGEXP_REPLACE(ID,'^\d{8}-\d{1}$','') AS "ID"
from TEST
WHERE PK = 11;
)
What are the possible ways to add transformations as mentioned in points[1-4] above in a single query?
Using REGEXP_REPLACE, I can achieve ID in above format. But in case format is incorrect, and ID needs to be transformed[like removing extra dash, or adding dash in case 9 digits are received] to achieve satisfactory format, how can that be achieved in a single UPDATE query?

In any case, you need to extract 9 digits from your string in the first step. And then
add a hyphen before the last character. For both steps use regexp_replace() function
with test(id) as
(
select '08452142--6' from dual union all
select '08452142 - . 3' from dual union all
select '08452142-2..' from dual union all
select '123456789' from dual union all
select '1234567890' from dual
)
select case when length(regexp_replace(id,'(\D)'))=9 then
regexp_replace(regexp_replace(id,'(\D)'),
'(^[[:digit:]]{8})(.*)([[:digit:]]{1}$)','\1-\3')
end as id
from test;
ID
----------
08452142-6
08452142-3
08452142-2
12345678-9
<null>
Demo

You can use the following I think:
UPDATE TEST
SET ID = REGEXP_REPLACE(ID,'^\D*(\d{8})\D*(\d)\D*$','\1-\2')
WHERE REGEXP_LIKE(ID,'^\D*(\d{8})\D*(\d)\D*$')
This way you ignore all non-digit charcters and search for a 8-digit number and then an 1-digit number. Take these 2 numbers and put a single '-' in between.
This is a little more generous as you might need but should work with all your provided examples.

I think you want the first 8 digits, then a hyphen, then the 9th digit:
select ( substr(regexp_replace(id, '[^0-9]', ''), 1, 8) ||
'-' ||
substr(regexp_replace(id, '[^0-9]', ''), 9, 1)
)

I tried an approach based on the suggestion by #BarbarosÖzhan:
with source as (
select '02426467--6' id from dual union all
select '02426467-6' id from dual union all
select '02597718 -- . 3' id from dual union all
select '02597718 --dF5 . 3' id from dual union all
select '00120792-2..' id from dual union all
select '..00120792-2..' id from dual union all
select '123456789' id from dual union all
select '1234567890' id from dual
)
select
case
when regexp_like(id, '\d{8}-\d{1}')
then id
else
case
when regexp_like(id, '\d{8}-\d{1}')
then id
else
case
when regexp_count(id, '\d') = 9
then
case
when
regexp_like(
regexp_replace(
regexp_replace(
id, '(\d{8}-)(-)(\d{1})', '\1\3'
), '(\d{8})([^A-Za-z1-9])(\d{1})', '\1-\3'
)
, '\d{8}-\d{1}')
then
regexp_replace(
regexp_replace(
id, '(\d{8}-)(-)(\d{1})', '\1\3'
), '(\d{8})([^A-Za-z1-9])(\d{1})', '\1-\3'
)
else id
end
else id
end
end id_tr
from source
However, in cases 3 and 4, I cannot get rid of the space, dot and alphabets. I think something wrong with the logic in case length is more than 9. I end with "id" as it is so the result is the same without any modifications.
Any suggestions to impprove this?

Related

Regexp pattern for special characters

I have the data in the format like
Input:
Code_1
FAB
?
USP BEN,
.
-
,
Output:
Code_1
FAB
IP BEN,
I need to exclude only the value which have length as 1 and and are special characters
I am using (regexp_like(code_1,'^[^<>{}"/|;:.,~!?##$%^=&*\]\\()\[¿§«»ω⊙¤°℃℉€¥£¢¡®©0-9_+]')) AND LENGTH(CODE_1)>=1
I have also tried REGEXP_LIKE(CODE_1,'[A-Za-z0-9]')
Based on your requirements which I understand are you want data that is not single character AND non-alpha numeric (at the same time), this should do it for you.
The 'WITH' clause just sets up test data in this case and can be thought of like a temp table here. It is a great way to help people help you by setting up test data. Always include data you don't expect!
The actual query starts below and selects data that uses grouping to get the data that is NOT a group of non-alpha numeric with a length of one. It uses a POSIX shortcut of [:alnum:] to indicate [A-Za-z0-9].
Note your requirements will allow multiple non-alnum characters to be selected as is indicated by the test data.
WITH tbl(DATA) AS (
SELECT 'FAB' FROM dual UNION ALL
SELECT '?' FROM dual UNION ALL
SELECT 'USP BEN,' FROM dual UNION ALL
SELECT '.' FROM dual UNION ALL
SELECT '-' FROM dual UNION ALL
SELECT '----' FROM dual UNION ALL
SELECT ',' FROM dual UNION ALL
SELECT 'A' FROM dual UNION ALL
SELECT 'b' FROM dual UNION ALL
SELECT '5' FROM dual
)
SELECT DATA
FROM tbl
WHERE NOT (REGEXP_LIKE(DATA, '[^[:alnum:]]')
AND LENGTH(DATA) = 1);
DATA
----------
FAB
USP BEN,
----
A
b
5
6 rows selected.

find invalid characters in string

I need a select statement that will show any invalid characters in Customer number field.
A vaild customer number starts with the captial letter N then 10 digits, can be zero to 9.
Something like,
SELECT (CustomerField, 'N[0-9](10)') <> ''
FROM CustomerTable;
Use regexp_like.
select customerfield
from CustomerTable
where not regexp_like(CustomerField, '^N[0-9]{10}$')
This will show the customerfield's that don't follow the pattern specified.
If you really need to find the invalid characters in the string (and not to just simply find the strings that are invalid) perhaps this more complex query will help. You didn't state in what format you may need the output, so I made up my own. I also created several strings for testing (in particular, it is always important to check that the NULL input is treated correctly).
The column len shows the length of the input, if it's not 11. The length of the empty string (null in Oracle) is shown as 0. The first-nondigit columns refer to characters starting at the SECOND position in the string (ignoring the first character, for which the rules are different and which is checked for validity separately).
with
inputs ( str ) as (
select 'N0123456789' from dual union all
select '' from dual union all
select '02324434323' from dual union all
select 'N02345678' from dual union all
select 'A2140480080' from dual union all
select 'N93049c4995' from dual union all
select 'N4448883333' from dual union all
select 'PAR3993949Z' from dual union all
select 'AN39E' from dual
)
-- end of test data; query begins below this line
select str,
case when regexp_like(str, '^N\d{10}$') then 'valid'
else 'invalid' end as classif,
case when length(str) != 11 then length(str)
when str is null then 0 end as len,
case when substr(str, 1, 1) != 'N'
then substr(str, 1, 1) end as first_char,
regexp_substr(str, '[^0-9]', 2) as first_nondigit,
nullif(regexp_instr( str, '[^0-9]', 2), 0) as first_nondigit_pos
from inputs
;
OUTPUT
STR CLASSIF LEN FIRST_CHAR FIRST_NONDIG FIRST_NONDIGIT_POS
----------- ------- ----- ---------- ------------ ------------------
N0123456789 valid
invalid 0
02324434323 invalid 0
N02345678 invalid 9
A2140480080 invalid A
N93049c4995 invalid c 7
N4448883333 valid
PAR3993949Z invalid P A 2
AN39E invalid 5 A N 2
9 rows selected.
\d stands for digit
Perl-influenced Extensions in Oracle Regular Expressions
The rest if the regular expression elements can be found here
Regular Expression Operator Multilingual Enhancements
select *
from CustomerTable
where not regexp_like (CustomerField,'^N\d{10}$')

Fetching value from Pipe-delimited String using Regex (Oracle)

I have a sample source string like below, which was in pipe delimited format in that the value obr can be at anywhere. I need to get the second value of the pipe from the first occurrence of obr. So for the below source strings the expected would be,
Source string:
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
Expected output:
value1
value2
value3
I have tried the below regexp in oracle sql, but it is not working fine properly.
with t as (
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
)
select text,to_char(regexp_replace(text,'*obr\|([^|]*\|)([^|]*).*$', '\2')) output from t;
It is working fine when the string starts with OBR, but when OBR is in the middle like the above samples it is not working fine.
Any help would be appreciated.
Not sure of how Oracle handles regular expressions, but starting with an asterisk usually implies that you're looking for zero or more null characters.
Have you tried '^.*obr\|([^|]*\|)([^|]*).*$' ?
This handles null elements and is wrapped in a NVL() call which supplies a value if 'obr' is not found or occurs too far toward the end of a record so a value 2 away is not possible:
SQL> with t(id, text) as (
select 1, 'asd|dfg|obr|1|value1|end' from dual
union
select 2, 'a|brx|123|obr|2|value2|end' from dual
union
select 3, 'hfv|obr|3|value3|345|pre|end' from dual
union
select 4, 'hfv|obr||value4|345|pre|end' from dual
union
select 5, 'a|brx|123|obriem|2|value5|end' from dual
union
select 6, 'a|brx|123|obriem|2|value6|obr' from dual
)
select
id,
nvl(regexp_substr(text, '\|obr\|[^|]*\|([^|]*)(\||$)', 1, 1, null, 1), 'value not found') value
from t;
ID VALUE
---------- -----------------------------
1 value1
2 value2
3 value3
4 value4
5 value not found
6 value not found
6 rows selected.
SQL>
The regex basically can be read as "look for a pattern of a pipe, followed by 'obr', followed by a pipe, followed by zero or more characters that are not a pipe, followed by a pipe, followed by zero or more characters that are not a pipe (remembered in a captured group), followed by a pipe or the end of the line". The regexp_substr() call then returns the 1st captured group which is the set of characters between the pipes 2 fields from the 'obr'.

substring, after last occurrence of character?

I need help with this problem:
I have a column named phone_number and I wanted to query this column to get the the string right of the last occurrence of '.' for all kinds of numbers in one single sql query.
example #:
515.123.1277
011.44.1345.629268
I need to get 1277 and 629268 respectively.
I have this so far:
select phone_number,
case when length(phone_number) <= 12
then
substr(phone_number,-4)
else
substr (phone_number, -6) end
from employees;
This works for this example, but I want it for all kinds of # formats.
Would be great to get some input.
Thanks
It should be as easy as this regex:
SELECT phone_number, REGEXP_SUBSTR(phone_number, '[^.]*$')
FROM employees;
With the end anchor $ it should get everything that is not a . character after the final .. If the last character is . then it will return NULL.
Search for a pattern including the period, [.] with digits, \d, followed by the end of the string, $.
Associate the digits with a character group by placing the pattern, \d, in parenthesis (see below). This is referenced with the subexpr parameter, 1 (last parameter).
Here is the solution:
SCOTT#dev> list
1 WITH t AS
2 ( SELECT '414.352.3100' p_number FROM dual
3 UNION ALL
4 SELECT '515.123.1277' FROM dual
5 UNION ALL
6 SELECT '011.44.1345.629268' FROM dual
7 )
8* SELECT regexp_substr(t.p_number, '[.](\d+)$', 1, 1, NULL, 1) end_num FROM t
SCOTT#dev> /
END_NUM
========================================================================
3100
1277
629268
You can do something like this in oracle:
select regexp_substr(num,'[^\.]+',1,regexp_count(num,'\.')+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );
Previous to 11gR2 you can use regexp_replace instead regexp_count:
select regexp_substr(num,'[^\.]+',1,length(regexp_replace (num , '[^\.]+'))+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );

How to add a delimiter at a particular position in Oracle

Hi I have a string like this
ABCDEFGH I want the output to be ABCDEF.GH
If it's a number like 1234567 then i want the output to be 12345.67
Basically i want the delimeter (.) before last 2 characters.
You can use regular expressions for this:
with v_data(val) as (
select '123456' from dual union all
select 'abcdef' from dual union all
select '678' from dual
)
select
val,
regexp_replace(val, '(\d+)(\d{2})', '\1.\2')
from v_data
This matches
one or more digits (\d+) (capturing them in group #1)
followed by exactly two digits (\d{2}) (capturing them in group #2)
and replaces this with the contents of group #1 followed by a . followed by the contents of group #2: \1.\2