SQL remove unwanted special characters from a string - sql

Hi i am new to SQL and am writing a case statement for a column of grade values.
The values can be a length of 3 like A02, B04, A10, A09, D03. The first character is a letter while the next 2 are digits.
If a user enters in 'A02 I want to change it to do A02. Basically remove any special characters if there are present.
CASE
WHEN Grade like '[^0-9A-z]%' THEN ''
else Grade end as Grade
So far I have this but I am not sure how to use regex to remove the character only search for it.

Unless you really want to do a CASE for the fun of it, in oracle I'd do it like this which removes punctuation characters and spaces when you select it. Note this does not verify format so a grade of Z1234 would get returned.
WITH tbl(ID, grade) AS (
SELECT 1, 'A01' FROM dual UNION ALL
SELECT 1, '''B02' FROM dual UNION ALL
SELECT 2, '$ C01&' FROM dual
)
SELECT ID, grade, REGEXP_REPLACE(grade, '([[:punct:]]| )') AS grade_scrubbed
from tbl;
ID GRADE GRADE_SCRUBBED
---------- --------- --------------
1 A01 A01
1 'B02 B02
2 $ C01& C01
3 rows selected.
HOWEVER, that said, since you seem to want to verify the format and use regex, you could do it this way although it's a little fugly. See comments.
WITH tbl(ID, grade) AS (
-- Test data. Include every crazy combo you'd never expect to see,
-- because you WILL see it, it's just a matter of time :-)
SELECT 1, 'A01' FROM dual UNION ALL
SELECT 1, '''B02' FROM dual UNION ALL
SELECT 2, '$ C01&' FROM dual UNION ALL
SELECT 3, 'DDD' FROM dual UNION ALL
SELECT 4, 'A'||CHR(10)||'DEF' FROM dual UNION ALL
SELECT 5, 'Z1234' FROM dual UNION ALL
SELECT 6, NULL FROM dual
)
SELECT ID, grade,
CASE
-- Correct format of A99.
WHEN REGEXP_LIKE(grade, '^[A-Z]\d{2}$')
THEN grade
-- if not A99, see if stripping out punctuation and spaces make it match A99.
-- If so, return with punctuation and spaces stripped out.
WHEN NOT REGEXP_LIKE(grade, '^[A-Z]\d{2}$')
AND REGEXP_LIKE(REGEXP_REPLACE(grade, '([[:punct:]]| )'), '^[A-Z]\d{2}$')
THEN REGEXP_REPLACE(grade, '([[:punct:]]| )')
-- if not A99, and stripping out punctuation and spaces didn't make it match A99,
-- then the grade is in the wrong format.
WHEN NOT REGEXP_LIKE(grade, '^[A-Z]\d{2}$')
AND NOT REGEXP_LIKE(REGEXP_REPLACE(grade, '([[:punct:]]| )'), '^[A-Z]\d{2}$')
THEN 'Invalid grade format'
-- Something fell through all cases we tested for. Always expect the unexpected!
ELSE 'No case matched!'
END AS grade_scrubbed
from tbl;
ID GRADE GRADE_SCRUBBED
---------- -------------------- --------------------
1 A01 A01
1 'B02 B02
2 $ C01& C01
3 DDD Invalid grade format
4 A
DEF Invalid grade format
5 Z1234 Invalid grade format
6 No case matched!
7 rows selected.

Related

How can I get a natural numeric sort order in Oracle?

I have a column with a letter followed by either numbers or letters:
ID_Col
------
S001
S1001
S090
SV911
SV800
Sfoofo
Szap
Sbart
How can I order it naturally with the numbers first (ASC) then the letters alphabetically? If it starts with S and the remaining characters are numbers, sort by the numbers. Else, sort by the letter. So SV911should be sorted at the end with the letters since it also contains a V. E.g.
ID_Col
------
S001
S090
S1001
Sbart
Sfoofo
SV800
SV911
Szap
I see this solution uses regex combined with the TO_NUMBER function, but since I also have entries with no numbers this doesn't seem to work for me. I tried the expression:
ORDER BY
TO_NUMBER(REGEXP_SUBSTR(ID_Col, '^S\d+$')),
ID_Col
/* gives ORA-01722: invalid number */
Would this help?
SQL> with test (col) as
2 (select 'S001' from dual union all
3 select 'S1001' from dual union all
4 select 'S090' from dual union all
5 select 'SV911' from dual union all
6 select 'SV800' from dual union all
7 select 'Sfoofo' from dual union all
8 select 'Szap' from dual union all
9 select 'Sbart' from dual
10 )
11 select col
12 from test
13 order by substr(col, 1, 1),
14 case when regexp_like(col, '^[[:alpha:]]\d') then to_number(regexp_substr(col, '\d+$')) end,
15 substr(col, 2);
COL
------
S001
S090
S1001
Sbart
Sfoofo
SV800
SV911
Szap
8 rows selected.
SQL>

Update ID value to format XXXXXXXX-X using oracle SQL

Table name: TEST
Column name: ID [VARCHAR(200)]
The format of ID is ‘XXXXXXXX-X’, where ‘X’ is a number from 0 to 9.
Additional operations in case above format is not satisfied:
if the ID consists of 9 digits and there is a double dash between eighth and ninth digit , the extra dash is removed (e.g. 08452142--6 -> 08452142-6)
if the ID consists of 9 digits and there is/are space(s) between eighth and ninth digit and/or non-digits and/or non-letter symbol(s) then replace them to dash (e.g. 08452142 - . 3 -> 08452142-3)
if the ID consists 9 digits and starts/ends with non-digits and/or non-letter symbol(s) then delete that symbol(s) up to digit (e.g. 08452142-2.. -> 08452142-2)
if the ID contains only 9 digits then put a dash before the last digit (e.g. 123456789 -> 12345678-9)
I have achieved the necessary format by using the below snippet.
UPDATE TEST
SET ID = (SELECT REGEXP_REPLACE(ID,'^\d{8}-\d{1}$','') AS "ID"
from TEST
WHERE PK = 11;
)
What are the possible ways to add transformations as mentioned in points[1-4] above in a single query?
Using REGEXP_REPLACE, I can achieve ID in above format. But in case format is incorrect, and ID needs to be transformed[like removing extra dash, or adding dash in case 9 digits are received] to achieve satisfactory format, how can that be achieved in a single UPDATE query?
In any case, you need to extract 9 digits from your string in the first step. And then
add a hyphen before the last character. For both steps use regexp_replace() function
with test(id) as
(
select '08452142--6' from dual union all
select '08452142 - . 3' from dual union all
select '08452142-2..' from dual union all
select '123456789' from dual union all
select '1234567890' from dual
)
select case when length(regexp_replace(id,'(\D)'))=9 then
regexp_replace(regexp_replace(id,'(\D)'),
'(^[[:digit:]]{8})(.*)([[:digit:]]{1}$)','\1-\3')
end as id
from test;
ID
----------
08452142-6
08452142-3
08452142-2
12345678-9
<null>
Demo
You can use the following I think:
UPDATE TEST
SET ID = REGEXP_REPLACE(ID,'^\D*(\d{8})\D*(\d)\D*$','\1-\2')
WHERE REGEXP_LIKE(ID,'^\D*(\d{8})\D*(\d)\D*$')
This way you ignore all non-digit charcters and search for a 8-digit number and then an 1-digit number. Take these 2 numbers and put a single '-' in between.
This is a little more generous as you might need but should work with all your provided examples.
I think you want the first 8 digits, then a hyphen, then the 9th digit:
select ( substr(regexp_replace(id, '[^0-9]', ''), 1, 8) ||
'-' ||
substr(regexp_replace(id, '[^0-9]', ''), 9, 1)
)
I tried an approach based on the suggestion by #BarbarosÖzhan:
with source as (
select '02426467--6' id from dual union all
select '02426467-6' id from dual union all
select '02597718 -- . 3' id from dual union all
select '02597718 --dF5 . 3' id from dual union all
select '00120792-2..' id from dual union all
select '..00120792-2..' id from dual union all
select '123456789' id from dual union all
select '1234567890' id from dual
)
select
case
when regexp_like(id, '\d{8}-\d{1}')
then id
else
case
when regexp_like(id, '\d{8}-\d{1}')
then id
else
case
when regexp_count(id, '\d') = 9
then
case
when
regexp_like(
regexp_replace(
regexp_replace(
id, '(\d{8}-)(-)(\d{1})', '\1\3'
), '(\d{8})([^A-Za-z1-9])(\d{1})', '\1-\3'
)
, '\d{8}-\d{1}')
then
regexp_replace(
regexp_replace(
id, '(\d{8}-)(-)(\d{1})', '\1\3'
), '(\d{8})([^A-Za-z1-9])(\d{1})', '\1-\3'
)
else id
end
else id
end
end id_tr
from source
However, in cases 3 and 4, I cannot get rid of the space, dot and alphabets. I think something wrong with the logic in case length is more than 9. I end with "id" as it is so the result is the same without any modifications.
Any suggestions to impprove this?

How to trim out letter in the column

I don't know the effective way to trim out letter in the name. For example, the f_name column have Jenny, Johnny, Doe, Ken, Smith.
I wanted to trim out the letter in these name so it consist only the first 2 letter. Like Je, Jo, Do, Ke, Sm as the output for the new column.
But the letter in these name don't have equal number of letter, like Johnny have 6 letter and John have 4 letter.
Is there any effective way to trim the uneven character's length without count all the character's length in f_name and place all the condition to trim all names. Like these below.
CASE WHEN LENGTH(f_name) > 4 THEN LTRIM(f_name, 2)
For Oracle use substr():
with data (f_name) as (
select 'Jenny' from dual union all
select 'Johnny' from dual union all
select 'Doe' from dual union all
select 'Ken' from dual union all
select 'Smith' from dual
)
select substr(f_name, 1, 2)
from data
Returns:
SUBSTR(F_NAME,1,2)
------------------
Je
Jo
Do
Ke
Sm
USE SUBSTRING
CASE WHEN LENGTH(f_name) > 4 THEN SUBSTR(f_name,1, 2)
If you want to get least acronym by all names. You may write something like
with s as (select level as lvl from dual connect by level <(select max(LENGTH(f_name)) from your_table ))
select f_name,
max(sub_f_name) keep (dense_rank FIRST order by cnt, t.lvl desc) as least_acronym
select f_name
, substr(t.f_name,-lvl) as sub_f_name
, t.lvl
, count(*) over (partition by substr(t.f_name,-lvl)) as cnt
from your_table t
, s)
group by f_name
NB. Just as Idea. Not tested yet

Fetching value from Pipe-delimited String using Regex (Oracle)

I have a sample source string like below, which was in pipe delimited format in that the value obr can be at anywhere. I need to get the second value of the pipe from the first occurrence of obr. So for the below source strings the expected would be,
Source string:
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
Expected output:
value1
value2
value3
I have tried the below regexp in oracle sql, but it is not working fine properly.
with t as (
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
)
select text,to_char(regexp_replace(text,'*obr\|([^|]*\|)([^|]*).*$', '\2')) output from t;
It is working fine when the string starts with OBR, but when OBR is in the middle like the above samples it is not working fine.
Any help would be appreciated.
Not sure of how Oracle handles regular expressions, but starting with an asterisk usually implies that you're looking for zero or more null characters.
Have you tried '^.*obr\|([^|]*\|)([^|]*).*$' ?
This handles null elements and is wrapped in a NVL() call which supplies a value if 'obr' is not found or occurs too far toward the end of a record so a value 2 away is not possible:
SQL> with t(id, text) as (
select 1, 'asd|dfg|obr|1|value1|end' from dual
union
select 2, 'a|brx|123|obr|2|value2|end' from dual
union
select 3, 'hfv|obr|3|value3|345|pre|end' from dual
union
select 4, 'hfv|obr||value4|345|pre|end' from dual
union
select 5, 'a|brx|123|obriem|2|value5|end' from dual
union
select 6, 'a|brx|123|obriem|2|value6|obr' from dual
)
select
id,
nvl(regexp_substr(text, '\|obr\|[^|]*\|([^|]*)(\||$)', 1, 1, null, 1), 'value not found') value
from t;
ID VALUE
---------- -----------------------------
1 value1
2 value2
3 value3
4 value4
5 value not found
6 value not found
6 rows selected.
SQL>
The regex basically can be read as "look for a pattern of a pipe, followed by 'obr', followed by a pipe, followed by zero or more characters that are not a pipe, followed by a pipe, followed by zero or more characters that are not a pipe (remembered in a captured group), followed by a pipe or the end of the line". The regexp_substr() call then returns the 1st captured group which is the set of characters between the pipes 2 fields from the 'obr'.

substring, after last occurrence of character?

I need help with this problem:
I have a column named phone_number and I wanted to query this column to get the the string right of the last occurrence of '.' for all kinds of numbers in one single sql query.
example #:
515.123.1277
011.44.1345.629268
I need to get 1277 and 629268 respectively.
I have this so far:
select phone_number,
case when length(phone_number) <= 12
then
substr(phone_number,-4)
else
substr (phone_number, -6) end
from employees;
This works for this example, but I want it for all kinds of # formats.
Would be great to get some input.
Thanks
It should be as easy as this regex:
SELECT phone_number, REGEXP_SUBSTR(phone_number, '[^.]*$')
FROM employees;
With the end anchor $ it should get everything that is not a . character after the final .. If the last character is . then it will return NULL.
Search for a pattern including the period, [.] with digits, \d, followed by the end of the string, $.
Associate the digits with a character group by placing the pattern, \d, in parenthesis (see below). This is referenced with the subexpr parameter, 1 (last parameter).
Here is the solution:
SCOTT#dev> list
1 WITH t AS
2 ( SELECT '414.352.3100' p_number FROM dual
3 UNION ALL
4 SELECT '515.123.1277' FROM dual
5 UNION ALL
6 SELECT '011.44.1345.629268' FROM dual
7 )
8* SELECT regexp_substr(t.p_number, '[.](\d+)$', 1, 1, NULL, 1) end_num FROM t
SCOTT#dev> /
END_NUM
========================================================================
3100
1277
629268
You can do something like this in oracle:
select regexp_substr(num,'[^\.]+',1,regexp_count(num,'\.')+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );
Previous to 11gR2 you can use regexp_replace instead regexp_count:
select regexp_substr(num,'[^\.]+',1,length(regexp_replace (num , '[^\.]+'))+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );