How to remove any special characters from a string even with dot and comma and spaces - sql

INPUT STRING:'HI every one. I want to (2-21-2022) remove the comma-dot and other any special character from string(123)'.
OUTPUT STRING:'HI every one I want to 2-21-2022 remove the #comma dot and other any special #character from string 123'
Thanks IN Advance.

If what you said in title:
remove any special characters from a string even with dot and comma and spaces
means that you'd want to keep only digits and letters, then such a regular expression might do:
SQL> with test (col) as
2 (select 'HI every one. I want to (2-21-2022) remove the comma-dot and other any special character from string(123)' from dual)
3 select regexp_replace(col, '[^[:alnum:]]') result
4 from test;
RESULT
---------------------------------------------------------------------------------
HIeveryoneIwantto2212022removethecommadotandotheranyspecialcharacterfromstring123
SQL>
On the other hand, that's not what example you posted represents (as already commented).

Related

How do I insert a character every 2 spaces in a string?

I have a table with a 50 CHAR column, with a content like
AABB
AA
CCXXDD
It's a string used like an array of 25 elements CHAR 2.
I need to insert a comma every 2 characters
AA,BB
AA
CC,XX,DD
It there a system function or I need to create one?
We can do a regex replacement here:
SELECT col, RTRIM(REGEXP_REPLACE(col, '(..)', '\1,'), ',') AS col_out
FROM yourTable;
The above logic inserts a comma after every two characters. For inputs having an even number of characters, this leaves an unwanted dangling comma on the right, which we remove using RTRIM().

How do I extract data between two strings based on a pattern in Oracle SQL

I want to extract the data from a column which is of type CLOB in oracle SQL based on a specific pattern. I tried different things with regex nothing worked so far.
PFB the example on how the data would look like and the expected output.
Sample Data:
I should extract CLOB column preceding the word LIST until one word before the .(dot)
PS: CLOB can have CR LF / Carriage return within the pattern.
Expected Output:
Here is how I would do this. Note a couple of things:
The output preserves newlines that existed in the input. You didn't
say anything about removing them; however, your output doesn't show
them. In any case - they can be removed, if needed, but that is an
unrelated process.
You say "word" but obviously you are using that in a sense different
from the common usage in regular expressions. In regexp, "word
characters" are only letters, digits and underscore; yet your
"words" include brackets, equal sign, and who knows what else. I interpreted the term "word" to mean any
sequence of consecutive non-whitespace characters.
Here is how we can recreate your data. When you ask a question here, this is how you should provide sample data - not as an image that we can't copy and paste in an SQL editor.
CREATE TABLE sample_data( col_a varchar2(20), col_b CLOB );
INSERT INTO sample_data VALUES
('12345', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]'));
INSERT INTO sample_data VALUES
('12346', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]'));
commit;
Then here is the query and the output. Note that, depending on your interface (in my case: SQL Developer, which uses a SQL*Plus-like interface), you may need to change some settings so that the output is not truncated. In particular, in SQL*Plus, CLOB columns are truncated to 80 characters by default; I had to
set long 100
So - query and output:
select col_a, col_b,
regexp_substr(col_b, '(\s|^)(LIST:[^.]*?)\s+\S+\.', 1, 1, null, 2)
as result
from sample_data
;
COL_A COL_B RESULT
----- ------------------------------ ------------------------------
12345 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]
12346 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]
The regular expression matches a single whitespace character or the beginning of the string ((\s|^)), then the characters LIST: followed by as few consecutive, non-period characters (this will match spaces and newline characters, in particular) as needed to allow a match - which continues with one or more whitespace characters, followed by a single word (string of 1 or more non-whitespace characters) and a literal period (\.).
The expression we must return is enclosed in parentheses, so that we can return it from regexp_substr. Such an expression is called a "capture group". The regexp includes another capture group, (\s|^), out of necessity (alternation), so the capture group we must return is the second in the regexp. This is what the last argument to regexp_substr does: it instructs the function to return the second capture group.
Note a peculiar thing about the period (related to the much more general concept of escaping within bracket expressions): the period must be escaped to represent a literal period, rather than "any character", at the end of the regular expression; however, within the (negated) bracket expression [^.]*?, the period - representing a literal period, not "any character" - is not escaped. Oracle follows the ERE (extended regular expressions) dialect of the POSIX standard, and that standard says that escape sequences are invalid within bracket expressions. This is different from other regular expression dialect, and confuses a lot of users.
One option would be using REPLACE() in order to remove line feed (CHR(10)) and carriage return (CHR(13)), then REGEXP_REPLACE() functions recursively in order to extract the substring after LIST: upto the dot such as
SELECT col_a,
'LIST:'||REGEXP_REPLACE(REPLACE(REPLACE(col_b,CHR(10)),CHR(13)),'(.*LIST:)(\S+)(\..*)','\2') AS result
FROM t;
col_a result
------ -------
12345 LIST:[ABC][DEF][GHI][LMNO][PQRST][Location=BLAH]
12346 LIST:[ABC][DEF][GHI][LMNO][PQRST][SOC]
Demo
There may be more efficient ways to do this, but the following seems to work:
First I replace newline characters with spaces using TRANSLATE, then using regex find anything between LIST: and .. Then I remove the final "word" using SUBSTR and INSTR. I've used a subquery to prevent having to repeat the first steps.
SELECT
SubQuery.COL_A,
SUBSTR(SubQuery.WithWordAndDot, 1, INSTR(SubQuery.WithWordAndDot,' ',-1)-1) AS Result
FROM
(
SELECT
COL_A,
REGEXP_SUBSTR(TRANSLATE(COL_B, CHR(10)||CHR(13), ' '),'LIST:[^\.]+\.') as WithWordAndDot
FROM MyTable
) SubQuery
;

Replacing multiple special characters using Oracle SQL functions

#All - Thanks for your help
ID Email
1 karthik.sanu#gmail.com. , example#gmail.com#
2 karthik?sanu#gmail.com
In the above example, if you see the 1st row, the email address is invalid because of dot at the end of 1st email address and # at the end of 2nd email address.
In 2nd row, there is a ? in between email address.
Just wanted to know is there any way to handle there characters and remove those from email address using SQL function and update the same in table.
Thanks in advance.
you can also check a translate function
translate('my ,string#with .special chars','#,?. ', ' ')
You could nest multiple invokations of replace(), but this quickly becomes convoluted.
On the other hand, regexp_replace() comes handy for this:
regexp_replace(column_name, '#|,|\?|\.', ' ')
The pipe character (|) stands for or. The dot (.) and the question mark (?) are meaningful characters in regexes so they need to escaped with a backslash (\).
Something like this will "remove" everything but digits, letters and spaces (if that's what you wanted).
SQL> with test (col) as
2 (select 'This) is a se#nten$ce with. everything "but/ only 123 numbers, and ABC lett%ers' from dual)
3 select regexp_replace(col, '[^a-zA-Z0-9 ]') result
4 from test;
RESULT
-----------------------------------------------------------------------
This is a sentence with everything but only 123 numbers and ABC letters
SQL>

Get the FIrst character from the Firstname if more than one firstname is present in oracle

I have a requirement where the persons can have more than one first name and i need to convert the name into first characters from the 2 or more names with capital letters.
Examples:
Srinivas Kalyan ,Sai Kishore
if the above is one having 4 first names then i need to display as below
S K S K
The comma is also be replaced and take all the first characters
2. Srinivas-Kalyan Sai Kishore
For the above name the value should be as
S S K
since Srinivas-Kalyan is considered as one name
Also the name can be in small letters
srinivas kalyan sai kishore
For this
S K S K
This has to be in oracle
Tried the below regex_replace which is working fine in sql developer but in the application it is changing into space
replace(trim(regexp_replace(to_char(regexp_replace(initcap(regexp_replace(regexp_replace
(FIRST_NAME, '[0-9]', ''), '( *[[:punct:]])', '')), '([[:lower:]]| )')), '(.)', '\1 ')),',',null)
The missing letter in your output is caused by this regex for character removal:
( *[[:punct:]])
This will turn Kalyan ,Sai into KalyanSai, which will be treated as one word by the rest of the process, and so you will not have the S of Sai in your output.
I would suggest this shorter expression:
trim(upper(regexp_replace(
regexp_replace(first_name, '([[:alpha:]])(-|[[:alpha:]])+', '\1'),
'.*?([[:alpha:]]|$)', ' \1'
)))
Explanation
([[:alpha:]])(-|[[:alpha:]])+ -> \1
This replaces any sequence of letters (or hyphen) with the first of those. So it reduces words to their initial letter.
.*?([[:alpha:]]|$) -> (space)\1
This replaces anything that precedes the next letter, with a space. As no letter will be skipped (because .*? is non-greedy) this effectively replaces all non-letter sequences with a space. To also replace the non-letters at the very end (which do not precede a letter), the special case $ (end-of-string) is added as alternative.
After these two steps there just remains to:
Upper case everything
Remove blanks at the start and end of the result with trim
I find the advantage of this method is that it does not use any other class than [[:alpha]]. Digits, punctuation, lowercase -- or whatever else -- does not need to be identified explicitly, as it is just the negation of [[:alpha]]. The only exception that has to be made, is for the hyphen.
As some names might include some other non-letters, like a quote, you might want to add such characters as well in the first regular expression.

REGEXP to insert special characters, not remove

How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?
string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Please suggest! Thank you.
This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.
One rather brute force method for internal fields is:
replace(replace(string, ',', '","'), '""', '"')
This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".
This can be adapted for the first and last fields as well, but it complicates the expression.
This offering attempts to address a number of end cases:
Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
Preserving a pair of double quotes within a field representing a single double quote.
The SQL:
WITH orig(str) AS (
SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
FROM dual
),
rpl_first(str) AS (
SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5')
FROM orig
)
SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
FROM rpl_first;
The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.
The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".
The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).
So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).
Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.
The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.
Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.
My thanks to #Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).
This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
from dual
),
rpl_first(str) as (
select regexp_replace(str, '(",)([^"])', '\1"\2')
from orig
)
select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
SQL>
EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
from dual union
select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
select '"ES26653","ABCBEVERAGES",861526999728' from dual union
select '1S26653,"ABCBEVERAGES",861526999728' from dual union
select '"ES26653",,861526999728' from dual
),
rpl_empty(str) as (
select regexp_replace(str, ',,', ',"",')
from orig
),
rpl_first(str) as (
select regexp_replace(str, '(",|^)([^"])', '\1"\2')
from rpl_empty
)
select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"
SQL>