Oracle replace a character not followed by another character - sql

I am attempting to replace all of the &'s in a string with &amp unless the & is followed by lt, apos, gt or quot.
Running this statement
select
regexp_replace('&lt &apos &gt &quot &','&(^lt|^gt|^quot|^apos)','&amp')
however results in no changes to the string.
The output I would be looking for is
'&lt &apos &gt &quot &amp'

A direct and efficient solution (but difficult to write, read and maintain) is:
set define off
(in case you are using a front-end that uses & to mark substitution variables)
then
with
inputs ( inp_str ) as (
select '&lt &apos &gt &quot &' from dual union all
select 'Hello, World!' from dual union all
select '' from dual union all
select '7 &lt 10 &and &&quot' from dual
)
select inp_str,
regexp_replace(inp_str,
'&($|[^lagq]|(g|l)([^t]|$)|a($|[^p]|p($|[^o]|o($|[^s])))|q($|[^u]|u($|[^o]|o($|[^t]))))',
'&amp\1') as new_str
from inputs;
Explanation: (partial...) This will replace every & with &amp, with a few exceptions. The & will be replaced if:
It is followed by the end of the string ($), or
It is followed by any character other than l, a, g or q; or
it is followed by g or l, which is then followed by a character other than t, or by the end of string ($); or
It is followed by a, followed by the end of string, by any letter other than p, or by the letter p followed by the end of string, or .........
Output (from my inputs):
INP_STR NEW_STR
---------------------------- ----------------------------
&lt &apos &gt &quot & &lt &apos &gt &quot &amp
Hello, World! Hello, World!
7 &lt 10 &and &&quot 7 &lt 10 &ampand &amp&quot
4 rows selected.
(Note: I always include an empty string and a string with no ampersands among the inputs, to verify that the query works correctly on them too.)

These codes look much like HTML entity names, but the ending semi-colons are missing... making it less clear where a name ends.
In the following solution I assume that these entities cannot be followed immediately by a letter, a digit nor underscore.
When a & is followed by such a character, it is considered an entity, and not touched. Only the other & are replaced.
select regexp_replace('&lt &apos &gt &quot &', '&(\W|$)', '&amp\1') from dual;
The \W|$ matches either with a character that is not a letter, digit or underscore, or with the end of the string.

Related

How to get first string after character Oracle SQL

I'm trying to get first string after a character.
Example is like
ABCDEF||GHJ||WERT
I need only
GHJ
I tried to use REGEXP but i couldnt do it.
Can anyone help me with please?
Thank you
Somewhat simpler:
SQL> select regexp_substr('ABCDEF||GHJ||WERT', '\w+', 1, 2) result from dual;
^
RES |
--- give me the 2nd "word"
GHJ
SQL>
which reads as: give me the 2nd word out of that string. Won't work properly if GHJ consists of several words (but that's not what your example suggests).
Something like I interpret with a separator in place, In this case it is || or | example is with oracle database
-- pattern -- > [^] represents non-matching character and + for says one or more character followed by ||
-- 3rd parameter --> starting position
-- 4th parameter --> nth occurrence
WITH tbl(str) AS
(SELECT 'ABCDEF||GHJ||WERT' str FROM dual)
SELECT regexp_substr(str
,'[^||]+'
,1
,2) output
FROM tbl;
I think the most general solution is:
WITH tbl(str) AS (
SELECT 'ABCDEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABC|DEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABClDEF||GHJ||WERT' str FROM dual
)
SELECT regexp_replace(str, '^.*\|\|(.*)\|\|.*', '\1')
FROM tbl;
Note that this works even if the individual elements contain punctuation or a single vertical bar -- which the other solutions do not. Here is a comparison.
Presumably, the double vertical bar is being used for maximum flexibility.
You should use regexp_substr function
select regexp_substr('ABCDEF||GHJ||WERT ', '\|{2}([^|]+)', 1, 1, 'i', 1) str
from dual;
STR
---
GHJ

regexp no space

The regexp [[:blank:]] and \s arent they the same.
The below shows 2 different results.
select regexp_replace('Greg94/Eric99Chandler/Faulkner','/','')
from dual
where regexp_like(trim('Greg94/Eric99Chandler/Faulkner'),'[^[[:blank:]]]');
The above query returns no rows whereas when i replace blank with [^/s] it returns the row.
the problem is that you are using [[:blank:]] instead of [:blank:].
Regular Expression [^ [[:blank:]]] evaluate:
^[[:blank:]] : no character within the list "[, [:blank:]"
] last character to be evaluated.
or you remove the last character ']' which is the one that is not returning records or correct the expression:
[^ [:blank:]]
[^\s] is correct.
That would be
SQL> SELECT regexp_replace('Greg94/Eric99Chandler/Faulkner','/','') as result
2 FROM dual
3 WHERE REGEXP_LIKE(TRIM('Greg94/Eric99Chandler/Faulkner'), '[^[:blank:]]');
RESULT
--------------------------------------------------
Greg94Eric99ChandlerFaulkner
SQL>
SQL> SELECT regexp_replace('Greg94/Eric99Chandler/Faulkner','/','') as result
2 FROM dual
3 WHERE NOT REGEXP_LIKE(TRIM('Greg94/Eric99Chandler/Faulkner'), '[[:blank:]]');
RESULT
--------------------------------------------------
Greg94Eric99ChandlerFaulkner
SQL>
SQL> SELECT regexp_replace('Greg94/Eric99Chandler/Faulkner','/','') as result
2 FROM dual
3 WHERE REGEXP_LIKE(TRIM('Greg94/Eric99Chandler/Faulkner'), '[^\s]');
RESULT
--------------------------------------------------
Greg94Eric99ChandlerFaulkner
SQL>
Pick the one you like the most. Besides, if you found what works OK, why don't you simply use it (and forget about the one that doesn't work)? (I guess I know - because of but WHY???).
Perhaps a clearer test would be to generate some strings containing various whitespace characters and then use case expressions to see whether they match different regexes.
with demo (str) as
( select ':' from dual union all
select 'a' from dual union all
select 'b' from dual union all
select 'c' from dual union all
select 'contains'||chr(9)||'tabs' from dual union all
select 'contains'||chr(10)||chr(13)||'linebreaks' from dual union all
select 'contains some spaces' from dual
)
select str
, case when regexp_like(str,'[:blank:]') then 'Y' end as "[:blank:]"
, case when regexp_like(str,'[[:blank:]]') then 'Y' end as "[[:blank:]]"
, case when regexp_like(str,'[[:space:]]') then 'Y' end as "[[:space:]]"
, case when regexp_like(str,'\s') then 'Y' end as "\s"
from demo
order by 1;
STR [:blank:] [[:blank:]] [[:space:]] \s
-------------------- --------- ----------- ----------- --
: Y
a Y
b Y
c
contains tabs Y Y Y
contains Y Y Y
linebreaks
contains some spaces Y Y Y Y
(I manually edited the result for the row with tabs to align the results, otherwise the tab messes it up and makes it harder to read.)
[:blank:] matches any of :, b, l, a, n, k, because a character class is only valid within a [] bracket expression.
[[:blank:]] only matches spaces.
[[:space:]] matches tab, newline, carriage return and space.
\s is the same as [[:space:]]
As for your example, it is not behaving as you expected in two different ways.
Firstly, [^[[:blank:]]] should be [^[:blank:]] - that is, the character class [:blank:] within a bracketed expression.
Secondly, the corrected syntax still returns a match when there are no blanks because it looks for any character that is not a space, for example the first character G is not a space so it matches the expression:
regexp_like('Greg94/Eric99Chandler/Faulkner','[^ ]');
To identify strings that do not contain any whitespace character, you should use:
not regexp_like(str,'\s')
or
not regexp_like(str, '[[:space:]]')

How to use regexp_substr() with group of delimiter characters?

I have a string something like this 'SERO02~~~NA_#ERO5'. I need to sub string it using delimiter ~~~. So can get SERO02 and NA_#ERO5 as result.
I create an regex experession like this:
select regexp_substr('SERO02~~~NA_#ERO5' ,'[^~~~]+',1,2) from dual;
It worked fine and returns : NA_#ERO5
But if I change the string to ERO02~NA_#ERO5 the result is still same.
But I expect the expression to return nothing since delimiter ~~~ is not found in that string. Can someone help me out to create correct expression?
[^~~~] matches a single character that is not one of the characters following the caret in the square brackets. Since all those characters are identical then [^~~~] is the same as [^~].
You can match it using:
SELECT REGEXP_SUBSTR(
'SERO02~~~NA_#ERO5',
'~~~(.*?)(~~~|$)',
1,
1,
NULL,
1
)
FROM DUAL;
Which will match ~~~ then store zero-or-more characters in a capture group (the round brackets () indicates a capture group) until it finds either ~~~ or the end-of-string. It will then return the first capture group.
You can do it without regular expressions, with a bit of logics:
with test(text) as ( select 'SERO02~~~NA_#ERO5' from dual)
select case
when instr(text, '~~~') != 0 then
substr(text, instr(text, '~~~') + 3)
else
null
end
from test
This will give the part of the string after '~~~', if it exists, null otherwise.
You can edit the ELSE part to get what you need when the input string does not contain '~~~'.
Even using regexp,to match the string '~~~', you need to write it exactly, without []; the [] is used to list a set of characters, so [aaaaa] is exactly the same than [a],while [abc] means 'a' OR 'b' OR 'c'.
With regexp, even if not necessary, one way could be the following:
substr(regexp_substr(text, '~~~.*'), 4)
In case you want all elements. Handles NULL elements too:
SQL> with tbl(str) as (
select 'SERO02~~~NA_#ERO5' from dual
)
select regexp_substr(str, '(.*?)(~~~|$)', 1, level, null, 1) element
from tbl
connect by level <= regexp_count(str, '~~~') + 1;
ELEMENT
-----------------
SERO02
NA_#ERO5
SQL>

Insert character between string Oracle SQL

I need to insert character string after each character in Oracle SQL.
Example:
ABC will A,B,C
DEFG will be D,E,F,G
This question gives only one character in string
Oracle insert character into a string
Edit: As some fellows have mentioned, Oracle does not admit this regex. So my approach would be to do a regex to match all characters, add them a comma after the character and then removing the last comma.
WITH regex AS (SELECT REGEXP_REPLACE('ABC', '(.)', '\1,') as reg FROM dual) SELECT SUBSTR(reg, 1, length(reg)-1) FROM regex;
Note that with the solution of rtrim there could be errors if the string you want to parse has a final ending comma and you don't want to remove it.
Previous solution: (Not working on Oracle)
Check if this does the trick:
SELECT REGEXP_REPLACE('ABC', '(.)(?!$)', '\1,') FROM dual;
It does a regexp_replace of every character, but the last one for the same character followed by a ,
To see how regexp_replace works I recommend you: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
SELECT rtrim(REGEXP_REPLACE('ABC', '(.)', '\1,'),',') "REGEXP_REPLACE" FROM dual;
You could do it using:
REGEXP_REPLACE
RTRIM
For example,
SQL> WITH sample_data AS(
2 SELECT 'ABC' str FROM dual UNION ALL
3 SELECT 'DEFG' str FROM dual UNION ALL
4 SELECT 'XYZ' str FROM dual
5 )
6 -- end of sample_data mimicking a real table
7 SELECT str,
8 rtrim(regexp_replace(str, '(\w?)', '\1,'),',') new_str
9 FROM sample_data;
STR NEW_STR
---- ----------
ABC A,B,C
DEFG D,E,F,G
XYZ X,Y,Z
Since there is no way to negate the end of string in an Oracle regex (that does not support lookarounds), you may use
SELECT REGEXP_REPLACE(
REGEXP_REPLACE('ABC', '([^,])([^,])','\1,\2'),
'([^,])([^,])',
'\1,\2')
AS Result from dual
See the DB Fiddle. The point here is to use REGEXP_REPLACE with ([^,])([^,]) pattern twice to cater for consecutive matches.
The ([^,])([^,]) pattern matches any non-comma char into Group 1 (\1) and then any non-comma char into Group 2 (\2), and inserts a comma in between them.

Delete certain character based on the preceding or succeeding character - ORACLE

I have used REPLACE function in order to delete email addresses from hundreds of records. However, as it is known, the semicolon is the separator, usually between each email address and anther. The problem is, there are a lot of semicolons left randomly.
For example: the field:
123#hotmail.com;456#yahoo.com;789#gmail.com;xyz#msn.com
Let's say that after I deleted two email addresses, the field content became like:
;456#yahoo.com;789#gmail.com;
I need to clean these fields from these extra undesired semicolons to be like
456#yahoo.com;789#gmail.com
For double semicolons I have used REPLACE as well by replacing each ;; with ;
Is there anyway to delete any semicolon that is not preceded or following by any character?
If you only need to replace semicolons at the start or end of the string, using a regular expression with the anchor '^' (beginning of string) / '$' (end of string) should achieve what you want:
with v_data as (
select '123#hotmail.com;456#yahoo.com;789#gmail.com;xyz#msn.com' value
from dual union all
select ';456#yahoo.com;789#gmail.com;' value from dual
)
select
value,
regexp_replace(regexp_replace(value, '^;', ''), ';$', '') as normalized_value
from v_data
If you also need to replace stray semicolons from the middle of the string, you'll probably need regexes with lookahead/lookbehind.
You remove leading and trailing characters with TRIM:
select trim(both ';' from ';456#yahoo.com;;;789#gmail.com;') from dual;
To replace multiple characters with only one occurrence use REGEXP_REPLACE:
select regexp_replace(';456#yahoo.com;;;789#gmail.com;', ';+', ';') from dual;
Both methods combined:
select regexp_replace( trim(both ';' from ';456#yahoo.com;;;789#gmail.com;'), ';+', ';' ) from dual;
regular expression replace can help
select regexp_replace('123#hotmail.com;456#yahoo.com;;456#yahoo.com;;789#gmail.com',
'456#yahoo.com(;)+') as result from dual;
Output:
| RESULT |
|-------------------------------|
| 123#hotmail.com;789#gmail.com |