remove & symbolic from Query - sql

we have Oracle query running in Informatica SQ transformation, I have given my query below
SELECT
CAST(T.COLUMN_VALUE.EXTRACT('//text()') AS VARCHAR2(200))
FROM
(SELECT regexp_replace('assaley&lee#direct.wvhin.org','(!##$%^\&*()_+=)*','') as RECIPIENTS FROM DUAL) T1,
TABLE( xmlsequence( XMLTYPE( '<x><x>' || REPLACE(t1.RECIPIENTS, ',', '</x><x>') || '</x></x>' ).EXTRACT('//x/*'))) t
where length(T1.RECIPIENTS) =28 -- and length(T1.RECIPIENTS) > 25
if i run this above query prompting some user input due '&' this symbolic reference, i should turnoff this prompt step.
could any one help me with this one?
Note: 'assaley&lee#direct.wvhin.org' this value is hard-coded value.
Thanks
Pandia

One method is to use UNISTR function and hardcode& as unicode literal: \0026:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
SELECT unistr( 'assaley\0026lee#direct.wvhin.org' ) x
from dual;
X
----------------------------
assaley&lee#direct.wvhin.org

Regardless of the issue you are currently experiencing, the code should be corrected.
Within the regular expression you should use square brackets instead or round brackets.
Round brackets are intend for expression capturing.
Square brackets are intend for characters set.
SELECT regexp_replace('assaley&lee#direct.wvhin.org','[!##$%^\&*()_+=]+','') as RECIPIENTS
FROM DUAL
;
Or -
remove anything except for letters and digits:
SELECT regexp_replace('assaley&lee#direct.wvhin.org','[^[:alpha:][:digit:]]','') as RECIPIENTS
FROM DUAL
;

Related

How do I dynamically extract substring from string?

I’m trying to dynamically extract a substring from a very long URL. For example, I may have the following URLs:
https://www.google.com/ABCDEF Version=“0.0.00.0” GHIJK
https://www.google.com/ABCDEFGH Version=“0.0.0.0” IJKLM
https://www.google.com/ABC Version=“0.0.0.00” 12345
I am trying to extract the version code only (0.0.0.0).
This is what I have so far:
SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9)
FROM table
This query returns the following result:
0.0.00.0” GHIJK … (url continues on)
So, I attempt to find “Version” in the link, so I can start from the same position in each row. This works fine, however I’m having a hard time dynamically locating the ending quote (“). I tried using INSTR in the third parameter of my SUBSTR function, like so:
SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9, INSTR(col, ‘“‘))
FROM table
I figured that this would find the position of the ending quote, and then use that number for the length, but it returns a strange output. I’ve also used POSITION, CHARINDEX, LENGTH, and LOCATE. None of these functions work in Oracle.
I think maybe when I put +9 after the first INSTR function, it’s setting the query to a fixed position instead of a dynamic one, but I’m not sure how else to remove ‘Version=“‘.
Here's one option (which, actually, selects what's between double quotes - that's version in your example; if there were some other similar substring, you'd get a wrong result).
with test (col) as
(select 'https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK' from dual union all
select 'https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM' from dual union all
select 'https://www.google.com/ABC Version="0.0.0.00" 12345' from dual
)
select col,
replace(regexp_substr(col, '".+"'), '"') version
from test;
which results in
https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK 0.0.00.0
https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM 0.0.0.0
https://www.google.com/ABC Version="0.0.0.00" 12345 0.0.0.00
You can still use use INSTR to locate the second " in the string, then subtract the location of the first " to get the length that you need to get. Below is an example query:
SELECT col,
SUBSTR (col, INSTR (col, '"') + 1, INSTR (col, '"', 1, 2) - INSTR (col, '"') - 1) version
FROM test;
You can use REGEXP_SUBSTR() with Version=(\d.*\d?) pattern in order to extract the piece between Version=" and "(your quotes are presumed to be regular double quotes " ")
SELECT REGEXP_SUBSTR(url,'Version="(\d.*\d)"',1,1,null,1) AS version
FROM t
where
the third argument(1) is position,
the fourth argument(1) is occurence, and especially important to use the last one as being capture group (1)
indeed using '"(\d.*\d)"' pattern is enough for the
current data set
or
REGEXP_REPLACE() with capture group \2 as
SELECT REGEXP_REPLACE(url,'^(.*Version=")([^"]*).*','\2') AS version
FROM t
Demo

Regex: how to get the text between a few colons?

So, i have a lot of strings like the ones below in my database:
product1:1stparty:single_aduls:android:
product2:3rdparty:married_adults:ios:
product3:3rdparty:other_adults:android:
I need a regex to get only the text after the product name and before the device category. So, in the first line I'd get 1stparty:single_aduls, in the second 3rdparty:married_adults and in the third 3rdparty:other_adults. I'm stuck and can't find a way to solve that. Could anyone help me please?
As a regular expression, you can use:
select regexp_extract('product1:1stparty:single_aduls:android:', '^[^:]*:(.*):[^:]*:$')
This returns every after the first colon and before the penultimate colon.
We can try using REGEXP_REPLACE here:
SELECT REGEXP_REPLACE(val, r"^.*?:|:[^:]+:$", "") AS output
FROM yourTable;
This approach removes either the leading ...: or trailing :...: from the column, leaving behind the content you want. Here is a demo showing that the regex replacement is working:
Demo
You can also use standard split function and access result array element by index, which is quite clear to read and understand.
with a as (
select split('product1:1stparty:single_aduls:android:', ':') as splitted
)
select splitted[ordinal(2)] || ':' || splitted[ordinal (3)] as subs
from a
Consider below example
with your_table as (
select 'product1:1stparty:single_aduls:android:' txt union all
select 'product2:3rdparty:married_adults:ios:' union all
select 'product3:3rdparty:other_adults:android:'
)
select *,
(
select string_agg(part, ':' order by offset)
from unnest(split(txt, ':')) part with offset
where offset in (1, 2)
) result
from your_table
with output

Using REGEXP_SUBSTR with Strings Qualifier

Getting Examples from similar Stack Overflow threads,
Remove all characters after a specific character in PL/SQL
and
How to Select a substring in Oracle SQL up to a specific character?
I would want to retrieve only the first characters before the occurrence of a string.
Example:
STRING_EXAMPLE
TREE_OF_APPLES
The Resulting Data set should only show only STRING_EXAM and TREE_OF_AP because PLE is my delimiter
Whenever i use the below REGEXP_SUBSTR, It gets only STRING_ because REGEXP_SUBSTR treats PLE as separate expressions (P, L and E), not as a single expression (PLE).
SELECT REGEXP_SUBSTR('STRING_EXAMPLE','[^PLE]+',1,1) from dual;
How can i do this without using numerous INSTRs and SUBSTRs?
Thank you.
The problem with your query is that if you use [^PLE] it would match any characters other than P or L or E. You are looking for an occurence of PLE consecutively. So, use
select REGEXP_SUBSTR(colname,'(.+)PLE',1,1,null,1)
from tablename
This returns the substring up to the last occurrence of PLE in the string.
If the string contains multiple instances of PLE and only the substring up to the first occurrence needs to be extracted, use
select REGEXP_SUBSTR(colname,'(.+?)PLE',1,1,null,1)
from tablename
Why use regular expressions for this?
select substr(colname, 1, instr(colname, 'PLE')-1) from...
would be more efficient.
with
inputs( colname ) as (
select 'FIRST_EXAMPLE' from dual union all
select 'IMPLEMENTATION' from dual union all
select 'PARIS' from dual union all
select 'PLEONASM' from dual
)
select colname, substr(colname, 1, instr(colname, 'PLE')-1) as result
from inputs
;
COLNAME RESULT
-------------- ----------
FIRST_EXAMPLE FIRST_EXAM
IMPLEMENTATION IM
PARIS
PLEONASM

Delete certain character based on the preceding or succeeding character - ORACLE

I have used REPLACE function in order to delete email addresses from hundreds of records. However, as it is known, the semicolon is the separator, usually between each email address and anther. The problem is, there are a lot of semicolons left randomly.
For example: the field:
123#hotmail.com;456#yahoo.com;789#gmail.com;xyz#msn.com
Let's say that after I deleted two email addresses, the field content became like:
;456#yahoo.com;789#gmail.com;
I need to clean these fields from these extra undesired semicolons to be like
456#yahoo.com;789#gmail.com
For double semicolons I have used REPLACE as well by replacing each ;; with ;
Is there anyway to delete any semicolon that is not preceded or following by any character?
If you only need to replace semicolons at the start or end of the string, using a regular expression with the anchor '^' (beginning of string) / '$' (end of string) should achieve what you want:
with v_data as (
select '123#hotmail.com;456#yahoo.com;789#gmail.com;xyz#msn.com' value
from dual union all
select ';456#yahoo.com;789#gmail.com;' value from dual
)
select
value,
regexp_replace(regexp_replace(value, '^;', ''), ';$', '') as normalized_value
from v_data
If you also need to replace stray semicolons from the middle of the string, you'll probably need regexes with lookahead/lookbehind.
You remove leading and trailing characters with TRIM:
select trim(both ';' from ';456#yahoo.com;;;789#gmail.com;') from dual;
To replace multiple characters with only one occurrence use REGEXP_REPLACE:
select regexp_replace(';456#yahoo.com;;;789#gmail.com;', ';+', ';') from dual;
Both methods combined:
select regexp_replace( trim(both ';' from ';456#yahoo.com;;;789#gmail.com;'), ';+', ';' ) from dual;
regular expression replace can help
select regexp_replace('123#hotmail.com;456#yahoo.com;;456#yahoo.com;;789#gmail.com',
'456#yahoo.com(;)+') as result from dual;
Output:
| RESULT |
|-------------------------------|
| 123#hotmail.com;789#gmail.com |

How can I count the number of instances of this character at the end of a string in SQL or PL/SQL?

I've got a string that ends with a certain number of '=' characters at the end. It's basically a base 64 string.
How can I get this count of '=' characters at the end? A built-in SQL function or regex would be preferred.
I know about the instr function, but it doesn't seem like it could be applied here. I'm not sure if a regex would apply here either.
Use length and replace
select length(some_column) - length(replace(some_column, '=', ''))
from your_table
SELECT REGEXP_COUNT('hello world==', '=') cnt FROM dual
/
SELECT count(distinct(Instr('hello world==','=', LEVEL))) cnt
FROM dual
CONNECT BY LEVEL <= Length('hello world==')
ORDER BY 1
/