Regular Expression - Retrieve specific asterisk separated value string - sql

I need to retrieve a specific part of a string which has values separated by asterisk's
In the example below I need to retrieve the string Client Contact Centre Seniors2 which sits between the 6 and 7 asterisk.
I am fairly new to regular expressions and have only managed to find select a value between 2 asterisks using *[\w]+*
Is there a way to specify which number of asterisk to look at using regular expression, or is there a better way for me to retrieve the string I am after?
String:
2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*
Note: I will be using this regular expression in Oracle SQL using REGEXP_LIKE(string, regex).

* is a regex operator and needs to be escaped, unless used inside brackets that holds character list. You can use this simplified pattern to extract the
seventh word.
regexp_substr(Audits.audit_log,'[^*]+',1,7)
SQL Fiddle
Query 1:
with x(y) as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*'
from dual
)
select regexp_substr(y,'([^*]+)\*',1,7,null,1)
from x
Results:
| REGEXP_SUBSTR(Y,'([^*]+)\*',1,7,NULL,1) |
|-----------------------------------------|
| Client Contact Centre Seniors2 |
Query 2:
with x(y) as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*'
from dual
)
select regexp_substr(y,'[^*]+',1,7)
from x
Results:
| REGEXP_SUBSTR(Y,'[^*]+',1,7) |
|--------------------------------|
| Client Contact Centre Seniors2 |

You could also use INSTR and SUBSTR for that. Simple and fast, but not as concise as the REGEXP_SUBSTR.
with t as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*' testvalue
from dual
)
select substr(testvalue, instr(testvalue, '*', 1, 6)+1, instr(testvalue, '*', 1, 7) - instr(testvalue, '*', 1, 6) - 1)
from t;

Related

Select specific data after defined decimal

I have a column that contains version information in a format like:
20.6.4.4200
10.28.30.2678
22.8.34.1200
I’m wanting to only select the values after the last decimal like:
4200
2678
1200
What is the best way to do this or is a Regex needed?
In SQL Server, that has poor regex support, you can use string functions as follows:
right(val, charindex('.', reverse(val)) - 1)
The idea is to get the position of the last dot in the string, counting from the end of the string, then extract the relevant part of the string with right().
Demo on DB Fiddle:
select val, right(val, charindex('.', reverse(val)) - 1) new_val
from (values('20.6.4.4200'), ('10.28.30.2678'), ('22.8.34.1200')) t(val)
GO
val | new_val
:------------ | :------
20.6.4.4200 | 4200
10.28.30.2678 | 2678
22.8.34.1200 | 1200
A regular expression is probably the simplest method. Databases that support regular expressions usually support something that retrieves a substring. The syntax would be like:
select regexp_substr(col, '[0-9]+$')
Of course, the function might have a different name.

How to replace characters at specific position in several words using REGEX_REPLACE

I have a query similar to this:
SELECT YEAR_CODE FROM YEAR_CODES
and it returns several records: typically 1 but sometimes 2 or 3. The returned records look like this: 2018FOO, 2019BAR
I need to get the matching previous year of the returned codes. For instance:
2018FOO becomes 2017FOO
2019BAR becomes 2018BAR
Looking for something similar to:
REGEX_REPLACE(SELECT YEAR_CODE FROM YEAR_CODES, 4th character, 4th character minus 1)
You don't need regexp_replace(), using substr() string operator with concat() function (or concatenation operators ||) is enough :
with year_codes(year_code) as
(
select '2018FOO' from dual union all
select '2019BAR' from dual
)
select concat(substr(year_code,1,4) - 1,substr(year_code,-3)) as year_code
from year_codes;
YEAR_CODE
---------
2017FOO
2018BAR
to_number() conversion is redundant, since Oracle implicitly considers a string as a number which is completely composed of digits for an arithmetic operation.
You can do use string operations:
with c as (
<your query here>
)
select
from year_code yc
where to_number(substr(yc.code, 1, 4)) = to_number(substr(c.code)) - 1 and
substr(yc.code, 5) = substr(c.code, 5)

How to extract a string with several lines between brackets in oracle sql query

I am trying to extract a value between the brackets from a string.
Here(How to extract a string between brackets in oracle sql query), it is explains how to do.
But in my situation, the string has 2 lines. With this way, I get only NULL.
SELECT REGEXP_SUBSTR('Gupta, Abha (01792)', '\((.+)\)', 1, 1, NULL, 1) FROM dual --01792
SELECT REGEXP_SUBSTR('Gupta, Abha (01
792)', '\((.+)\)', 1, 1, NULL, 1) FROM dual -- NULL
I known that i can remove the break line symbol and then use regex_substr but i need to keep the break line symbol
I would adress this with the following regex:
\(([^)]*)\
This makes use of a custom character class, [^)], which means: everything but a closing parenthese. This way, you do not have to worry about line breaks (since, obviously, a line break is not a closing parenthese), or any other special character:
Demo on DB Fiddle:
SELECT REGEXP_SUBSTR('Gupta, Abha (01
792)', '\(([^)]*)\)') res FROM dual
| RES |
| :------------------- |
| (01 |
| 792) |

PLSQL show digits from end of the string

I have the following problem.
There is a String:
There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235
I need to show only just the last date from this string: 2015.06.07.
I tried with regexp_substr with insrt but it doesn't work.
So this is just test, and if I can solve this after it with this solution I should use it for a CLOB query where there are multiple date, and I need only the last one. I know there is regexp_count, and it is help to solve this, but the database what I use is Oracle 10g so it wont work.
Can somebody help me?
The key to find the solution of this problem is the idea of reversing the words in the string presented in this answer.
Here is the possible solution:
WITH words AS
(
SELECT regexp_substr(str, '[^[:space:]]+', 1, LEVEL) word,
rownum rn
FROM (SELECT 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 2015.06.08 2015.06.17. 2015.07.01. 12345678999 12125235' str
FROM dual) tab
CONNECT BY LEVEL <= LENGTH(str) - LENGTH(REPLACE(str, ' ')) + 1
)
, words_reversed AS
(
SELECT *
FROM words
ORDER BY rn DESC
)
SELECT regexp_substr(word, '\d{4}\.\d{2}\.\d{2}', 1, 1)
FROM words_reversed
WHERE regexp_like(word, '\d{4}\.\d{2}\.\d{2}')
AND rownum = 1;
From the documentation on regexp_substr, I see one problem immediately:
The . (period) matches any character. You need to escape those with a backslash: \. in order to match only a period character.
For reference, I am linking this post which appears to be the approach you are taking with substr and instr.
Relevant documentation from Oracle:
INSTR(string , substring [, position [, occurrence]])
When position is negative, then INSTR counts and searches backward from the end of string. The default value of position is 1, which means that the function begins searching at the beginning of string.
The problem here is that your regular expression only returns a single value, as explained here, so you will be giving the instr function the appropriate match in the case of multiple dates.
Now, because of this limitation, I recommend using the approach that was proposed in this question, namely reverse the entire string (and your regular expression, i.e. \d{2}\.\d{2}\.\d{4}) and then the first match will be the 'last match'. Then, perform another string reversal to get the original date format.
Maybe this isn't the best solution, but it should work.
There are three different PL/SQL functions that will get you there.
The INSTR function will identify where the first "period" in the date string appears.
SUBSTR applied to the entire string using the value from (1) as the start point
TO_DATE for a specific date mask: YYYY.MM.DD will convert the result from (2) into a Oracle date time type.
To make this work in procedural code, the standard blocks apply:
DECLARE
v_position pls_integer;
... other variables
BEGIN
sql code and function calls;
END
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE finddate
(column1 varchar2(11), column2 varchar2(39))
;
INSERT ALL
INTO finddate (column1, column2)
VALUES ('row1', '1234567 242424 2015.06.07. 12125235')
INTO finddate (column1, column2)
VALUES ('string2', '1234567 242424 2015.06.07. 12125235')
SELECT * FROM dual
;
Query 1:
select instr(column2,'.',1) from finddate
where column1 = 'string2'
select substr(column2,(20-4),10) from finddate
select to_date('2015.06.07','YYYY.MM.DD') from finddate
Results:
| TO_DATE('2015.06.07','YYYY.MM.DD') |
|------------------------------------|
| June, 07 2015 00:00:00 |
| June, 07 2015 00:00:00 |
Here's a way using regexp_replace() that should work with 10g, assuming the format of the lines will be the same:
with tbl(col_string) as
(
select 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235'
from dual
)
select regexp_replace(col_string, '^.*(\d{4}\.\d{2}\.\d{2})\. \d*$', '\1')
from tbl;
The regex can be read as:
^ - Match the start of the line
. - followed by any character
* - followed by 0 or more of the previous character (which is any character)
( - Start a remembered group
\d{4}\.\d{2}\.\d{2} - 4 digits followed by a literal period followed by 2 digits, etc
) - End the first remembered group
\. - followed by a literal period
- followed by a space
\d* - followed by any number of digits
$ - followed by the end of the line
regexp_replace then replaces all that with the first remembered group (\1).
Basically describe the whole line as a regular expression, group around what you want to return. You will most likely need to tweak the regex for the end of the line if it could be other characters than digits but this should give you an idea.
For the sake of argument this works too ONLY IF there are 2 occurrences of the date pattern:
with tbl(col_string) as
(
select 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235' from dual
)
select regexp_substr(col_string, '\d{4}\.\d{2}\.\d{2}', 1, 2)
from tbl;
returns the second occurrence of the pattern. I expect the above regexp_replace more accurately describes the solution.

SQL change date formats inside a string

I would like to convert a string containing dates in SQL select from Oracle 11g database.
Original string (CLOB) example:
"1.12.2011 - event 1
2.2.2012 - event 2
13.3.2012 - event 44"
Desired output:
"20111201 - event 1
20120202 - event 2
20120313 - event 44"
Is there a better (faster) way than using 4 separate replacements?
regexp_replace(regexp_replace(regexp_replace(regexp_replace(my_string,
'(\d\d)\.(\d\d)\.(20\d\d)', '\3\2\1'),
'(\d\d)\.(\d)\.(20\d\d)', '\30\2\1'),
'(\d)\.(\d\d)\.(20\d\d)', '\3\20\1'),
'(\d)\.(\d)\.(20\d\d)', '\30\20\1')
Especially if you're using clobs you have to be careful unless you're certain of the data in there.
However, if your clob only looks like that then you need threeregexp_replace in order for this to work; it'll also be much more dynamic. Just explicitly specify digits using [[:digit:]] then specify a minimum and maximum number of times these digits could be there using {1,2}.
Then the following would work:
select regexp_replace(
regexp_replace(
regexp_replace( my_string
, '([[:digit:]]{1,2})\.([[:digit:]]{1,2})\.(20[[:digit:]]{2})'
, '\3-\2-\1')
, '-([[:digit:]]{1}(-|$))'
, '0\1' )
, ('-')
, '')
from dual
This means:
match ( group 1 ) 1 or 2 digits
match a full stop.
match ( group 2 ) 1 or 2 digits
match a full stop
match ( group 3 ) 20 + 2 digits.
Then take out only groups 1, 2 and 3, i.e. ignoring the full stops and return then in the order 3, 2, 1 padded with a hyphen
Then replace any [digit] that is followed by either a hyphen or the end of the string, i.e. the number of digits is only 1 with -0[digit].
Lastly replace all the hyphens.
Separately from that I agree with tbone. It would make a lot more sense to store this data in a separate table (event_id number, event_date date). Any string transformations are easy with no chance of getting it wrong, unlike in this situation, and the data is easy to query and compare.
there are no better options (both correct and readable) with better performance - or if there are, no one cares..
i prefer a 2-level regexp_replace for date part:
select regexp_replace(
regexp_replace( my_string,
'([[:digit:]]{1,2})\.([[:digit:]]{1,2})\.(20[[:digit:]]{2})',
'\3-0\2-0\1' ),
'(20[[:digit:]]{2})-0?([[:digit:]]{2})-0?([[:digit:]]{2})',
'\3\2\1' )
from dual;
Demo
Maybe try doing:
select to_char(to_date('13.3.2011', 'DD.MM.YYYY'),'YYYYMMDD') from dual;