Need help in converting the following string into a required format. I will have several values as below. Is there a easy way to do this using REGEXP or something better?
Current format coming from column A
Region[Envionment Lead|||OTC|||06340|||List Program|||TX|||Z3452|||Souther Region 05|||M7894|||California Divison|||Beginning]
Region[Coding Analyst|||BA|||04561|||Water Bridge|||CA|||M8459|||West Region 09|||K04956|||East Division|||Supreme]
Required Format of column A
Region[actingname=Envionment Lead,commonid=OTC,insturmentid=06340,commonname=List Program]
Region[actingname=Coding Analyst,commonid=BA,insturmentid=04561,commonname=Water Bridge]
revised data
**Column data**
Region[Coding Analyst|||BA|||reg pro|||04561|||08/16/2011|||Board member|||AZ|||06340|||Whiter Bridge|||CA|||M0673|||West Region 09|||K04956|||East Division|||Supreme]
**required Data**
{actingname=06340, actingid=M0673, insturmentid=BA, insturmentname=Coding Analyst, commonname=West Region 09, stdate=08/16/2011, linnumber=04561, linstate=CA, linname=Supreme}
The issue is getting the 10,11,12 and 15 position of the string. I can get anything below 10th position, but not 10 or more string position. Can you please guide me what i'm i missing here
'{actingname=\8,actingid=\11,insturmentid=\2,insturmentname=\1,commonname=\12, stdate=\5,linnumber=4,linstate=10,linname=15}'--Here 10,11,12 and 15 posistion are not being fethched
I used REGEXP_REPLACE
SELECT REGEXP_REPLACE(
'Region[Envionment Lead|||OTC|||06340|||List Program|||TX|||Z3452|||Souther Region 05|||M7894|||California Divison|||Beginning]',
'^Region\[([[:alpha:][:space:][:digit:]]*)\|\|\|([[:alpha:]]*)\|\|\|([[:digit:]]*)\|\|\|([[:alpha:][:space:][:digit:]]*).*',
'Region[actingname=\1,commonid=\2,instrumentid=\3,commonname=\4]') as replaced
FROM dual
or like an update it would be
UPDATE table1
SET col1 = REGEXP_REPLACE(
col1,
'^Region\[([[:alpha:][:space:][:digit:]]*)\|\|\|([[:alpha:]]*)\|\|\|([[:digit:]]*)\|\|\|([[:alpha:][:space:][:digit:]]*).*',
'Region[actingname=\1,commonid=\2,instrumentid=\3,commonname=\4]')
You can use regexp_substr and listagg consecutively
with t1(str1) as
(
select 'Region[Coding Analyst|||BA|||04561|||Water Bridge]' from dual
), t2(str2) as
(
select 'actingname,commonid,insturmentid,commonname' from dual
), t3 as
(
select regexp_substr(str1, '[^|||]+', 1, level) str1,
regexp_substr(str2, '[^,]+', 1, level)||'=' str2,
level as lvl
from t1
cross join t2
connect by level <= regexp_count(str1, '[^|||]+')
), t4 as
(
select case when lvl = 1 then
replace(str1,'[','['||str2)
else
str2||str1
end as str, lvl
from t3
)
select listagg(str,',') within group (order by lvl) as "Result String" from t4;
Result String
----------------------------------------------------------------------------------------
Region[actingname=Coding Analyst,commonid=BA,insturmentid=04561,commonname=Water Bridge]
P.S. I considered the second one as a sample, and took the 4 first string due to number of substrings seperated by triple-pipes due to the number of tuple labels ending with equality sign is 4.
Demo
this will work:
select substr(regexp_replace(regexp_replace(regexp_replace
(regexp_replace(regexp_replace("col1",'\[','[actingname='),
'\|\|\|',',commonid=',1,1,'i'),
'\|\|\|',',insturmentid=',1,1,'i'),
'\|\|\|',',commonname=',1,1,'i'),
'\|',']',1,1,'i'),
1,regexp_instr(regexp_replace(regexp_replace(regexp_replace
(regexp_replace(regexp_replace("col1",'\[','[actingname='),
'\|\|\|',',commonid=',1,1,'i'),
'\|\|\|',',insturmentid=',1,1,'i'),
'\|\|\|',',commonname=',1,1,'i'),
'\|',']',1,1,'i'),'\]')-1 )||']'
from Table1;
check:
http://sqlfiddle.com/#!4/3ddfa0/11
thanks!!!!!!
Related
Lets say the full String is
The following example examines the string, looking for the first substring bounded by comas
and the subString is
substing bounded
is there any way that I could check the full string if contains a 90% matching subString using sql
like the word substing bounded and substring bounded in my example
the subString could be a compound of more words so I can't split the full string into words .
First transform your text in a table of words. You'll find a lot as posts to this topic on SO, e.g. here
You'll have to adjust the list of delimiter characters to extract the words only.
This is a sample query
with t1 as (select 1 rn, 'The following example examines the string, looking for the first substring bounded by comas' col from dual ),
t2 as (select rownum colnum from dual connect by level < 16 /* (max) number of words */),
t3 as (select t1.rn, t2.colnum, rtrim(ltrim(regexp_substr(t1.col,'[^ ,]+', 1, t2.colnum))) col from t1, t2
where regexp_substr(t1.col, '[^ ,]+', 1, t2.colnum) is not null)
select * from t3;
COL
----------
The
following
example
examines
...
In the next step your the Levenshtein Distance to get the closes word.
with t1 as (select 1 rn, 'The following example examines the string, looking for the first substring bounded by comas' col from dual ),
t2 as (select rownum colnum from dual connect by level < 16 /* (max) number of words */),
t3 as (select t1.rn, t2.colnum, rtrim(ltrim(regexp_substr(t1.col,'[^ ,]+', 1, t2.colnum))) col from t1, t2
where regexp_substr(t1.col, '[^ ,]+', 1, t2.colnum) is not null)
select col, str, UTL_MATCH.EDIT_DISTANCE(col, str) distance
from t3
cross join (select 'commas' str from dual)
order by 3;
COL STR DISTANCE
---------- ------ ----------
comas commas 1
for commas 5
examines commas 6
...
Check the definition of the Levenshtein Distance and define a threshold on the distance to get your candidate words.
To match independent of the word boundary simple scan through your input and get all substring in a lenth of your match string adjusted for the diferentce e.g. adding some 10%.
You may limit the candidates by filtering such substrings only that start on the word boundary. The rest ist the same distance calculation.
with txt as (select 'The following example examines the string, looking for the first substring bounded by comas' txt from dual),
str as (select 'substing bounded' str from dual),
t1 as (select substr(txt, rownum, (select length(str) * 1.1 from str)) substr, /* add 10% length for the match */
(select str from str) str
from txt connect by level < (select length(txt) from txt) - (select length(str) from str))
select SUBSTR, STR,
UTL_MATCH.EDIT_DISTANCE(SUBSTR, STR) distance
from t1
order by 3;
SUBSTR STR DISTANCE
-------------------- ---------------- ----------
substring bounded substing bounded 1
ubstring bounded substing bounded 3
substring bounde substing bounded 3
t substring bound substing bounded 5
...
Experiment with the SOUNDEX function.
I haven't tested this but this might help you on your way:
WITH strings AS (
select regexp_substr('The following example examines the string, looking for the first substring bounded by comas','[ ]+', 1, level) ss
from dual
connect by regexp_substr('The following example examines the string, looking for the first substring bounded by comas', '[ ]+', 1, level) is not null
)
SELECT ss
FROM strings
WHERE SOUNDEX(ss) = SOUNDEX( 'commas' ) ;
The REGEXP_SUBSTR with CONNECT BY splits the long string into words (by space) - amend the delimited as required to include punctuation marks etc.
Here we are relying on the built-in SOUNDEX matching our expectations.
I have one of the column in oracle table which has below value :
select csv_val from my_table where date='09-OCT-18';
output
==================
50,100,25,5000,1000
I want this values to be in ascending order with select query, output would looks like :
output
==================
25,50,100,1000,5000
I tried this link, but looks like it has some restriction on number of digits.
Here, I made you a modified version of the answer you linked to that can handle an arbitrary (hardcoded) number of commas. It's pretty heavy on CTEs. As with most LISTAGG answers, it'll have a 4000-char limit. I also changed your regexp to be able to handle null list entries, based on this answer.
WITH
T (N) AS --TEST DATA
(SELECT '50,100,25,5000,1000' FROM DUAL
UNION
SELECT '25464,89453,15686' FROM DUAL
UNION
SELECT '21561,68547,51612' FROM DUAL
),
nums (x) as -- arbitrary limit of 20, can be changed
(select level from dual connect by level <= 20),
splitstr (N, x, substring) as
(select N, x, regexp_substr(N, '(.*?)(,|$)', 1, x, NULL, 1)
from T
inner join nums on x <= 1 + regexp_count(N, ',')
order by N, x)
select N, listagg(substring, ',') within group (order by to_number(substring)) as sorted_N
from splitstr
group by N
;
Probably it can be improved, but eh...
Based on sample data you posted, relatively simple query would work (you need lines 3 - 7). If data doesn't really look like that, query might need adjustment.
SQL> with my_table (csv_val) as
2 (select '50,100,25,5000,1000' from dual)
3 select listagg(token, ',') within group (order by to_number(token)) result
4 from (select regexp_substr(csv_val, '[^,]+', 1, level) token
5 from my_table
6 connect by level <= regexp_count(csv_val, ',') + 1
7 );
RESULT
-------------------------
25,50,100,1000,5000
SQL>
sample column data:
Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:
Error in CREATE_ACC_STATEMENT() [line 16]
I am looking for a way to extract only the uppercase words (table names) separated by underscores. I want the whole table name, the maximum is 3 underscores and the minimum is 1 underscore. I would like to ignore any capital letters that are initcap.
You can just use regexp_substr():
select regexp_substr(str, '[A-Z_]{3,}', 1, 1, 'c')
from (select 'Failure on table TOLL_USR_TRXN_HISTORY' as str from dual) x;
The pattern says to find substrings with capital letters or underscores, at least 3 characters long. The 1, 1 means start from the first position and return the first match. The 'c' makes the search case-sensitive.
You may use such a SQL Select statement for each substituted individual line
( Failure on table TOLL_USR_TRXN_HISTORY in the below case )
from your text :
select regexp_replace(q.word, '[^a-zA-Z0-9_]+', '') as word
from
(
select substr(str,nvl(lag(spc) over (order by lvl),1)+1*sign(lvl-1),
abs(decode(spc,0,length(str),spc)-nvl(lag(spc) over (order by lvl),1))) word,
nvl(lag(spc) over (order by lvl),1) lg
from
(
with tab as
( select 'Failure on table TOLL_USR_TRXN_HISTORY' str from dual )
select instr(str,' ',1,level) spc, str, level lvl
from tab
connect by level <= 10
)
) q
where lg > 0
and upper(regexp_replace(q.word, '[^a-zA-Z0-9_]+', ''))
= regexp_replace(q.word, '[^a-zA-Z0-9_]+', '')
and ( nvl(length(regexp_substr(q.word,'_',1,1)),0)
+ nvl(length(regexp_substr(q.word,'_',1,2)),0)
+ nvl(length(regexp_substr(q.word,'_',1,3)),0)) > 0
and nvl(length(regexp_substr(q.word,'_',1,4)),0) = 0;
Alternate way to get only table name from below error message , the below query will work only if table_name at end in the mentioned way
with t as( select 'Failure on table TOLL_USR_TRXN_HISTORY:' as data from dual)
SELECT RTRIM(substr(data,instr(data,' ',-1)+1),':') from t
New Query for all messages :
select replace (replace ( 'Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:' , 'Failure on table', ' ' ),':',' ') from dual
I have a scenario like when a column value exceeds the length of 10 characters, I need to take a sub-string for only 10 characters (left most) but if it is shorter than that it should be left padded with zeroes. I tried the following:
with data1 as (select '1234567890123' as dummy1 from dual)
select CASE when (length(dummy1)>10) then substr(dummy1,1,10) else lpad(dummy1,10,'0') end from data1;
But this seems to me quite a longer way to do. Is there any shorter way to achieve this, maybe an Oracle function?
I tried to Google this but could not find any relevant result.
lpad is enough to do the job :
SELECT LPAD( '1234567890123', 10, '0' ) AS formatted
FROM dual;
Just use SUBSTR and LPAD together:
WITH data ( value ) AS (
SELECT '1234567890123' FROM DUAL UNION ALL
SELECT '1' FROM DUAL
)
SELECT LPAD( SUBSTR( value, 1, 10 ), 10, '0' ) AS formatted
FROM data;
Output:
FORMATTED
----------
1234567890
0000000001
I have a string with groups of nubmers. And Id like to make constant length string. Now I use two regexp_replace. First to add 10 numbers to string and next to cut string and take last 10 values:
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace(
regexp_replace(txt, '(\d+)','0000000000\1')
,'\d+(\d{10})','\1') from s ;
But Id like to use only one regex something like
regexp_replace(txt, '(\d+)',lpad('\1',10,'0'))
But it don't work. lpad executed before regexp. Could you have any ideas?
With a slightly different approach, you can try the following:
with s(id, txt) as
(
select rownum, txt
from (
select '1030123:12031:1341' as txt from dual union all
select '1234:0123456789:1341' from dual
)
)
SELECT listagg(lpad(regexp_substr(s.txt, '[^:]+', 1, lines.column_value), 10, '0'), ':') within group (order by column_value) txt
FROM s,
TABLE (CAST (MULTISET
(SELECT LEVEL FROM dual CONNECT BY instr(s.txt, ':', 1, LEVEL - 1) > 0
) AS sys.odciNumberList )) lines
group by id
TXT
-----------------------------------
0001030123:0000012031:0000001341
0000001234:0123456789:0000001341
This uses the CONNECT BY to split every string based on the separator ':', then uses LPAD to pad to 10 and then aggregates the strings to build rows containing the concatenation of padded values
This works for non-empty sequences (e.g. 123::456)
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace (regexp_replace (txt,'(\d+)',lpad('0',10,'0') || '\1'),'0*(\d{10})','\1')
from s
;