Comma-delimited fields in a csv file in plsql - sql

I have
WHILE INSTR (l_buffer, ',', 1, l_col_no) != 0
which checks whether the l_buffer is comma delimited and enters the loop.
Now I have a file with values
CandidateNumber,rnumber,title,OrganizationCode,OrganizationName,JobCode,JobName
10223,1600003B,Admin Officer,00000004,"Org Land, Inc.",ORGA03,ORGA03 HR & Admin
In this file it is considering "Org Land, Inc." as two words because of , in between. Is there a way to treat this as one by using Instr or anything?

Horrible idea. If you are forced to use character-delimited strings, the least you should be able to require is that the delimiter be a character that is all but guaranteed not to appear in regular field values.
The problem you raised can be solved. I show below a solution - probably not close to the most efficient, but at least it shouldn't be difficult to follow the logic. I intentionally chose an example (the fifth string) to demonstrate how it can fail. I assumed any commas between a pair of double-quotes (an opening one and a closing one) should become "invisible" - treated as if they were not delimiters, but part of the field value. That breaks if a double-quote is used in a way different from the "usual" - see my sample string #5. It will also break on any other "natural" uses of comma (where they are not meant as a delimiter) - for example, what if you have a field with a value of $1,000.00? Now you need to "escape" that comma too. One could probably come up with at least ten more similar situations - are you going to code around all of them?
Now, for my own learning and practice, I pretended the ONLY way a comma may need to be "escaped" (to become invisible to the tokenization process) is if it is enclosed between an opening and a closing double-quote (determined simply by ordering: a double-quote with an odd count from the beginning of the string is an opening one, and a double-quote with an even count is a closing one). Here is the solution; test strings at the top, including a few to test proper treatment of nulls, and the output following immediately after.
Good luck!
with test_strings (r, s) as (
select 1, 'abdc, ronfn 0003, "ABC, Inc.", 9939' from dual union all
select 2, 'New Delhi' from dual union all
select 3, null from dual union all
select 4, ',' from dual union all
select 5, 'If needed, use double quote("), OK?' from dual
),
t (r, s) as (
select r, ',' || s || ',' from test_strings
),
ct (r, nc, nq) as (
select r, regexp_count(s, ','), regexp_count(s, '"') from t
),
c (r, pos) as (
select t.r, instr(t.s, ',', 1, level) from t join ct on t.r = ct.r
connect by level <= ct.nc and t.r = prior t.r and prior sys_guid() is not null
),
q (r, pos) as (
select t.r, instr(t.s, '"', 1, level) from t join ct on t.r = ct.r
connect by level <= ct.nq and t.r = prior t.r and prior sys_guid() is not null
),
p (r, pos_from, pos_to, rn) as (
select r, pos, lead(pos) over (partition by r order by pos),
row_number() over (partition by r order by pos) from c
where mod((select count(1) from q where q.r = c.r and q.pos != 0
and q.pos < c.pos), 2) = 0
)
select p.r as string_number, p.rn as token_number,
substr(t.s, p.pos_from + 1, p.pos_to - p.pos_from - 1)
from t join p on t.r = p.r
where p.pos_to is not null
order by string_number, token_number
;
Results:
STRING_NUMBER TOKEN_NUMBER TOKEN
------------- ------------ --------------------
1 1 abdc
1 2 ronfn 0003
1 3 "ABC, Inc."
1 4 9939
2 1 New Delhi
3 1
4 1
4 2
5 1 If needed
9 rows selected.

Use notepad++, And change all commas to ';'. Before it, You should use REGEXP to change all commas between double quotes for let's say '#'. Then ctrl+h -> ',' to ';' and '#' to ','.

Related

SQL: using regexp_substr ot regexp_extract, looking for the regex pattern that will only return the string between one character and a space

The row I am trying to parse from is a series of string values separated only by spaces. Sample below:
TX:123 SP:XapZNsyeS INST:456123
I need to use either regexp_substr or regexp_extract to return only values for the string that appears after "TX:" or "SP:", etc. So essentially an expression that only captures the string after a string (e.g. "TX:") and before a space (" ").
Here's one way to split on 2 delimiters. This works on Oracle 12c as you included the Oracle regexp-substr tag. Using a with statement, first set up the original data, then split on a space or the end of the line, then break into name-value pairs.
WITH tbl_original_data(ID, str) AS (
SELECT 1, 'TX:123 SP:XapZNsyeS INST:456123' FROM dual UNION ALL
SELECT 2, 'MI:321 SP:MfeKLgkrJ INST:654321' FROM dual
),
tbl_split_on_space(ID, ELEMENT) AS (
SELECT ID,
REGEXP_SUBSTR(str, '(.*?)( |$)', 1, LEVEL, NULL, 1)
FROM tbl_original_data
CONNECT BY REGEXP_SUBSTR(str, '(.*?)( |$)', 1, LEVEL) IS NOT NULL
AND PRIOR ID = ID
AND PRIOR SYS_GUID() IS NOT NULL
)
--SELECT * FROM tbl_split_on_space;
SELECT ID,
REGEXP_REPLACE(ELEMENT, '^(.*):.*', '\1') NAME,
REGEXP_REPLACE(ELEMENT, '.*:(.*)$', '\1') VALUE
FROM tbl_split_on_space;
ID NAME VALUE
---------- ---------- ----------
1 TX 123
1 SP XapZNsyeS
1 INST 456123
2 MI 321
2 SP MfeKLgkrJ
2 INST 654321
6 rows selected.
EDIT: Realizing this answer is a little more than was asked for, here's a simplified answer to return one element. Don't forget to allow for the ending of a space or the end of the line as well, in case you element is at the end of the line.
WITH tbl_original_data(ID, str) AS (
SELECT 1, 'TX:123 SP:XapZNsyeS INST:456123' FROM dual
)
SELECT REGEXP_SUBSTR(str, '.*?TX:(.*)( |$)', 1, 1, NULL, 1) TX_VALUE
FROM tbl_original_data;
TX_VALUE
--------
123
1 row selected.

Match a pattern using some condition and replace it with some string using regexp_replace in Oracle

I have a requirement to identify a particular string in a text using below conditions:
*1. Any string containing whitespace before and after it OR
Any string containing a . (dot) as prefix and whitespace as suffix OR
Any string having whitespace as prefix and , (comma) as suffix*
Once found, i need to replace it with another string without replacing the prefix and suffix as mentioned above. And this needs to be done in a pl/sql code in Oracle (preferably using regexp_replace function).
Example:
Text : 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,'
Replace string: 'replaced'
Output: 'This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced'
I know this is a weird example, but the actual requirement is even more weird than this. Please guide me how to achieve this.
Thank you in advance.
Rather than trying to write one giant regex, it may be easier to split the string into rows for each token. You can do this by adapting any of the CSV-to-rows methods around. e.g.:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
select * from vals;
STR L
This, 1
is 2
a 3
sample_text, 4
which_needs_. 5
to_be_replaced 6
as 7
per, 8
the 9
matching. 10
criteria. 11
defined 12
above, 13
Now replace each of these according to your rules. You're only dealing with one token at a time, so it's easy to see which you're replacing correctly. This makes the regex easier to write and debug:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
select case
when l = 1 then str
when substr ( str, -1, 1 ) = '.' then
str
else
regexp_replace (
str,
'^[A-z_]+',
'replaced'
)
end replaced, l
from vals;
REPLACED L
This, 1
replaced 2
replaced 3
replaced, 4
which_needs_. 5
replaced 6
replaced 7
replaced, 8
replaced 9
matching. 10
criteria. 11
replaced 12
replaced, 13
Then you listagg the values back together to get the final string:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
), replaces as (
select case
when l = 1 then str
when substr ( str, -1, 1 ) = '.' then
str
else
regexp_replace (
str,
'[A-z_]+',
'replaced'
)
end replaced, l
from vals
)
select listagg ( replaced )
within group ( order by l ) s
from replaces;
S
This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced,
Ensure you test thoroughly! In my experience, you find more exceptions/refinements when you have complex rules like this. So it's likely you'll have to tinker with the replacement rules in the case expression.

Apply order by in comma separated string in oracle

I have one of the column in oracle table which has below value :
select csv_val from my_table where date='09-OCT-18';
output
==================
50,100,25,5000,1000
I want this values to be in ascending order with select query, output would looks like :
output
==================
25,50,100,1000,5000
I tried this link, but looks like it has some restriction on number of digits.
Here, I made you a modified version of the answer you linked to that can handle an arbitrary (hardcoded) number of commas. It's pretty heavy on CTEs. As with most LISTAGG answers, it'll have a 4000-char limit. I also changed your regexp to be able to handle null list entries, based on this answer.
WITH
T (N) AS --TEST DATA
(SELECT '50,100,25,5000,1000' FROM DUAL
UNION
SELECT '25464,89453,15686' FROM DUAL
UNION
SELECT '21561,68547,51612' FROM DUAL
),
nums (x) as -- arbitrary limit of 20, can be changed
(select level from dual connect by level <= 20),
splitstr (N, x, substring) as
(select N, x, regexp_substr(N, '(.*?)(,|$)', 1, x, NULL, 1)
from T
inner join nums on x <= 1 + regexp_count(N, ',')
order by N, x)
select N, listagg(substring, ',') within group (order by to_number(substring)) as sorted_N
from splitstr
group by N
;
Probably it can be improved, but eh...
Based on sample data you posted, relatively simple query would work (you need lines 3 - 7). If data doesn't really look like that, query might need adjustment.
SQL> with my_table (csv_val) as
2 (select '50,100,25,5000,1000' from dual)
3 select listagg(token, ',') within group (order by to_number(token)) result
4 from (select regexp_substr(csv_val, '[^,]+', 1, level) token
5 from my_table
6 connect by level <= regexp_count(csv_val, ',') + 1
7 );
RESULT
-------------------------
25,50,100,1000,5000
SQL>

Using Oracle REGEXP_SUBSTR to extract uppercase data separated by underscores

sample column data:
Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:
Error in CREATE_ACC_STATEMENT() [line 16]
I am looking for a way to extract only the uppercase words (table names) separated by underscores. I want the whole table name, the maximum is 3 underscores and the minimum is 1 underscore. I would like to ignore any capital letters that are initcap.
You can just use regexp_substr():
select regexp_substr(str, '[A-Z_]{3,}', 1, 1, 'c')
from (select 'Failure on table TOLL_USR_TRXN_HISTORY' as str from dual) x;
The pattern says to find substrings with capital letters or underscores, at least 3 characters long. The 1, 1 means start from the first position and return the first match. The 'c' makes the search case-sensitive.
You may use such a SQL Select statement for each substituted individual line
( Failure on table TOLL_USR_TRXN_HISTORY in the below case )
from your text :
select regexp_replace(q.word, '[^a-zA-Z0-9_]+', '') as word
from
(
select substr(str,nvl(lag(spc) over (order by lvl),1)+1*sign(lvl-1),
abs(decode(spc,0,length(str),spc)-nvl(lag(spc) over (order by lvl),1))) word,
nvl(lag(spc) over (order by lvl),1) lg
from
(
with tab as
( select 'Failure on table TOLL_USR_TRXN_HISTORY' str from dual )
select instr(str,' ',1,level) spc, str, level lvl
from tab
connect by level <= 10
)
) q
where lg > 0
and upper(regexp_replace(q.word, '[^a-zA-Z0-9_]+', ''))
= regexp_replace(q.word, '[^a-zA-Z0-9_]+', '')
and ( nvl(length(regexp_substr(q.word,'_',1,1)),0)
+ nvl(length(regexp_substr(q.word,'_',1,2)),0)
+ nvl(length(regexp_substr(q.word,'_',1,3)),0)) > 0
and nvl(length(regexp_substr(q.word,'_',1,4)),0) = 0;
Alternate way to get only table name from below error message , the below query will work only if table_name at end in the mentioned way
with t as( select 'Failure on table TOLL_USR_TRXN_HISTORY:' as data from dual)
SELECT RTRIM(substr(data,instr(data,' ',-1)+1),':') from t
New Query for all messages :
select replace (replace ( 'Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:' , 'Failure on table', ' ' ),':',' ') from dual

How to remove duplicates from space separated list by Oracle regexp_replace? [duplicate]

This question already has answers here:
How to remove duplicates from comma separated list by regexp_replace in Oracle?
(2 answers)
Closed 4 years ago.
I have a list called 'A B A A C D'. My expected result is 'A B C D'. So far from web I have found out
regexp_replace(l_user ,'([^,]+)(,[ ]*\1)+', '\1');
Expression. But this is for , separated list. What is the modification need to be done in order to make it space separated list. no need to consider the order.
If I understand well you don't simply need to replace ',' with a space, but also to remove duplicates in a smarter way.
If I modify that expression to work with space instead of ',', I get
select regexp_replace('A B A A C D' ,'([^ ]+)( [ ]*\1)+', '\1') from dual
which gives 'A B A C D', not what you need.
A way to get your needed result could be the following, a bit more complicated:
with string(s) as ( select 'A B A A C D' from dual)
select listagg(case when rn = 1 then str end, ' ') within group (order by lev)
from (
select str, row_number() over (partition by str order by 1) rn, lev
from (
SELECT trim(regexp_substr(s, '[^ ]+', 1, level)) str,
level as lev
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
)
My main problem here is that I'm not able to build a regexp that checks for non adjacent duplicates, so I need to split the string, check for duplicates and then aggregate again the non duplicated values, keeping the order.
If you don't mind the order of the tokens in the result string, this can be simplified:
with string(s) as ( select 'A B A A C D' from dual)
select listagg(str, ' ') within group (order by 1)
from (
SELECT distinct trim(regexp_substr(s, '[^ ]+', 1, level)) as str
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
Assuming you want to keep the component strings in the order of their first occurrence (and not, say, reorder them alphabetically - your example is poorly chosen in this regard, because both lead to the same result), the problem is more complicated, because you must keep track of order too. Then for each letter you must keep just the first occurrence - here is where row_number() helps.
with
inputs ( str ) as ( select 'A B A A C D' from dual)
-- end test data; solution begins below this line
select listagg(token, ' ') within group (order by id) as new_str
from (
select level as id, regexp_substr(str, '[^ ]+', 1, level) as token,
row_number() over (
partition by regexp_substr(str, '[^ ]+', 1, level)
order by level ) as rn
from inputs
connect by regexp_substr(str, '[^ ]+', 1, level) is not null
)
where rn = 1
;
Xquery?
select xmlquery('string-join(distinct-values(ora:tokenize(.," ")), " ")' passing 'A B A A C D' returning content) result from dual