Convert a sequence of 0s and 1s to a print-style page list - sql

I need to convert a string of 0s and 1s into a sequence of integers representing the 1s, similar to a page selection sequence in a print dialog.
e.g. '0011001110101' -> '3-4,7-9,11,13'
Is it possible to do this in a single SQL select (in Oracle 11g)?
I can get an individual list of the page numbers with the following:
with data as (
select 'K1' KEY, '0011001110101' VAL from dual
union select 'K2', '0101000110' from dual
union select 'K3', '011100011010' from dual
)
select
KEY,
listagg(ords.column_value, ',') within group (
order by ords.column_value
) PAGES
from
data
cross join (
table(cast(multiset(
select level
from dual
connect by level <= length(VAL)
) as sys.OdciNumberList)) ords
)
where
substr(VAL, ords.column_value, 1) = '1'
group by
KEY
But that doesn't do the grouping (e.g. returns "3,4,7,8,9,11,13" for the first value).
If I could assign a group number every time the value changes then I could use analytic functions to get the min and max for each group. I.e. if I could generate the following then I'd be set:
Key Page Val Group
K1 1 0 1
K1 2 0 1
K1 3 1 2
K1 4 1 2
K1 5 0 3
K1 6 0 3
K1 7 1 4
K1 8 1 4
K1 9 1 4
K1 10 0 5
K1 11 1 6
K1 12 0 7
K1 13 1 8
But I'm stuck on that.
Anyone have any ideas, or another approach to get this?

first of all let's level it:
select regexp_instr('0011001110101', '1+', 1, LEVEL) istr,
regexp_substr('0011001110101', '1+', 1, LEVEL) strlen
FROM dual
CONNECT BY regexp_substr('0011001110101', '1+', 1, LEVEL) is not null
then the rest is easy with listagg :
with data as
(
select 'K1' KEY, '0011001110101' VAL from dual
union select 'K2', '0101000110' from dual
union select 'K3', '011100011010' from dual
)
SELECT key,
(SELECT listagg(CASE
WHEN length(regexp_substr(val, '1+', 1, LEVEL)) = 1 THEN
to_char(regexp_instr(val, '1+', 1, LEVEL))
ELSE
regexp_instr(val, '1+', 1, LEVEL) || '-' ||
to_char(regexp_instr(val, '1+', 1, LEVEL) +
length(regexp_substr(val, '1+', 1, LEVEL)) - 1)
END,
' ,') within GROUP(ORDER BY regexp_instr(val, '1+', 1, LEVEL))
from dual
CONNECT BY regexp_substr(data.val, '1+', 1, LEVEL) IS NOT NULL) val
FROM data

Using a recursive sub-query factoring clause without regular expressions:
Oracle Setup:
CREATE TABLE data ( key, val ) AS
SELECT 'K1', '0011001110101' FROM DUAL UNION ALL
SELECT 'K2', '0101000110' FROM DUAL UNION ALL
SELECT 'K3', '011100011010' FROM DUAL UNION ALL
SELECT 'K4', '000000000000' FROM DUAL UNION ALL
SELECT 'K5', '000000000001' FROM DUAL;
Query:
WITH ranges ( key, val, pos, rng ) AS (
SELECT key,
val,
INSTR( val, '1', 1 ), -- Position of the first 1
NULL
FROM data
UNION ALL
SELECT key,
val,
INSTR( val, '1', INSTR( val, '0', pos ) ), -- Position of the next 1
rng || ',' || CASE
WHEN pos = LENGTH( val ) -- Single 1 at end-of-string
OR pos = INSTR( val, '0', pos ) - 1 -- 1 immediately followed by 0
THEN TO_CHAR( pos )
WHEN INSTR( val, '0', pos ) = 0 -- Multiple 1s until end-of-string
THEN pos || '-' || LENGTH( val )
ELSE pos || '-' || ( INSTR( val, '0', pos ) - 1 ) -- Normal range
END
FROM ranges
WHERE pos > 0
)
SELECT KEY,
VAL,
SUBSTR( rng, 2 ) AS rng -- Strip the leading comma
FROM ranges
WHERE pos = 0 OR val IS NULL
ORDER BY KEY;
Output
KEY VAL RNG
--- ------------- -------------
K1 0011001110101 3-4,7-9,11,13
K2 0101000110 2,4,8-9
K3 011100011010 2-4,8-9,11
K4 000000000000
K5 000000000001 12

Here is a slightly more efficient version of Isalamon's solution (using a hierarchical query). It is slightly more efficient because I use a single hierarchical query instead of multiple ones (in correlated subqueries), and I calculate the length of each sequence of 1's just once, in the inner query. (In fact it is calculated only once anyway, but the function call itself has some overhead.)
This version also treats inputs like '00000' and NULL correctly. Isalamon's solution doesn't, and MT0's solution does not return a row when the input value is NULL. It is not clear if NULL is even possible in the input data, and if it is, what the desired result is; I assumed a row should be returned, with the page_list NULL as well.
Optimizer cost for this version is 17, vs. 18 for Isalamon's solution and 33 for MT0's. However, optimizer cost doesn't take into account the significantly slower processing of regular expressions compared to standard string functions; if speed of execution is important, MT0's solution should definitely be tried since it may prove faster.
with data ( key, val ) as (
select 'K1', '0011001110101' from dual union all
select 'K2', '0101000110' from dual union all
select 'K3', '011100011010' from dual union all
select 'K4', '000000000000' from dual union all
select 'K5', '000000000001' from dual union all
select 'K6', null from dual union all
select 'K7', '1111111' from dual union all
select 'K8', '1' from dual
)
-- End of test data (not part of the solution); SQL query begins below this line.
select key, val,
listagg(case when len = 1 then to_char(s_pos)
when len > 1 then to_char(s_pos) || '-' || to_char(s_pos + len - 1)
end, ',') within group (order by lvl) as page_list
from ( select key, level as lvl, val,
regexp_instr(val, '1+', 1, level) as s_pos,
length(regexp_substr(val, '1+', 1, level)) as len
from data
connect by regexp_substr(val, '1+', 1, level) is not null
and prior key = key
and prior sys_guid() is not null
)
group by key, val
order by key
;
Output:
KEY VAL PAGE_LIST
--- ------------- -------------
K1 0011001110101 3-4,7-9,11,13
K2 0101000110 2,4,8-9
K3 011100011010 2-4,8-9,11
K4 000000000000
K5 000000000001 12
K6
K7 1111111 1-7
K8 1 1

Related

Convert a series of Number values in Text in Oracle SQL Query

In the Oracle database, I have string values (VARCHAR2) like 1,4,7,8. The number represents as 1=car, 2= bus, 3=BB, 4=SB, 5=Ba, 6=PA, 7=HB, and 8 =G
and want to convert the above-said example to "car,SB,HB,G" in my query results
I tried to use "Decode" but it does not work. Please advise how to make it works. Would appreciate.
Thanks`
Initially, I have used the following query:
Select Clientid as C#, vehicletypeExclusions as vehicle from
clients
The sample of outcomes are:
C# Vehicle
20 1,19,20,23,24,7,5
22 1,19,20,23,24,7,5
I also tried the following that gives me the null value of vehicles:
Select Clientid as C#, Decode (VEHICLETYPEEXCLUSIONS, '1', 'car',
'3','bus', '5','ba' ,'7','HB', '8','G'
, '9','LED1102', '10','LED1104', '13','LED8-2',
'14','Flip4-12', '17','StAT1003', '19','Taxi-Min', '20','Tax_Sed',
'21','Sup-veh' , '22','T-DATS', '23','T-Mini',
'24','T-WAM') as vehicle_Ex from clients >
Here's one option. Read comments within code. Sample data in lines #1 - 13; query begins at line #14.
SQL> with
2 expl (id, name) as
3 (select 1, 'car' from dual union all
4 select 2, 'bus' from dual union all
5 select 3, 'BB' from dual union all
6 select 4, 'SB' from dual union all
7 select 5, 'Ba' from dual union all
8 select 6, 'PA' from dual union all
9 select 7, 'HB' from dual union all
10 select 8, 'G' from dual
11 ),
12 temp (col) as
13 (select '1,4,7,8' from dual),
14 -- split COL to rows
15 spl as
16 (select regexp_substr(col, '[^,]+', 1, level) val,
17 level lvl
18 from temp
19 connect by level <= regexp_count(col, ',') + 1
20 )
21 -- join SPL with EXPL; aggregate the result
22 select listagg(e.name, ',') within group (order by s.lvl) result
23 from expl e join spl s on s.val = e.id;
RESULT
--------------------------------------------------------------------------------
car,SB,HB,G
SQL>
Using the function f_subst from https://stackoverflow.com/a/68537479/429100 :
create or replace
function f_subst(str varchar2, template varchar2, subst sys.odcivarchar2list) return varchar2
as
res varchar2(32767):=str;
begin
for i in 1..subst.count loop
res:=replace(res, replace(template,'%d',i), subst(i));
end loop;
return res;
end;
/
I've replaced ora_name_list_t (nested table) with sys.odcivarchar2list (varray) to make this example easier, but I would suggest to create your own collection for example create type varchar2_table as table of varchar2(4000);
Example:
select
f_subst(
'1,4,7,8'
,'%d'
,sys.odcivarchar2list('car','bus','BB','SB','Ba','PA','HB','G')
) s
from dual;
S
----------------------------------------
car,SB,HB,G
Assume you have a lookup table (associating the numeric codes with descriptions) and a table of input strings, which I called sample_inputs in my tests, as shown below:
create table lookup (code, descr) as
select 1, 'car' from dual union all
select 2, 'bus' from dual union all
select 3, 'BB' from dual union all
select 4, 'SB' from dual union all
select 5, 'Ba' from dual union all
select 6, 'PA' from dual union all
select 7, 'HB' from dual union all
select 8, 'G' from dual
;
create table sample_inputs (str) as
select '1,4,7,8' from dual union all
select null from dual union all
select '3' from dual union all
select '5,5,5' from dual union all
select '6,2,8' from dual
;
One strategy for solving your problem is to split the input - slightly modified to make it a JSON array, so that we can use json_table to split it - then join to the lookup table and re-aggregate.
select s.str, l.descr_list
from sample_inputs s cross join lateral
( select listagg(descr, ',') within group (order by ord) as descr_list
from json_table( '[' || str || ']', '$[*]'
columns code number path '$', ord for ordinality)
join lookup l using (code)
) l
;
STR DESCR_LIST
------- ------------------------------
1,4,7,8 car,SB,HB,G
3 BB
5,5,5 Ba,Ba,Ba
6,2,8 PA,bus,G

Update based on the comma seperated value

I have the below Table with two columns both columns are VARCHAR2(100).
PARAM_NAME PARAM_VALUE
PlanName,DemandMonth EUMOCP,01-2022
PlanName,DemandMonth EUMOCP,02-2022
PlanName,DemandMonth EUMOCP,03-2022
PlanName,DemandMonth EUMOCP,04-2021
How can we write a update on the table so that it only updates the corresponding value.
For example:
Update DemandMonth from 01-2022 to 04-2022.
Provided it only updates the columns based on the first column
For instance,
Column A Column B
1,2 3,4
based on 1 we can update 3 as it is before ',' similarly based on 2 we can update 4.
What we want to achieve is the first it identifies where is 'DemandMonth' and then accordingly update the second column. Also if possible can we write it for 4 or 5 comma seperated values?
Don't store values in delimited strings.
Change your table so the values are:
CREATE TABLE params ( id, param_name, param_value ) AS
SELECT 1, 'PlanName', 'EUMOCP' FROM DUAL UNION ALL
SELECT 1, 'DemandMonth', '01-2022' FROM DUAL UNION ALL
SELECT 2, 'PlanName', 'EUMOCP' FROM DUAL UNION ALL
SELECT 2, 'DemandMonth', '02-2022' FROM DUAL UNION ALL
SELECT 3, 'PlanName', 'EUMOCP' FROM DUAL UNION ALL
SELECT 3, 'DemandMonth', '03-2022' FROM DUAL UNION ALL
SELECT 4, 'PlanName', 'EUMOCP' FROM DUAL UNION ALL
SELECT 4, 'DemandMonth', '04-2021' FROM DUAL;
Then all you need to do to update the value is:
UPDATE params
SET param_value = '04-2022'
WHERE param_name = 'DemandMonth'
AND param_value = '01-2022';
There is no worrying about where in the delimited string the value is and it is all simple.
You should not do this and should refactor your table to not use delimited strings... however, you can use:
MERGE INTO params dst
USING (
WITH items ( rid, param_names, param_values, name, value, lvl, max_lvl ) AS (
SELECT ROWID,
param_name,
param_value,
REGEXP_SUBSTR( param_name, '[^,]+', 1, 1 ),
REGEXP_SUBSTR( param_value, '[^,]+', 1, 1 ),
1,
REGEXP_COUNT( param_value, '[^,]+' )
FROM params
UNION ALL
SELECT rid,
param_names,
param_values,
REGEXP_SUBSTR( param_names, '[^,]+', 1, lvl + 1 ),
REGEXP_SUBSTR( param_values, '[^,]+', 1, lvl + 1 ),
lvl + 1,
max_lvl
FROM items
WHERE lvl < max_lvl
)
SELECT rid,
LISTAGG(
CASE
WHEN name = 'DemandMonth' AND value = '01-2022'
THEN '04-2022'
ELSE value
END,
','
) WITHIN GROUP ( ORDER BY lvl ) AS param_value
FROM items
GROUP BY rid
HAVING COUNT(
CASE
WHEN name = 'DemandMonth' AND value = '01-2022'
THEN 1
END
) > 0
) src
ON ( dst.ROWID = src.rid )
WHEN MATCHED THEN
UPDATE SET param_value = src.param_value;
Which, for the sample data:
CREATE TABLE params ( param_name, param_value ) AS
SELECT 'PlanName,DemandMonth', 'EUMOCP,01-2022' FROM DUAL UNION ALL
SELECT 'PlanName,DemandMonth', 'EUMOCP,02-2022' FROM DUAL UNION ALL
SELECT 'PlanName,DemandMonth', 'EUMOCP,03-2022' FROM DUAL UNION ALL
SELECT 'PlanName,DemandMonth', 'EUMOCP,04-2021' FROM DUAL;
Then:
SELECT * FROM params;
Outputs:
PARAM_NAME
PARAM_VALUE
PlanName,DemandMonth
EUMOCP,04-2022
PlanName,DemandMonth
EUMOCP,02-2022
PlanName,DemandMonth
EUMOCP,03-2022
PlanName,DemandMonth
EUMOCP,04-2021
db<>fiddle here

Oracle : replace string of options based on data set - is this possible?

I have column in table looking like this:
PATTERN
{([option1]+[option2])*([option3]+[option4])}
{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}
{[option1]+[option6]}
{([option1]+[option2])*([option8]+[option9])}
{([option1]+[option2])*[option4]}
{[option10]}
Every option has a number of value.
There is a table - let's call it option_set and records look like
OPTION VALUE
option1 3653265
option2 26452
option3 73552
option3 100
option4 1235
option5 42565
option6 2330
option7 544
option9 2150
I want to replace option name to number in 1st table, if exists of course, if not exists then =0.
I have done this in PLSQL (get the pattern, go through every option, and if exists - regexp_replace),
but I am wondering if this could be done in SQL??
My goal is to replace values for all patterns for current OPTION_SET and get only records, where all equations would be greater than 0. Of course - I couldn't run this equation in SQL, so I think of something like
for rec in
(
SELECT...
)
loop
execute immediate '...';
if above_equation > 0 then ..
end loop;
Any ideas would be appreciated
You can do a loop-like query in SQL with the recursive CTE, replacing new token on each iteration, so this will let you to replace all the tokens.
The only way I know to execute a dynamic query inside SQL statement in Oracle is DBMS_XMLGEN package, so you can evaluate the expression and filter by the result value without PL/SQL. But all this is viable for low cardinality tables with patterns and options.
Here's the code:
with a as (
select 1 as id, '{([option1]+[option2])*([option3]+[option4])}' as pattern from dual union all
select 2 as id, '{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}' as pattern from dual union all
select 3 as id, '{[option1]+[option6]}' as pattern from dual union all
select 4 as id, '{([option1]+[option2])*([option8]+[option9])}' as pattern from dual union all
select 5 as id, '{([option1]+[option2])*[option4]}' as pattern from dual union all
select 6 as id, '{[option10]}]' as pattern from dual
)
, opt as (
select 'option1' as opt, 3653265 as val from dual union all
select 'option2' as opt, 26452 as val from dual union all
select 'option3' as opt, 73552 as val from dual union all
select 'option3' as opt, 100 as val from dual union all
select 'option4' as opt, 1235 as val from dual union all
select 'option5' as opt, 42565 as val from dual union all
select 'option6' as opt, 2330 as val from dual union all
select 'option7' as opt, 544 as val from dual union all
select 'option9' as opt, 2150 as val from dual
)
, opt_ordered as (
/*Order options to iterate over*/
select opt.*, row_number() over(order by 1) as rn
from opt
)
, rec (id, pattern, repl_pattern, lvl) as (
select
id,
pattern,
pattern as repl_pattern,
0 as lvl
from a
union all
select
r.id,
r.pattern,
/*Replace each part at new step*/
replace(r.repl_pattern, '[' || o.opt || ']', o.val),
r.lvl + 1
from rec r
join opt_ordered o
on r.lvl + 1 = o.rn
)
, out_prepared as (
select
rec.*,
case
when instr(repl_pattern, '[') = 0
/*When there's no more not parsed expressions, then we can try to evaluate them*/
then dbms_xmlgen.getxmltype(
'select ' || replace(replace(repl_pattern, '{', ''), '}', '')
|| ' as v from dual'
)
/*Otherwise SQL statement will fail*/
end as parsed_expr
from rec
/*Retrieve the last step*/
where lvl = (select max(rn) from opt_ordered)
)
select
id,
pattern,
repl_pattern,
extractvalue(parsed_expr, '/ROWSET/ROW/V') as calculated_value
from out_prepared o
where extractvalue(parsed_expr, '/ROWSET/ROW/V') > 0
ID | PATTERN | REPL_PATTERN | CALCULATED_VALUE
-: | :------------------------------------------------------------------ | :---------------------------------------- | :---------------
1 | {([option1]+[option2])*([option3]+[option4])} | {(3653265+26452)*(73552+1235)} | 275194995279
2 | {([option1]+[option2])*([option3]+[option4])*([option6]+[option7])} | {(3653265+26452)*(73552+1235)*(2330+544)} | 790910416431846
3 | {[option1]+[option6]} | {3653265+2330} | 3655595
5 | {([option1]+[option2])*[option4]} | {(3653265+26452)*1235} | 4544450495
db<>fiddle here
Here is one way to do this. There's a lot to unpack, so hang on tight.
I include the test data in the with clause. Of course, you won't need that; simply remove the two "tables" and use your actual table and column names in the query.
From Oracle 12.1 on, we can define PL/SQL functions directly in the with clause, right at the top; if we do so, the query must be terminated with a slash (/) instead of the usual semicolon (;). If your version is earlier than 12.1, you can define the function separately. The function I use takes an "arithmetic expression" (a string representing a compound arithmetic operation) and returns its value as a number. It uses native dynamic SQL (the "execute immediate" statement), which will cause the query to be relatively slow, as a different cursor is parsed for each row. If speed becomes an issue, this can be changed, to use a bind variable (so that the cursor is parsed only once).
The recursive query in the with clause replaces each placeholder with the corresponding value for the "options" table. I use 0 either if a "placeholder" doesn't have a corresponding option in the table, or if it does but the corresponding value is null. (Note that your sample data shows option3 twice; that makes no sense, and I removed one occurrence from my sample data.)
Instead of replacing one placeholder at a time, I took the opposite approach; assuming the patterns may be long, but the number of "options" is small, this should be more efficient. Namely: at each step, I replace ALL occurrences of '[optionN]' (for a given N) in a single pass. Outside the recursive query, I replace all the placeholders for "non-existent" options with 0.
Note that recursive with clause requires Oracle 11.2. If your version is even earlier than that (although it shouldn't be), there are other ways; you would likely need to do that in PL/SQL also.
So, here it is - a single SELECT query for the whole thing:
with
function expr_eval(pattern varchar2) return number as
x number;
begin
execute immediate 'select ' || pattern || ' from dual' into x;
return x;
end;
p (id, pattern) as (
select 1, '{([option1]+[option2])*([option3]+[option4])}' from dual union all
select 2, '{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}' from dual union all
select 3, '{[option1]+[option6]}' from dual union all
select 4, '{([option1]+[option2])*([option8]+[option9])}' from dual union all
select 5, '{([option1]+[option2])*[option4]}' from dual union all
select 6, '{[option10]}' from dual union all
select 7, '{[option2]/([option3]+[option8])-(300-[option2])/(0.1 *[option3])}' from dual
)
, o (opt, val) as (
select 'option1', 3653265 from dual union all
select 'option2', 26452 from dual union all
select 'option3', 100 from dual union all
select 'option4', 1235 from dual union all
select 'option5', 42565 from dual union all
select 'option6', 2330 from dual union all
select 'option7', 544 from dual union all
select 'option9', 2150 from dual
)
, n (opt, val, rn, ct) as (
select opt, val, rownum, count(*) over ()
from o
)
, r (id, pattern, rn, ct) as (
select id, substr(pattern, 2, length(pattern) - 2), 1, null
from p
union all
select r.id, replace(r.pattern, '[' || n.opt || ']', nvl(to_char(n.val), 0)),
r.rn + 1, n.ct
from r join n on r.rn = n.rn
)
, ae (id, pattern) as (
select id, regexp_replace(pattern, '\[[^]]*]', '0')
from r
where rn = ct + 1
)
select id, expr_eval(pattern) as result
from ae
order by id
/
Output:
ID RESULT
---- ---------------
1 4912422195
2 14118301388430
3 3655595
4 7911391550
5 4544450495
6 0
7 2879.72

Finding out the highest number in a comma separated string using Oracle SQL

I have a table with two columns:
OLD_REVISIONS |NEW_REVISIONS
-----------------------------------
1,25,26,24 |1,26,24,25
1,56,55,54 |1,55,54
1 |1
1,2 |1
1,96,95,94 |1,96,94,95
1 |1
1 |1
1 |1
1 |1
1,2 |1,2
1 |1
1 |1
1 |1
1 |1
For each row there will be a list of revisions for a document (comma separated)
The comma separated list might be the same in both columns but the order/sort might be different - e.g.
2,1 |1,2
I would like to find all the instances where the highest revision in the OLD_REVISIONS column is lower than than the highest revision in NEW_REVISIONS
The following would fit that criteria
OLD_REVISIONS |NEW_REVISIONS
-----------------------------------
1,2 |1
1,56,55,54 |1,55,54
I tried a solution using the MINUS option (joining the table to itself) but it returns differences even for when the list is the same but in the wrong order
I tried the function GREATEST (i.e where greatest(new_Revisions) < greatest(old_revisions)) but i am not sure why greatest(OLD_REVISIONS) always just returns the comma separated value. It does not return the max value. I suspect it is comparing strings because the columns are VARCHAR.
Also, MAX function expects a single number.
Is there another way i can achieve the above? I am looking for a pure SQL option so i can print out the results (or a PL/SQL option that can print out the results)
Edit
Apologies for not mentioning this but for the NEW_REVISIONS i do actually have the data in a table where each revision is in a separate row:
"DOCNUMBER" "REVISIONNUMBER"
67 1
67 24
67 25
67 26
75 1
75 54
75 55
75 56
78 1
79 1
79 2
83 1
83 96
83 94
Just to give some content, a few weeks ago i suspected that there are revisions disappearing.
To investigate this, i decided to take a count of all revisions for all documents and take a snapshot to compare later to see if revisions are indeed missing.
The snapshot that i took contained the following columns:
docnumber, count, revisions
The revisions were stored in a comma separated list using the listagg function.
The trouble i have now is the on live table, new revisions have been added so when i compare the main table and the snapshot using a MINUS i get a difference because
of the new revisions in the main table.
Even though in the actual table the revisions are individual rows, in the snapshot table i dont have the individual rows.
I am thinking the only way to recreate the snapshot in the same format and compare them find out if maximum revision in the main table is lower than the max revision in the snapshot table (hence why im trying to find out how to find out the max in a comma separated string)
Enjoy.
select xmlcast(xmlquery(('max((' || OLD_REVISIONS || '))') RETURNING CONTENT) as int) as OLD_REVISIONS_max
,xmlcast(xmlquery(('max((' || NEW_REVISIONS || '))') RETURNING CONTENT) as int) as NEW_REVISIONS_max
from t
;
Assuming your base table has an id column (versions of what?) - here is a solution based on splitting the rows.
Edit: If you like this solution, check out vkp's solution, which is better than mine. I explain why his solution is better in a Comment to his Answer.
with
t ( id, old_revisions, new_revisions ) as (
select 101, '1,25,26,24', '1,26,24,25' from dual union all
select 102, '1,56,55,54', '1,55,54' from dual union all
select 103, '1' , '1' from dual union all
select 104, '1,2' , '1' from dual union all
select 105, '1,96,95,94', '1,96,94,95' from dual union all
select 106, '1' , '1' from dual union all
select 107, '1' , '1' from dual union all
select 108, '1' , '1' from dual union all
select 109, '1' , '1' from dual union all
select 110, '1,2' , '1,2' from dual union all
select 111, '1' , '1' from dual union all
select 112, '1' , '1' from dual union all
select 113, '1' , '1' from dual union all
select 114, '1' , '1' from dual
)
-- END of TEST DATA; the actual solution (SQL query) begins below.
select id, old_revisions, new_revisions
from (
select id, old_revisions, new_revisions, 'old' as flag,
to_number(regexp_substr(old_revisions, '\d+', 1, level)) as rev_no
from t
connect by level <= regexp_count(old_revisions, ',') + 1
and prior id = id
and prior sys_guid() is not null
union all
select id, old_revisions, new_revisions, 'new' as flag,
to_number(regexp_substr(new_revisions, '\d+', 1, level)) as rev_no
from t
connect by level <= regexp_count(new_revisions, ',') + 1
and prior id = id
and prior sys_guid() is not null
)
group by id, old_revisions, new_revisions
having max(case when flag = 'old' then rev_no end) !=
max(case when flag = 'new' then rev_no end)
order by id -- ORDER BY is optional
;
ID OLD_REVISION NEW_REVISION
--- ------------ ------------
102 1,56,55,54 1,55,54
104 1,2 1
You can compare every value by putting together the revisions in the same order using listagg function.
SELECT listagg(o,',') WITHIN GROUP (ORDER BY o) old_revisions,
listagg(n,',') WITHIN GROUP (ORDER BY n) new_revisions
FROM (
SELECT DISTINCT rowid r,
regexp_substr(old_revisions, '[^,]+', 1, LEVEL) o,
regexp_substr(new_revisions, '[^,]+', 1, LEVEL) n
FROM table
WHERE regexp_substr(old_revisions, '[^,]+', 1, LEVEL) IS NOT NULL
CONNECT BY LEVEL<=(SELECT greatest(MAX(regexp_count(old_revisions,',')),MAX(regexp_count(new_revisions,',')))+1 c FROM table)
)
GROUP BY r
HAVING listagg(o,',') WITHIN GROUP (ORDER BY o)<>listagg(n,',') WITHIN GROUP (ORDER BY n);
This could be a way:
select
OLD_REVISIONS,
NEW_REVISIONS
from
REVISIONS t,
table(cast(multiset(
select level
from dual
connect by level <= length (regexp_replace(t.OLD_REVISIONS, '[^,]+')) + 1
) as sys.OdciNumberList
)
) levels_old,
table(cast(multiset(
select level
from dual
connect by level <= length (regexp_replace(t.NEW_REVISIONS, '[^,]+')) + 1
)as sys.OdciNumberList
)
) levels_new
group by t.ROWID,
OLD_REVISIONS,
NEW_REVISIONS
having max(to_number(trim(regexp_substr(t.OLD_REVISIONS, '[^,]+', 1, levels_old.column_value)))) >
max(to_number(trim(regexp_substr(t.new_REVISIONS, '[^,]+', 1, levels_new.column_value))))
This uses a double string split to pick the values from every field, and then simply finds the rows where the max values among the two collections match your requirement.
You should edit this by adding some unique key in the GROUP BYclause, or a rowid if you don't have any unique key on your table.
One way to do is to split the columns on comma separation using regexp_substr and checking if the max and min values are different.
Sample Demo
with rownums as (select t.*,row_number() over(order by old_revisions) rn from t)
select old_revisions,new_revisions
from rownums
where rn in (select rn
from rownums
group by rn
connect by regexp_substr(old_revisions, '[^,]+', 1, level) is not null
or regexp_substr(new_revisions, '[^,]+', 1, level) is not null
having max(cast(regexp_substr(old_revisions,'[^,]+', 1, level) as int))
<> max(cast(regexp_substr(new_revisions,'[^,]+', 1, level) as int))
)
Comments say normalise data. I agree but also I understand it may be not possible. I would try something like query below:
select greatest(val1, val2), t1.r from (
select max(val) val1, r from (
select regexp_substr(v1,'[^,]+', 1, level) val, rowid r from tab1
connect by regexp_substr(v1, '[^,]+', 1, level) is not null
) group by r) t1
inner join (
select max(val) val2, r from (
select regexp_substr(v2,'[^,]+', 1, level) val, rowid r from tab1
connect by regexp_substr(v2, '[^,]+', 1, level) is not null
) group by r) t2
on (t1.r = t2.r);
Tested on:
create table tab1 (v1 varchar2(100), v2 varchar2(100));
insert into tab1 values ('1,3,5','1,4,7');
insert into tab1 values ('1,3,5','1,2,9');
insert into tab1 values ('1,3,5','1,3,5');
insert into tab1 values ('1,3,5','1,4');
and seems to work fine. I left rowid for reference. I guess you have some id in table.
After your edit I would change query to:
select greatest(val1, val2), t1.r from (
select max(val) val1, r from (
select regexp_substr(v1,'[^,]+', 1, level) val, DOCNUMBER r from tab1
connect by regexp_substr(v1, '[^,]+', 1, level) is not null
) group by DOCNUMBER) t1
inner join (
select max(DOCNUMBER) val2, DOCNUMBER r from NEW_REVISIONS) t2
on (t1.r = t2.r);
You may write a PL/SQL function parsing the string and returning the maximal number
select max_num( '1,26,24,25') max_num from dual;
MAX_NUM
----------
26
The query ist than very simple:
select OLD_REVISIONS NEW_REVISIONS
from revs
where max_num(OLD_REVISIONS) < max_num(NEW_REVISIONS);
A prototyp function without validation and error handling
create or replace function max_num(str_in VARCHAR2) return NUMBER as
i number;
x varchar2(1);
n number := 0;
max_n number := 0;
pow number := 0;
begin
for i in 0.. length(str_in)-1 loop
x := substr(str_in,length(str_in)-i,1);
if x = ',' then
-- check max number
if n > max_n then
max_n := n;
end if;
-- reset
n := 0;
pow := 0;
else
n := n + to_number(x)*power(10,pow);
pow := pow +1;
end if;
end loop;
return(max_n);
end;
/

Remove duplicate values from comma separated string in Oracle

I need your help with the regexp_replace function. I have a table which has a column for concatenated string values which contain duplicates. How do I eliminate them?
Example:
Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha
I need the output to be
Ian,Beatty,Larry,Neesha
The duplicates are random and not in any particular order.
Update--
Here's how my table looks
ID Name1 Name2 Name3
1 a b c
1 c d a
2 d e a
2 c d b
I need one row per ID having distinct name1,name2,name3 in one row as a comma separated string.
ID Name
1 a,c,b,d,c
2 d,c,e,a,b
I have tried using listagg with distinct but I'm not able to remove the duplicates.
The easiest option I would go with -
SELECT ID, LISTAGG(NAME_LIST, ',')
FROM (SELECT ID, NAME1 NAME_LIST FROM DATA UNION
SELECT ID, NAME2 FROM DATA UNION
SELECT ID, NAME3 FROM DATA
)
GROUP BY ID;
Demo.
So, try this out...
([^,]+),(?=.*[A-Za-z],[] ]*\1)
I don't think you can do it just with regexp_replace if the repeated values are not next to each other. One approach is to split the values up, eliminate the duplicates, and then put them back together.
The common method to tokenize a delimited string is with regexp_substr and a connect by clause. Using a bind variable with your string to make the code a bit clearer:
var value varchar2(100);
exec :value := 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha';
select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null;
VALUE
------------------------------
Ian
Beatty
Larry
Neesha
Beatty
Neesha
Ian
Neesha
You can use that as a subquery (or CTE), get the distinct values from it, then reassemble it with listagg:
select listagg(value, ',') within group (order by value) as value
from (
select distinct value from (
select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null
)
);
VALUE
------------------------------
Beatty,Ian,Larry,Neesha
It's a bit more complicated if you're looking at multiple rows in a table as that confused the connect-by syntax, but you can use a non-determinisitic reference to avoid loops:
with t42 (id, value) as (
select 1, 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha' from dual
union all select 2, 'Mary,Joe,Mary,Frank,Joe' from dual
)
select id, listagg(value, ',') within group (order by value) as value
from (
select distinct id, value from (
select id, regexp_substr(value, '[^,]+', 1, level) as value
from t42
connect by regexp_substr(value, '[^,]+', 1, level) is not null
and id = prior id
and prior dbms_random.value is not null
)
)
group by id;
ID VALUE
---------- ------------------------------
1 Beatty,Ian,Larry,Neesha
2 Frank,Joe,Mary
Of course this wouldn't be necessary if you were storing relational data properly; having a delimited string in a column is not a good idea.
There is a way to find duplicates in this case, but it is a problem to remove them if there are more than one duplicated name within a string per id. Here is code that can deal with one duplicate per id.
Sample data:
WITH
tbl AS
(
Select 1 "ID", 'a' "NAME_1", 'b' "NAME_2", 'c' "NAME_3" From Dual Union All
Select 1 "ID", 'c' "NAME_1", 'd' "NAME_2", 'a' "NAME_3" From Dual Union All
Select 2 "ID", 'd' "NAME_1", 'e' "NAME_2", 'a' "NAME_3" From Dual Union All
Select 2 "ID", 'c' "NAME_1", 'd' "NAME_2", 'b' "NAME_3" From Dual
),
lists AS
(
Select 1 "ID", 'a,c,b,d,c' "NAME" From Dual Union All
Select 2 "ID", 'd,c,e,a,b' "NAME" From Dual
),
Creating CTE that compares your LISTAGG sttring with original data finding duplicate values:
grid AS
(
Select DISTINCT l.ID, l.NAME,
CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_1 || ',', '')) ) / Length(t.NAME_1 || ',') > 1 THEN NAME_1 END "NAME_1",
CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_2 || ',', '')) ) / Length(t.NAME_2 || ',') > 1 THEN NAME_2 END "NAME_2",
CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_3 || ',', '')) ) / Length(t.NAME_3 || ',') > 1 THEN NAME_3 END "NAME_3"
From
lists l
Inner Join
tbl t ON(t.ID = l.ID)
)
ID NAME NAME_1 NAME_2 NAME_3
---------- --------- ------ ------ ------
2 d,c,e,a,b
1 a,c,b,d,c c
1 a,c,b,d,c c
Main SQL, using Union, builds new string (removing second appearance) where the duplicate was found and then puts that new string after comparison with the old one.
SELECT DISTINCT l.ID, Nvl(g.NAME, l.NAME) NAME
FROM
lists l
LEFT JOIN
(
SELECT ID, CASE WHEN NAME_1 Is Not Null
THEN REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_1, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_1, 1, 2) + Length(NAME_1)), ',,', ','), NULL ) )
END "NAME"
FROM grid
WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
UNION ALL
SELECT ID, CASE WHEN NAME_2 Is Not Null
THEN REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_2, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_2, 1, 2) + Length(NAME_2)), ',,', ','), NULL ) )
END "NAME"
FROM grid
WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
UNION ALL
SELECT ID, CASE WHEN NAME_3 Is Not Null
THEN REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_3, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_3, 1, 2) + Length(NAME_3)), ',,', ','), NULL ) )
END "NAME"
FROM grid
WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
) g ON(g.ID = l.ID And Length(g.NAME) < Length(l.NAME))
R e s u l t :
ID NAME
---------- -------------
2 d,c,e,a,b
1 a,c,b,d
For multiple occurences within a string or for multiplicated different names there should be done some recursions or multiplied nestings to get it done...