Related
In the Oracle database, I have string values (VARCHAR2) like 1,4,7,8. The number represents as 1=car, 2= bus, 3=BB, 4=SB, 5=Ba, 6=PA, 7=HB, and 8 =G
and want to convert the above-said example to "car,SB,HB,G" in my query results
I tried to use "Decode" but it does not work. Please advise how to make it works. Would appreciate.
Thanks`
Initially, I have used the following query:
Select Clientid as C#, vehicletypeExclusions as vehicle from
clients
The sample of outcomes are:
C# Vehicle
20 1,19,20,23,24,7,5
22 1,19,20,23,24,7,5
I also tried the following that gives me the null value of vehicles:
Select Clientid as C#, Decode (VEHICLETYPEEXCLUSIONS, '1', 'car',
'3','bus', '5','ba' ,'7','HB', '8','G'
, '9','LED1102', '10','LED1104', '13','LED8-2',
'14','Flip4-12', '17','StAT1003', '19','Taxi-Min', '20','Tax_Sed',
'21','Sup-veh' , '22','T-DATS', '23','T-Mini',
'24','T-WAM') as vehicle_Ex from clients >
Here's one option. Read comments within code. Sample data in lines #1 - 13; query begins at line #14.
SQL> with
2 expl (id, name) as
3 (select 1, 'car' from dual union all
4 select 2, 'bus' from dual union all
5 select 3, 'BB' from dual union all
6 select 4, 'SB' from dual union all
7 select 5, 'Ba' from dual union all
8 select 6, 'PA' from dual union all
9 select 7, 'HB' from dual union all
10 select 8, 'G' from dual
11 ),
12 temp (col) as
13 (select '1,4,7,8' from dual),
14 -- split COL to rows
15 spl as
16 (select regexp_substr(col, '[^,]+', 1, level) val,
17 level lvl
18 from temp
19 connect by level <= regexp_count(col, ',') + 1
20 )
21 -- join SPL with EXPL; aggregate the result
22 select listagg(e.name, ',') within group (order by s.lvl) result
23 from expl e join spl s on s.val = e.id;
RESULT
--------------------------------------------------------------------------------
car,SB,HB,G
SQL>
Using the function f_subst from https://stackoverflow.com/a/68537479/429100 :
create or replace
function f_subst(str varchar2, template varchar2, subst sys.odcivarchar2list) return varchar2
as
res varchar2(32767):=str;
begin
for i in 1..subst.count loop
res:=replace(res, replace(template,'%d',i), subst(i));
end loop;
return res;
end;
/
I've replaced ora_name_list_t (nested table) with sys.odcivarchar2list (varray) to make this example easier, but I would suggest to create your own collection for example create type varchar2_table as table of varchar2(4000);
Example:
select
f_subst(
'1,4,7,8'
,'%d'
,sys.odcivarchar2list('car','bus','BB','SB','Ba','PA','HB','G')
) s
from dual;
S
----------------------------------------
car,SB,HB,G
Assume you have a lookup table (associating the numeric codes with descriptions) and a table of input strings, which I called sample_inputs in my tests, as shown below:
create table lookup (code, descr) as
select 1, 'car' from dual union all
select 2, 'bus' from dual union all
select 3, 'BB' from dual union all
select 4, 'SB' from dual union all
select 5, 'Ba' from dual union all
select 6, 'PA' from dual union all
select 7, 'HB' from dual union all
select 8, 'G' from dual
;
create table sample_inputs (str) as
select '1,4,7,8' from dual union all
select null from dual union all
select '3' from dual union all
select '5,5,5' from dual union all
select '6,2,8' from dual
;
One strategy for solving your problem is to split the input - slightly modified to make it a JSON array, so that we can use json_table to split it - then join to the lookup table and re-aggregate.
select s.str, l.descr_list
from sample_inputs s cross join lateral
( select listagg(descr, ',') within group (order by ord) as descr_list
from json_table( '[' || str || ']', '$[*]'
columns code number path '$', ord for ordinality)
join lookup l using (code)
) l
;
STR DESCR_LIST
------- ------------------------------
1,4,7,8 car,SB,HB,G
3 BB
5,5,5 Ba,Ba,Ba
6,2,8 PA,bus,G
I have column in table looking like this:
PATTERN
{([option1]+[option2])*([option3]+[option4])}
{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}
{[option1]+[option6]}
{([option1]+[option2])*([option8]+[option9])}
{([option1]+[option2])*[option4]}
{[option10]}
Every option has a number of value.
There is a table - let's call it option_set and records look like
OPTION VALUE
option1 3653265
option2 26452
option3 73552
option3 100
option4 1235
option5 42565
option6 2330
option7 544
option9 2150
I want to replace option name to number in 1st table, if exists of course, if not exists then =0.
I have done this in PLSQL (get the pattern, go through every option, and if exists - regexp_replace),
but I am wondering if this could be done in SQL??
My goal is to replace values for all patterns for current OPTION_SET and get only records, where all equations would be greater than 0. Of course - I couldn't run this equation in SQL, so I think of something like
for rec in
(
SELECT...
)
loop
execute immediate '...';
if above_equation > 0 then ..
end loop;
Any ideas would be appreciated
You can do a loop-like query in SQL with the recursive CTE, replacing new token on each iteration, so this will let you to replace all the tokens.
The only way I know to execute a dynamic query inside SQL statement in Oracle is DBMS_XMLGEN package, so you can evaluate the expression and filter by the result value without PL/SQL. But all this is viable for low cardinality tables with patterns and options.
Here's the code:
with a as (
select 1 as id, '{([option1]+[option2])*([option3]+[option4])}' as pattern from dual union all
select 2 as id, '{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}' as pattern from dual union all
select 3 as id, '{[option1]+[option6]}' as pattern from dual union all
select 4 as id, '{([option1]+[option2])*([option8]+[option9])}' as pattern from dual union all
select 5 as id, '{([option1]+[option2])*[option4]}' as pattern from dual union all
select 6 as id, '{[option10]}]' as pattern from dual
)
, opt as (
select 'option1' as opt, 3653265 as val from dual union all
select 'option2' as opt, 26452 as val from dual union all
select 'option3' as opt, 73552 as val from dual union all
select 'option3' as opt, 100 as val from dual union all
select 'option4' as opt, 1235 as val from dual union all
select 'option5' as opt, 42565 as val from dual union all
select 'option6' as opt, 2330 as val from dual union all
select 'option7' as opt, 544 as val from dual union all
select 'option9' as opt, 2150 as val from dual
)
, opt_ordered as (
/*Order options to iterate over*/
select opt.*, row_number() over(order by 1) as rn
from opt
)
, rec (id, pattern, repl_pattern, lvl) as (
select
id,
pattern,
pattern as repl_pattern,
0 as lvl
from a
union all
select
r.id,
r.pattern,
/*Replace each part at new step*/
replace(r.repl_pattern, '[' || o.opt || ']', o.val),
r.lvl + 1
from rec r
join opt_ordered o
on r.lvl + 1 = o.rn
)
, out_prepared as (
select
rec.*,
case
when instr(repl_pattern, '[') = 0
/*When there's no more not parsed expressions, then we can try to evaluate them*/
then dbms_xmlgen.getxmltype(
'select ' || replace(replace(repl_pattern, '{', ''), '}', '')
|| ' as v from dual'
)
/*Otherwise SQL statement will fail*/
end as parsed_expr
from rec
/*Retrieve the last step*/
where lvl = (select max(rn) from opt_ordered)
)
select
id,
pattern,
repl_pattern,
extractvalue(parsed_expr, '/ROWSET/ROW/V') as calculated_value
from out_prepared o
where extractvalue(parsed_expr, '/ROWSET/ROW/V') > 0
ID | PATTERN | REPL_PATTERN | CALCULATED_VALUE
-: | :------------------------------------------------------------------ | :---------------------------------------- | :---------------
1 | {([option1]+[option2])*([option3]+[option4])} | {(3653265+26452)*(73552+1235)} | 275194995279
2 | {([option1]+[option2])*([option3]+[option4])*([option6]+[option7])} | {(3653265+26452)*(73552+1235)*(2330+544)} | 790910416431846
3 | {[option1]+[option6]} | {3653265+2330} | 3655595
5 | {([option1]+[option2])*[option4]} | {(3653265+26452)*1235} | 4544450495
db<>fiddle here
Here is one way to do this. There's a lot to unpack, so hang on tight.
I include the test data in the with clause. Of course, you won't need that; simply remove the two "tables" and use your actual table and column names in the query.
From Oracle 12.1 on, we can define PL/SQL functions directly in the with clause, right at the top; if we do so, the query must be terminated with a slash (/) instead of the usual semicolon (;). If your version is earlier than 12.1, you can define the function separately. The function I use takes an "arithmetic expression" (a string representing a compound arithmetic operation) and returns its value as a number. It uses native dynamic SQL (the "execute immediate" statement), which will cause the query to be relatively slow, as a different cursor is parsed for each row. If speed becomes an issue, this can be changed, to use a bind variable (so that the cursor is parsed only once).
The recursive query in the with clause replaces each placeholder with the corresponding value for the "options" table. I use 0 either if a "placeholder" doesn't have a corresponding option in the table, or if it does but the corresponding value is null. (Note that your sample data shows option3 twice; that makes no sense, and I removed one occurrence from my sample data.)
Instead of replacing one placeholder at a time, I took the opposite approach; assuming the patterns may be long, but the number of "options" is small, this should be more efficient. Namely: at each step, I replace ALL occurrences of '[optionN]' (for a given N) in a single pass. Outside the recursive query, I replace all the placeholders for "non-existent" options with 0.
Note that recursive with clause requires Oracle 11.2. If your version is even earlier than that (although it shouldn't be), there are other ways; you would likely need to do that in PL/SQL also.
So, here it is - a single SELECT query for the whole thing:
with
function expr_eval(pattern varchar2) return number as
x number;
begin
execute immediate 'select ' || pattern || ' from dual' into x;
return x;
end;
p (id, pattern) as (
select 1, '{([option1]+[option2])*([option3]+[option4])}' from dual union all
select 2, '{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}' from dual union all
select 3, '{[option1]+[option6]}' from dual union all
select 4, '{([option1]+[option2])*([option8]+[option9])}' from dual union all
select 5, '{([option1]+[option2])*[option4]}' from dual union all
select 6, '{[option10]}' from dual union all
select 7, '{[option2]/([option3]+[option8])-(300-[option2])/(0.1 *[option3])}' from dual
)
, o (opt, val) as (
select 'option1', 3653265 from dual union all
select 'option2', 26452 from dual union all
select 'option3', 100 from dual union all
select 'option4', 1235 from dual union all
select 'option5', 42565 from dual union all
select 'option6', 2330 from dual union all
select 'option7', 544 from dual union all
select 'option9', 2150 from dual
)
, n (opt, val, rn, ct) as (
select opt, val, rownum, count(*) over ()
from o
)
, r (id, pattern, rn, ct) as (
select id, substr(pattern, 2, length(pattern) - 2), 1, null
from p
union all
select r.id, replace(r.pattern, '[' || n.opt || ']', nvl(to_char(n.val), 0)),
r.rn + 1, n.ct
from r join n on r.rn = n.rn
)
, ae (id, pattern) as (
select id, regexp_replace(pattern, '\[[^]]*]', '0')
from r
where rn = ct + 1
)
select id, expr_eval(pattern) as result
from ae
order by id
/
Output:
ID RESULT
---- ---------------
1 4912422195
2 14118301388430
3 3655595
4 7911391550
5 4544450495
6 0
7 2879.72
I need to convert a string of 0s and 1s into a sequence of integers representing the 1s, similar to a page selection sequence in a print dialog.
e.g. '0011001110101' -> '3-4,7-9,11,13'
Is it possible to do this in a single SQL select (in Oracle 11g)?
I can get an individual list of the page numbers with the following:
with data as (
select 'K1' KEY, '0011001110101' VAL from dual
union select 'K2', '0101000110' from dual
union select 'K3', '011100011010' from dual
)
select
KEY,
listagg(ords.column_value, ',') within group (
order by ords.column_value
) PAGES
from
data
cross join (
table(cast(multiset(
select level
from dual
connect by level <= length(VAL)
) as sys.OdciNumberList)) ords
)
where
substr(VAL, ords.column_value, 1) = '1'
group by
KEY
But that doesn't do the grouping (e.g. returns "3,4,7,8,9,11,13" for the first value).
If I could assign a group number every time the value changes then I could use analytic functions to get the min and max for each group. I.e. if I could generate the following then I'd be set:
Key Page Val Group
K1 1 0 1
K1 2 0 1
K1 3 1 2
K1 4 1 2
K1 5 0 3
K1 6 0 3
K1 7 1 4
K1 8 1 4
K1 9 1 4
K1 10 0 5
K1 11 1 6
K1 12 0 7
K1 13 1 8
But I'm stuck on that.
Anyone have any ideas, or another approach to get this?
first of all let's level it:
select regexp_instr('0011001110101', '1+', 1, LEVEL) istr,
regexp_substr('0011001110101', '1+', 1, LEVEL) strlen
FROM dual
CONNECT BY regexp_substr('0011001110101', '1+', 1, LEVEL) is not null
then the rest is easy with listagg :
with data as
(
select 'K1' KEY, '0011001110101' VAL from dual
union select 'K2', '0101000110' from dual
union select 'K3', '011100011010' from dual
)
SELECT key,
(SELECT listagg(CASE
WHEN length(regexp_substr(val, '1+', 1, LEVEL)) = 1 THEN
to_char(regexp_instr(val, '1+', 1, LEVEL))
ELSE
regexp_instr(val, '1+', 1, LEVEL) || '-' ||
to_char(regexp_instr(val, '1+', 1, LEVEL) +
length(regexp_substr(val, '1+', 1, LEVEL)) - 1)
END,
' ,') within GROUP(ORDER BY regexp_instr(val, '1+', 1, LEVEL))
from dual
CONNECT BY regexp_substr(data.val, '1+', 1, LEVEL) IS NOT NULL) val
FROM data
Using a recursive sub-query factoring clause without regular expressions:
Oracle Setup:
CREATE TABLE data ( key, val ) AS
SELECT 'K1', '0011001110101' FROM DUAL UNION ALL
SELECT 'K2', '0101000110' FROM DUAL UNION ALL
SELECT 'K3', '011100011010' FROM DUAL UNION ALL
SELECT 'K4', '000000000000' FROM DUAL UNION ALL
SELECT 'K5', '000000000001' FROM DUAL;
Query:
WITH ranges ( key, val, pos, rng ) AS (
SELECT key,
val,
INSTR( val, '1', 1 ), -- Position of the first 1
NULL
FROM data
UNION ALL
SELECT key,
val,
INSTR( val, '1', INSTR( val, '0', pos ) ), -- Position of the next 1
rng || ',' || CASE
WHEN pos = LENGTH( val ) -- Single 1 at end-of-string
OR pos = INSTR( val, '0', pos ) - 1 -- 1 immediately followed by 0
THEN TO_CHAR( pos )
WHEN INSTR( val, '0', pos ) = 0 -- Multiple 1s until end-of-string
THEN pos || '-' || LENGTH( val )
ELSE pos || '-' || ( INSTR( val, '0', pos ) - 1 ) -- Normal range
END
FROM ranges
WHERE pos > 0
)
SELECT KEY,
VAL,
SUBSTR( rng, 2 ) AS rng -- Strip the leading comma
FROM ranges
WHERE pos = 0 OR val IS NULL
ORDER BY KEY;
Output
KEY VAL RNG
--- ------------- -------------
K1 0011001110101 3-4,7-9,11,13
K2 0101000110 2,4,8-9
K3 011100011010 2-4,8-9,11
K4 000000000000
K5 000000000001 12
Here is a slightly more efficient version of Isalamon's solution (using a hierarchical query). It is slightly more efficient because I use a single hierarchical query instead of multiple ones (in correlated subqueries), and I calculate the length of each sequence of 1's just once, in the inner query. (In fact it is calculated only once anyway, but the function call itself has some overhead.)
This version also treats inputs like '00000' and NULL correctly. Isalamon's solution doesn't, and MT0's solution does not return a row when the input value is NULL. It is not clear if NULL is even possible in the input data, and if it is, what the desired result is; I assumed a row should be returned, with the page_list NULL as well.
Optimizer cost for this version is 17, vs. 18 for Isalamon's solution and 33 for MT0's. However, optimizer cost doesn't take into account the significantly slower processing of regular expressions compared to standard string functions; if speed of execution is important, MT0's solution should definitely be tried since it may prove faster.
with data ( key, val ) as (
select 'K1', '0011001110101' from dual union all
select 'K2', '0101000110' from dual union all
select 'K3', '011100011010' from dual union all
select 'K4', '000000000000' from dual union all
select 'K5', '000000000001' from dual union all
select 'K6', null from dual union all
select 'K7', '1111111' from dual union all
select 'K8', '1' from dual
)
-- End of test data (not part of the solution); SQL query begins below this line.
select key, val,
listagg(case when len = 1 then to_char(s_pos)
when len > 1 then to_char(s_pos) || '-' || to_char(s_pos + len - 1)
end, ',') within group (order by lvl) as page_list
from ( select key, level as lvl, val,
regexp_instr(val, '1+', 1, level) as s_pos,
length(regexp_substr(val, '1+', 1, level)) as len
from data
connect by regexp_substr(val, '1+', 1, level) is not null
and prior key = key
and prior sys_guid() is not null
)
group by key, val
order by key
;
Output:
KEY VAL PAGE_LIST
--- ------------- -------------
K1 0011001110101 3-4,7-9,11,13
K2 0101000110 2,4,8-9
K3 011100011010 2-4,8-9,11
K4 000000000000
K5 000000000001 12
K6
K7 1111111 1-7
K8 1 1
I need your help with the regexp_replace function. I have a table which has a column for concatenated string values which contain duplicates. How do I eliminate them?
Example:
Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha
I need the output to be
Ian,Beatty,Larry,Neesha
The duplicates are random and not in any particular order.
Update--
Here's how my table looks
ID Name1 Name2 Name3
1 a b c
1 c d a
2 d e a
2 c d b
I need one row per ID having distinct name1,name2,name3 in one row as a comma separated string.
ID Name
1 a,c,b,d,c
2 d,c,e,a,b
I have tried using listagg with distinct but I'm not able to remove the duplicates.
The easiest option I would go with -
SELECT ID, LISTAGG(NAME_LIST, ',')
FROM (SELECT ID, NAME1 NAME_LIST FROM DATA UNION
SELECT ID, NAME2 FROM DATA UNION
SELECT ID, NAME3 FROM DATA
)
GROUP BY ID;
Demo.
So, try this out...
([^,]+),(?=.*[A-Za-z],[] ]*\1)
I don't think you can do it just with regexp_replace if the repeated values are not next to each other. One approach is to split the values up, eliminate the duplicates, and then put them back together.
The common method to tokenize a delimited string is with regexp_substr and a connect by clause. Using a bind variable with your string to make the code a bit clearer:
var value varchar2(100);
exec :value := 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha';
select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null;
VALUE
------------------------------
Ian
Beatty
Larry
Neesha
Beatty
Neesha
Ian
Neesha
You can use that as a subquery (or CTE), get the distinct values from it, then reassemble it with listagg:
select listagg(value, ',') within group (order by value) as value
from (
select distinct value from (
select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null
)
);
VALUE
------------------------------
Beatty,Ian,Larry,Neesha
It's a bit more complicated if you're looking at multiple rows in a table as that confused the connect-by syntax, but you can use a non-determinisitic reference to avoid loops:
with t42 (id, value) as (
select 1, 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha' from dual
union all select 2, 'Mary,Joe,Mary,Frank,Joe' from dual
)
select id, listagg(value, ',') within group (order by value) as value
from (
select distinct id, value from (
select id, regexp_substr(value, '[^,]+', 1, level) as value
from t42
connect by regexp_substr(value, '[^,]+', 1, level) is not null
and id = prior id
and prior dbms_random.value is not null
)
)
group by id;
ID VALUE
---------- ------------------------------
1 Beatty,Ian,Larry,Neesha
2 Frank,Joe,Mary
Of course this wouldn't be necessary if you were storing relational data properly; having a delimited string in a column is not a good idea.
There is a way to find duplicates in this case, but it is a problem to remove them if there are more than one duplicated name within a string per id. Here is code that can deal with one duplicate per id.
Sample data:
WITH
tbl AS
(
Select 1 "ID", 'a' "NAME_1", 'b' "NAME_2", 'c' "NAME_3" From Dual Union All
Select 1 "ID", 'c' "NAME_1", 'd' "NAME_2", 'a' "NAME_3" From Dual Union All
Select 2 "ID", 'd' "NAME_1", 'e' "NAME_2", 'a' "NAME_3" From Dual Union All
Select 2 "ID", 'c' "NAME_1", 'd' "NAME_2", 'b' "NAME_3" From Dual
),
lists AS
(
Select 1 "ID", 'a,c,b,d,c' "NAME" From Dual Union All
Select 2 "ID", 'd,c,e,a,b' "NAME" From Dual
),
Creating CTE that compares your LISTAGG sttring with original data finding duplicate values:
grid AS
(
Select DISTINCT l.ID, l.NAME,
CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_1 || ',', '')) ) / Length(t.NAME_1 || ',') > 1 THEN NAME_1 END "NAME_1",
CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_2 || ',', '')) ) / Length(t.NAME_2 || ',') > 1 THEN NAME_2 END "NAME_2",
CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_3 || ',', '')) ) / Length(t.NAME_3 || ',') > 1 THEN NAME_3 END "NAME_3"
From
lists l
Inner Join
tbl t ON(t.ID = l.ID)
)
ID NAME NAME_1 NAME_2 NAME_3
---------- --------- ------ ------ ------
2 d,c,e,a,b
1 a,c,b,d,c c
1 a,c,b,d,c c
Main SQL, using Union, builds new string (removing second appearance) where the duplicate was found and then puts that new string after comparison with the old one.
SELECT DISTINCT l.ID, Nvl(g.NAME, l.NAME) NAME
FROM
lists l
LEFT JOIN
(
SELECT ID, CASE WHEN NAME_1 Is Not Null
THEN REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_1, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_1, 1, 2) + Length(NAME_1)), ',,', ','), NULL ) )
END "NAME"
FROM grid
WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
UNION ALL
SELECT ID, CASE WHEN NAME_2 Is Not Null
THEN REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_2, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_2, 1, 2) + Length(NAME_2)), ',,', ','), NULL ) )
END "NAME"
FROM grid
WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
UNION ALL
SELECT ID, CASE WHEN NAME_3 Is Not Null
THEN REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_3, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_3, 1, 2) + Length(NAME_3)), ',,', ','), NULL ) )
END "NAME"
FROM grid
WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
) g ON(g.ID = l.ID And Length(g.NAME) < Length(l.NAME))
R e s u l t :
ID NAME
---------- -------------
2 d,c,e,a,b
1 a,c,b,d
For multiple occurences within a string or for multiplicated different names there should be done some recursions or multiplied nestings to get it done...
I need to check if a partial name matches full name. For example:
Partial_Name | Full_Name
--------------------------------------
John,Smith | Smith William John
Eglid,Timothy | Timothy M Eglid
I have no clue how to approach this type of matching.
Another thing is that name and last name may come in the wrong order, making it harder.
I could do something like this, but this only works if names are in the same order and 100% match
decode(LOWER(REGEXP_REPLACE(Partial_Name,'[^a-zA-Z'']','')), LOWER(REGEXP_REPLACE(Full_Name,'[^a-zA-Z'']','')), 'Same', 'Different')
you could use this pattern on the text provided - works for most engines
([^ ,]+),([^ ,]+)(?=.*\b\1\b)(?=.*\b\2\b)
Demo
WITH
/*
tab AS
(
SELECT 'Smith William John' Full_Name, 'John,Smith' Partial_Name FROM dual
UNION ALL SELECT 'Timothy M Eglid', 'Eglid,timothy' FROM dual
UNION ALL SELECT 'Tim M Egli', 'Egli,Tim,M2' FROM dual
UNION ALL SELECT 'Timot M Eg', 'Eg' FROM dual
),
*/
tmp AS (
SELECT Full_Name, Partial_Name,
trim(CASE WHEN instr(Partial_Name, ',') = 0 THEN Partial_Name
ELSE regexp_substr(Partial_Name, '[^,]+', 1, lvl+1)
END) token
FROM tab t CROSS JOIN (SELECT lvl FROM (SELECT LEVEL-1 lvl FROM dual
CONNECT BY LEVEL <= (SELECT MAX(LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')))+1 FROM tab)))
WHERE LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')) >= lvl
)
SELECT Full_Name, Partial_Name
FROM tmp
GROUP BY Full_Name, Partial_Name
HAVING count(DISTINCT token)
= count(DISTINCT CASE WHEN REGEXP_LIKE(Full_Name, token, 'i')
THEN token ELSE NULL END);
In the tmp each partial_name is splitted on tokens (separated by comma)
The resulting query retrieves only those rows which full_name matches all the corresponding tokens.
This query works with the dynamic number of commas in partial_name. If there can be only zero or one commas then the query will be much easier:
SELECT * FROM tab
WHERE instr(Partial_Name, ',') > 0
AND REGEXP_LIKE(full_name, substr(Partial_Name, 1, instr(Partial_Name, ',')-1), 'ix')
AND REGEXP_LIKE(full_name, substr(Partial_Name,instr(Partial_Name, ',')+1), 'ix')
OR instr(Partial_Name, ',') = 0
AND REGEXP_LIKE(full_name, Partial_Name, 'ix');
This is what I ended up doing... Not sure if this is the best approach.
I split partials by comma and check if first name present in full name and last name present in full name. If both are present then match.
CASE
WHEN
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 1)))) > 0
AND
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 2)))) > 0
THEN 'Y'
ELSE 'N'
END AS MATCHING_NAMES