Word Match Search with PL/SQL - sql

what would be the best way to do a word match search in PL/SQL?
E.g. for the string "BROUGHTONS OF CHELTENHAM LIMITED"
"BROUGHTONS LIMITED" is a match
"OF LIMITED" is a match
"CHELTENHAM BROUGHTONS" is a match
"BROUG" is a non-match

Here's a rather crude approach, but should do what you are asking. As Xophmeister noted, you probably need to tokenize each string and then search the tokens (since you want to match out of order, doing a simple "like %tokenA%tokenB%tokenC%" won't work).
Also, this doesn't even touch all the issues around phonetics, soundex, etc. But again, not what you asked. This also doesn't touch performance or scaling issues, and would probably only be acceptable for a small set of data.
So, first we need a split function:
create or replace
function fn_split(i_string in varchar2, i_delimiter in varchar2 default ',', b_dedup_tokens in number default 0)
return sys.dbms_debug_vc2coll
as
l_tab sys.dbms_debug_vc2coll;
begin
select regexp_substr(i_string,'[^' || i_delimiter || ']+', 1, level)
bulk collect into l_tab
from dual
connect by regexp_substr(i_string, '[^' || i_delimiter || ']+', 1, level) is not null
order by level;
if (b_dedup_tokens > 0) then
return l_tab multiset union distinct l_tab;
end if;
return l_tab;
end;
Now we can use it to inspect strings for specific tokens. Here I'm searching for 3 tokens (John Q Public) from a sample set of data
with test_data as (
select 1 as id, 'John Q Public' as full_name from dual
union
select 2 as id, 'John John Smith' as full_name from dual
union
select 3 as id,'Sally Smith' from dual
union
select 4 as id, 'Mr John B B Q Public' from dual
union
select 5 as id, 'A Public John' from dual
)
select d.id, d.full_name, count(1) as hits
from test_data d, table(fn_split(full_name, ' ', 1))
-- should have at least 1 of these tokens
where column_value in ('John', 'Q', 'Public')
group by d.id, d.full_name
-- can also restrict results to those with at least x token hits
having count(1) >= 2
-- most hits at top of results
order by count(1) desc, id asc
Output:
"ID" "FULL_NAME" "HITS"
1 "John Q Public" 3
4 "Mr John B B Q Public" 3
5 "A Public John" 2
You can also add "upper" to make case insensitive, etc.

Use an Oracle Text index. This will allow you to issue powerful CONTAINS queries.
http://docs.oracle.com/cd/B28359_01/text.111/b28303/quicktour.htm

Related

Oracle SQL : Check if specified words are present in comma separated string

I have an SQL function that returns me a string of comma separated country codes.
I have configured some specific codes in another table and I may remove or add more later.
I want to check if the comma separated string is only the combination of those specific country codes or not. That said, if that string is having even a single country code other than the specified ones, it should return true.
Suppose I configured two rows in the static data table GB and CH. Then I need below results:
String from function
result
GB
false
CH
false
GB,CH
false
CH,GB
false
GB,FR
true
FR,ES
true
ES,CH
true
CH,GB,ES
true
I am on Oracle 19c and can use only the functions available for this version. Plus I want it to be optimised. Like I can check the number of values in string and then count for each specific code. If not matching then obviously some other codes are present. But I don't want to use loops.
Can someone please suggest me a better option.
Assuming that all country codes in the static table, as well as all tokens in the comma-separated strings, are always exactly two-letter strings, you could do something like this:
with
static_data(country_code) as (
select 'GB' from dual union all
select 'CH' from dual
)
, sample_inputs(string_from_function) as (
select 'GB' from dual union all
select 'CH' from dual union all
select 'GB,CH' from dual union all
select 'CH,GB' from dual union all
select 'GB,FR' from dual union all
select 'FR,ES' from dual union all
select 'ES,CH' from dual union all
select 'CH,GB,ES' from dual
)
select string_from_function,
case when regexp_replace(string_from_function,
',| |' || (select listagg(country_code, '|')
within group (order by null)
from static_data))
is null then 'false' else 'true' end as result
from sample_inputs
;
Output:
STRING_FROM_FUNCTION RESULT
---------------------- --------
GB false
CH false
GB,CH false
CH,GB false
GB,FR true
FR,ES true
ES,CH true
CH,GB,ES true
The regular expression replaces comma, space, and every two-letter country code from the static data table with null. If the result of the whole thing is null, then all coded in the csv are in the static table; that's what you need to test for.
The assumptions guarantee that a token like GBCH (for a country like "Great Barrier Country Heat") would not be mistakenly considered OK because GB and CH are OK separately.
You can convert a csv column to a table and use EXISTS. For example
with tbl(id,str) as
(
SELECT 1,'GB,CH' FROM DUAL UNION ALL
SELECT 2,'GB,CH,FR' FROM DUAL UNION ALL
SELECT 3,'GB' FROM DUAL
),
countries (code) as
(SELECT 'GB' FROM DUAL UNION ALL
SELECT 'CH' FROM DUAL
)
select t.* ,
case when exists (
select 1
from xmltable(('"' || REPLACE(str, ',', '","') || '"')) s
where trim(s.column_value) not in (select code from countries)
)
then 'true' else 'false' end flag
from tbl t
One option is to match the country codes one by one, and then determine whether there exists an extra non-matched country from the provided literal as parameter.
The following one with FULL JOIN would help by considering the logic above
WITH
FUNCTION with_function(i_countries VARCHAR2) RETURN VARCHAR2 IS
o_val VARCHAR2(10);
BEGIN
SELECT CASE WHEN SUM(NVL2(t.country_code,0,1))=0 THEN 'false'
ELSE 'true'
END
INTO o_val
FROM (SELECT DISTINCT REGEXP_SUBSTR(i_countries,'[^ ,]+',1,level) AS country
FROM dual
CONNECT BY level <= REGEXP_COUNT(i_countries,',')+1) tt
FULL JOIN t
ON tt.country = t.country_code;
RETURN o_val;
END;
SELECT with_function(<comma-seperated-parameter-list>) AS result
FROM dual
Demo
Here is one solution
with cte as
(select distinct
s,regexp_substr(s, '[^,]+',1, level) code from strings
connect by regexp_substr(s, '[^,]+', 1, level) is not null
)
select
s string,min(case when exists
(select * from countries
where cod = code) then 'yes'
else 'no'end) all_found
from cte
group by s
order by s;
STRING | ALL_FOUND
:----- | :--------
CH | yes
CH,GB | yes
ES | no
ES,CH | no
FR | no
GB | yes
GB,CH | yes
GB,ES | no
db<>fiddle here
If you have a small number of values in the static table then the simplest method may not be to split the values from the function but to generate all combinations of values from the static table using:
SELECT SUBSTR(SYS_CONNECT_BY_PATH(value, ','), 2) AS combination
FROM static_table
CONNECT BY NOCYCLE PRIOR value != value;
Which, for the sample data:
CREATE TABLE static_table(value) AS
SELECT 'GB' FROM DUAL UNION ALL
SELECT 'CH' FROM DUAL;
Outputs:
COMBINATION
GB
GB,CH
CH
CH,GB
Then you can use a simple CASE expression to your string output to the combinations:
SELECT function_value,
CASE
WHEN function_value IN (SELECT SUBSTR(SYS_CONNECT_BY_PATH(value, ','), 2)
FROM static_table
CONNECT BY NOCYCLE PRIOR value != value)
THEN 'false'
ELSE 'true'
END AS not_matched
FROM string_from_function;
Which, for the sample data:
CREATE TABLE string_from_function(function_value) AS
SELECT 'GB' FROM DUAL UNION ALL
SELECT 'CH' FROM DUAL UNION ALL
SELECT 'GB,CH' FROM DUAL UNION ALL
SELECT 'CH,GB' FROM DUAL UNION ALL
SELECT 'GB,FR' FROM DUAL UNION ALL
SELECT 'FR,ES' FROM DUAL UNION ALL
SELECT 'ES,CH' FROM DUAL UNION ALL
SELECT 'CH,GB,ES' FROM DUAL;
Outputs:
FUNCTION_VALUE
NOT_MATCHED
GB
false
CH
false
GB,CH
false
CH,GB
false
GB,FR
true
FR,ES
true
ES,CH
true
CH,GB,ES
true
db<>fiddle here

Oracle : replace string of options based on data set - is this possible?

I have column in table looking like this:
PATTERN
{([option1]+[option2])*([option3]+[option4])}
{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}
{[option1]+[option6]}
{([option1]+[option2])*([option8]+[option9])}
{([option1]+[option2])*[option4]}
{[option10]}
Every option has a number of value.
There is a table - let's call it option_set and records look like
OPTION VALUE
option1 3653265
option2 26452
option3 73552
option3 100
option4 1235
option5 42565
option6 2330
option7 544
option9 2150
I want to replace option name to number in 1st table, if exists of course, if not exists then =0.
I have done this in PLSQL (get the pattern, go through every option, and if exists - regexp_replace),
but I am wondering if this could be done in SQL??
My goal is to replace values for all patterns for current OPTION_SET and get only records, where all equations would be greater than 0. Of course - I couldn't run this equation in SQL, so I think of something like
for rec in
(
SELECT...
)
loop
execute immediate '...';
if above_equation > 0 then ..
end loop;
Any ideas would be appreciated
You can do a loop-like query in SQL with the recursive CTE, replacing new token on each iteration, so this will let you to replace all the tokens.
The only way I know to execute a dynamic query inside SQL statement in Oracle is DBMS_XMLGEN package, so you can evaluate the expression and filter by the result value without PL/SQL. But all this is viable for low cardinality tables with patterns and options.
Here's the code:
with a as (
select 1 as id, '{([option1]+[option2])*([option3]+[option4])}' as pattern from dual union all
select 2 as id, '{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}' as pattern from dual union all
select 3 as id, '{[option1]+[option6]}' as pattern from dual union all
select 4 as id, '{([option1]+[option2])*([option8]+[option9])}' as pattern from dual union all
select 5 as id, '{([option1]+[option2])*[option4]}' as pattern from dual union all
select 6 as id, '{[option10]}]' as pattern from dual
)
, opt as (
select 'option1' as opt, 3653265 as val from dual union all
select 'option2' as opt, 26452 as val from dual union all
select 'option3' as opt, 73552 as val from dual union all
select 'option3' as opt, 100 as val from dual union all
select 'option4' as opt, 1235 as val from dual union all
select 'option5' as opt, 42565 as val from dual union all
select 'option6' as opt, 2330 as val from dual union all
select 'option7' as opt, 544 as val from dual union all
select 'option9' as opt, 2150 as val from dual
)
, opt_ordered as (
/*Order options to iterate over*/
select opt.*, row_number() over(order by 1) as rn
from opt
)
, rec (id, pattern, repl_pattern, lvl) as (
select
id,
pattern,
pattern as repl_pattern,
0 as lvl
from a
union all
select
r.id,
r.pattern,
/*Replace each part at new step*/
replace(r.repl_pattern, '[' || o.opt || ']', o.val),
r.lvl + 1
from rec r
join opt_ordered o
on r.lvl + 1 = o.rn
)
, out_prepared as (
select
rec.*,
case
when instr(repl_pattern, '[') = 0
/*When there's no more not parsed expressions, then we can try to evaluate them*/
then dbms_xmlgen.getxmltype(
'select ' || replace(replace(repl_pattern, '{', ''), '}', '')
|| ' as v from dual'
)
/*Otherwise SQL statement will fail*/
end as parsed_expr
from rec
/*Retrieve the last step*/
where lvl = (select max(rn) from opt_ordered)
)
select
id,
pattern,
repl_pattern,
extractvalue(parsed_expr, '/ROWSET/ROW/V') as calculated_value
from out_prepared o
where extractvalue(parsed_expr, '/ROWSET/ROW/V') > 0
ID | PATTERN | REPL_PATTERN | CALCULATED_VALUE
-: | :------------------------------------------------------------------ | :---------------------------------------- | :---------------
1 | {([option1]+[option2])*([option3]+[option4])} | {(3653265+26452)*(73552+1235)} | 275194995279
2 | {([option1]+[option2])*([option3]+[option4])*([option6]+[option7])} | {(3653265+26452)*(73552+1235)*(2330+544)} | 790910416431846
3 | {[option1]+[option6]} | {3653265+2330} | 3655595
5 | {([option1]+[option2])*[option4]} | {(3653265+26452)*1235} | 4544450495
db<>fiddle here
Here is one way to do this. There's a lot to unpack, so hang on tight.
I include the test data in the with clause. Of course, you won't need that; simply remove the two "tables" and use your actual table and column names in the query.
From Oracle 12.1 on, we can define PL/SQL functions directly in the with clause, right at the top; if we do so, the query must be terminated with a slash (/) instead of the usual semicolon (;). If your version is earlier than 12.1, you can define the function separately. The function I use takes an "arithmetic expression" (a string representing a compound arithmetic operation) and returns its value as a number. It uses native dynamic SQL (the "execute immediate" statement), which will cause the query to be relatively slow, as a different cursor is parsed for each row. If speed becomes an issue, this can be changed, to use a bind variable (so that the cursor is parsed only once).
The recursive query in the with clause replaces each placeholder with the corresponding value for the "options" table. I use 0 either if a "placeholder" doesn't have a corresponding option in the table, or if it does but the corresponding value is null. (Note that your sample data shows option3 twice; that makes no sense, and I removed one occurrence from my sample data.)
Instead of replacing one placeholder at a time, I took the opposite approach; assuming the patterns may be long, but the number of "options" is small, this should be more efficient. Namely: at each step, I replace ALL occurrences of '[optionN]' (for a given N) in a single pass. Outside the recursive query, I replace all the placeholders for "non-existent" options with 0.
Note that recursive with clause requires Oracle 11.2. If your version is even earlier than that (although it shouldn't be), there are other ways; you would likely need to do that in PL/SQL also.
So, here it is - a single SELECT query for the whole thing:
with
function expr_eval(pattern varchar2) return number as
x number;
begin
execute immediate 'select ' || pattern || ' from dual' into x;
return x;
end;
p (id, pattern) as (
select 1, '{([option1]+[option2])*([option3]+[option4])}' from dual union all
select 2, '{([option1]+[option2])*([option3]+[option4])*([option6]+[option7])}' from dual union all
select 3, '{[option1]+[option6]}' from dual union all
select 4, '{([option1]+[option2])*([option8]+[option9])}' from dual union all
select 5, '{([option1]+[option2])*[option4]}' from dual union all
select 6, '{[option10]}' from dual union all
select 7, '{[option2]/([option3]+[option8])-(300-[option2])/(0.1 *[option3])}' from dual
)
, o (opt, val) as (
select 'option1', 3653265 from dual union all
select 'option2', 26452 from dual union all
select 'option3', 100 from dual union all
select 'option4', 1235 from dual union all
select 'option5', 42565 from dual union all
select 'option6', 2330 from dual union all
select 'option7', 544 from dual union all
select 'option9', 2150 from dual
)
, n (opt, val, rn, ct) as (
select opt, val, rownum, count(*) over ()
from o
)
, r (id, pattern, rn, ct) as (
select id, substr(pattern, 2, length(pattern) - 2), 1, null
from p
union all
select r.id, replace(r.pattern, '[' || n.opt || ']', nvl(to_char(n.val), 0)),
r.rn + 1, n.ct
from r join n on r.rn = n.rn
)
, ae (id, pattern) as (
select id, regexp_replace(pattern, '\[[^]]*]', '0')
from r
where rn = ct + 1
)
select id, expr_eval(pattern) as result
from ae
order by id
/
Output:
ID RESULT
---- ---------------
1 4912422195
2 14118301388430
3 3655595
4 7911391550
5 4544450495
6 0
7 2879.72

Search comma separated value in oracle 12

I have a Table - Product In Oracle, wherein p_spc_cat_id is stored as comma separated values.
p_id p_name p_desc p_spc_cat_id
1 AA AAAA 26,119,27,15,18
2 BB BBBB 0,0,27,56,57,4
3 BB CCCC 26,0,0,15,3,8
4 CC DDDD 26,0,27,7,14,10
5 CC EEEE 26,119,0,48,75
Now I want to search p_name which have p_spc_cat_id in '26,119,7' And this search value are not fixed it will some time '7,27,8'. The search text combination change every time
my query is:
select p_id,p_name from product where p_spc_cat_id in('26,119,7');
when i execute this query that time i can't find any result
I am little late in answering however i hope that i understood the question correctly.
Read further if: you have a table storing records like
1. 10,20,30,40
2. 50,40,20,70
3. 80,60,30,40
And a search string like '10,60', in which cases it should return rows 1 & 3.
Please try below, it worked for my small table & data.
create table Temp_Table_Name (some_id number(6), Ab varchar2(100))
insert into Temp_Table_Name values (1,'112,120')
insert into Temp_Table_Name values (2,'7,8,100,26')
Firstly lets breakdown the logic:
The table contains comma separated data in one of the columns[Column AB].
We have a comma separated string which we need to search individually in that string column. ['26,119,7,18'-X_STRING]
ID column is primary key in the table.
1.) Lets multiple each record in the table x times where x is the count of comma separated values in the search string [X_STRING]. We can use below query to create the cartesian join sub-query table.
Select Rownum Sequencer,'26,119,7,18' X_STRING
from dual
CONNECT BY ROWNUM <= (LENGTH( '26,119,7,18') - LENGTH(REPLACE( '26,119,7,18',',',''))) + 1
Small note: Calculating count of comma separated values =
Length of string - length of string without ',' + 1 [add one for last value]
2.) Create a function PARSING_STRING such that PARSING_STRING(string,position). So If i pass:
PARSING_STRING('26,119,7,18',3) it should return 7.
CREATE OR REPLACE Function PARSING_STRING
(String_Inside IN Varchar2, Position_No IN Number)
Return Varchar2 Is
OurEnd Number; Beginn Number;
Begin
If Position_No < 1 Then
Return Null;
End If;
OurEnd := Instr(String_Inside, ',', 1, Position_No);
If OurEnd = 0 Then
OurEnd := Length(String_Inside) + 1;
End If;
If Position_No = 1 Then
Beginn := 1;
Else
Beginn := Instr(String_Inside, ',', 1, Position_No-1) + 1;
End If;
Return Substr(String_Inside, Beginn, OurEnd-Beginn);
End;
/
3.) Main query, with the join to multiply records.:
select t1.*,PARSING_STRING(X_STRING,Sequencer)
from Temp_Table_Name t1,
(Select Rownum Sequencer,'26,119,7,18' X_STRING from dual
CONNECT BY ROWNUM <= (Select (LENGTH( '26,119,7,18') - LENGTH(REPLACE(
'26,119,7,18',',',''))) + 1 from dual)) t2
Please note that with each multiplied record we are getting 1 particular position value from the comma separated string.
4.) Finalizing the where condition:
Where
/* For when the value is in the middle of the strint [,value,] */
AB like '%,'||PARSING_STRING(X_STRING,Sequencer)||',%'
OR
/* For when the value is in the start of the string [value,]
parsing the first position comma separated value to match*/
PARSING_STRING(AB,1) = PARSING_STRING(X_STRING,Sequencer)
OR
/* For when the value is in the end of the string [,value]
parsing the last position comma separated value to match*/
PARSING_STRING(AB,(LENGTH(AB) - LENGTH(REPLACE(AB,',',''))) + 1) =
PARSING_STRING(X_STRING,Sequencer)
5.) Using distinct in the query to get unique ID's
[Final Query:Combination of all logic stated above: 1 Query to find them all]
select distinct Some_ID
from Temp_Table_Name t1,
(Select Rownum Sequencer,'26,119,7,18' X_STRING from dual
CONNECT BY ROWNUM <= (Select (LENGTH( '26,119,7,18') - LENGTH(REPLACE( '26,119,7,18',',',''))) + 1 from dual)) t2
Where
AB like '%,'||PARSING_STRING(X_STRING,Sequencer)||',%'
OR
PARSING_STRING(AB,1) = PARSING_STRING(X_STRING,Sequencer)
OR
PARSING_STRING(AB,(LENGTH(AB) - LENGTH(REPLACE(AB,',',''))) + 1) = PARSING_STRING(X_STRING,Sequencer)
You can use like to find it:
select p_id,p_name from product where p_spc_cat_id like '%26,119%'
or p_spc_cat_id like '%119,26%' or p_spc_cat_id like '%119,%,26%' or p_spc_cat_id like '%26,%,119%';
Use the Oracle function instr() to achieve what you want. In your case that would be:
SELECT p_name
FROM product
WHERE instr(p_spc_cat_id, '26,119') <> 0;
Oracle Doc for INSTR
If the string which you are searching will always have 3 values (i.e. 2 commas present) then you can use below approach.
where p_spc_cat_id like regexp_substr('your_search_string, '[^,]+', 1, 1)
or p_spc_cat_id like regexp_substr('your_search_string', '[^,]+', 1, 2)
or p_spc_cat_id like regexp_substr('your_search_string', '[^,]+', 1, 3)
If you cant predict how many values will be there in your search string
(rather how many commas) in that case you may need to generate dynamic query.
Unfortunately sql fiddle is not working currently so could not test this code.
SELECT p_id,p_name
FROM product
WHERE p_spc_cat_id
LIKE '%'||'&i_str'||'%'`
where i_str is 26,119,7 or 7,27,8
This solution uses CTE's. "product" builds the main table. "product_split" turns products into rows so each element in p_spc_cat_id is in it's own row. Lastly, product_split is searched for each value in the string '26,119,7' which is turned into rows by the connect by.
with product(p_id, p_name, p_desc, p_spc_cat_id) as (
select 1, 'AA', 'AAAA', '26,119,27,15,18' from dual union all
select 2, 'BB', 'BBBB', '0,0,27,56,57,4' from dual union all
select 3, 'BB', 'CCCC', '26,0,0,15,3,8' from dual union all
select 4, 'CC', 'DDDD', '26,0,27,7,14,10' from dual union all
select 5, 'CC', 'EEEE', '26,119,0,48,75' from dual
),
product_split(p_id, p_name, p_spc_cat_id) as (
select p_id, p_name,
regexp_substr(p_spc_cat_id, '(.*?)(,|$)', 1, level, NULL, 1)
from product
connect by level <= regexp_count(p_spc_cat_id, ',')+1
and prior p_id = p_id
and prior sys_guid() is not null
)
-- select * from product_split;
select distinct p_id, p_name
from product_split
where p_spc_cat_id in(
select regexp_substr('26,119,7', '(.*?)(,|$)', 1, level, NULL, 1) from dual
connect by level <= regexp_count('26,119,7', ',') + 1
)
order by p_id;
P_ID P_
---------- --
1 AA
3 BB
4 CC
5 CC
SQL>

substring, after last occurrence of character?

I need help with this problem:
I have a column named phone_number and I wanted to query this column to get the the string right of the last occurrence of '.' for all kinds of numbers in one single sql query.
example #:
515.123.1277
011.44.1345.629268
I need to get 1277 and 629268 respectively.
I have this so far:
select phone_number,
case when length(phone_number) <= 12
then
substr(phone_number,-4)
else
substr (phone_number, -6) end
from employees;
This works for this example, but I want it for all kinds of # formats.
Would be great to get some input.
Thanks
It should be as easy as this regex:
SELECT phone_number, REGEXP_SUBSTR(phone_number, '[^.]*$')
FROM employees;
With the end anchor $ it should get everything that is not a . character after the final .. If the last character is . then it will return NULL.
Search for a pattern including the period, [.] with digits, \d, followed by the end of the string, $.
Associate the digits with a character group by placing the pattern, \d, in parenthesis (see below). This is referenced with the subexpr parameter, 1 (last parameter).
Here is the solution:
SCOTT#dev> list
1 WITH t AS
2 ( SELECT '414.352.3100' p_number FROM dual
3 UNION ALL
4 SELECT '515.123.1277' FROM dual
5 UNION ALL
6 SELECT '011.44.1345.629268' FROM dual
7 )
8* SELECT regexp_substr(t.p_number, '[.](\d+)$', 1, 1, NULL, 1) end_num FROM t
SCOTT#dev> /
END_NUM
========================================================================
3100
1277
629268
You can do something like this in oracle:
select regexp_substr(num,'[^\.]+',1,regexp_count(num,'\.')+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );
Previous to 11gR2 you can use regexp_replace instead regexp_count:
select regexp_substr(num,'[^\.]+',1,length(regexp_replace (num , '[^\.]+'))+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );

Check if string variations exists in another string

I need to check if a partial name matches full name. For example:
Partial_Name | Full_Name
--------------------------------------
John,Smith | Smith William John
Eglid,Timothy | Timothy M Eglid
I have no clue how to approach this type of matching.
Another thing is that name and last name may come in the wrong order, making it harder.
I could do something like this, but this only works if names are in the same order and 100% match
decode(LOWER(REGEXP_REPLACE(Partial_Name,'[^a-zA-Z'']','')), LOWER(REGEXP_REPLACE(Full_Name,'[^a-zA-Z'']','')), 'Same', 'Different')
you could use this pattern on the text provided - works for most engines
([^ ,]+),([^ ,]+)(?=.*\b\1\b)(?=.*\b\2\b)
Demo
WITH
/*
tab AS
(
SELECT 'Smith William John' Full_Name, 'John,Smith' Partial_Name FROM dual
UNION ALL SELECT 'Timothy M Eglid', 'Eglid,timothy' FROM dual
UNION ALL SELECT 'Tim M Egli', 'Egli,Tim,M2' FROM dual
UNION ALL SELECT 'Timot M Eg', 'Eg' FROM dual
),
*/
tmp AS (
SELECT Full_Name, Partial_Name,
trim(CASE WHEN instr(Partial_Name, ',') = 0 THEN Partial_Name
ELSE regexp_substr(Partial_Name, '[^,]+', 1, lvl+1)
END) token
FROM tab t CROSS JOIN (SELECT lvl FROM (SELECT LEVEL-1 lvl FROM dual
CONNECT BY LEVEL <= (SELECT MAX(LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')))+1 FROM tab)))
WHERE LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')) >= lvl
)
SELECT Full_Name, Partial_Name
FROM tmp
GROUP BY Full_Name, Partial_Name
HAVING count(DISTINCT token)
= count(DISTINCT CASE WHEN REGEXP_LIKE(Full_Name, token, 'i')
THEN token ELSE NULL END);
In the tmp each partial_name is splitted on tokens (separated by comma)
The resulting query retrieves only those rows which full_name matches all the corresponding tokens.
This query works with the dynamic number of commas in partial_name. If there can be only zero or one commas then the query will be much easier:
SELECT * FROM tab
WHERE instr(Partial_Name, ',') > 0
AND REGEXP_LIKE(full_name, substr(Partial_Name, 1, instr(Partial_Name, ',')-1), 'ix')
AND REGEXP_LIKE(full_name, substr(Partial_Name,instr(Partial_Name, ',')+1), 'ix')
OR instr(Partial_Name, ',') = 0
AND REGEXP_LIKE(full_name, Partial_Name, 'ix');
This is what I ended up doing... Not sure if this is the best approach.
I split partials by comma and check if first name present in full name and last name present in full name. If both are present then match.
CASE
WHEN
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 1)))) > 0
AND
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 2)))) > 0
THEN 'Y'
ELSE 'N'
END AS MATCHING_NAMES