Oracle: Fuzzy lookup - sql

I'm loading a table looking up an employee table. However sometimes the names from Source files and Employee table does not match correctly.
**Employee table:**
Employee Name
Paul Jaymes
**Source File**
Paul James
I want this to match. What could be the solution.

Use the UTL_MATCH package or the SOUNDEX function:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Employees ( Name ) AS
SELECT 'Paul Jaymes' FROM DUAL;
Query 1:
UTL_MATCH.EDIT_DISTANCE:
Calculates the number of changes required to transform string-1 into string-2
SELECT *
FROM Employees
WHERE UTL_MATCH.EDIT_DISTANCE( Name, 'Paul James' ) < 2
Query 2:
UTL_MATCH.EDIT_DISTANCE_SIMILARITY:
Calculates the number of changes required to transform string-1 into string-2, returning a value between 0 (no match) and 100 (perfect match)
SELECT *
FROM Employees
WHERE UTL_MATCH.EDIT_DISTANCE_SIMILARITY( Name, 'Paul James' ) > 90
Query 3:
UTL_MATCH.JARO_WINKLER:
Calculates the measure of agreement between string-1 and string-2
SELECT *
FROM Employees
WHERE UTL_MATCH.JARO_WINKLER( Name, 'Paul James' ) > 0.9
Query 4:
UTL_MATCH.JARO_WINKLER_SIMILARITY:
Calculates the measure of agreement between string-1 and string-2, returning a value between 0 (no match) and 100 (perfect match)
SELECT *
FROM Employees
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY( Name, 'Paul James' ) > 95
Query 5:
SOUNDEX:
returns a character string containing the phonetic representation of char. This function lets you compare words that are spelled differently, but sound alike in English.
SELECT *
FROM Employees
WHERE SOUNDEX( Name ) = SOUNDEX( 'Paul James' )
Results:
All give the output:
| NAME |
|-------------|
| Paul Jaymes |

Use UTL_MATCH.EDIT_DISTANCE_SIMILARITY function in Oracle.
I would recommend creating a temporary table as below and check if the data is as expected. Usually score above 90-93 should be same with some typo in different systems. If there's only difference in 1 character you would get a score of 92 and above.
select s.employee_name,
utl_match.edit_distance_similarity(initcap(s.employee_name),e.employee_name) as score
from source_table s cross join employee_table e
where utl_match.edit_distance_similarity(initcap(s.employee_name),e.employee_name) >=90 ;

Related

Get union and intersection of jsonb array in Postgresql

I have a DB of people with jsonb column interests. In my application user can search for people by providing their hobbies which is set of some predefined values. I want to offer him a best match and in order to do so I would like to count match as intersection/union of interests. This way the top results won't be people who have plenty of hobbies in my DB.
Example:
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["knitting"]
user input in app:
["reading", "swimming", "knitting", "cars"]
my script should output this:
Mary 0.4
John 0.2
Ann 0.16667
Carl 0.25
Now I'm using
SELECT name
FROM people
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
but this gives me even records with many interests and no way to order it.
Is there any way I can achieve it in a reasonable time - let's say up to 5 seconds in DB with around 400K records?
EDIT:
I added another example to clarify my calculations. My calculation needs to filter people with many hobbies. Therefore match should be calculated as Intersection(input, db_record)/Union(input, db_record).
Example:
input = ["reading"]
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["reading"]
Match for Mary would be calculated as (LENGTH(["reading"]))/(LENGTH(["swimming","reading","jogging"])) which is 0.3333
and for Carl it would be (LENGTH(["reading"]))/LENGTH([("reading")]) which is 1
UPDATE: I managed to do it with
SELECT result.id, result.name, result.overlap_count/(jsonb_array_length(persons.interests) + 4 - result.overlap_count)::decimal as score
FROM (SELECT t1.name as name, t1.id, COUNT(t1.name) as overlap_count
FROM (SELECT name, id, jsonb_array_elements(interests)
FROM persons) as t1
JOIN (SELECT unnest(ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"'])::jsonb as elements) as t2 ON t1.jsonb_array_elements = t2.elements
GROUP BY t1.name, t1.id) as result
JOIN persons ON result.id = persons.id ORDER BY score desc
Here's my fiddle https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b4b1760854b2d77a1c7e6011d074a1a3
However it's not fast enough and I would appreciate any improvements.
One option is to unnest the parameter and use the ? operator to check each and every element the jsonb array:
select
t.name,
x.match_ratio
from mytable t
cross join lateral (
select avg( (t.interests ? a.val)::int ) match_ratio
from unnest(array['reading', 'swimming', 'knitting', 'cars']) a(val)
) x
It is not very clear what are the rules behind the result that you are showing. This gives you a ratio that represents the percentage of values in the parameter array that can be found in the interests of each person (so Mary gets 0.5 since she has two interests in common with the search parameter, and all other names get 0.25).
Demo on DB Fiddle
One option would be using jsonb_array_elements() to unnest the jsonb column :
SELECT name, count / SUM(count) over () AS ratio
FROM(
SELECT name, COUNT(name) AS count
FROM people
JOIN jsonb_array_elements(interests) AS j(elm) ON TRUE
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
GROUP BY name ) q
Demo

Select field based on other column max value in oracle pl/sql

I am calculating a field called "Degree Level" in a view. This is a field in the table "Degrees", and the table shows degrees for each faculty member. A faculty member can have more than one degree.
The field "degree level" is also in the table "Crosswalk_Table". I want to choose Degree level for a faculty member based on the max value in the column "Degree_Hierarchy" in the Crosswalk_table.
The code below displays the "Master" instead of the "Doctor" for degree level (which has the higher hierarchy value). ANY help is much appreciated thank you.
CAST (
(SELECT DEGREE_LEVEL
FROM Degrees D, Crosswalk_Table E
WHERE
E.DEGREE_HIERARCHY =
(SELECT MAX (DEGREE_HIERARCHY)
FROM Crosswalk_Table
WHERE DEGREE_CODE = D.FACULTY_DEGREE_CODE)
AND D.FACULTY_DEGREE_CODE = E.DEGREE_CODE
AND D.PERSON_SKEY = SRC.PERSON_SKEY
AND ROWNUM <=1
ORDER BY DEGREE_HIERARCHY DESC)
AS VARCHAR2 (50))
Sample Data:
Degree table:
Person_skey Degree_Code
-------------------------
123456 MA
123456 JD
Crosswalk_Table:
degree_level degree_code degree_hierarchy
---------------------------------------------
master MA 30
doctor JD 40
If you are using Oracle 12 or higher, then you may use a subquery like that (with ORDER BY and LIMIT to 1 row):
SELECT c.DEGREE_LEVEL
FROM Degrees d
JOIN CROSSWALK_TABLE c
ON c.Degree_Code = d.Degree_Code
WHERE d.Person_skey = 123456
ORDER BY c.DEGREE_HIERARCHY DESC
FETCH FIRST ROW ONLY
Please take a look at this simple demo:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=c8d41924c593f4f361de59a611a363cc

How to store before and after decimal value in 2 different column

Name Gender Amount
Ram male 20.56
Bhavna female 78.2
darshan male 12.02
Avni female 50.366
I want to divide the Amount Column in 2 parts where one Column includes the before decimal value (i.e 20.56=20) And Second column includes after decimal value (i.e 20.56=56)...
-- check this query
select amount, decode (pos,0,amount,substr(amount,1,pos-1)) as before_decimal ,
decode(pos,0,0,substr(amount,pos+1,length(amount))) as after_decimal
from (
select instr((substr(amount,1,length(amount))),'.') as pos,amount
from table_name
)
you can get numbers using FORMAT:
FORMAT(your_number,xxxxx) --you can choose xxxxx whatever you want
usage: FORMAT (N, D)
You can look how to use it : https://www.w3resource.com/mysql/string-functions/mysql-format-function.php
You can use this query to get your expected output like,
Amount is : 20.56
To get '20' as a output we can use this query
SELECT FLOOR(20.56) FROM TABLE_NAME
& To get exact '56' as a output we can use this query
SELECT FLOOR((20.56 - FLOOR(20.56))*100) FROM TABLE_NAME
If you want them in separate columns, you can use arithmetic functions:
select t.*, floor(val) as col_left, floor(val * 100) % 100 as col_right
from (select 20.56 as val) t

Any built-in function in Oracle to round down numbers and distribute the remaining values randomly

I have a table say STAFF that stores the staff names and their salaries.
Below are some sample data:
STAFF | SALARY
===========================
ALEX | 100.4
JESSICA | 100.4
PETER | 99.2
The total of salaries is always a whole number and I want to round down all staff's salaries and then randomly put the remaining value to one of them.
For example, the output would be like below if JESSICA is selected to receive the remaining value.
STAFF | SALARY
===========================
ALEX | 100
JESSICA | 101
PETER | 99
Does Oracle provide any built-in function to perform the described operation.
The quantity SALARY - TRUNC(SALARY) should give the decimal portion of each salary, for each record. You can sum this for the entire table, and then increment a certain user's salary but this amount. Try something like this:
UPDATE yourTable
SET SALARY = TRUNC(SALARY) + (SELECT SUM(SALARY - TRUNC(SALARY)) FROM yourTable)
WHERE STAFF = 'JESSICA'
here I have tried one thing that gives random result based on generated random no.
with mine(STAFF,salary,status) as
(
select 'ALEX',100.4,'Y' from dual union all
select 'JESSICA',100.4,'Y' from dual union all
select 'PETER',99.2,'Y' from dual union all
select 'randomno',floor(dbms_random.value(1,4)),'N' vno from dual
)
select STAFF,decode(rndno,rno,csalary,rsalary) salary,decode(rndno,rno,'selected to receive the remaining value',null) selected from(
select rownum rno,STAFF,salary,round(salary) rsalary,ceil(salary) csalary,
(select salary from mine where status='N') rndno
from mine where status='Y'
);
here on every run on query new user is selected which have floating salary.
in above query i have add one onther rows that supply random no and compare with acual result rows.

SQL server - How to find the highest number in '<> ' in a text column?

Lets say I have the following data in the Employee table: (nothing more)
ID FirstName LastName x
-------------------------------------------------------------------
20 John Mackenzie <A>te</A><b>wq</b><a>342</a><d>rt21</d>
21 Ted Green <A>re</A><b>es</b><1>t34w</1><4>65z</4>
22 Marcy Nate <A>ds</A><b>tf</b><3>fv 34</3><6>65aa</6>
I need to search in the X column and get highest number in <> these brackets
What sort of SELECT statement can get me, for example, the number 6 like in <6>, in the x column?
This type of query generally works on finding patterns, I consider that the <6> is at the 9th position from left.
Please note if the pattern changes the below query will not work.
SELECT A.* FROM YOURTABLE A INNER JOIN
(SELECT TOP 1 ID,Firstname,Lastname,SUBSTRING(X,LEN(X)-9,1) AS [ORDER]
FROM YOURTABLE
WHERE ISNUMERIC(SUBSTRING(X,LEN(X)-9,1))=1
ORDER BY SUBSTRING(X,LEN(X)-9,1))B
ON
A.ID=B.ID AND
A.FIRSTNAME=B.FIRSTNAME AND
A.LASTNAME=B.LASTNAME