SQL Server - grab part of string after a value sequence - sql

I have a table called Note with a column named Notes.
Notes
------
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Arial;}}
\viewkind4\uc1\pard\lang1033\fs20 called insurance company they are waiting to hear from the claimant's attorney
It has font info in the beginning which I don't need. I've created a new column name final_notes and would like to grab everything after the "fs" plus two characters. The final result would be
final_notes
-----------
called insurance company they are waiting to hear from the claimant's attorney

We use PATINDEX to find the first occurrence of fs followed by two digits.
We null it out if we get a 0 i.e. we cannot find the string.
SUBSTRING(Note, NULLIF(PATINDEX('%fs[0-9][0-9]%', Note), 0) + 4, LEN(Note))

Related

Using LIKE clause when formats are different

I was given a patient list with names and I am trying to match with a list already in our database and am having troubles given the format of the name field in the patient list. This list is taken from a web form so people can input names however they want so it does not match up well.
WEBFORM_NAME
PATIENT_NAME
JOHN SMITH
SMITH,JOHN L
SHANNON BROWN
BROWN,SHANNON MARIE
Is there a way to use a LIKE clause in an instance like this? All I really need is the LIKE clause to find the first name because I have joined on phone number and email address already. My issue is when households have the same phone number and email address (spouses for example) I just want to return the right person in the household.
Not sure if all you need is to get first name, here is the WIldCard expression to get first name
SELECT LEFT(WEBFORM_NAME,CHARINDEX(' ',WEBFORM_NAME)-1) AS FirstName1,
SUBSTRING(PATIENT_NAME,CHARINDEX(',',PATIENT_NAME)+1,(CHARINDEX(' ',PATIENT_NAME)-CHARINDEX(',',PATIENT_NAME))) AS FirstName2
FROM yourTable
The assumption here seems to be that the webform (where user would manually) type in the name would be of the format <First Name> [<optional middle Name(s)>] <Last Name>, where as the data stored in the table are of the form <Last Name>,<First Name> [<optional middle Name(s)>]. Its not an exact science, but since other criteria (like email, phone etc) have been matched best case
select *
from webform w, patient p
where
-- extract just the last name and match that
regexp_like(p.name,
'^' ||
regexp_extract(w.name,
'([^[:space:],][[:space:],])*([^[:space:],]+)', 1, 2))
and -- extract the first name and match it
regexp_like(p.name,
',[[:space:]]*' ||
regexp_extract(w.name, '(^[^[:space:],]+)'))
Since webform is free form user input, its hard to handle abbreviated middle name(s) and other variations so using the above will do first name and last name based matching which in addition to the matching you are already doing should help.

How to get the specified string in a sentence

I have a section as Note which contains the Patient Name and Patient Number I want to fetch Patient Name only. I tried using the CHARINDEX function but was able to fetch the Name along with the Phone written at the end. How can I try to remove the last 5 chars or optimize to fetch only the name from the column?
Input:
Note
Patient Name: John Mathews Phone Number: 1234567890
Required Output: John Mathews
Currently, I am using the following SQL query to get the output as:
SELECT SUBSTRING(Note, CHARINDEX(':', Note)
, CHARINDEX('Phone',Note) - CHARINDEX('Name', Note))
The output I have received is :
Output: John Mathews Phone
I want to remove the Phone part, I tried using various methods but was unable to find a proper solution for the same.
Can someone help me where should I make changes in the same function without using another substring to find the length and removing it from the end?
Based on your sample data and description, the patient name always begins at position 15. That makes this rather simple:
select substring(v.note, 15, charindex('Phone Number:', note) - 16)

CASE ReGex with substring

I'm writing a SQL query where I am taking the substring of 2 names (First name/last name) to create an initials column, the data is unstructured to a certain extent (Can't show for GDPR reasons) but where there is a company name it is just in the surname column.
I'm trying to use Regex to say when the already present initials column is 1 letter (I.e not an initial) and if it is not an initial run a command that I wrote which successfully works.
CAST(CASE
WHEN [DATA_TABLE].[INITIALS] = '\d' THEN (CONCAT(substring([DATA_TABLE].[FIRSTNAMES],1,1),substring([DATA_TABLE].[SURNAME],1,1)) AS char) AS INITIALS
ELSE [DATA_TABLE].[INITIALS]
end as char) as INITIALS,
An example of the data format:
First name last name initials
John smith JS
Electrical company E
Sam Craig SC
I want the names that are just in the surname (Company names) to just remain as they are with no change (I.e The \d regex). Ones which don't will become the substring of their first name as (1,1) and a substring of their last name to also be (1,1).

Get following letter from given

I have a table with company names. Some companies have different locations and different legal names but they should be reported under the same Group Code. The Code is made up using the first five letters.
Company GroupCode
DEEZER FRANCE DEEZE
DEEZER SPAIN DEEZE
DEEZER ALGERIA DEEZE
So far so good. Now I’m adding a different company which starts with the same letters but should get a new Group Code.
A new Code should be assigned if the company name does not contain a word which is part of a company name already having a GroupCode. In this Case DEEZER is the key word which determines association with GroupCode DEEZE
Rule is that the code should then use the first four letters + the fifth letter next in the alphabet. If this code also exists then use the first four letters + the fifth letter next but one in the alphabet. The required result would look like:
Company GroupCode Status
DEEZER FRANCE DEEZE EXISTING
DEEZER SPAIN DEEZE EXISITNG
DEEZER ALGERIA DEEZE EXISTING
DEEZEMBER DEEZF CREATED
DEEZEMAL DEEZG CREATED
So what I need to figure out is the next „unused“ letter. How can I achieve this with SQL Server 2008 R2?
Try this:
;with cte as
(select max(groupcode) maxcode
from yourtable
where left(code,4) = left(#companyname,4))
insert into yourtable (company, groupcode, [status])
select #companyname,
case when maxcode is null then left(#companyname,4) + 'a' else left(maxcode,4) + char(ascii(right(maxcode,1))+1) end,
'created'
from cte
Assumption: Your input is taking the company name as a parameter from somewhere, presumably the front end.
The idea is to use ascii function to get the ASCII code of the last letter, increment it by 1 and go back to the corresponding character using char function.
Be warned, however, that this is definitely not the best solution. For instance, I have not implemented bounds checking to ensure range between A and Z. In fact, I would suggest that you handle this in application code rather than at DB level.

SQL for Querying MSSQL with Double Metaphone for First/Last Name Combos

I am using Double-Metaphone for fuzzy searching within my database. I have a table of names, and both the first and last names have double metaphone entries already created (and updated, via a Trigger). In my application, I am allowing the user to search by Lastname and/or Firstname.
What is the best way to query the database, to get the best results from the Double-Metaphone indexes when dealing with both last AND first names ? Querying just based on lastname is easy - generate the DM tags and query the database. It's when querying by both first and last that I'd like to get some fine tuning.
The database layout is similar to the following:
tblName
FirstName
LastName
MetaPhoneFN1
MetaPhoneFN2
MetaPhoneLN1
MetaPhoneLN2
Application: [Lastname] [FirstName]
User inputs just a lastname, or a combination of lastname + [First initial, first name, part of first name].
Lastname: SMITH
FirstName: J or Jo or John or Johnathan
If I pass in "J" as the firstname - I'd like all name entries matching "J%".
If I pass in "JO" as the firstname - I'd like all name entries matching "JO%".
If I pass in "JOHN" or "JOHNATHAN" as the firstname - I'd like to use DM
or maybe also "JOHN%" ?
I'm really open to suggestions here, for the firstname. I want the results to be as good as possible and return what the user wants.
What is the best way to query the database for last + any of those combinations of first name ? Here's a sample of what I've gotten so far.. and I'm not completely thrilled with the results:
SELECT *
FROM tblName
WHERE
--There will always be a last name
(MetaPhoneLN1 = #paramMetaPhoneLN1
OR (CASE WHEN #paramMetaPhoneLN2 IS NOT NULL AND MetaPhoneLN2 = #paramMetaPhoneLN2 THEN 1
WHEN #paramMetaPhoneLN2 IS NULL THEN 0
END) = 1)
-- Match Firstname 1
AND (CASE WHEN #paramMetaPhoneFN1 IS NULL THEN 1
WHEN #paramMetaPhoneFN1 IS NOT NULL AND MetaPhoneFN1 = #paramMetaPhoneFN1 THEN 1
WHEN LEN(#paramMetaPhoneFN1) > 1 AND LEN(#paramMetaPhoneFN1) < 4 AND MetaPhoneFN1 LIKE #paramMetaPhoneFN1 + '%' THEN 1
WHEN LEN(#paramMetaPhoneFN1) = 1 THEN 1
END) = 1
-- Match Firstname 2
AND (CASE WHEN #paramMetaPhoneFN2 IS NULL THEN 1
WHEN #paramMetaPhoneFN2 IS NOT NULL AND MetaPhoneFN2 = #paramMetaPhoneFN2 THEN 1
WHEN LEN(#paramMetaPhoneFN2) > 1 AND LEN(#paramMetaPhoneFN2) < 4 AND MetaPhoneFN2 LIKE #paramMetaPhoneFN2 + '%' THEN 1
WHEN LEN(#paramMetaPhoneFN2) = 1 THEN 1
--ELSE 0
END) = 1
AND (CASE WHEN #paramFirstName IS NULL THEN 1
WHEN FirstName LIKE #paramFirstName + '%' THEN 1
--WHEN LEN(#paramMetaPhoneFN1) = 1 AND #paramFirstName IS NOT NULL AND LEN(#paramFirstName) > 1 AND FirstName LIKE #paramFirstName + '%' THEN 1
--ELSE 1
END) = 1
What I've tried to do is account for the different variations for firstname. My results however, aren't exactly what I would want.
I've been able to find lots of implementations of Double Metaphone in SQL/C#, etc. for /generating/ the Double-Metaphone values, but nothing on how to actually query the database effectively once you have those values.
SUMMARY:
When I search by both lastname and firstname -- I'd like to query the database for the Double Metaphone match only on Lastname, but I'd like a lot of flexibility when a firstname is also passed in.. first initial ? sounds like ? etc. I am open to suggestions and SQL examples!
UPDATE 1:
When I say that I'm not thrilled with the results.. what I'm saying is that I'm not sure how to formulate the Firstname part of the query, to maximize results. If I search for "WILL" - what results should be returned ? WILLIAM, WILL, WILBERT .. but not WALKER - though with what I have here, WALKER would be returned because WILL -> FL and WALKER IS [FLKR] but WILLIAM IS [FLM]. If I do only DM = DM then I wouldn't get WILLIAM even returned, which is why I'm doing a LIKE in the first place, if the DM length is < 4.
Basically, I'd like to know if anyone else has run into this issue, and see what solutions others have come up with.
First initial only - should show all firstnames starting with that initial
- Here's where I'm uncertain:
Partial name - should should all firstnames starting with the partial ? [how do you know if it's just a partial name ?!]
Full name - should use DM ?
It's up to you to decide your business rules on what to return, and what to consider using LIKE vs. DM (or both) on.
Once thing you seem to not considering, though is length of the DM value.
If I search for "WILL" - what results should be returned ? WILLIAM,
WILL, WILBERT .. but not WALKER - though with what I have here, WALKER
would be returned because WILL -> FL and WALKER IS [FLKR] but WILLIAM
IS [FLM]. If I do only DM = DM then I wouldn't get WILLIAM even
returned, which is why I'm doing a LIKE in the first place, if the DM
length is < 4.
So, for this case:
WILL -> FL and WALKER IS [FLKR] but WILLIAM > IS [FLM]
Assuming you are OK with returning multiple matches with best match at top, you would order the results by the length of the stored matching DM value ascending. So, WALKER would be suggested before WILLIAM.
For the first names, again assuming you are OK with returning multiple possible matchs, you could return exact string matches first (non-DM), followed by exact DM matches, followed by partial DM and LIKE matches ordered by the shortest DM matches first, and then LIKE matches and then the rest of the longer DM matches. This is often easiest done with a bunch of UNIONed queries.
You could also choose to rank the LIKE matches by how much the returned string length differs from the input string length (smaller difference = better match).
The difficulty you are facing is that you are combining searching abbreviated names with phonetically similar names. Those two aims are sometimes opposing each other.
Just to throw you another complication, ;-), Bill is also an abbreviation of William.
My thoughts on this subject are that it's probably best to treat names that could be abbreviated or are abbreviations as a separate issue from the phonetic matching. Once you come up with a solution for the abbreviations, then feed the results through metaphone.