How replace only the next word after search string? - sql

how can i find and replace only the next word after a search string with a select statement?
For example:
"The user Mr Smith helped me a lot" --> Output: "The user Mr X helped me a lot"
The search string is "Mr" and there a many different last names (data protection reasons).
Thank you :)

I'm not sure you are asking the correct question.
with your "replace next word" you will run into issues.
By the looks of your example, this is a free text input from an end user (assuming VARCHAR(MAX)) and it is hard to predict what variation end users could type.
you could therefore have search items;
- Mr Smith
Mr. Smith
Mister Smith
Mistar Smith (spelling error intentional for example)
Mrs Smith
Miss Smith
Mr & Mrs Smith (2 people)
Mr. + Mrs & Miss Smith (Family of 3 with the end users using/not using punctuation and using different AND symbols)
Mrs&MrSmith (End user didn't want Spaces for the entity)
Dr. Smith (or Doc or Doc. or Doct or Doct. or Doctor or misspelled Docter)
Prof. Smith (Pro. or Professor or Proffessor or Profeser or Pr)
Father Smith (Ftr.)
Rev Smith (rev
Earl Smith
Sir Smith
Dame Smith
Lady Smith
Chancellor Smith
etc. (Note: these are just English titles, what if there is a German Herr, a French Monsieur or a title from any other language in the text?)
Also you have the issue of "the next word", what about double barrel names, names that are broken by hyphens or even those containing apostrophes (e.g. Mr Smith Carroll / Mr Smith-Carroll / Mr O'Carroll)?
At what point do you want the next word to finish? The next space? The next non-surname? Do you have a list of all surnames to check this against?
You really need to encrypt the db to be 100% sure that no data will accidentally not be replaced.
Make it protocol going forward not to allow the use of actual names in free text boxes, i.e. have the end users type "Mr X" in the text box from now on, but encryption seems to be your best/safest option.

Use STUFF() Function of MSSQL. STUFF()

Below is a query which would help you to extract the required result.
Note:- I am using 'addressstreet' as a column which you can replace with your column and Box with Mr.
WITH CTE AS (
SELECT SUBSTRING(addressstreet ,1,(CHARINDEX(' BOX ',addressstreet + ' ')-1))TEST0,REPLACE(SUBSTRING([addressstreet], CHARINDEX('BOX ', [addressstreet]), LEN([addressstreet])),'BOX','')test, addressstreet FROM [dbo].[Vendor]
where addressstreet like '% BOX %')
,CTE2 AS(
SELECT TEST0, ltrim(stuff(TEST,1,charindex(' ',TEST),''))TEST1, ADDRESSSTREET FROM CTE
)
SELECT TEST0 + ' Mr X '+ ltrim(stuff(TEST1,1,charindex(' ',TEST1),'')) RequiredOutput, ADDRESSSTREET from cte2
This SQL statement is not a performance optimum query, but you can achieve your objective.
Output:-
RequiredOutput AddressStreet
P.O. Mr X Street Plaza P.O. BOX 32109 Street Plaza

Related

Alteryx Designer - How to retrieve only first and last name from field excluding middle initials?

I need help in writing SQL code in Alteryx Designer.
My table employees contains a column Name with values shown below. However, I need the expected output as shown below.
Please help.
Name:
Smith, Mary K
Koch, J B
Batoon Rene, Anne S
Vaughan-tre Doctor, Maria S
Excepted output:
Smith, Mary
Koch, J
Batoon Rene, Anne
Vaughan-tre, Maria
The middle initials and “Doctor” word is removed.
Not sure why you need to use SQL if you have the data in Alteryx?
So, you need to remove the right hand 2 characters and the word 'Doctor' from each record?
You could use the Formula tool, though I suspect there are numerous other ways:
replace (TrimRight([Name],' '+right([Name],1)),'Doctor','')

Remove middle name from full name in SAS

I need a way to remove the middle names from my full name in SAS.
Example:
Name= MARY ANN SMITH
Name= JERRY J SMITH
Output wanted:
Name2= MARY SMITH
Name2= JERRY SMITH
Any ideas how I can do this?
If you have actual names of real people then the problem is much harder than you implied. Some people have first or last (or both) names that are more than one word. What about people that only have one name?
Anyway SCAN() can do what you want.
name2=catx(' ',scan(name,1,' '),scan(name,-1,' '));

Logic and query for getting first name from full name

I have a large database with a full name field. The full name can be in any format and can also include title. For example, all of the following are possible:
John Smith
Smith, John
Mr. John Smith
Dr. John Smith
Mrs. Jane Smith
Ms. Jane Smith
Jane Smith, Esq.
Jane Smith, MD
I want to preserve the full name field, but also add a predicted first name field from a separate table (that contains name, gender).
I think the proper logic for this is to match the first name values + a space to the full name table via LIKE. The space is so that "David Johnson" doesn't match to "John."
I think the way to accomplish this is an update statement with a subquery in it. Here's what I have so far:
UPDATE "employees"
SET "employees".FirstName = (SELECT firstname
FROM genders
WHERE fullname LIKE '%"employees".FirstName %')
What you really want to do is use Postgres's full text search capabilities. You can create a stopwords list containing titles to exclude (Mr, Ms, etc.). Then, set up a search configuration to use your stopwords.
Once you've set up your search configuration correctly, your query will look something like this (this is the SELECT variant: Changing to UPDATE will be trivial):
SELECT employees.full_name, genders.first_name
FROM employees
LEFT JOIN genders ON
TO_TSVECTOR('english_titles', employees.full_name)
## TO_TSQUERY('english_titles', genders.first_name)
This will give you the following results:
full_name first_name
"John Smith" "John"
"Smith, John" "John"
"Mr. John Smith" "John"
"Dr. John Smith" "John"
"Mrs. Jane Smith" "Jane"
"Ms. Jane Smith" "Jane"
"Jane Smith, Esq." "Jane"
"Jane Smith, MD" "Jane"
"David Johnson" NULL
In order for this to work, you'll need to take the following steps:
Create a stopwords file containing job titles, and put it in your $SHAREDIR/tsearch_data Postgres directory. See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS.
Create a dictionary that uses this stopwords list (you can probably use the pg_catalog.simple as your template dictionary). See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY.
Create a search configuration for job titles. See https://www.postgresql.org/docs/9.1/static/textsearch-configuration.html.
Alter your search configuration to use the dictionary you created in Step 2 (cf. the link above).
Now, with all that said, you need to think carefully about a couple of things:
How do you expect to handle people whose last name matches a first name in your Genders table? For example, you have someone called John Stuart, and both John and Stuart are in your genders table. How do you expect to handle that?
How do you expect to handle people with nicknames, or only have one name? I would strongly encourage you to read Falsehoods Programmers Believe About Names to make sure you're not making any ill-founded assumptions.
Why is your table containing first named called genders? Do you expect to match people to first-name by gender? If so, that's a dangerous road to go down---there are names that can be used for either sex.

SQL: How to find exact matches of words within a text

Please bear with me, I'm new to Access and SQL.
What I'm trying to do is to write a SQL query to filter through two tables - one contains words that are split into two columns and the other contains text. Essentially, what I want is a new table that gives me all of the exact matches of the two columns of words with the column of text.
Here's an analogous database to simulate what I want as a result:
Table A:
FirstName: LastName:
John Doe
Jane Doe
Josh Smith
James Jones
David Johnson
Table B:
FullName:
Jake Davidson
Mike Peters
Jason James
John Michael Smith
Query Result:
FirstName: LastName: FullName:
John Doe John Michael Smith
Josh Smith John Michael Smith
James Jones Jason James
(notice that the David - Davidson match didn't come up. i.e. I'd like exact matches only)
So help me fill in the blanks:
SELECT TableA.FirstName,TableA.LastName, TableB.FullName
FROM TableA,TableB
WHERE TableB.FullName LIKE (has an exact match with TableA.FirstName--not sure what to put )
UNION
SELECT TableA.FirstName,TableA.LastName, TableB.FullName
FROM TableA,TableB
WHERE TableB.FullName LIKE (has an exact match with TableA.LastName--not sure what to put)
;
This will be dependant on what you want it to do with FullNames with more than two names, like "John Jacob Smith", but, assuming you want it to ignore the middle word[s],
then try
Select firstname, lastname, fullname
from tableA a
Join tableb f
On f.firstname = Mid(a.fullname, 1, InStr(a.fullname, " ")-1)
Join tableb l
On l.lastname = Mid(a.fullname, InStrRev(a.FullNamee, " ")+1)
Here is an approach that compares each FullName to both Firstname and LastName:
select a.Firstname, a.LastName, b.FullName
from tableA as a inner join
tableB as b
on instr(' '&b.FullName&' ', ' '&a.FirstName&' ') > 0 and
instr(' '&b.FullName&' ', ' '&a.Lastname&' ') > 0
It assume that the delimiter for names is a space (as in your example). The comparison attaches a space onto the beginning and end of FullName and then looks for a space-padded first name and last name.

Oracle 10G regexp for Name

I am trying to write a regexp_replace to create a "Friendly" name for some employees. They are currently stored as FIRST <POSSIBLE MIDDLE INITIAL> LAST <POSSIBLE SUFFIX> <MULTIPLE WHITESPACE> SITE_ID
For example,
JOHN SMITH ABC
JOHN Q SMITH ABC
JOHN Q SMITH III ABC
I am trying to write a regex so that I will end up with:
Smith, John
Smith, John Q
Smith III, John Q
The ABC "Site ID" doesn't need to be included in my output.
This is what I tried with little success:
regexp_replace(
employee_name,
'^(\S+)\s(\S+)\s(\S+)',
'\3, \1 \2'
)
Also, I am using Oracle 10G. Any help would be greatly appreciated!
If your names don't show the problems ruakh points out, i.e., there aren't single-letter names or surnames, and no Hispanic names, you can try this regexp:
^(\S+)\s(\S\s)?(\S+)(\s\S+)?\s\s+\S+$
The replacement should be:
\3\4, \1\2