Logic and query for getting first name from full name - sql

I have a large database with a full name field. The full name can be in any format and can also include title. For example, all of the following are possible:
John Smith
Smith, John
Mr. John Smith
Dr. John Smith
Mrs. Jane Smith
Ms. Jane Smith
Jane Smith, Esq.
Jane Smith, MD
I want to preserve the full name field, but also add a predicted first name field from a separate table (that contains name, gender).
I think the proper logic for this is to match the first name values + a space to the full name table via LIKE. The space is so that "David Johnson" doesn't match to "John."
I think the way to accomplish this is an update statement with a subquery in it. Here's what I have so far:
UPDATE "employees"
SET "employees".FirstName = (SELECT firstname
FROM genders
WHERE fullname LIKE '%"employees".FirstName %')

What you really want to do is use Postgres's full text search capabilities. You can create a stopwords list containing titles to exclude (Mr, Ms, etc.). Then, set up a search configuration to use your stopwords.
Once you've set up your search configuration correctly, your query will look something like this (this is the SELECT variant: Changing to UPDATE will be trivial):
SELECT employees.full_name, genders.first_name
FROM employees
LEFT JOIN genders ON
TO_TSVECTOR('english_titles', employees.full_name)
## TO_TSQUERY('english_titles', genders.first_name)
This will give you the following results:
full_name first_name
"John Smith" "John"
"Smith, John" "John"
"Mr. John Smith" "John"
"Dr. John Smith" "John"
"Mrs. Jane Smith" "Jane"
"Ms. Jane Smith" "Jane"
"Jane Smith, Esq." "Jane"
"Jane Smith, MD" "Jane"
"David Johnson" NULL
In order for this to work, you'll need to take the following steps:
Create a stopwords file containing job titles, and put it in your $SHAREDIR/tsearch_data Postgres directory. See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS.
Create a dictionary that uses this stopwords list (you can probably use the pg_catalog.simple as your template dictionary). See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY.
Create a search configuration for job titles. See https://www.postgresql.org/docs/9.1/static/textsearch-configuration.html.
Alter your search configuration to use the dictionary you created in Step 2 (cf. the link above).
Now, with all that said, you need to think carefully about a couple of things:
How do you expect to handle people whose last name matches a first name in your Genders table? For example, you have someone called John Stuart, and both John and Stuart are in your genders table. How do you expect to handle that?
How do you expect to handle people with nicknames, or only have one name? I would strongly encourage you to read Falsehoods Programmers Believe About Names to make sure you're not making any ill-founded assumptions.
Why is your table containing first named called genders? Do you expect to match people to first-name by gender? If so, that's a dangerous road to go down---there are names that can be used for either sex.

Related

Alteryx Designer - How to retrieve only first and last name from field excluding middle initials?

I need help in writing SQL code in Alteryx Designer.
My table employees contains a column Name with values shown below. However, I need the expected output as shown below.
Please help.
Name:
Smith, Mary K
Koch, J B
Batoon Rene, Anne S
Vaughan-tre Doctor, Maria S
Excepted output:
Smith, Mary
Koch, J
Batoon Rene, Anne
Vaughan-tre, Maria
The middle initials and “Doctor” word is removed.
Not sure why you need to use SQL if you have the data in Alteryx?
So, you need to remove the right hand 2 characters and the word 'Doctor' from each record?
You could use the Formula tool, though I suspect there are numerous other ways:
replace (TrimRight([Name],' '+right([Name],1)),'Doctor','')

Remove middle name from full name in SAS

I need a way to remove the middle names from my full name in SAS.
Example:
Name= MARY ANN SMITH
Name= JERRY J SMITH
Output wanted:
Name2= MARY SMITH
Name2= JERRY SMITH
Any ideas how I can do this?
If you have actual names of real people then the problem is much harder than you implied. Some people have first or last (or both) names that are more than one word. What about people that only have one name?
Anyway SCAN() can do what you want.
name2=catx(' ',scan(name,1,' '),scan(name,-1,' '));

SQL WHERE column values into capital letters

Let's say I have the following entries in my database:
Id
Name
12
John Doe
13
Mary anne
13
little joe
14
John doe
In my program I have a string variable that is always capitalized, for example:
myCapString = "JOHN DOE"
Is there a way to retrieve the rows in the table by using a WHERE on the name column with the values capitalized and then matching myCapString?
In this case the query would return two entries, one with id=12, and one with id=14
A solution is NOT to change the actual values in the table.
A general solution in Postgres would be to capitalize the Name column and then do a comparison against an all-caps string literal, e.g.
SELECT *
FROM yourTable
WHERE UPPER(Name) = 'JOHN DOE';
If you need to implement this is Knex, you will need to figure out how to uppercase a column. This might require using a raw query.

MSSQL query where to include city name Abbreviations

Have search on my site and user need to input a city name.
Even though I have an auto complete from my database in city input they can still type any name.
So the db table will have city names such as:
Saint Paul
St. Louis
And user can input "Saint Louis" / "St Louis"
and i still need there to be a result from table even though it is not an exact match.
The same goes for Mt/Mt./Mount and so on.
How can this be achieved?
Thanks

How do you query only part of the data in the row of a column - Microsoft SQL Server

I have a column called NAME, I have 2000 rows in that column that are filled with people's full names, e.g. ANN SMITH. How do I do a query that will list all the people whose first name is ANN? There are about 20 different names whose first name is ANN but the surname is different.
I tried
and (NAME = 'ANN')
but it returned zero results.
I have to enter the FULL name and (NAME = 'ANN SMITH') ANN SMITH to even get a result .
I just want to list all the people with there first name as ANN
Try in your where clause:
Where Name like 'ANN %'
Should work mate.
ANN% will find all results where ANN is first then anything after.
%ANN% will find the 3 letters ANN in any part of that rows field.
Hope it helps
Also usually Name is separated into First names and second name columns.
this will save Having to use wild cards in your SQL and provide A bit more normalized data.
SELECT NAME
FROM NAMES
WHERE NAME LIKE 'ANN %'
This should wildcard select anything that begins with 'ANN' followed by a space.