I need a way to remove the middle names from my full name in SAS.
Example:
Name= MARY ANN SMITH
Name= JERRY J SMITH
Output wanted:
Name2= MARY SMITH
Name2= JERRY SMITH
Any ideas how I can do this?
If you have actual names of real people then the problem is much harder than you implied. Some people have first or last (or both) names that are more than one word. What about people that only have one name?
Anyway SCAN() can do what you want.
name2=catx(' ',scan(name,1,' '),scan(name,-1,' '));
Related
I need help in writing SQL code in Alteryx Designer.
My table employees contains a column Name with values shown below. However, I need the expected output as shown below.
Please help.
Name:
Smith, Mary K
Koch, J B
Batoon Rene, Anne S
Vaughan-tre Doctor, Maria S
Excepted output:
Smith, Mary
Koch, J
Batoon Rene, Anne
Vaughan-tre, Maria
The middle initials and “Doctor” word is removed.
Not sure why you need to use SQL if you have the data in Alteryx?
So, you need to remove the right hand 2 characters and the word 'Doctor' from each record?
You could use the Formula tool, though I suspect there are numerous other ways:
replace (TrimRight([Name],' '+right([Name],1)),'Doctor','')
I have some data which looks something like this:
FirstName
LastName
John
Doe
Jane
Roe
Bob
Smith
Sue
Jones
And I want to display all these people one after the other, like so, for a report where I'm reporting on some other table and want these people to be displayed inline:
FirstName1
LastName1
FirstName2
LastName2
FirstName3
LastName3
FirstName4
LastName4
John
Doe
Jane
Roe
Bob
Smith
Sue
Jones
I don't know exactly how many people there will be, so this won't work. How can I do this without knowing the number of people? I suppose it would be OK to leave out the numeric suffixes, even if that does make the data a bit less clear. Let's say the person names are coming from a Person table, even though the data I'm dealing with is more than just person names (I just simplified it to make it more understandable, minimal reproducible example and all).
Thanks!
how can i find and replace only the next word after a search string with a select statement?
For example:
"The user Mr Smith helped me a lot" --> Output: "The user Mr X helped me a lot"
The search string is "Mr" and there a many different last names (data protection reasons).
Thank you :)
I'm not sure you are asking the correct question.
with your "replace next word" you will run into issues.
By the looks of your example, this is a free text input from an end user (assuming VARCHAR(MAX)) and it is hard to predict what variation end users could type.
you could therefore have search items;
- Mr Smith
Mr. Smith
Mister Smith
Mistar Smith (spelling error intentional for example)
Mrs Smith
Miss Smith
Mr & Mrs Smith (2 people)
Mr. + Mrs & Miss Smith (Family of 3 with the end users using/not using punctuation and using different AND symbols)
Mrs&MrSmith (End user didn't want Spaces for the entity)
Dr. Smith (or Doc or Doc. or Doct or Doct. or Doctor or misspelled Docter)
Prof. Smith (Pro. or Professor or Proffessor or Profeser or Pr)
Father Smith (Ftr.)
Rev Smith (rev
Earl Smith
Sir Smith
Dame Smith
Lady Smith
Chancellor Smith
etc. (Note: these are just English titles, what if there is a German Herr, a French Monsieur or a title from any other language in the text?)
Also you have the issue of "the next word", what about double barrel names, names that are broken by hyphens or even those containing apostrophes (e.g. Mr Smith Carroll / Mr Smith-Carroll / Mr O'Carroll)?
At what point do you want the next word to finish? The next space? The next non-surname? Do you have a list of all surnames to check this against?
You really need to encrypt the db to be 100% sure that no data will accidentally not be replaced.
Make it protocol going forward not to allow the use of actual names in free text boxes, i.e. have the end users type "Mr X" in the text box from now on, but encryption seems to be your best/safest option.
Use STUFF() Function of MSSQL. STUFF()
Below is a query which would help you to extract the required result.
Note:- I am using 'addressstreet' as a column which you can replace with your column and Box with Mr.
WITH CTE AS (
SELECT SUBSTRING(addressstreet ,1,(CHARINDEX(' BOX ',addressstreet + ' ')-1))TEST0,REPLACE(SUBSTRING([addressstreet], CHARINDEX('BOX ', [addressstreet]), LEN([addressstreet])),'BOX','')test, addressstreet FROM [dbo].[Vendor]
where addressstreet like '% BOX %')
,CTE2 AS(
SELECT TEST0, ltrim(stuff(TEST,1,charindex(' ',TEST),''))TEST1, ADDRESSSTREET FROM CTE
)
SELECT TEST0 + ' Mr X '+ ltrim(stuff(TEST1,1,charindex(' ',TEST1),'')) RequiredOutput, ADDRESSSTREET from cte2
This SQL statement is not a performance optimum query, but you can achieve your objective.
Output:-
RequiredOutput AddressStreet
P.O. Mr X Street Plaza P.O. BOX 32109 Street Plaza
I have a large database with a full name field. The full name can be in any format and can also include title. For example, all of the following are possible:
John Smith
Smith, John
Mr. John Smith
Dr. John Smith
Mrs. Jane Smith
Ms. Jane Smith
Jane Smith, Esq.
Jane Smith, MD
I want to preserve the full name field, but also add a predicted first name field from a separate table (that contains name, gender).
I think the proper logic for this is to match the first name values + a space to the full name table via LIKE. The space is so that "David Johnson" doesn't match to "John."
I think the way to accomplish this is an update statement with a subquery in it. Here's what I have so far:
UPDATE "employees"
SET "employees".FirstName = (SELECT firstname
FROM genders
WHERE fullname LIKE '%"employees".FirstName %')
What you really want to do is use Postgres's full text search capabilities. You can create a stopwords list containing titles to exclude (Mr, Ms, etc.). Then, set up a search configuration to use your stopwords.
Once you've set up your search configuration correctly, your query will look something like this (this is the SELECT variant: Changing to UPDATE will be trivial):
SELECT employees.full_name, genders.first_name
FROM employees
LEFT JOIN genders ON
TO_TSVECTOR('english_titles', employees.full_name)
## TO_TSQUERY('english_titles', genders.first_name)
This will give you the following results:
full_name first_name
"John Smith" "John"
"Smith, John" "John"
"Mr. John Smith" "John"
"Dr. John Smith" "John"
"Mrs. Jane Smith" "Jane"
"Ms. Jane Smith" "Jane"
"Jane Smith, Esq." "Jane"
"Jane Smith, MD" "Jane"
"David Johnson" NULL
In order for this to work, you'll need to take the following steps:
Create a stopwords file containing job titles, and put it in your $SHAREDIR/tsearch_data Postgres directory. See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS.
Create a dictionary that uses this stopwords list (you can probably use the pg_catalog.simple as your template dictionary). See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY.
Create a search configuration for job titles. See https://www.postgresql.org/docs/9.1/static/textsearch-configuration.html.
Alter your search configuration to use the dictionary you created in Step 2 (cf. the link above).
Now, with all that said, you need to think carefully about a couple of things:
How do you expect to handle people whose last name matches a first name in your Genders table? For example, you have someone called John Stuart, and both John and Stuart are in your genders table. How do you expect to handle that?
How do you expect to handle people with nicknames, or only have one name? I would strongly encourage you to read Falsehoods Programmers Believe About Names to make sure you're not making any ill-founded assumptions.
Why is your table containing first named called genders? Do you expect to match people to first-name by gender? If so, that's a dangerous road to go down---there are names that can be used for either sex.
I have a spreadsheet for payroll that is populated from a seperate spreadsheet. Occasionally,one of our workers will get a promotion. That promotion shows on the timesheets: ex. Smith, Adam Position becomes Smith, Adam Promotion.
This data is then populated into a pivot table where Smith, Adam Position and Smith, Adam Promotion show in separate cells. Currently, we are manually adding the two data sets so that payroll gets a single number instead of multiple. I would like to simplify this tasks. I am using excel 2003, so some more advanced functions don't work.
Any suggestions and help would be greatly appreciated. Thanks in advance.
Ideally, you'd use a different field (a unique identifier) to identify Smith, Adam (e.g., an employee ID number), but if that's not available, then you could take the following approach:
(Suppose that "Smith, Adam Position" is in A1.)
You could add an additional column that extracts the last name, the comma, and then whatever the next word is. For example, from
Smith, Adam Analyst
you would get Smith, Adam. Unfortunately, this means that If you have something like
Jones, Mary Ellen Consultant
you would end up with Jones, Mary. If you think you can live with that, this solution could work. The way you would extract that would be with the following formula:
=SUBSTITUTE(LEFT(SUBSTITUTE(A1,", ",",",1),FIND(" ",A1)-1),",",", ",1)
And then build your pivot table on that field.