Oracle SQL to extraction of human names like Nltk - sql

Is there a SQL/Regex or some advance function where we can extract human names for a columns that has around 2 million rows? some thing like NLTK
below is my sample. In the below I wanted to extract only human names (i.e.) filter companies **. Like these I have 2 million mixed with real companies and human names
KAREN STRAUSS
KASEY NEMELKA
KATHLEEN MCMAHON
KATHRYN HOCKADAY
KATHRYN HOLAHAN
KATIE NELSON
**KATHERINE KACENA CONSULTING**
KATHY ATKINS
KATRINA GRANT
KATY DYER
KATY G TACKES
**KAUFFMAN S TRANSPORT LLC**
KATHERINE MAGPANTAY
KATHERINE VENTURA
KATHRYN RUANO
JORGE DANIEL MUSCIA
JOSE MANUEL ROSALES SANTEROS
JOSE MANUEL VILAS CARR
JOSEPH H WILNER

This is too long for a comment. Human names are too variable. After all, is "John Deere" the name of a company. Or is it the name of a person? Or both?
You can construct special purpose logic for your data. It will take time to develop but something like this:
regexp(lower(name), '\s(consult|llc)')

Related

How to merge crosstab info down in Access?

Not sure if this is possible but I'm hoping it is. I am using MS Access for Estate Planning for work. I've gotten to the point where I've got the data to look like this:
File_Name
Executor_1
Executor_2
Beneficiary_1
Beneficiary_2
Hill, Hank
Peggy Hill
Peggy Hill
Hill, Hank
Bobby Hill
Bobby Hill
Gribble, Dale
Nancy Gribble
Gribble, Dale
Joseph Gribble
Joseph Gribble
Gribble, Dale
John Redcorn
But I need it to look like this:
File_Name
Executor_1
Executor_2
Beneficiary_1
Beneficiary_2
Hill, Hank
Peggy Hill
Bobby Hill
Peggy Hill
Bobby Hill
Gribble, Dale
Nancy Gribble
Joseph Gribble
Joseph Gribble
John Redcorn
I need it in the latter format so I can use MailMerge in word and create the Will. Can anyone provide any guidance? We don't currently use any software for Est. Planning so anything beats having to go into Word manually and retype everything. Please let me know if more information is needed.
Edit:
This is what the SQL looks like:
TRANSFORM Last(File_Roles.File_Name) AS LastOfFile_Name
SELECT File_Roles.Executor_1,
File_Roles.Executor_2,
File_Roles.Beneficiary_1,
File_Roles.Beneficiary_2,
File_Roles.Trustee_1,
File_Roles.Trustee_2,
File_Roles.Guardian_1,
File_Roles.Guardian_2,
File_Roles.ATTY_IF_1, File_Roles.ATTY_IF_2,
File_Roles.HCATTY_IF_1,
File_Roles.HCATTY_IF_2
FROM File_Roles
GROUP BY File_Roles.Executor_1,
File_Roles.Executor_2,
File_Roles.Beneficiary_1,
File_Roles.Beneficiary_2,
File_Roles.Trustee_1,
File_Roles.Trustee_2,
File_Roles.Guardian_1,
File_Roles.Guardian_2,
File_Roles.ATTY_IF_1,
File_Roles.ATTY_IF_2,
File_Roles.HCATTY_IF_1,
File_Roles.HCATTY_IF_2
PIVOT File_Roles.File_Name;
You can use GROUP BY and MAX()
SELECT
t.File_Name,
MAX(t.Executor_1) As Executor_1,
MAX(t,Executor_2) As Executor_2,
MAX(t.Beneficiary_1) As Beneficiary_1,
MAX(t.Beneficiary_2) As Beneficiary_2
FROM table_or_query t
GROUP BY File_Name
But maybe you can fix your original crosstab query to do this right away. Probably you are doing the grouping wrong. You must group by File_Name in the crosstab query and apply Max to the total row of the value (so it is difficult to say without seeing this query).
GROUP BY File_Name means that one row is created for each distinct value of File_Name.
Since this will merge several rows into one, you must specify an aggregate function for every column in the SELECT list not listed in the GROUP BY clause. This can be e.g. SUM(), AVG(), MIN() or MAX(). See SQL Aggregate Functions for a complete list. Since any Null value is considered to be less than any other value, MAX() will take this non-Null value from the merged rows.

Remove middle name from full name in SAS

I need a way to remove the middle names from my full name in SAS.
Example:
Name= MARY ANN SMITH
Name= JERRY J SMITH
Output wanted:
Name2= MARY SMITH
Name2= JERRY SMITH
Any ideas how I can do this?
If you have actual names of real people then the problem is much harder than you implied. Some people have first or last (or both) names that are more than one word. What about people that only have one name?
Anyway SCAN() can do what you want.
name2=catx(' ',scan(name,1,' '),scan(name,-1,' '));

SQL Group By with Text Transformation

I'm trying to do some transformations on a large data set that I'm working on and was hoping for a bit of assistance on a particular grouping. I have a series of records that follow a pattern similar to below:
Language Full Name Customer ID
--------------------------------------
English John Smith 12222
French John Smith 12222
Spanish John Smith 12222
English Karen Wong 55999
Cantonese Karen Wong 55999
I need the data such that the Full Name and Customer ID are not repeated so simply using DISTINCT for that. However, one oddity in the requirement is that all the different languages need to be preserved and squashed into the resulting output so the resulting data needs to look like this:
Languages Spoken Full Name Customer ID
----------------------------------------------------
English, French, Spanish John Smith 12222
English, Cantonese Karen Wong 55999
Sounded like a simple thing but I guess I'm not a big SQL guru and keep getting funny results. Any help would be much appreciated :)
If you're using SQL Server 2017 or Azure SQL than you can just use STRING_AGG
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017
For everything else (covers solutions from SQL Server 2005 and on):
Simulating group_concat MySQL function in Microsoft SQL Server 2005?

ms word index only subentry

I have a Word document where I've marked various entries for the Index. The entries are like this:
Inland Empire
David Shaver
John Jameson
JM Granny
Justin Flatterer
Mary Martinson
Palouse Poppies
Sara Talk
Eddie Haskell
I've marked each Organization as a Main Entry and each Person as a SubEntry.
I need TWO indices.
1) List of companies with all their people (similar to as shown, above).
2) List of ONLY the people. No companies.
How can I generate an index that shows only the Subentries?

VBA Access SQL - field within LIKE operator

Can I use a table column within a Like operator? I've created an example,
TableA
Names Location
Albert Smith Senior Aberdeen
John Lee London
Michael Rogers Junior Newcastle
Mary Roberts Edinburgh
TableB
Names
Albert Smith
John Lee
Michael Rogers
I want to do a query such as:
SELECT TableA.Location
into NewTable
FROM TableA
WHERE TableA.Names Like '*[TableB.Names]*';
In this case, there would be no match for Mary Roberts, Edinburgh but the first three locations would be returned.
Is it possible to put a column into a like statement?
If not does anyone have any ideas how I could do this?
Hope you can help
PS I can't use an actual asterisk since this is removed and the text italicised, also I have read about using % instead but this has not worked for me.
You can join the two tables and use LIKE within the JOIN clause:
SELECT TableA.Location
into NewTable
FROM TableA
INNER JOIN TableB ON TableA.Names LIKE TableB.Names & '*';
Honestly, I had no idea that you can do this in Access before I tried it just now :-)