how can force spacy to recognise "Mr. Smith" and "Mrs. Smith" as separate entities - spacy

How can I use spacy NER to find people in text and differentiate between Mr. Smith and Mrs. Smith as different people/named entities.
For example this identifies Smith and Smith as the same person:
text="Mr. Smith walked along the sea front. Mrs. Smith stayed at home."
basenlp = spacy.load("en_core_web_sm")
doc = basenlp(text)
displacy.render(doc, style="ent")
I have tried to merge the tokens:
def compounds(doc):
with doc.retokenize() as rt:
for t in doc:
if t.dep_=="compound":
newt = Span(doc, t.i, t.head.i+1)
rt.merge(newt)
return doc
basenlp.add_pipe(compounds, "compounds", before="parser")
Same result with Smith and Smith
I try:
basenlp.add_pipe(compounds, "compounds", before="ner")
Now it does not find any entities.

OK I found it in the documentation under "expanding named entities":
https://spacy.io/usage/rule-based-matching

Related

I want to fetch all the details of wrestlers from the tables

I have a link its this- www.cagematch.net/?id=8&nr=1&page=15
In this link you will able to see a table with wrestlers. But If you click on the the name of a wrestler you will be able to see details of a wrestler. So, I want to fetch all the wrestlers with details in an easy & shortcut way. In my mind, I am thinking like this :
urls = [
link1, link2, link3, link4
]
for u in urls:
..... do the scrap
But there are 275 wrestlers I don't want to enter all the links like this. Is there any easy way to do it?
To get all links into a list and then info about each wrestler you can use this example:
import requests
from bs4 import BeautifulSoup
url = "http://www.cagematch.net/?id=8&nr=1&page=15"
headers = {"Accept-Encoding": "deflate"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
links = [
"https://www.cagematch.net/" + a["href"] for a in soup.select(".TCol a")
]
for u in links:
soup = BeautifulSoup(
requests.get(u, headers=headers).content, "html.parser"
)
print(soup.h1.text)
for info in soup.select(".InformationBoxRow"):
print(
info.select_one(".InformationBoxTitle").text.strip(),
info.select_one(".InformationBoxContents").text.strip(),
)
# get other info here
# ...
print("-" * 80)
Prints:
Adam Pearce
Current gimmick: Adam Pearce
Age: 44 years
Promotion: World Wrestling Entertainment
Active Roles: Road Agent, Trainer, On-Air Official, Backstage Helper
Birthplace: Lake Forest, Illinois, USA
Gender: male
Height: 6' 2" (188 cm)
Weight: 238 lbs (108 kg)
WWW: http://twitter.com/ScrapDaddyAP https://www.facebook.com/OfficialAdamPearce https://www.youtube.com/watch?v=us91bK1ScL4
Alter egos: Adam O'BrienAdam Pearce    a.k.a.  US Marshall Adam J. PearceMasked Spymaster #2Tommy Lee Ridgeway
Roles: Singles Wrestler (1996 - 2014)Road Agent (2015 - today)Booker (2008 - 2010)Trainer (2013 - today)On-Air Official (2020 - today)Backstage Helper (2015 - today)
Beginning of in-ring career: 16.05.1996
End of in-ring career: 21.12.2014
In-ring experience: 18 years
Wrestling style: Allrounder
Trainer: Randy Ricci & Sonny Rogers
Nicknames: "Scrap Iron"
Signature moves: PiledriverFlying Body SplashRackbomb II
--------------------------------------------------------------------------------
AJ Styles
Current gimmick: AJ Styles
Age: 45 years
Promotion: World Wrestling Entertainment
Brand: RAW
Active Roles: Singles Wrestler
Birthplace: Jacksonville, North Carolina, USA
Gender: male
Height: 5' 11" (180 cm)
Weight: 218 lbs (99 kg)
Background in sports: Ringen, Football, Basketball, Baseball
WWW: http://AJStyles.org https://www.facebook.com/AJStylesOrg-110336188978264/ https://twitter.com/AJStylesOrg https://www.instagram.com/ajstylesp1/ https://www.twitch.tv/Stylesclash
Alter egos: AJ Styles    a.k.a.  Air StylesMr. Olympia
Roles: Singles Wrestler (1999 - today)Tag Team Wrestler (2001 - 2021)
Beginning of in-ring career: 15.02.1999
In-ring experience: 23 years
Wrestling style: Techniker, High Flyer
Trainer: Rick Michaels
Nicknames: "The Phenomenal""The Prince Of Phenomenal"
Signature moves: Styles ClashPelé KickCalf Killer/Calf CrusherStylin' DDTCliffhangerSpiral TapPhenomenal Forearm450 Splash
--------------------------------------------------------------------------------
...and so on.

Why does this not work?(school project btw)

import sys,time,random
typing_speed = 80 #wpm
def slow_type(t):
for l in t:
sys.stdout.write(l)
sys.stdout.flush()
time.sleep(random.random()*10.0/typing_speed)
slow_type("Hello which person do you want info for ")
inputs = input(
"Type 1 For Malcom X, type 2 for Kareem Abdul-Jabbar ")
if inputs == ('1'):
inputs = input(
"what info do you want. 1. overall life 2. accomplishments and obstacles. 3. His legacy "
)
if inputs == ('1'):
slow_type(
"born in may 19 1925 in Omaha Nebraska his parents both died when he was a young child and there wasn't anyone who really could take care of him so he spent much of his time bouncing around different foater homes, in 1952 he joined the nation of islam and became a preacher, he left the NOI to make a new group because he embraced a different type of Islam, sunni islam, he died in febuary 21 on 1965 by assasins who were part of the NOI."
)
elif inputs == ('2'):
slow_type(
"Some of his major accomplishments include preaching islam and the message that the oppressed ahould fight back. "
)
if inputs == ('2'):
inputs = input(
"what info do you want. 1. Birth and age 2. Early Life. 3. Nba life 4. Later Life 5. Accomplishments and Accolades"
)
if inputs == ('1', '2', '3', '4', '5'):
if inputs == ('1'):
slow_type(
"Kareem was born in New York during 1947 on the day of April 16th with the birth name of Lew Alcindor Jr. the son of Fernando Lewis Alcindor., New York policeman and Cora Alcindor. Later in his life Lew Alcindor changed his name to Kareem Abdul-Jabbar, meaning noble servant of the powerful One. Kareem is still alive today and is 74 years of age"
)
if inputs == ('2'):
slow_type(
"Kareem/ Lew Alcindor was always the tallest person in his class. When Kareem turned 9 he was already 5’8”. When he hit eighth grade he was 6’8”. Lew was playing basketball since he was young. At power memorial academy, Lew had a high-school career that nobody could match. Lew brought his team to 71 straight wins and 3 straight city titles."
)
if inputs == ('3'):
slow_type(
"In 1969 the Milwaukee Bucks selected Lew Alcindor with the first overall pick in the NBA draft. Lew quickly became a star being second in the league in scoring and third in rebounding, Lew was named the NBA Rookie of The Year. In the following season Lew became better and better and the bucks added future Oscar Robertson to the roster, making the Bucks the best team in the league with a 66-16 record. The bucks won the ring that year and Lew won MVP. Later that Summer Lew converted to Islam and Changed his name to Kareem Abdul-jabbar. Kareem and the bucks got to the NBA finals that year but lost to the Celtics. Even with al the success with the bucks Kareem struggled to be happy. Later that off season demanded a trade to either The Lakers or the Nicks. The bucks complied and traded Kareem to the Los Angelos Lakers where he was paired with Magic Johnson, making the lakers by far the best team in the league. During the rest of Kareems career he dominated the NBA winning 5 more titles and wining 5 more MVPs."
)
if inputs == ('4'):
slow_type("o")
To be specific the info doesn’t print for some reason pls help owo uwu I’m a furry cat girl
It doesn't work because your logic.
if inputs == ('1', '2', '3', '4', '5'): will always return False as your inputs variable will never be that tuple. You are also overwriting the inputs variable and I would consider renaming those distinct.
I made a few changes in there. Take a look and compare it to your code. This code is working just fine (relative to what you provided).
import sys,time,random
typing_speed = 80 #wpm
def slow_type(t):
print('\n')
for l in t:
sys.stdout.write(l)
sys.stdout.flush()
time.sleep(random.random()*10.0/typing_speed)
slow_type("Hello which person do you want info for?")
inputs_alpha = input(
"Type 1 For Malcom X, type 2 for Kareem Abdul-Jabbar\n--> ")
if inputs_alpha == '1':
inputs = input(
"what info do you want?\n1. overall life\n2. accomplishments and obstacles.\n3. His legacy\n--> "
)
if inputs == '1':
slow_type(
"born in may 19 1925 in Omaha Nebraska his parents both died when he was a young child and there wasn't anyone who really could take care of him so he spent much of his time bouncing around different foater homes, in 1952 he joined the nation of islam and became a preacher, he left the NOI to make a new group because he embraced a different type of Islam, sunni islam, he died in febuary 21 on 1965 by assasins who were part of the NOI."
)
elif inputs == '2':
slow_type(
"Some of his major accomplishments include preaching islam and the message that the oppressed ahould fight back. "
)
if inputs_alpha == '2':
inputs = input(
"what info do you want?\n1. Birth and age\n2. Early Life.\n3. Nba life\n4. Later Life\n5. Accomplishments and Accolades\n--> "
)
if inputs in ['1', '2', '3', '4', '5']:
if inputs == '1':
slow_type(
"Kareem was born in New York during 1947 on the day of April 16th with the birth name of Lew Alcindor Jr. the son of Fernando Lewis Alcindor., New York policeman and Cora Alcindor. Later in his life Lew Alcindor changed his name to Kareem Abdul-Jabbar, meaning noble servant of the powerful One. Kareem is still alive today and is 74 years of age"
)
if inputs == '2':
slow_type(
"Kareem/ Lew Alcindor was always the tallest person in his class. When Kareem turned 9 he was already 5’8”. When he hit eighth grade he was 6’8”. Lew was playing basketball since he was young. At power memorial academy, Lew had a high-school career that nobody could match. Lew brought his team to 71 straight wins and 3 straight city titles."
)
if inputs == '3':
slow_type(
"In 1969 the Milwaukee Bucks selected Lew Alcindor with the first overall pick in the NBA draft. Lew quickly became a star being second in the league in scoring and third in rebounding, Lew was named the NBA Rookie of The Year. In the following season Lew became better and better and the bucks added future Oscar Robertson to the roster, making the Bucks the best team in the league with a 66-16 record. The bucks won the ring that year and Lew won MVP. Later that Summer Lew converted to Islam and Changed his name to Kareem Abdul-jabbar. Kareem and the bucks got to the NBA finals that year but lost to the Celtics. Even with al the success with the bucks Kareem struggled to be happy. Later that off season demanded a trade to either The Lakers or the Nicks. The bucks complied and traded Kareem to the Los Angelos Lakers where he was paired with Magic Johnson, making the lakers by far the best team in the league. During the rest of Kareems career he dominated the NBA winning 5 more titles and wining 5 more MVPs."
)
if inputs == '4':
slow_type("o")

Delete abbreviations (combination of Letter+dot) from Pandas column

I'd like to delete specific parts of strings in a pandas column, such as any letter followed by a dot. For example, having a column with names:
John W. Man
Betty J. Rule
C.S. Stuart
What should remain is
John Man
Betty Rule
Stuart
SO, any letter followed by a dot, that represents an abbreviation, should go.
I can't think of a way with str.replace or anything like that.
Use Series.str.replace with reegx for match one letter with . and space after it if exist:
df['col'] = df['col'].str.replace('([a-zA-Z]{1}\.\s*)','', regex=True)
print (df)
col
0 John Man
1 Betty Rule
2 Stuart

In this example using the reduce() internal iterator function, why won't the accumulator acknowledge this digit?

I'm kind of baffled here. I'm simply trying to use the reduce function to create a String representing the elements of a list in numbered order. Here is the code:
val names = listOf("John", "Billy", "Tom", "Joe", "Eric", "Jerry")
val result = names.reduce { accum, name -> "$accum ${names.indexOf(name) + 1}. $name" }
println(result) // John 2. Billy 3. Tom 4. Joe 5. Eric 6. Jerry
// ^ missing 1.
I'm expecting the value of the accumulator to accumulate as such after each iteration:
"1. John"
"1. John 2. Billy"
"1. John 2. Billy 3. Tom"
"1. John 2. Billy 3. Tom 4. Joe"
"1. John 2. Billy 3. Tom 4. Joe 5. Eric"
"1. John 2. Billy 3. Tom 4. Joe 5. Eric 6. Jerry"
When I run the code, it prints: John 2. Billy 3. Tom 4. Joe 5. Eric 6. Jerry
I don't understand why the "1." is missing,
Reduce function uses your first value as a starting accumulator.
So operation is only applied between (((1 -> 2) -> 3) -> 4) pairs.
You can achieve expected behaviour with fold function, which takes initial value:
val result = names.fold("") { accum, name -> "$accum ${names.indexOf(name) + 1}. $name" }

Vlookup to Make a list?

This site has been super helpful, thank you to everyone who has answered my questions. Here is the next one I am working on. Not sure if I should use vlookup, hlookup, a combination of both or something else.
So I have a list of teams with lineups
Team
Player
A
Sam
A
Chris
A
Tom
A
Scott
B
Mark
B
Dan
B
Greg
B
Ben
C
Sara
C
Beth
C
Luara
C
Britt
On a separate page I am trying to fill in a line up "IF" a team is selected.
For reference this is the current formula I have been trying:
=IFERROR(INDEX('Team LineUps'!$B:$B,Match(0,COUNTIF($C$16,IF('Team LineUps'!$A:$A=$C$16,'Team LineUps'!$B:$B,$C$16)),0)),"")
This will get me The first player on the list for a team. If I change the 0 to a 1 it will get me the last player on the team. How can I/ Can I? get the entire list 1-4? Or is it only a "true" OR "False"
Answer:
Use a QUERY.
Formula:
=QUERY('Team LineUps'!A2:B13, "SELECT B WHERE A='"&B4&"'")
Example Usage: