Related
I'm using beautiful soup and I'm getting the error, "AttributeError: 'NoneType' object has no attribute 'get_text'" and also "TypeError: 'NoneType' object is not subscriptable".
I know my code works when I use it to search for a single restaurant. However when I try to make a loop for all restaurants, then I get an error.
Here is my screen recording showing the problem. https://streamable.com/pok13
The rest of the code can be found here: https://pastebin.com/wsv1kfNm
# AttributeError: 'NoneType' object has no attribute 'get_text'
restaurant_address = yelp_containers[yelp_container].find("address", {
"class": 'lemon--address__373c0__2sPac'
}).get_text()
print("restaurant_address: ", restaurant_address)
# TypeError: 'NoneType' object is not subscriptable
restaurant_starCount = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc i-stars__373c0__30xVZ i-stars--regular-4__373c0__2R5IO border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I"
})['aria-label']
print("restaurant_starCount: ", restaurant_starCount)
# AttributeError: 'NoneType' object has no attribute 'text'
restaurant_district = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc display--inline-block__373c0__25zhW border-color--default__373c0__2xHhl"
}).text
print("restaurant_district: ", restaurant_district)
You are getting the error because your selectors are too specific, and you don't check if the tag was found or not. One solution is loosen the selectors (the lemon--div-XXX... selectors will probably change in the near future anyway):
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import csv
import re
my_url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San%20Francisco%2C%20CA'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
bs = soup(page_html, "html.parser")
yelp_containers = bs.select('li:contains("All Results") ~ li:contains("read more")')
for idx, item in enumerate(yelp_containers, 1):
print("--- Restaurant number #", idx)
restaurant_title = item.h3.get_text(strip=True)
restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
restaurant_numReview = item.select_one('[class*="reviewCount"]').get_text(strip=True)
restaurant_numReview = re.sub(r'[^\d.]', '', restaurant_numReview)
restaurant_starCount = item.select_one('[class*="stars"][aria-label]')['aria-label']
restaurant_starCount = re.sub(r'[^\d.]', '', restaurant_starCount)
pr = item.select_one('[class*="priceRange"]')
restaurant_price = pr.get_text(strip=True) if pr else '-'
restaurant_category = [a.get_text(strip=True) for a in item.select('[class*="priceRange"] ~ span a')]
restaurant_district = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[-1]
print(restaurant_title)
print(restaurant_address)
print(restaurant_numReview)
print(restaurant_price)
print(restaurant_category)
print(restaurant_district)
print('-' * 80)
Prints:
--- Restaurant number # 1
Fog Harbor Fish House
Pier 39
5487
$$
['Seafood', 'Bars']
Fisherman's Wharf
--------------------------------------------------------------------------------
--- Restaurant number # 2
The House
1230 Grant Ave
4637
$$$
['Asian Fusion']
North Beach/Telegraph Hill
--------------------------------------------------------------------------------
...and so on.
I am relatively new to python, and as such I don't always understand why I get errors. I keep getting this error:
Traceback (most recent call last):
File "python", line 43, in <module>
ValueError: invalid literal for int() with base 10: 'O'
This is the line it's referring to:
np.insert(arr, [i,num], "O")
I'm trying to change a value in a numpy array.
Some code around this line for context:
hOne = [one,two,three]
hTwo = [four,five,six]
hThree = [seven, eight, nine]
arr = np.array([hOne, hTwo, hThree])
test = "O"
while a != Answer :
Answer = input("Please Enter Ready to Start")
if a == Answer:
while win == 0:
for lists in arr:
print(lists)
place = int(input("Choose a number(Use arabic numerals 1,5 etc.)"))
for i in range(0,len(arr)):
for num in range(0, len(arr[i])):
print(arr[i,num], "test")
print(arr)
if place == arr[i,num]:
if arr[i,num]:
np.delete(arr, [i,num])
np.insert(arr, [i,num], "O")
aiTurn = 1
else:
print(space_taken)
The number variables in the lists just hold the int version of themselves, so one = 1, two = 2 three = 3, etc
I've also tried holding "O" as a variable and changing it that way as well.
Can anyone tell me why I'm getting this error?
My task is to clean the data and present it in next format: DataFrame( [ ["Michigan", "Ann Arbor"], ["Michigan", "Yipsilanti"] ], columns=["State", "RegionName"] ).
I'm getting a mistake. How can I fix it?
"UnboundLocalError: local variable 'state' referenced before assignment"
The code is following:
with open ('university_towns.txt') as file:
data=[]
for line in file:
data.append(line[:-1])
state_town = []
for line in data:
if line[:-6]=='[edit]':
state=line[:-6]
elif '(' in line:
town = line[:line.index('(')-1]
state_town.append([state,town])
else:
town=line
state_town.append([state,town])
state_college_df=pd.DataFrame(state_town, columns=['State','RegionName'])
return state_college_df
I am attempting to prevent the input of anything other than an integer for an average, but I keep receiving a Traceback for a TypeError. Below is my program, and the program output when attempting to input anything other than an int for an average:
grades_file = open('grades.txt', 'w')
def get_averages():
student = 1
for i in range(3):
name, average = get_name_average()
student += 1
grades_file = open('grades.txt', 'a')
grades_file.write("Student Name: " + name + '\n' + "Student Average: " + str(average) + '\n\n')
grades_file.close()
print(("Added %s's average of %i to the file 'grades.txt. You are now entering information for student %i of 12.") % (name, average, student))
def get_name_average():
student_name = input("Please enter the student's name: ")
try:
student_average = int(input(("Please enter the average for %s: ") % student_name))
verified_average = check_grade_input(student_average)
return student_name, verified_average
except ValueError:
print("ERROR!! Please enter grade value as an integer!")
except TypeError:
print('Type error too!')
def check_grade_input(average):
legal_input = False
while not legal_input:
if (average < 0):
print("Nah bro, invalid number...")
average = int(input("Please enter another average that is above 0: "))
elif (average > 100):
print("Nah bro, invalid number...")
average = int(input("Please enter another average that is below 100: "))
else:
return average
def show_grades_file():
grades_file = open('grades.txt', 'r')
grade_contents = grades_file.read()
grades_file.close()
print("\nThe information you entered for into file 'grades.txt' is:\n\n" + grade_contents)
def main():
get_averages()
show_grades_file()
main()
Traceback and input:
Please enter the student's name: Aaron
Please enter the average for Aaron: as
ERROR!! Please enter grade value as an integer!
Traceback (most recent call last):
File "Documents/ProgrammingFundamentals/Lab6/aaron_blakey_Lab6b.py", line 52, in <module>
main()
File "Documents/ProgrammingFundamentals/Lab6/aaron_blakey_Lab6b.py", line 49, in main
get_averages()
File "Documents/ProgrammingFundamentals/Lab6/aaron_blakey_Lab6b.py", line 13, in get_averages
name, average = get_name_average()
TypeError: 'NoneType' object is not iterable
In function get_name_average you have printed message in Exception part print("ERROR!! Please enter grade value as an integer!") and you did not return any value.
By default system will return none value & required two values name, average.
You need to follow below code.
def get_name_average():
student_name = input("Please enter the student's name: ")
try:
student_average = int(input(("Please enter the average for %s: ") % student_name))
verified_average = check_grade_input(student_average)
return student_name, verified_average
except ValueError:
print("ERROR!! Please enter grade value as an integer!")
return False,False
except TypeError:
print('Type error too!')
return False,False
After that if you got exception then name,average=False,False.
In function get_averages you need add one condition if name is False then system should not write in the file.
This may help you.
Problem
I am trying to implement an error tolerant parser using Python Lex-Yacc (PLY), but I have trouble using error recovery rules at the end of my input string.
How can I recover from an unexpected end of input?
Example
This example grammar produces strings of the form A END A END A END A END ...
Statement : Expressions
Expressions : Expression Expressions
|
Expression : A END
I want to perform an error recovery if the END Token was omitted, so stings like A A A END or A A A will be recognized by the parser.
My approach
I added an error recovery rule, which allows me to accept input like A A A END
Expression : A END
| A error
Which allows me to accept the following input:
A A A END
But if the last END token is omitted (A A A), I still get a syntax error and cannot recover.
Sample PLY code
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expression expressions'''
p[0] = [p[1]] + p[2]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END
| A error'''
p[0] = 'A'
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOI")
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
yacc.parse(s)
I add it as a new answer (and do know it is too late for the bounty :-( ) because it is a very different approach. If we used flex, it would be much easier, since it has the notion of the <<EOF>> token that matches only at end of file. After thinking about that, I realized that it was very simple to add that functionality to PLY without any change to the original module by using a proxy around the lexer. And Python allows easy implementation of proxies thanks the the __getattr__ special method.
I just add
a new token EOF that will be send at end of file
a proxy around the token method of the lexer that on end of file returns the special EOF token on first pass and then the normal None
the eof token to end statement rule
And still reverse the rule expressions : expressions expression instead of expressions : expression expressions to allow immediate reduce
The code becomes :
from __future__ import print_function
# Tokens
tokens = ('A', 'END', 'EOF')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
orig_lexer = lex.lex()
class ProxyLexer(object):
def __init__(self, lexer, eoftoken):
self.end = False
self.lexer = lexer
self.eof = eoftoken
def token(self):
tok = self.lexer.token()
if tok is None:
if self.end :
self.end = False
else:
self.end = True
tok = lex.LexToken()
tok.type = self.eof
tok.value = None
tok.lexpos = self.lexer.lexpos
tok.lineno = self.lexer.lineno
# print ('custom', tok)
return tok
def __getattr__(self, name):
return getattr(self.lexer, name)
lexer = ProxyLexer(orig_lexer, 'EOF')
# Rules
def p_statement_expr(p):
'''statement : expressions EOF'''
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expressions expression'''
p[0] = p[1] + [p[2]]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END
| A error'''
p[0] = 'A'
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOI")
import ply.yacc as yacc
parser = yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
parser.parse(s, lexer = lexer)
That way :
the original grammar is unchanged
the error recovery method remains stupidly simple and has no dependance on the remaining of the grammar
it can be easily extended to complex parsers
As you want to accept all elements, you can explicitely declare a rule for a A not followed by a END and use the fact that yacc and PLY friendly deal with ambiguous rules.
You can simply have a normal rule :
Expression : A END
and below a lower priority rule (as it comes later) that will issue a warning
Expression : A
That way, all A will be accepted, there won't be any syntax error, and the warning will be issued for any A not followed by a END including one at the end of the flow. In order to more easily find the offending A, I have added in the warning the position of the symbol in the flow.
Edit:
The script is modified to correctly deal with other syntax error (such as AENDENDAEND), and also to immediately reduce expressions by replacing expressions : expression expressions with expressions : expressions expression
Here is the modified script (tested in python 3.4 simply replacing raw_input with input):
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expressions expression'''
p[0] = p[1] + [p[2]]
def p_expressions_err(p):
'''expressions : expressions error'''
p[0] = p[1]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END'''
p[0] = 'A'
# add a separate rule BELOW previous one to display a warning
def p_expression_pharse_warn(p):
'''expression : A'''
print("Warning at absolute position %d (line %d)" % (p.lexpos(1), p.lineno(1)))
p[0] = 'A'
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOI")
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
yacc.parse(s)
Edit : the following is an incorrect attempt to avoid an additional rule : it is more complex and less efficient than the above version. Please see my conclusion below
Edit per comment :
I understand your point that you do not want to multiply grammar rules. It is possible to be fault tolerant, except for last token. If your last token is in error, it will not be followed by anything and will never be caught in rule expression : A error.
But here is a fault tolerant parser that keeps everything except last token if case of error on that one :
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
# print("parsed:", p[1])
def p_expressions(p):
'''expressions : expressions expression'''
p[0] = p[1] + [p[2]]
result.append(p[2])
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END
| A error'''
p[0] = 'A'
def p_error(p):
if p:
global lasterr
print("Syntax error at '%s' (%d)" % (p.value, p.lexpos))
else:
print("Syntax error at EOI")
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = input('query > ') # use input() on Python 3
except EOFError:
break
result = []
yacc.parse(s)
print('Result', result)
The princip is to collate by expressions : expressions expression instead of expressions : expression expressions, and to keep all in a global variable.
With an input of A END A A END A A A END it gives
Result ['A', 'A', 'A', 'A', 'A', 'A']
and with : A END A A END A A A END , it gives
Result ['A', 'A', 'A', 'A', 'A']
(all tokens but the last)
With a true flex - bison solution, it would be possible to make use of the special <<EOF>> token that matches at end of input, to always have another token after the last one. Unfortunately, it is not implemented in PLY, and the only real solution is to introduce a rule that accepts alone A token. For a real parser, it also guarantees that you are actually processing the correct token : I used
def p_expression_pharse(p):
'''expression : A END'''
p[0] = 1 + p.lexpos(1)
# add a separate rule BELOW previous one to display a warning
def p_expression_pharse_warn(p):
'''expression : A'''
print("Warning at absolute position %d (line %d)" % (p.lexpos(1), p.lineno(1)))
p[0] = -1 - p.lexpos(1)
to uniquely identify tokens in resul string, and I get correct positions.
And ... the error processing is very simple ...
Discussion TL/DR :
I admit I missed the point of last token error recovery. It is because in all parsers I've seen in real use cases, the error recovery consisted in rejecting the part that was syntactically incorrect (and thus not directly useable) and re-synchonizing the parser on next correct group of token. In all what I have seen, if a partial sentence can be used, it must not be processed by the error recovery mechanizme but by a grammar rule, in which it is easy to describe the appropriate action.
If you just want to keep the offending input for later processing, I think it is not a problem of action depending of a syntax, and I would simply note the position of offending token, or at most note the position of last correctly analysed token (the end of a complete element), the begin of first error recovery token and say that what is between is incorrect.
But it would be much different than what is asked here ...
This works for all examples I could imagine
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
#
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expression expressions'''
p[0] = p[1] + p[2]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END'''
p[0] = ['A']
def p_expression_error(p):
'''expression : A error'''
p[0] = ['A']
if p[2] is not None:
p[0] += p[2]
def p_error(p):
if p is None:
print("Syntax error at EOI")
e = yacc.YaccSymbol()
e.type = 'error'
e.value = None
yacc.errok()
return e
elif p.type == 'error':
yacc.errok()
return
elif hasattr(p, 'value'):
print("Syntax error at '%s'" % p.value)
e = yacc.YaccSymbol()
e.type = 'error'
e.value = p.value
yacc.errok()
return e
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
yacc.parse(s)