Python Lex-Yacc (PLY) Error recovery at the end of input - yacc

Problem
I am trying to implement an error tolerant parser using Python Lex-Yacc (PLY), but I have trouble using error recovery rules at the end of my input string.
How can I recover from an unexpected end of input?
Example
This example grammar produces strings of the form A END A END A END A END ...
Statement : Expressions
Expressions : Expression Expressions
|
Expression : A END
I want to perform an error recovery if the END Token was omitted, so stings like A A A END or A A A will be recognized by the parser.
My approach
I added an error recovery rule, which allows me to accept input like A A A END
Expression : A END
| A error
Which allows me to accept the following input:
A A A END
But if the last END token is omitted (A A A), I still get a syntax error and cannot recover.
Sample PLY code
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expression expressions'''
p[0] = [p[1]] + p[2]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END
| A error'''
p[0] = 'A'
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOI")
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
yacc.parse(s)

I add it as a new answer (and do know it is too late for the bounty :-( ) because it is a very different approach. If we used flex, it would be much easier, since it has the notion of the <<EOF>> token that matches only at end of file. After thinking about that, I realized that it was very simple to add that functionality to PLY without any change to the original module by using a proxy around the lexer. And Python allows easy implementation of proxies thanks the the __getattr__ special method.
I just add
a new token EOF that will be send at end of file
a proxy around the token method of the lexer that on end of file returns the special EOF token on first pass and then the normal None
the eof token to end statement rule
And still reverse the rule expressions : expressions expression instead of expressions : expression expressions to allow immediate reduce
The code becomes :
from __future__ import print_function
# Tokens
tokens = ('A', 'END', 'EOF')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
orig_lexer = lex.lex()
class ProxyLexer(object):
def __init__(self, lexer, eoftoken):
self.end = False
self.lexer = lexer
self.eof = eoftoken
def token(self):
tok = self.lexer.token()
if tok is None:
if self.end :
self.end = False
else:
self.end = True
tok = lex.LexToken()
tok.type = self.eof
tok.value = None
tok.lexpos = self.lexer.lexpos
tok.lineno = self.lexer.lineno
# print ('custom', tok)
return tok
def __getattr__(self, name):
return getattr(self.lexer, name)
lexer = ProxyLexer(orig_lexer, 'EOF')
# Rules
def p_statement_expr(p):
'''statement : expressions EOF'''
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expressions expression'''
p[0] = p[1] + [p[2]]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END
| A error'''
p[0] = 'A'
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOI")
import ply.yacc as yacc
parser = yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
parser.parse(s, lexer = lexer)
That way :
the original grammar is unchanged
the error recovery method remains stupidly simple and has no dependance on the remaining of the grammar
it can be easily extended to complex parsers

As you want to accept all elements, you can explicitely declare a rule for a A not followed by a END and use the fact that yacc and PLY friendly deal with ambiguous rules.
You can simply have a normal rule :
Expression : A END
and below a lower priority rule (as it comes later) that will issue a warning
Expression : A
That way, all A will be accepted, there won't be any syntax error, and the warning will be issued for any A not followed by a END including one at the end of the flow. In order to more easily find the offending A, I have added in the warning the position of the symbol in the flow.
Edit:
The script is modified to correctly deal with other syntax error (such as AENDENDAEND), and also to immediately reduce expressions by replacing expressions : expression expressions with expressions : expressions expression
Here is the modified script (tested in python 3.4 simply replacing raw_input with input):
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expressions expression'''
p[0] = p[1] + [p[2]]
def p_expressions_err(p):
'''expressions : expressions error'''
p[0] = p[1]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END'''
p[0] = 'A'
# add a separate rule BELOW previous one to display a warning
def p_expression_pharse_warn(p):
'''expression : A'''
print("Warning at absolute position %d (line %d)" % (p.lexpos(1), p.lineno(1)))
p[0] = 'A'
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOI")
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
yacc.parse(s)
Edit : the following is an incorrect attempt to avoid an additional rule : it is more complex and less efficient than the above version. Please see my conclusion below
Edit per comment :
I understand your point that you do not want to multiply grammar rules. It is possible to be fault tolerant, except for last token. If your last token is in error, it will not be followed by anything and will never be caught in rule expression : A error.
But here is a fault tolerant parser that keeps everything except last token if case of error on that one :
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
# print("parsed:", p[1])
def p_expressions(p):
'''expressions : expressions expression'''
p[0] = p[1] + [p[2]]
result.append(p[2])
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END
| A error'''
p[0] = 'A'
def p_error(p):
if p:
global lasterr
print("Syntax error at '%s' (%d)" % (p.value, p.lexpos))
else:
print("Syntax error at EOI")
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = input('query > ') # use input() on Python 3
except EOFError:
break
result = []
yacc.parse(s)
print('Result', result)
The princip is to collate by expressions : expressions expression instead of expressions : expression expressions, and to keep all in a global variable.
With an input of A END A A END A A A END it gives
Result ['A', 'A', 'A', 'A', 'A', 'A']
and with : A END A A END A A A END , it gives
Result ['A', 'A', 'A', 'A', 'A']
(all tokens but the last)
With a true flex - bison solution, it would be possible to make use of the special <<EOF>> token that matches at end of input, to always have another token after the last one. Unfortunately, it is not implemented in PLY, and the only real solution is to introduce a rule that accepts alone A token. For a real parser, it also guarantees that you are actually processing the correct token : I used
def p_expression_pharse(p):
'''expression : A END'''
p[0] = 1 + p.lexpos(1)
# add a separate rule BELOW previous one to display a warning
def p_expression_pharse_warn(p):
'''expression : A'''
print("Warning at absolute position %d (line %d)" % (p.lexpos(1), p.lineno(1)))
p[0] = -1 - p.lexpos(1)
to uniquely identify tokens in resul string, and I get correct positions.
And ... the error processing is very simple ...
Discussion TL/DR :
I admit I missed the point of last token error recovery. It is because in all parsers I've seen in real use cases, the error recovery consisted in rejecting the part that was syntactically incorrect (and thus not directly useable) and re-synchonizing the parser on next correct group of token. In all what I have seen, if a partial sentence can be used, it must not be processed by the error recovery mechanizme but by a grammar rule, in which it is easy to describe the appropriate action.
If you just want to keep the offending input for later processing, I think it is not a problem of action depending of a syntax, and I would simply note the position of offending token, or at most note the position of last correctly analysed token (the end of a complete element), the begin of first error recovery token and say that what is between is incorrect.
But it would be much different than what is asked here ...

This works for all examples I could imagine
from __future__ import print_function
# Tokens
tokens = ('A', 'END')
t_A = r'A'
t_END = r'END'
t_ignore = " "
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Rules
def p_statement_expr(p):
'''statement : expressions'''
#
print("parsed:", p[1])
def p_expressions(p):
'''expressions : expression expressions'''
p[0] = p[1] + p[2]
def p_expressions_empty(p):
'''expressions : '''
p[0] = list()
def p_expression_pharse(p):
'''expression : A END'''
p[0] = ['A']
def p_expression_error(p):
'''expression : A error'''
p[0] = ['A']
if p[2] is not None:
p[0] += p[2]
def p_error(p):
if p is None:
print("Syntax error at EOI")
e = yacc.YaccSymbol()
e.type = 'error'
e.value = None
yacc.errok()
return e
elif p.type == 'error':
yacc.errok()
return
elif hasattr(p, 'value'):
print("Syntax error at '%s'" % p.value)
e = yacc.YaccSymbol()
e.type = 'error'
e.value = p.value
yacc.errok()
return e
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = raw_input('query > ') # use input() on Python 3
except EOFError:
break
yacc.parse(s)

Related

Leetcode 126: Word Ladder 2 in Python code optimization

I have the solution for the Word Ladder 2 (Leetcode problem 126: Word Ladder 2 ) in Python 3.6, and I notice that one of the very last testcases times out for me on the platform. Funnily, the test passes when run on PyCharm or as an individual test case on the site, but it takes about 5 seconds for it to complete. My solution uses BFS with some optimizations, but can someone tell me if there is a way to make it faster. Thank you! (P.S: Apologies for the additional test cases included in the commented out section!)
import math
import queue
from typing import List
class WordLadder2(object):
#staticmethod
def is_one_hop_away(s1: str, s2: str) -> int:
"""
Uses the distance between strings to return True if string s2 is one character away from s1
:param s1: Base string
:param s2: Comparison string
:return: True if it the difference between the strings is one character
"""
matrix = [[0] * (len(s1) + 1) for i in range(len(s1) + 1)]
for r, row in enumerate(matrix):
for c, entry in enumerate(row):
if not r:
matrix[r][c] = c
elif not c:
matrix[r][c] = r
else:
if s1[r - 1] == s2[c - 1]:
matrix[r][c] = matrix[r - 1][c - 1]
else:
matrix[r][c] = 1 + min(matrix[r - 1][c - 1], matrix[r - 1][c], matrix[r][c - 1])
if matrix[-1][-1] == 1:
return True
else:
return False
def get_next_words(self, s1: str, wordList: List[str]) -> List[str]:
"""
For a given string in the list, return a set of strings that are one hop away
:param s1: String whose neighbors one hop away are needed
:param wordList: Array of words to choose from
:return: List of words that are one character away from given string s1
"""
words = []
for word in wordList:
if self.is_one_hop_away(s1, word):
words.append(word)
return words
def find_ladders(self, beginWord: str, endWord: str, wordList: List[str]) -> List[List[str]]:
"""
Main method to determine shortest paths between a beginning word and an ending word, in a given list of words
:param beginWord: Word to begin the ladder
:param endWord: Word to end the ladder
:param wordList: List of words to choose from
:return: List of list of word ladders, if they are found. Empty list, if endWord not in wordList or path not
found from beginWord to endWord
"""
q = queue.Queue()
paths = list()
current = [beginWord]
q.put((beginWord, current))
# Set to track words we have already processed
visited = set()
# Dictionary to keep track of the shortest path lengths to each word from beginWord
shortest_paths = {beginWord: 1}
min_length = math.inf
# Use BFS to find the shortest path in the graph
while q.qsize():
word, path = q.get()
# If endWord is found, add the current path to the list of paths and compute minimum path
# length found so far
if word == endWord:
paths.append(path)
min_length = min(min_length, len(path))
continue
for hop in self.get_next_words(word, wordList):
# If the hop is already processed or in the queue for processing, skip
if hop in visited or hop in q.queue:
continue
# If the shortest path to the hop has not been determined or the current path length is lesser
# than or equal to the known shortest path to the hop, add it to the queue and update the shortest
# path to the hop.
if (hop not in shortest_paths) or (hop in shortest_paths and len(path + [hop]) <= shortest_paths[hop]):
q.put((hop, path + [hop]))
shortest_paths[hop] = len(path + [hop])
visited.add(word)
return [s for s in paths if len(s) == min_length]
if __name__ == "__main__":
# beginword = 'qa'
# endword = 'sq'
# wordlist = ["si","go","se","cm","so","ph","mt","db","mb","sb","kr","ln","tm","le","av","sm","ar","ci","ca","br","ti","ba","to","ra","fa","yo","ow","sn","ya","cr","po","fe","ho","ma","re","or","rn","au","ur","rh","sr","tc","lt","lo","as","fr","nb","yb","if","pb","ge","th","pm","rb","sh","co","ga","li","ha","hz","no","bi","di","hi","qa","pi","os","uh","wm","an","me","mo","na","la","st","er","sc","ne","mn","mi","am","ex","pt","io","be","fm","ta","tb","ni","mr","pa","he","lr","sq","ye"]
# beginword = 'hit'
# endword = 'cog'
# wordlist = ['hot', 'dot', 'dog', 'lot', 'log', 'cog']
# beginword = 'red'
# endword = 'tax'
# wordlist = ['ted', 'tex', 'red', 'tax', 'tad', 'den', 'rex', 'pee']
beginword = 'cet'
endword = 'ism'
wordlist = ["kid","tag","pup","ail","tun","woo","erg","luz","brr","gay","sip","kay","per","val","mes","ohs","now","boa","cet","pal","bar","die","war","hay","eco","pub","lob","rue","fry","lit","rex","jan","cot","bid","ali","pay","col","gum","ger","row","won","dan","rum","fad","tut","sag","yip","sui","ark","has","zip","fez","own","ump","dis","ads","max","jaw","out","btu","ana","gap","cry","led","abe","box","ore","pig","fie","toy","fat","cal","lie","noh","sew","ono","tam","flu","mgm","ply","awe","pry","tit","tie","yet","too","tax","jim","san","pan","map","ski","ova","wed","non","wac","nut","why","bye","lye","oct","old","fin","feb","chi","sap","owl","log","tod","dot","bow","fob","for","joe","ivy","fan","age","fax","hip","jib","mel","hus","sob","ifs","tab","ara","dab","jag","jar","arm","lot","tom","sax","tex","yum","pei","wen","wry","ire","irk","far","mew","wit","doe","gas","rte","ian","pot","ask","wag","hag","amy","nag","ron","soy","gin","don","tug","fay","vic","boo","nam","ave","buy","sop","but","orb","fen","paw","his","sub","bob","yea","oft","inn","rod","yam","pew","web","hod","hun","gyp","wei","wis","rob","gad","pie","mon","dog","bib","rub","ere","dig","era","cat","fox","bee","mod","day","apr","vie","nev","jam","pam","new","aye","ani","and","ibm","yap","can","pyx","tar","kin","fog","hum","pip","cup","dye","lyx","jog","nun","par","wan","fey","bus","oak","bad","ats","set","qom","vat","eat","pus","rev","axe","ion","six","ila","lao","mom","mas","pro","few","opt","poe","art","ash","oar","cap","lop","may","shy","rid","bat","sum","rim","fee","bmw","sky","maj","hue","thy","ava","rap","den","fla","auk","cox","ibo","hey","saw","vim","sec","ltd","you","its","tat","dew","eva","tog","ram","let","see","zit","maw","nix","ate","gig","rep","owe","ind","hog","eve","sam","zoo","any","dow","cod","bed","vet","ham","sis","hex","via","fir","nod","mao","aug","mum","hoe","bah","hal","keg","hew","zed","tow","gog","ass","dem","who","bet","gos","son","ear","spy","kit","boy","due","sen","oaf","mix","hep","fur","ada","bin","nil","mia","ewe","hit","fix","sad","rib","eye","hop","haw","wax","mid","tad","ken","wad","rye","pap","bog","gut","ito","woe","our","ado","sin","mad","ray","hon","roy","dip","hen","iva","lug","asp","hui","yak","bay","poi","yep","bun","try","lad","elm","nat","wyo","gym","dug","toe","dee","wig","sly","rip","geo","cog","pas","zen","odd","nan","lay","pod","fit","hem","joy","bum","rio","yon","dec","leg","put","sue","dim","pet","yaw","nub","bit","bur","sid","sun","oil","red","doc","moe","caw","eel","dix","cub","end","gem","off","yew","hug","pop","tub","sgt","lid","pun","ton","sol","din","yup","jab","pea","bug","gag","mil","jig","hub","low","did","tin","get","gte","sox","lei","mig","fig","lon","use","ban","flo","nov","jut","bag","mir","sty","lap","two","ins","con","ant","net","tux","ode","stu","mug","cad","nap","gun","fop","tot","sow","sal","sic","ted","wot","del","imp","cob","way","ann","tan","mci","job","wet","ism","err","him","all","pad","hah","hie","aim"]
wl = WordLadder2()
# beginword = 'hot'
# endword = 'dog'
# wordlist = ['hot', 'dog', 'dot']
print(wl.find_ladders(beginword, endword, wordlist))
The part that slows down your solution is is_one_hop_away, which is a costly function. This is called repeatedly during the actual BFS. Instead you should aim to first create a graph structure -- an adjacency list -- so that complexity of calculating which words are neighbors is dealt with before actually peforming the BFS search.
Here is one way to do it:
from collections import defaultdict
class Solution:
def findLadders(self, beginWord: str, endWord: str, wordList: List[str]) -> List[List[str]]:
def createAdjacencyList(wordList):
adj = defaultdict(set)
d = defaultdict(set)
for word in wordList:
for i in range(len(word)):
derived = word[:i] + "*" + word[i+1:]
for neighbor in d[derived]:
adj[word].add(neighbor)
adj[neighbor].add(word)
d[derived].add(word)
return adj
def edgesOnShortestPaths(adj, beginWord, endWord):
frontier = [beginWord]
edges = defaultdict(list)
edges[beginWord] = []
while endWord not in frontier:
nextfrontier = set(neighbor
for word in frontier
for neighbor in adj[word]
if neighbor not in edges
)
if not nextfrontier: # endNode is not reachable
return
for word in frontier:
for neighbor in adj[word]:
if neighbor in nextfrontier:
edges[neighbor].append(word)
frontier = nextfrontier
return edges
def generatePaths(edges, word):
if not edges[word]:
yield [word]
else:
for neighbor in edges[word]:
for path in generatePaths(edges, neighbor):
yield path + [word]
if endWord not in wordList: # shortcut exit
return []
adj = createAdjacencyList([beginWord] + wordList)
edges = edgesOnShortestPaths(adj, beginWord, endWord)
if not edges: # endNode is not reachable
return []
return list(generatePaths(edges, endWord))

PySpark : AttributeError: 'DataFrame' object has no attribute 'values'

I'm a newbie in PySpark and I want to translate the following scripts which are pythonic into pyspark:
api_param_df = pd.DataFrame([[row[0][0], np.nan] if row[0][1] == '' else row[0] for row in http_path.values], columns=["api", "param"])
df = pd.concat([df['raw'], api_param_df], axis=1)
but I face the following error, which error trackback is following:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-df055fb7d6a1> in <module>()
21 # Notice we also make \? and the second capture group optional so that when there are no query parameters in http path, it returns NaN.
22
---> 23 api_param_df = pd.DataFrame([[row[0][0], np.nan] if row[0][1] == '' else row[0] for row in http_path.values], columns=["api", "param"])
24 df = pd.concat([df['raw'], api_param_df], axis=1)
25
/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py in __getattr__(self, name)
1642 if name not in self.columns:
1643 raise AttributeError(
-> 1644 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
1645 jc = self._jdf.apply(name)
1646 return Column(jc)
AttributeError: 'DataFrame' object has no attribute 'values'
The full script is as follow, and explanations are commented for using regex to apply on the certain column http_path in df to parse api and param and merge/concat them to df again.
#Extract features from http_path ["API URL", "URL parameters"]
regex = r'([^\?]+)\?*(.*)'
http_path = df.filter(df['http_path'].rlike(regex))
# http_path
#0 https://example.org/path/to/file?param=42#frag...
#1 https://example.org/path/to/file
# api param
#0 https://example.org/path/to/file param=42#fragment
#1 https://example.org/path/to/file NaN
#where in regex pattern:
#- (?:https?://[^/]+/)? optionally matches domain but doesn't capture it
#- (?P<api>[^?]+) matches everything up to ?
#- \? matches ? literally
#- (?P<param>.+) matches everything after ?
# Notice we also make \? and the second capture group optional so that when there are no query parameters in http_path, it returns NaN.
api_param_df = pd.DataFrame([[row[0][0], np.nan] if row[0][1] == '' else row[0] for row in http_path.values], columns=["api", "param"])
df = pd.concat([df['raw'], api_param_df], axis=1)
df
Any help will be appreciated.
The syntax is valid with Pandas DataFrames but that attribute doesn't exist for the PySpark created DataFrames. You can check out this link for the documentation.
Usually, the collect() method or the .rdd attribute would help you with these tasks.
You can use the following snippet to produce the desired result:
http_path = sdf.rdd.map(lambda row: row['http_path'].split('?'))
api_param_df = pd.DataFrame([[row[0], np.nan] if len(row) == 1 else row for row in http_path.collect()], columns=["api", "param"])
sdf = pd.concat([sdf.toPandas()['raw'], api_param_df], axis=1)
Note that I removed the comments to make it more readable and I've also substituted the regex with a simple split.

Error while validating User Input in Python

I am having problem validating the user input (I am asking user if they wish to continue with the program of calculating factorial). The code is as follows: (User input validation is towards the end of the main function and I have not included the factorial function)
def main():
valid_inp = False
usr_continue = True
while usr_continue:
while valid_inp == False:
usr_inp = int(input('Please ENTER a number: '))
if usr_inp < 0:
print('ERROR, INVALID INPUT')
else:
valid_inp = True
continue
result = factorial(usr_inp)
print(str(result) + '\n')
con_inp = str(input('Would you like to continue ? '))
if con_inp == 'Y' or con_inp == 'y':
usr_continue
elif con_inp == 'N' or con_inp == 'n':
print('Goodbye...')
break
main()
Make a function that only returns on valid input. Use an exception handler to deal with bad integer input, then validate the integer is the range you want:
from math import factorial
def get_nonnegative_integer(prompt):
while True:
try:
val = int(input(prompt)) # bad input for int such as "abc" will raise ValueError
if val >= 0: # good input will be range-checked
return val
else:
print('enter a number >= 0')
except ValueError:
print('invalid input for integer')
def main():
while True:
usr_inp = get_nonnegative_integer('Please enter a number: ')
result = factorial(usr_inp)
print(result)
con_inp = input('Would you like to continue(Y/n)? ').upper() # default Yes
if con_inp.startswith('N'):
print('Goodbye...')
break
main()

Can I restrict objects in Python3 so that only attributes that I make a setter for are allowed?

I have something called a Node. Both Definition and Theorem are a type of node, but only Definitions should be allowed to have a plural attribute:
class Definition(Node):
def __init__(self,dic):
self.type = "definition"
super(Definition, self).__init__(dic)
self.plural = move_attribute(dic, {'plural', 'pl'}, strict=False)
#property
def plural(self):
return self._plural
#plural.setter
def plural(self, new_plural):
if new_plural is None:
self._plural = None
else:
clean_plural = check_type_and_clean(new_plural, str)
assert dunderscore_count(clean_plural)>=2
self._plural = clean_plural
class Theorem(Node):
def __init__(self, dic):
self.type = "theorem"
super().__init__(dic)
self.proofs = move_attribute(dic, {'proofs', 'proof'}, strict=False)
# theorems CANNOT have plurals:
# if 'plural' in self:
# raise KeyError('Theorems cannot have plurals.')
As you can see, Definitions have a plural.setter, but theorems do not. However, the code
theorem = Theorem(some input)
theorem.plural = "some plural"
runs just fine and raises no errors. But I want it to raise an error. As you can see, I tried to check for plurals manually at the bottom of my code shown, but this would only be a patch. I would like to block the setting of ANY attribute that is not expressly defined. What is the best practice for this sort of thing?
I am looking for an answer that satisfies the "chicken" requirement:
I do not think this solves my issue. In both of your solutions, I can
append the code t.chicken = 'hi'; print(t.chicken), and it prints hi
without error. I do not want users to be able to make up new
attributes like chicken.
The short answer is "Yes, you can."
The follow-up question is "Why?" One of the strengths of Python is the remarkable dynamism, and by restricting that ability you are actually making your class less useful (but see edit at bottom).
However, there are good reasons to be restrictive, and if you do choose to go down that route you will need to modify your __setattr__ method:
def __setattr__(self, name, value):
if name not in ('my', 'attribute', 'names',):
raise AttributeError('attribute %s not allowed' % name)
else:
super().__setattr__(name, value)
There is no need to mess with __getattr__ nor __getattribute__ since they will not return an attribute that doesn't exist.
Here is your code, slightly modified -- I added the __setattr__ method to Node, and added an _allowed_attributes to Definition and Theorem.
class Node:
def __setattr__(self, name, value):
if name not in self._allowed_attributes:
raise AttributeError('attribute %s does not and cannot exist' % name)
super().__setattr__(name, value)
class Definition(Node):
_allowed_attributes = '_plural', 'type'
def __init__(self,dic):
self.type = "definition"
super().__init__(dic)
self.plural = move_attribute(dic, {'plural', 'pl'}, strict=False)
#property
def plural(self):
return self._plural
#plural.setter
def plural(self, new_plural):
if new_plural is None:
self._plural = None
else:
clean_plural = check_type_and_clean(new_plural, str)
assert dunderscore_count(clean_plural)>=2
self._plural = clean_plural
class Theorem(Node):
_allowed_attributes = 'type', 'proofs'
def __init__(self, dic):
self.type = "theorem"
super().__init__(dic)
self.proofs = move_attribute(dic, {'proofs', 'proof'}, strict=False)
In use it looks like this:
>>> theorem = Theorem(...)
>>> theorem.plural = 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __setattr__
AttributeError: attribute plural does not and cannot exist
edit
Having thought about this some more, I think a good compromise for what you want, and to actually answer the part of your question about restricting allowed changes to setters only, would be to:
use a metaclass to inspect the class at creation time and dynamically build the _allowed_attributes tuple
modify the __setattr__ of Node to always allow modification/creation of attributes with at least one leading _
This gives you some protection against both misspellings and creation of attributes you don't want, while still allowing programmers to work around or enhance the classes for their own needs.
Okay, the new meta class looks like:
class NodeMeta(type):
def __new__(metacls, cls, bases, classdict):
node_cls = super().__new__(metacls, cls, bases, classdict)
allowed_attributes = []
for base in (node_cls, ) + bases:
for name, obj in base.__dict__.items():
if isinstance(obj, property) and hasattr(obj, '__fset__'):
allowed_attributes.append(name)
node_cls._allowed_attributes = tuple(allowed_attributes)
return node_cls
The Node class has two adjustments: include the NodeMeta metaclass and adjust __setattr__ to only block non-underscore leading attributes:
class Node(metaclass=NodeMeta):
def __init__(self, dic):
self._dic = dic
def __setattr__(self, name, value):
if not name[0] == '_' and name not in self._allowed_attributes:
raise AttributeError('attribute %s does not and cannot exist' % name)
super().__setattr__(name, value)
Finally, the Node subclasses Theorem and Definition have the type attribute moved into the class namespace so there is no issue with setting them -- and as a side note, type is a bad name as it is also a built-in function -- maybe node_type instead?
class Definition(Node):
type = "definition"
...
class Theorem(Node):
type = "theorem"
...
As a final note: even this method is not immune to somebody actually adding or changing attributes, as object.__setattr__(theorum_instance, 'an_attr', 99) can still be used -- or (even simpler) the _allowed_attributes can be modified; however, if somebody is going to all that work they hopefully know what they are doing... and if not, they own all the pieces. ;)
You can check for the attribute everytime you access it.
class Theorem(Node):
...
def __getattribute__(self, name):
if name not in ["allowed", "attribute", "names"]:
raise MyException("attribute "+name+" not allowed")
else:
return self.__dict__[name]
def __setattr__(self, name, value):
if name not in ["allowed", "attribute", "names"]:
raise MyException("attribute "+name+" not allowed")
else:
self.__dict__[name] = value
You can build the allowed method list dynamically as a side effect of a decorator:
allowed_attrs = []
def allowed(f):
allowed_attrs.append(f.__name__)
return f
You would also need to add non method attributes manually.
If you really want to prevent all other dynamic attributes. I assume there's a well-defined time window that you want to allow adding attributes.
Below I allow it until object initialisation is finished. (you can control it with allow_dynamic_attribute variable.
class A:
def __init__(self):
self.allow_dynamic_attribute = True
self.abc = "hello"
self._plural = None # need to give default value
# A.__setattr__ = types.MethodType(__setattr__, A)
self.allow_dynamic_attribute = False
def __setattr__(self, name, value):
if hasattr(self, 'allow_dynamic_attribute'):
if not self.allow_dynamic_attribute:
if not hasattr(self, name):
raise Exception
super().__setattr__(name, value)
#property
def plural(self):
return self._plural
#plural.setter
def plural(self, new_plural):
self._plural = new_plural
a = A()
print(a.abc) # fine
a.plural = "yes" # fine
print(a.plural) # fine
a.dkk = "bed" # raise exception
Or it can be more compact this way, I couldn't figure out how MethodType + super can get along together.
import types
def __setattr__(self, name, value):
if not hasattr(self, name):
raise Exception
else:
super().__setattr__(name,value) # this doesn't work for reason I don't know
class A:
def __init__(self):
self.foo = "hello"
# after this point, there's no more setattr for you
A.__setattr__ = types.MethodType(__setattr__, A)
a = A()
print(a.foo) # fine
a.bar = "bed" # raise exception
Yes, you can create private members that cannot be modified from outside the class. The variable name should start with two underscores:
class Test(object):
def __init__(self, t):
self.__t = t
def __str__(self):
return str(self.__t)
t = Test(2)
print(t) # prints 2
t.__t = 3
print(t) # prints 2
That said, trying to access such a variable as we do in t.__t = 3 will not raise an exception.
A different approach which you can take to achieve the wanted behavior is using functions. This approach will require "accessing attributes" using functional notation, but if that doesn't bother you, you can get exactly what you want. The following demo "hardcodes" the values, but obviously you can have Theorem() accept an argument and use it to set values to the attributes dynamically.
Demo:
# -*- coding: utf-8 -*-
def Theorem():
def f(attrib):
def proofs():
return ''
def plural():
return '◊◊◊◊◊◊◊◊'
if attrib == 'proofs':
return proofs()
elif attrib == 'plural':
return plural()
else:
raise ValueError("Attribute [{}] doesn't exist".format(attrib))
return f
t = Theorem()
print(t('proofs'))
print(t('plural'))
print(t('wait_for_error'))
OUTPUT

◊◊◊◊◊◊◊◊
Traceback (most recent call last):
File "/Users/alfasi/Desktop/1.py", line 40, in <module>
print(t('wait_for_error'))
File "/Users/alfasi/Desktop/1.py", line 32, in f
raise ValueError("Attribute [{}] doesn't exist".format(attrib))
ValueError: Attribute [wait_for_error] doesn't exist

How can I create the grammar definition to correctly parse a input

Lex file
import ply.lex as lex
# List of token names.
tokens = (
"SYMBOL",
"COUNT"
)
t_SYMBOL = (r"Cl|Ca|Co|Os|C|H|O")
def t_COUNT(t):
r"\d+"
t.value = int(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
atomLexer = lex.lex()
data1 = "CH3Cl"
data = "OClOsOH3C"
def testItOut():
# Give the lexer some input
atomLexer.input(data1)
# Tokenize
tok = atomLexer.token()
while tok:
print (tok)
tok = atomLexer.token()
Parse file
import ply.yacc as yacc
# Get the token map from the lexer.
from atomLex import tokens
def p_expression_symbol(p):
'molecule : SYMBOL'
p[0] = p[1]
def p_error(p):
raise TypeError("unknown text at %r" % (p.value,))
atomParser = yacc.yacc()
def testItOut():
# Give the parser some input
s = input('Type a chemical name > ')
# Parse it
result = atomParser.parse(s)
print ('The atom is: ' + result)
while(True):
testItOut()
Currently I would like to be able to enter in CH3Cl, although within my parse file I am not entirely sure how to create these grammar definitions that I have been given,
chemical : chemical molecule
chemical : molecule
molecule : SYMBOL COUNT
molecule : SYMBOL
What would the grammar definitions for these be within the parse file? Thank you.
There is a nice set of documentation for PLY with examples, which can be used to answer this question: http://www.dabeaz.com/ply/ply.html
Section 6.2 is particularly helpful. I suggest you change this code:
def p_expression_symbol(p):
'molecule : SYMBOL'
p[0] = p[1]
To include the new rules. The name p_expression_symbol is also inappropriate. I guess you copied that from one of the examples. We now have:
def p_chemical_forumal(p):
'''molecule : SYMBOL
chemical : chemical molecule
chemical : molecule
molecule : SYMBOL COUNT
molecule : SYMBOL'''
p[0] = p[1]
There are also other useful examples in the documentation that can be applied to your exercise.