Related
Following along http://blog.ptsecurity.com/2016/06/theory-and-practice-of-source-code.html#java--and-java8-grammars, I am trying to reduce left-recursion in my fairly complex grammar. From what I understand, the non-primitive form of recursion can lead to performance problems both in terms of memory and process time.
So I am trying to refactor these rules in my grammar to use only "primitive" recursion. Of course, that blog post is the only time I have seen the phrase "primitive" recursion in regards to Antlr. So I am just guessing at its meaning/intent. Seems to me it means a rule that refers to itself as a lhs for at most just a single rule branch. Correct?
At the moment I have an expression rule like:
expression
: expression DOUBLE_PIPE expression # ConcatenationExpression
| expression PLUS expression # AdditionExpression
| expression MINUS expression # SubtractionExpression
| expression ASTERISK expression # MultiplicationExpression
| expression SLASH expression # DivisionExpression
| expression PERCENT expression # ModuloExpression
...
;
The ... includes quite a few sub-rules that also refer back to expression. But these are the only ones with direct recursion.
If I understand correctly, refactoring these to be "primitive" recursion would look something like:
expression
: binaryOpExpression # BinaryOpExpression
...
;
binaryOpExpression
: expression DOUBLE_PIPE expression # ConcatenationExpression
| expression PLUS expression # AdditionExpression
| expression MINUS expression # SubtractionExpression
| expression ASTERISK expression # MultiplicationExpression
| expression SLASH expression # DivisionExpression
| expression PERCENT expression # ModuloExpression
;
First, is that the correct refactoring?
Secondly, will that really help performance? At the end of the day it is still the same decisions, so I'm not really understanding how this helps performance (aside from maybe producing less ATNConfig objects).
Thanks
I have not heard "primitive recursion" before in this context and the author probably only means to name a specific form of recursions in ANTLR4.
Fact is there are 3 relevant forms of recursions in ANTLR4:
Direct left recursion: recursion from the first rule reference in a rule (to the same rule). For example: a: ab | c;
Indirect left recursion: left recursion not directly from the same rule. For example: a: b | c; b: c | d; c: a | e; (not allowed in ANTLR4)
Right recursion: any other recursion in a rule. For example: a: ba | c;. The name "right recursion" is however only correct in cases of binary expression, but is used often to differentiate from left recursions in general.
Having said that it becomes clear that your rewrite is wrong, as it would create indirect left recursion, which ANLTR4 does not support. Direct left recursion is usually not a problem (from a memory or performance standpoint) because ANTLR4 converts them to non-recursive ATN rule graphs.
What can become a problem are right recursions, because they are implemented by code recursion (recursive function calls in the runtime), which may qickly exhaust the CPU stack. I have seen cases with big expressions which could not be parsed in a separate thread, because I couldn't set the thread stack size to a larger value (the main thread stack size usually can be adjusted via linker settings).
The only solution for the latter case, which I have found useful, is to lower the number of parser rules in the grammar that call each other. Of course it's a matter of structure, readability etc. to put certain expression elements in different rules (for example andExpression, orExpression, bitExpression etc.), but that may lead to pretty deep invocation stacks, which may exhaust the CPU stack and/or require a lot of time to process them.
I've seen := used in several code samples, but never with an accompanying explanation. It's not exactly possible to google its use without knowing the proper name for it.
What does it do?
http://en.wikipedia.org/wiki/Equals_sign#In_computer_programming
In computer programming languages, the equals sign typically denotes either a boolean operator to test equality of values (e.g. as in Pascal or Eiffel), which is consistent with the symbol's usage in mathematics, or an assignment operator (e.g. as in C-like languages). Languages making the former choice often use a colon-equals (:=) or ≔ to denote their assignment operator. Languages making the latter choice often use a double equals sign (==) to denote their boolean equality operator.
Note: I found this by searching for colon equals operator
It's the assignment operator in Pascal and is often used in proofs and pseudo-code. It's the same thing as = in C-dialect languages.
Historically, computer science papers used = for equality comparisons and ← for assignments. Pascal used := to stand in for the hard-to-type left arrow. C went a different direction and instead decided on the = and == operators.
In the statically typed language Go := is initialization and assignment in one step. It is done to allow for interpreted-like creation of variables in a compiled language.
// Creates and assigns
answer := 42
// Creates and assigns
var answer = 42
Another interpretation from outside the world of programming languages comes from Wolfram Mathworld, et al:
If A and B are equal by definition (i.e., A is defined as B), then this is written symbolically as A=B, A:=B, or sometimes A≜B.
■ http://mathworld.wolfram.com/Defined.html
■ https://math.stackexchange.com/questions/182101/appropriate-notation-equiv-versus
Some language uses := to act as the assignment operator.
In a lot of CS books, it's used as the assignment operator, to differentiate from the equality operator =. In a lot of high level languages, though, assignment is = and equality is ==.
This is old (pascal) syntax for the assignment operator. It would be used like so:
a := 45;
It may be in other languages as well, probably in a similar use.
A number of programming languages, most notably Pascal and Ada, use a colon immediately followed by an equals sign (:=) as the assignment operator, to distinguish it from a single equals which is an equality test (C instead used a single equals as assignment, and a double equals as the equality test).
Reference: Colon (punctuation).
In Python:
Named Expressions (NAME := expr) was introduced in Python 3.8. It allows for the assignment of variables within an expression that is currently being evaluated. The colon equals operator := is sometimes called the walrus operator because, well, it looks like a walrus emoticon.
For example:
if any((comment := line).startswith('#') for line in lines):
print(f"First comment: {comment}")
else:
print("There are no comments")
This would be invalid if you swapped the := for =. Note the additional parentheses surrounding the named expression. Another example:
# Compute partial sums in a list comprehension
total = 0
values = [1, 2, 3, 4, 5]
partial_sums = [total := total + v for v in values]
# [1, 3, 6, 10, 15]
print(f"Total: {total}") # Total: 15
Note that the variable total is not local to the comprehension (so too is comment from the first example). The NAME in a named expression cannot be a local variable within an expression, so, for example, [i := 0 for i, j in stuff] would be invalid, because i is local to the list comprehension.
I've taken examples from the PEP 572 document - it's a good read! I for one am looking forward to using Named Expressions, once my company upgrades from Python 3.6. Hope this was helpful!
Sources: Towards Data Science Article and PEP 572.
It's like an arrow without using a less-than symbol <= so like everybody already said "assignment" operator. Bringing clarity to what is being set to where as opposed to the logical operator of equivalence.
In Mathematics it is like equals but A := B means A is defined as B, a triple bar equals can be used to say it's similar and equal by definition but not always the same thing.
Anyway I point to these other references that were probably in the minds of those that invented it, but it's really just that plane equals and less that equals were taken (or potentially easily confused with =<) and something new to define assignment was needed and that made the most sense.
Historical References: I first saw this in SmallTalk the original Object Language, of which SJ of Apple only copied the Windows part of and BG of Microsoft watered down from them further (single threaded). Eventually SJ in NeXT took the second more important lesson from Xerox PARC in, which became Objective C.
Well anyway they just took colon-equals assiment operator from ALGOL 1958 which was later popularized by Pascal
https://en.wikipedia.org/wiki/PARC_(company)
https://en.wikipedia.org/wiki/Assignment_(computer_science)
Assignments typically allow a variable to hold different values at
different times during its life-span and scope. However, some
languages (primarily strictly functional) do not allow that kind of
"destructive" reassignment, as it might imply changes of non-local
state.
The purpose is to enforce referential transparency, i.e. functions
that do not depend on the state of some variable(s), but produce the
same results for a given set of parametric inputs at any point in
time.
https://en.wikipedia.org/wiki/Referential_transparency
For VB.net,
a constructor (for this case, Me = this in Java):
Public ABC(int A, int B, int C){
Me.A = A;
Me.B = B;
Me.C = C;
}
when you create that object:
new ABC(C:=1, A:=2, B:=3)
Then, regardless of the order of the parameters, that ABC object has A=2, B=3, C=1
So, ya, very good practice for others to read your code effectively
Colon-equals was used in Algol and its descendants such as Pascal and Ada because it is as close as ASCII gets to a left-arrow symbol.
The strange convention of using equals for assignment and double-equals for comparison was started with the C language.
In Prolog, there is no distinction between assignment and the equality test.
I am writing a parser for SPARQL (Semantic Web query language) using DCG. I want to replace SPARQL variable names with Prolog variables. How would I go about this?
I can generate new variables using length([NewVar], 1), but I cannot keep track of existing assignments by simply using a list of name-variable pairs. A member/2 operation on the list will return a new variable, not the one stored in the list.
Is there an easy way for naming variables in Prolog, e.g., '$VAR(Name)'?
member/2 will do what you want. Here is an example:
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.3.25)
Copyright (c) 1990-2016 University of Amsterdam, VU Amsterdam
L=[a-X,b-Y,c-Z], member(b-V,L).
L = [a-X, b-V, c-Z],
Y = V
But you might get problems if you interleave write/1 with member/2,
since a variable might change its identity, i.e. the write symbol in the following circumstances:
because of garbage collection, if a variable is written as _G<memloc>
because of aliasing, in the above example the memloc of V might be shown
instead of the memloc of Y
Same problem with (#<)/2. One way out is to use attribute variables, which at least puts an end to aliasing, since attribute variables are usually unified last,
so in the above example if Y is an attribute variable and V is an ordinary variable you would never see the memloc of V after
calling member/2.
Further you can also mitigate the problem by using ISO core standard variable_names/1 write option, to write out a variablified term. The variable_names/1 write option is immune to garbage collection or aliasing.
Bye
So, I'm writing a language using flex/bison and I'm having difficulty with implementing identifiers, specifically when it comes to knowing when you're looking at an assignment or a reference,
for example:
1) A = 1+2
2) B + C (where B and C have already been assigned values)
Example one I can work out by returning an ID token from flex to bison, and just following a grammar that recognizes that 1+2 is an integer expression, putting A into the symbol table, and setting its value.
examples two and three are more difficult for me because: after going through my lexer, what's being returned in ex.2 to bison is "ID PLUS ID" -> I have a grammar that recognizes arithmetic expressions for numerical values, like INT PLUS INT (which would produce an INT), or DOUBLE MINUS INT (which would produce a DOUBLE). if I have "ID PLUS ID", how do I know what type the return value is?
Here's the best idea that I've come up with so far: When tokenizing, every time an ID comes up, I search for its value and type in the symbol table and switch out the ID token with its respective information; for example: while tokenizing, I come across B, which has a regex that matches it as being an ID. I look in my symbol table and see that it has a value of 51.2 and is a DOUBLE. So instead of returning ID, with a value of B to bison, I'm returning DOUBLE with a value of 51.2
I have two different solutions that contradict each other. Here's why: if I want to assign a value to an ID, I would say to my compiler A = 5. In this situation, if I'm using my previously described solution, What I'm going to get after everything is tokenized might be, INT ASGN INT, or STRING ASGN INT, etc... So, in this case, I would use the former solution, as opposed to the latter.
My question would be: what kind of logical device do I use to help my compiler know which solution to use?
NOTE: I didn't think it necessary to post source code to describe my conundrum, but I will if anyone could use it effectively as a reference to help me understand their input on this topic.
Thank you.
The usual way is to have a yacc/bison rule like:
expr: ID { $$ = lookupId($1); }
where the the lookupId function looks up a symbol in the symbol table and returns its type and value (or type and storage location if you're writing a compiler rather than a strict interpreter). Then, your other expr rules don't need to care whether their operands come from constants or symbols or other expressions:
expr: expr '+' expr { $$ = DoAddition($1, $3); }
The function DoAddition takes the types and values (or locations) for its two operands and either adds them, producing a result, or produces code to do the addition at run time.
If possible redesign your language so that the situation is unambiguous. This is why even Javascript has var.
Otherwise you're going to need to disambiguate via semantic rules, for example that the first use of an identifier is its declaration. I don't see what the problem is with your case (2): just generate the appropriate code. If B and C haven't been used yet, a value-reading use like this should be illegal, but that involves you in control flow analysis if taken to the Nth degree of accuracy, so you might prefer to assume initial values of zero.
In any case you can see that it's fundamentally a language design problem rather than a coding problem.
Task:
I am planning to parse a formula string in NSPredicate and to replace variables in the string by their numeric values. The variables are names for properties of existing object instances in my data model, for instance I have a class "company" with an instance "Apple Corp."
Set-up:
My formula would like look like this: "Profitability_2011_in% = [Profit 2011] / [Revenue 2011]"
The instance "Apple Corp" would have the following properties:
Revenue 2009 = 10, Revenue 2010 = 20, Revenue 2011 = 30,
Profit 2009 = 5, Profit 2010 = 10, Profit 2011 = 20.
Hence, the formula would yield 20 / 30 = 67%.
Variables are usually two-dimensional, for instance defined by "profit" as the financial statement item and "year" (for instance 2011).
The variables are enclosed in [ ] and the dimensions are separated by " " (whitespace).
How I would do it
My implementation would begin with NSRegularExpression's matchesInString:options:range: to get an array of all variables in the formula (Profit 2011, Revenue 2011) and then construct an NSDictionary (key = variable name) out of this array by querying my data model.
What do you think?
Is there a better way to do it in your view?
In the formula, how would you replace the variables by their values?
How would you parse the formula?
Thank you!!
Yes, you can do this. This falls under the category of "Using NSPredicate for things for which it was not intended", but will work just fine.
You'll need to replace your variables with a single word that start with a $, since that's how NSPredicate denotes variables:
NSPredicate *p = [NSPredicate predicateWithFormat:#"foo = $bar"];
However you want to do that, great. NSRegularExpression is a fine way to do that.
Once you do that, you'll have something like this:
#"$profitability2011 = $profit2011 / $revenue2011"
You can then pop this through +predicateWithFormat:. You'll get back an NSComparisonPredicate. The -leftExpression will be of type NSVariableExpressionType, and the -rightExpression will be of type NSFunctionExpressionType.
This is where things start to get hairy. If you were to -evaluteWithObject:substitutionVariables:, you'd simply get back a YES or NO value, since a predicate is simply a statement that evaluates to true or false. I haven't explored how you could just evaluate one side (in this case, the -rightExpression), but it's possible that -[NSExpression expressionValueWithObject:context:] might help you. I don't know, because I'm not sure what that "context" parameter is for. It doesn't seem like it's a substitution dictionary, but I could be wrong.
So if that doesn't work (and I have no idea if it will or not), you could use my parser: DDMathParser. It has a parser, similar to NSPredicate's parser, but is specifically tuned for parsing and evaluating mathematical expressions. In your case, you'd do:
#import "DDMathParser.h"
NSString *s = #"$profit2011 / $revenue2011";
NSDictionary *values = ...; // the values of the variables
NSNumber *profitability = [s numberByEvaluatingStringWithSubstitutions:values];
The documentation for DDMathParser is quite extensive, and it can do quite a bit.
edit Dynamic variable resolution
I just pushed a change that allows DDMathParser to resolve functions dynamically. It's important to understand that a function is different from a variable. A function is evaluated, whereas a variable is simply substituted. However, the change only does dynamic resolution for functions, not variables. That's ok, because DDMathParser has this neat thing called argumentless functions.
An argumentless function is a function name that's not followed by an opening parenthesis. For convenience, it's inserted for you. This means that #"pi" is correctly parsed as #"pi()" (since the constant for π is implemented as a function).
In your case, you can do this:
Instead of regexing your string to make variables, simply use the names of the terms:
#"profit_2011 / revenue_2011";
This will be parsed as if you had entered:
#"divide(profit_2011(), revenue_2011())"
You can the set up your DDMathEvaluator object with a function resolver. There are two examples of this in the DDMathParser repository:
This example shows how to use the resolver function to look up the "missing" function in a substitution dictionary (this would be most like what you want)
This example shows you to interpret any missing function as if it evaluated to 42.
Once you implement a resolver function, you can forego having to package all your variables up into a dictionary.
Is there a better way to do it in your view?
Yes - using Flex & Bison.
Possibly you could achieve what you want with a regex - but for many expression grammars, a regex isn't powerful enough to parse the grammar. Also, regex things like this get large, unreadable, and unyieldy.
You can use Flex (a lexer) and Bison (a parser) to create a grammar definition for your expressions, and generate C code (which, as I'm sure you know, works perfectly with Objective-C since Objective-C is C) which you can use to parse your expressions.
In the formula, how would you replace the variables by their values?
As you parse through it with Bison you should have a hash table with variable names and their current values. When you generate the syntax tree, add references to the variables to your syntax tree nodes.
How would you parse the formula?
Again - Flex & Bison are specifically meant to do this kind of thing - and they excel at it.