logic suggestions for interpreter ast building - oop

i'm in the process of making an interpreter, and i started with the ast classes.
The code is object oriented in c++.
Each action is a "statement", a block is a statement itself and it contains a list of statements.
Blocks also define different scopes, keeping record of what variables have been allocated within that scope and deleting them at the end of it.
The execution consists in the main block calling "execute" for every statement it contains, where each statement can consist in another block, a simple instruction or a function call.
With this structure i could think of how to implement most constructs, (while, if-else) but i don't understand how could i make a goto. Despite the fact gotos are almost not used anyway, i'd need the goto as a starting base to implement break and continue for loops.
Does anyone have a conceptual suggestion? (conceptually speaking, not actual code needed).
NOTE: the code is not executed line by line while being parsed, it's first entirely parsed into an ast, and then executed.

If you're gotoing to line numbers, just add an attribute to your AST node class indicating on which line it's declared in the source. When executing a goto node, find the highest node in the tree with that line number, and link that as your goto target.
For labels, you have a few options. You could turn the labels into an AST node, itself, or add an attribute to the node class which is only set if the previous statement is a label. When processing your goto, do the same thing as you'd do with line numbers. Find the top-most node that has the label attribute set.
This assumes that your targets appear in the source before the goto. Otherwise, you'll have to do a fixup pass to resolve the invocation targets, much like function calls.
There are probably better ways; this is just what comes to mind.
Essentially, your goto is a jmp, and you need a way to resolve the target address. I'd just parse the source, then do a second pass to fixup any gotos, which would have a pointer to the first AST node after the label.

Related

Invalid Iterator Fix

So, looking for advice on how to fix a situation or maybe a better way to program it.
I'm using iteration to build a complicated string from key:value pairs in an unordered_map. To make this work, I'm iterating through the map to find specific items, then sending a search term to an outside function to create the string. The outside function uses its own iterator to search the same unordered_map for the passed search term, then creates the string, then erases the entries that it referenced. The problem, I believe, is that although the outside function's iterator is still valid because it called the erase function, the iterators in the main function are now invalidated and throwing an out of range error. Is there a way to reset the iterators or send them to the next valid key:value pair when they become invalidated in order to avoid the error?
The code is a mess (mostly because I'm still discovering C++) and it might be possible to use recursion to accomplish this, but I wasn't able to get recursion to work correctly.
I can post the code, but without understanding the inputs and required outputs, it's likely not going to help explain anything, so for now, I'll just leave the question as-is: is there a way to "re-validate" invalidated iterators?
I was able to resolve the issue by redefining each of the iterators once the scope of control returned back to them. For the last iterator (in the outside function) that deleted individual key:value pairs from the unordered_map, I used:
if (it != map.end()) it = map.erase(it);
This forces the iterator to move to the next valid key:value pair after the erasure.
That worked for the end of the line, but didn't work once control was returned to each of the previous iterators. In those case, the iterators were invalidated when the outside function erased a key:value pair. So as control returned to an iterator, I included the following line before it looped back for increment:
if (it != map.end()) it = map.begin();
It seems to have resolved all of the issues, though I'm sure there's a better way to handle it.

Moving messages with VBA Outlook

What is the difference between these two lines?
Set MyMsg = MyMsg.Move(MyFolder2)
MyMsg.Move(MyFolder2)
The first one works just fine.
The second one usually gives an "Outlook is not responding" error.
The MailItem.Move method returns the MailItem that has been moved. Usually, properties return values and methods don't return anything. But for several methods, the designers decided it would be handy to have a return value, so they made them return a value (or object).
When you assign a method to a variable, any arguments must be in parentheses or you'll get a syntax error. If you call a method without assigning it to a variable (because you don't care what the method returns or it's one of the methods that doesn't return a value), then the arguments must not be in parentheses (kind of).
Parentheses, when used in places that the compiler does not require them, are the equivalent of saying "evaluate this before doing anything else". It's like how you use parentheses in order of operations so you can say "evaluate this addition operation before you do this multiplication even though that's not the normal order".
The (kind of) remark above is because most of the time when you "incorrectly" put parentheses around something, it doesn't matter.
Application.CreateItem 0
and
Application.CreateItem (0)
are the same. The second one evaluates the argument before it passes it to CreateItem, but evaluating a single integer takes no time and has no ill effects. The parentheses aren't necessary because we're not assigning the results to a variable, but they're not really hurting anything either.
In your second example, you're telling the compiler to evaluate the folder, then send it to the Move method. I don't know what evaluating a folder means, but I gather it's not good. It probably does something like create an array of all the objects in that folder, or something equally intensive. When Outlook is not responding, it means you gave it such a big job that it hasn't checked back in with the operating system in a timely fashion.
So: Use parentheses for arguments when it's on the right side of an equal sign. Don't use them when it's not. There are a few exceptions to that rule, but you may never need to know them.
There is no difference between the two (you just ignore the function result) unless you actually use the MyMsg variable afterwards - after the message is moved, you cannot access it anymore.
Use the first version.

Variable Encapsulation in Case Statement

While modifying an existing program's CASE statement, I had to add a second block where some logic is repeated to set NetWeaver portal settings. This is done by setting values in a local variable, then assigning that variable to a Changing parameter. I copied over the code and did a Pretty Print, expecting to compiler to complain about the unknown variable. To my surprise however, this code actually compiles just fine:
CASE i_actionid.
WHEN 'DOMIGO'.
DATA: ls_portal_actions TYPE powl_follow_up_sty.
CLEAR ls_portal_actions.
ls_portal_actions-bo_system = 'SAP_ECC_Common'.
" [...]
c_portal_actions = ls_portal_actions.
WHEN 'EBELN'.
ls_portal_actions-bo_system = 'SAP_ECC_Common'.
" [...]
C_PORTAL_ACTIONS = ls_portal_actions.
ENDCASE.
As I have seen in every other programming language, the DATA: declaration in the first WHEN statement should be encapsulated and available only inside that switch block. Does SAP ignore this encapsulation to make that value available in the entire CASE statement? Is this documented anywhere?
Note that this code compiles just fine and double-clicking the local variable in the second switch takes me to the data declaration in the first. I have however not been able to test that this code executes properly as our testing environment is down.
In short you cannot do this. You will have the following scopes in an abap program within which to declare variables (from local to global):
Form routine: all variables between FORM and ENDFORM
Method: all variables between METHOD and ENDMETHOD
Class - all variables between CLASS and ENDCLASS but only in the CLASS DEFINITION section
Function module: all variables between FUNCTION and ENDFUNCTION
Program/global - anything not in one of the above is global in the current program including variables in PBO and PAI modules
Having the ability to define variables locally in a for loop or if is really useful but unfortunately not possible in ABAP. The closest you will come to publicly available documentation on this is on help.sap.com: Local Data in the Subroutine
As for the compile process do not assume that ABAP will optimize out any variables you do not use it won't, use the code inspector to find and remove them yourself. Since ABAP works the way it does I personally define all my variables at the start of a modularization unit and not inline with other code and have gone so far as to modify the pretty printer to move any inline definitions to the top of the current scope.
Your assumption that a CASE statement defines its own scope of variables in ABAP is simply wrong (and would be wrong for a number of other programming languages as well). It's a bad idea to litter your code with variable declarations because that makes it awfully hard to read and to maintain, but it is possible. The DATA statements - as well as many other declarative statements - are only evaluated at compile time and are completely ignored at runtime. You can find more information about the scopes in the online documentation.
The inline variable declarations are now possible with the newest version of SAP Netweaver. Here is the link to the documentation DATA - inline declaration. Here are also some guidelines of a good and bad usage of this new feature
Here is a quote from this site:
A declaration expression with the declaration operator DATA declares a variable var used as an operand in the current writer position. The declared variable is visible statically in the program from DATA(var) and is valid in the current context. The declaration is made when the program is compiled, regardless of whether the statement is actually executed.
Personally have not had time to check it out yet, because of lack of access to such system.

Which one is better for me to use: "defer-panic-recover" or checking "if err != nil { //dosomething}" in golang?

I've made a large program that opens and closes files and databases, perform writes and reads on them etc among other things. Since there no such thing as "exception handling in go", and since I didn't really know about "defer" statement and "recover()" function, I applied error checking after every file-open, read-write, database entry etc. E.g.
_,insert_err := stmt.Run(query)
if insert_err != nil{
mylogs.Error(insert_err.Error())
return db_updation_status
}
For this, I define db_updation_status at the beginning as "false" and do not make it "true" until everything in the program goes right.
I've done this in every function, after every operation which I believe could go wrong.
Do you think there's a better way to do this using defer-panic-recover? I read about these here http://golang.org/doc/articles/defer_panic_recover.html, but can't clearly get how to use them. Do these constructs offer something similar to exception-handling? Am I better off without these constructs?
I would really appreciate if someone could explain this to me in a simple language, and/or provide a use case for these constructs and compare them to the type of error handling I've used above.
It's more handy to return error values - they can carry more information (advantage to the client/user) than a two valued bool.
What concerns panic/recover: There are scenarios where their use is completely sane. For example, in a hand written recursive descent parser, it's quite a PITA to "bubble" up an error condition through all the invocation levels. In this example, it's a welcome simplification if there's a deferred recover at the top most (API) level and one can report any kind of error at any invocation level using, for example
panic(fmt.Errorf("Cannot %v in %v", foo, bar))
If an operation can fail and returns an error, than checking this error immediately and handling it properly is idiomatic in go, simple and nice to check if anything gets handled properly.
Don't use defer/recover for such things: Needed cleanup actions are hard to code, especially if stuff gets nested.
The usual way to report an error to a caller is to return an error as an extra return value. The canonical Read method is a well-known instance; it returns a byte count and an error.
But what if the error is unrecoverable? Sometimes the program simply cannot continue.
For this purpose, there is a built-in function panic that in effect creates a run-time error that will stop the program (but see the next section). The function takes a single argument of arbitrary type—often a string—to be printed as the program dies. It's also a way to indicate that something impossible has happened, such as exiting an infinite loop.
http://golang.org/doc/effective_go.html#errors

Write a compiler for a language that looks ahead and multiple files?

In my language I can use a class variable in my method when the definition appears below the method. It can also call methods below my method and etc. There are no 'headers'. Take this C# example.
class A
{
public void callMethods() { print(); B b; b.notYetSeen();
public void print() { Console.Write("v = {0}", v); }
int v=9;
}
class B
{
public void notYetSeen() { Console.Write("notYetSeen()\n"); }
}
How should I compile that? what i was thinking is:
pass1: convert everything to an AST
pass2: go through all classes and build a list of define classes/variable/etc
pass3: go through code and check if there's any errors such as undefined variable, wrong use etc and create my output
But it seems like for this to work I have to do pass 1 and 2 for ALL files before doing pass3. Also it feels like a lot of work to do until I find a syntax error (other than the obvious that can be done at parse time such as forgetting to close a brace or writing 0xLETTERS instead of a hex value). My gut says there is some other way.
Note: I am using bison/flex to generate my compiler.
My understanding of languages that handle forward references is that they typically just use the first pass to build a list of valid names. Something along the lines of just putting an entry in a table (without filling out the definition) so you have something to point to later when you do your real pass to generate the definitions.
If you try to actually build full definitions as you go, you would end up having to rescan repatedly, each time saving any references to undefined things until the next pass. Even that would fail if there are circular references.
I would go through on pass one and collect all of your class/method/field names and types, ignoring the method bodies. Then in pass two check the method bodies only.
I don't know that there can be any other way than traversing all the files in the source.
I think that you can get it down to two passes - on the first pass, build the AST and whenever you find a variable name, add it to a list that contains that blocks' symbols (it would probably be useful to add that list to the corresponding scope in the tree). Step two is to linearly traverse the tree and make sure that each symbol used references a symbol in that scope or a scope above it.
My description is oversimplified but the basic answer is -- lookahead requires at least two passes.
The usual approach is to save B as "unknown". It's probably some kind of type (because of the place where you encountered it). So you can just reserve the memory (a pointer) for it even though you have no idea what it really is.
For the method call, you can't do much. In a dynamic language, you'd just save the name of the method somewhere and check whether it exists at runtime. In a static language, you can save it in under "unknown methods" somewhere in your compiler along with the unknown type B. Since method calls eventually translate to a memory address, you can again reserve the memory.
Then, when you encounter B and the method, you can clear up your unknowns. Since you know a bit about them, you can say whether they behave like they should or if the first usage is now a syntax error.
So you don't have to read all files twice but it surely makes things more simple.
Alternatively, you can generate these header files as you encounter the sources and save them somewhere where you can find them again. This way, you can speed up the compilation (since you won't have to consider unchanged files in the next compilation run).
Lastly, if you write a new language, you shouldn't use bison and flex anymore. There are much better tools by now. ANTLR, for example, can produce a parser that can recover after an error, so you can still parse the whole file. Or check this Wikipedia article for more options.