AST tree in antlr. Can I add a state object to it? - antlr

Right now, we have code that translates from one AST tree to their own tree. I would prefer to get rid of this second tree(it is hard enough to understand one tree, let alone two when you join a project as trees are complex as it is). One of the reasons for this is stuff was looked up on the tree walk.
I am wondering instead if during the tree walk, I could somehow add state to CommonTree that has the lookup information?
What I would like to see is something like this
CommonTree tree = parent.getChildren().get(0);
tree.setState(myMapOrObjectState);
This allows me to attach state to some nodes where it is needed before I pass to plugins. Right now, I have this ugly recreate the whole tree and wrrap the AST tree :(.
thanks,
Dean

Related

Store tree in SQL with fast transitive mean

So, my problem is that I need to store a tree structure in an SQL database.
There are 2 types of nodes: NODES and LEAVES. Nodes store no data. Leaves store a single number.
Sometimes the new nodes and leaves may be inserted (it is okay to insert them in the middle of some hierarchy), other times they may get deleted (both leaves and nodes), they may also get updated (for example, switch parent node to another one ot get new data, for leaves).
My primary goal is to be able for each node to tell the mean value of its leaves (transitive included, i. e. leaves of child nodes, leaves of chld of child, e.t.c).
So far I have came up with multiple ideas, but I do not find them really efficient and maintainable in the means of inserting and deleting data:
Using PostgreSQL's ltree module. It allows for quick checking of a node belonging to some leaf, including, transitively. Using that we can select based on leaf being a child of a current node and then calculate a mean value. However, it seems to me, there may be issues with updating. For example, when a node switches its parent we will need to update every child node and leaf (including transitive ones) so we do not match them when, for example, try to search for leaves belonging to node before the previous parent of the switching node. (hope doesn't sound too complicated)
The second approach I've considered is using arrays in each node and leaf to store all its ancestry (like parent, grandparent, grand-grand, e.t.c). This, actually, shares the same problem considering switching parent. Moreover I have doubts about the perfomance of searching in arrays, since the hierarchy may become really deep).
Basic approaches like storing direct parent link / childs array seem not perfomant enough to me, since I will have to execute n-1 queries, where n is the depth of the tree
So now I'm wondering if there any other ideas, I have not thought of. I believe, there must be a better approach!

General stategy for designing Flexible Language application using ANTLR4

Requirement:
I am trying to develop a language application using antlr4. The language in question is not important. The important thing is that the grammar is very vast (easily >2000 rules!!!). I want to do a number of operations
Extract bunch of informations. These can be call graphs, variable names. constant expressions etc.
Any number of transformations:
if a loop can be expanded, we go ahead and expand it
If we can eliminate dead code we might choose to do that
we might choose to rename all variable names to conform to some norms.
Each of these operations can be applied independent of each other. And after application of these steps I want the rewrite the input as close as possible to the original input.
e.g. So we might want to eliminate loops and rename the variable and then output the result in the original language format.
Questions:
I see a need to build a custom Tree (read AST) for this. So that I can modify the tree with each of the transformations. However when I want to generate the output, I lose the nice abilities of the TokenStreamRewriter. I have to specify how to write each of the nodes of the tree and I lose the original input formatting for the places I didn't do any transformations. Does antlr4 provide a good way to get around this problem?
Is AST the best way to go? Or do I build my own object representation? If so how do I create that object efficiently? Creating object representation is very big pain for such a vast language. But may be better in the long run. Again how do I get back the original formatting?
Is it possible to work just on the parse tree?
Are there similar language applications which do the same thing? If so what strategy do they use?
Any input is welcome.
Thanks in advance.
In general, what you want is called a Program Transformation System (PTS).
PTSs generally have parsers, build ASTs, can prettyprint the ASTs to recover compilable source text. More importantly, they have standard ways to navigate/inspect/modify the ASTs so that you can change them programmatically.
Many offer these capabilities in the form of pattern-matching code fragments written in the surface syntax of the language being transformed; this avoids the need to forever having to know excruciatingly fine details about which nodes are in your AST and how they are related to children. This is incredibly useful when you big complex grammars, as most of our modern (and our legacy languages) all seem to have.
More sophisticated PTSs (very few) provide additional facilities for teasing out the semantics of the source code. It is pretty hard to analyze/transform most code without knowing what scopes individual symbols belong to, or their type, and many other details such as data flow. Full disclosure: I build one of these.

using file handling with AVL tree

I'm given a project in college to use trees(AVL tree, to be specific) and file handling(not very conversant with it).
But I'm not able to relate the two. I only know that files can be used to store data. But in what way can trees and file handling be connected?
I know how to implement trees but how to store it as such in files?
If you know any two of the three tree traversals (preorder, inorder, postorder), you can construct the AVL tree. Since AVL tree is a BST, inorder is known. So, store any of postorder or preorder in the file. From that, you can construct the whole tree.
See about tree-traversal.
And also how to construct tree from inorder and preorder traversals.

A tree, where each node could have multiple parents

Here's a theoretical/pedantic question: imagine properties where each one could be owned by multiple others. Furthermore, from one iteration of ownership to the next, two neighboring owners could decide to partly combine ownership. For example:
territory 1, t=0: a,b,c,d
territory 2, t=0: e,f,g,h
territory 1, t=1: a,b,g,h
territory 2, t=1: g,h
That is to say, c and d no longer own property; and g and h became fat cats, so to speak.
I'm currently representing this data structure as a tree where each child could have multiple parents. My goal is to cram this into the Composite design pattern; but I'm having issues getting a conceptual footing on how the client might go back and update previous ownership without mucking up the whole structure.
My question is twofold.
Easy: What is a convenient name for this data structure such that I can google it myself?
Hard: What am I doing wrong? When I code I try to keep the mantra, "Keep it simple, Stupid," in my head, and I feel I am breaking this credo.
My question is two fold: Easy: What is a convenient name for this data
structure such that I can google it myself?
What you have here is not a tree, it is a graph. A multimap will help you here.
But any adjacency list or adjacency matrix will give you a good start.
Here is a video on adjacency matrix and list: Youtube on adjacency matrix and list
Hard: What am I doing wrong?
This is really hard to tell. Perhaps you did not model the relationship
in a proper way. It is not that hard, given a good datastructure to start with.
And, as you asked for design patterns (but you probably found out yourself),
the Composite pattern will let you model such an setting with ease.
You have a many-to-many relationship between your owners and your territories (properties). I'm not sure what language you're working in, but this sort of thing can be easily represented and tracked in a relational database. (You'd probably want a table for each entity, and the relationship would probably require a third "junction" table. If it's necessary to be able to query "back in time", this could have some sort of "time index" column as well.)
If you are working in an object-oriented language, you might create two classes, Territory and Owner, where the Territory class has a property/member/field which is a collection of references/pointers to Owners and the Owner class has a similar collection of Territories. (One of these two collections may need to contain "weak" references depending on the language.)
In this case, some difficulty may arise if you want to be able to go back and look at the network state at some particular point earlier in time. (If this is what you need, say so and I (or someone else) can post a solution that works for that.)
I'm not sure what level of simplicity you are striving for, but in neither of these cases is updating the ownership relationships really that "hard". Maybe if you posted some code it might be easier to give you more concrete advice.
Hard to tell without more information regarding the business rules. Though I've plenty of experience designing graphs where each node could potentially have numerous parents.
A common structure is the Directed Acyclic Graph. Essential rules here are that no path through the graph can cycle back onto itself. For example take the path "A/B/C/B", this would not be valid as B repeats twice.
Valid:- "A/B/C", "D/E/C", node C has two parents E and B.
Invalid:- "A/B/C/B", node B repeats in the same path causing a cycle.

ANTLR: SQL engine - specifics beyond the AST

have always wanted to get to grips a little with Compilers and DSL's and have been trying to dabble with a SQL like engine specifically for log files.
.. I realize that there are many of these out there already, but please remember that part (well, most) of the excersize is as an excuse to learn this stuff.
I feel that I've hit a mental block though, and was hoping people could help me pass it..
Lots of the texts that I have read focus on grammar construction, which is fine - but I'm confused about the leap from having/constructing an AST to actually making it do something useful.
I have been reading the chapter in this book on interpreted languages - the one about the 'pie' language - as this seems to have the most meat about this specific part of building a language.
if I had to code something like
select x,y from "c:\temp\foo.txt" where x=1 delimited by {Commas}
Assume I have loaded the contents of the file into an ArrayList to make things easy, Then would I be building an external tree walker to traverse my AST and shuffle elements into intermediate storage (if they matched x=1) - finally printing out the last buffer which would be the result set.
Looking forward to any guidance on offer.
Cheers, Ace