how to insert tokens or contexts in my listener class in antlr4 - antlr

I have an antlr4 grammar file that parses a BASIC language. Is there a way to insert more code in my extended baseListener class?
For example, if I am parsing this code:
10 print "hello world"
%include "moreCode.bas"
print "after include"
moreCode.bas could be something like:
for t% = 1% to 10%
print t%
next t%
I need to detect the include command and include the contents into the file being walked and continue walking it as a whole.
So I was thinking that in my enterIncludeCommand method in my listener class I would start a new parser for moreCode.bas and then somehow insert the tokens/contexts into my current one.
What is the correct way of doing this?

There is no one right pattern. That said, one effective way is to have your main initiate the parser by always calling through a constructor that takes a state object and a source path as parameters
public class BasicParser {
public static void main(String[] args) {
...
StateModel state = new StateModel()
RecurseParser rp = new RecurseParser(state, pathname);
...
}
}
public class RecurseParser {
public RecurseParser(StateModel state, String pathname) {
this.state = state;
this.pathname = pathname; // source text to parse
...
}
public StateModel getResults() {
return this.state
}
In your enterIncludeStatement method, create and run a new RecurseParser instance directly. In your exitIncludeStatement, retrieve the new current state and, as appropriate, validate/check for errors.
Since the state model encapsulates your symbol table, etc., you maintain continuity as you walk through the forest -- recursion is effectively free.
Should mention that, relative to the symbol table, treat executing an include essentially the same as calling a subroutine.
Related: Symbol Table

I have two solutions for this and I took the last one I am going to mention. Also GRosenBerg has a great idea too.
1) use the TokenStreamRewriter and in the enterIncludeStatement use the rewriter insertBefore, insertAfter and/or replace methods. At the end of the walk of that particular listener object, call the rewriter getText() and that will give you the combined string. You will have to reparse that text to go the next listener pass.
2) In the enterIncludeStatement method of the listener class, get the include file name, run the lexer/parser on it and then take the first StatementContext(in my case) and inject it into the current tree using the IncludeContext.AddChile(myStatement). Looping for each statement line in that include file. The tricky part is to include the statements in the correct place but you will end up with a complete tree that you can walk with the next listener class phase.
I used option 2 and its working for me so far however I'm not sure using the addChild method is the best way since I am really inserting siblings not children. Given this siblings/childrens issue then maybe grosenberg's recursive idea would be the best.

Related

Structural Search to match method call with generic parameter

Let's say that I have a class class Foo : Base and I want to perform a certain method call with signature
public void someStuf(Iterable<? extends Base> param)
For the search template I just take as starting point the pre-existing one
$Instance$.$MethodCall$($Parameter$)
Is it possible to match a parameter of any kind Iterable of a specific Base subclass (Foo in this example)??
List<Foo> fooList = new ArrayList<>();
fooList.add(new Foo("A"));
List<Bar> barList = new ArrayList<>();
barList.add(new Bar(1));
someStuff(fooList); // find this!
someStuff(barList); // don't find this one
someStuff(Collections.singletonList(new Foo("B"))); // also match this one
I've tried several combinations without any luck, is it even possible to do?
This was previously not possible without resorting to a hack. See further below for that. Currently, the code can be found using a Type modifier.
search template
$Instance$.$MethodCall$($Parameter$)
variables
$Instance$ Count=[0,1]
$MethodCall Text=someStuff
$Parameter$ Type=Iterable<Foo>, within type hierarchy
hack
The hack previously needed used a Script modifier and a simple Type modifier on the $Parameter$ variable:
$Parameter$
Script=__context__.type.parameters[0].presentableText.contains("Foo")
Type=Iterable
The related bug report is fixed since IntelliJ IDEA 2017.3.

How to define a set of input parameters in Pex?

Say I have MyClass with 100s of fields.
If I use an object of MyClass as an input param, Pex would simply choke trying to generate all possible combinations (mine runs into 1000s of paths even on a simple test)
[PexMethod]
void MytestMethod(MyClass param){...}
How can I tell Pex to use only a set of predefined objects of MyClass rather than having it trying to be smart and generate all possible combinations to test?
In other words I want to manually specify a list of possible states for param in the code above and tell Pex to use it
Cheers
If you find that Pex is generating large amounts of irrelevant, redundant, or otherwise unhelpful inputs, you can shape the values that it generates for your parametrized unit tests' input using PexAssume, which will ensure that all generated inputs meet a set of criteria that you provide.
If you were wanting to ensure that arguments came from a predefined collection of values, for instance, you could do something like this:
public void TestSomething(Object a) {
PexAssume.IsTrue(someCollection.Contains(a));
}
PexAssume has other helper methods as well for more general input pruning, such as IsNotNull, AreNotEqual, etc. What little documentation is out there suggests that there is some collection-specific functionality as well, though if those methods exist, I'm not familiar with them.
Check out the Pex manual for a bit more information.
Pex will not try to generate every possible combination of values. Instead, it analyses your code and tries to cover every branch. So if you have
if (MyObject.Property1 == "something")
{
...
}
then it will try to create an object that has Property1 == "something". So limiting the tests to some predefined objects is rather against the 'Pex philosophy'. That said, you may find the following information interesting.
You can provide a Pex factory class. See, for instance, this blog post or this one.
[PexFactoryClass]
public partial class EmployeeFactory
{
[PexFactoryMethod(typeof(Employee))]
public static Employee Create(
int i0,
string s0,
string s1,
DateTime dt0,
DateTime dt1,
uint ui0,
Contract c0
)
{
Employee e0 = new Employee();
e0.EmployeeID = i0;
e0.FirstName = s0;
e0.LastName = s1;
e0.BirthDate = dt0;
e0.StartDateContract = dt1;
e0.Salary = ui0;
e0.TypeContract = c0;
return e0;
}
}
Pex will then call this factory class (instead of a default factory) using appropriate values it discovers from exploring your code. The factory method allows you to limit the possible parameters and values.
You can also use PexArguments attribute to suggest values, but this will not prevent Pex from trying to generate other values to cover any branches in your code. It just tries the ones you provide first.
[PexArguments(1, "foo")] // try this first
void MyTest(int i, string s)
{
...
}
See here for more information on PexArguments and also search for 'seed values' in the PDF documentation on Parameterized Test Patterns.

Where should the responsibility for parsing the input stream be in this scenario?

Say if I am parsing readings from a handheld device of some sort via an input stream. There are readings of different types, and each need parsing differently.
Currently I have a class "handheld" that handles all parsing and creates reading objects of the appropriate type as required. It parses the reading and populates each reading via their "set" methods.
I'm wondering though if the readings themselves should know how to parse the input stream. For instance, when the next reading comes along, should I instantiate the appropriate reading object and call a "parse" method on it, passing it in the input stream?
The main thing I don't like about this is the parsing code is all over the place rather than kept neatly in one place. It does however get rid of the need for all those set methods and the reading can just apply itself to the server/database/whatever when required via the "apply" method I have.
So which would be considered the "nicer" (or more OO) way?
I would go by creating a Factory design pattern.
Create a base class to represent GeneralParser and make a child class for each parser and if there was something common in the parsing method, let it be in the base GeneralParser's Parse method and call base.parse method in child.parse method.
I am sure you have a way to determine which parser to use, and I think currently you're using control statements (if, switch...) and do the parsing. Well now instead of that let the specialized (child) parser class handle it for you.
Pseudo class diagram:
GeneralParser
|
|
->XMLParser
->JsonParser
Here is some implementation in C#.Net
public static class ParserFactory
{
public static GeneralParser CreateXMLParser()
{
return new XMLParser();
}
public static GeneralParser CreateJsonParser()
{
return new JSONParser();
}
}
In your program code, you may write something like this (pseudo-code) because it depends on the way that you're deciding which parser to use.
// ...
GeneralParser parser;
if( to_be_parsed_as_xml)
{
parser = ParserFactory.CreateXMLParser();
parser.Parse(stream);
}
else if( to_be_parsed_as_json )
{
parser = ParserFactory.CreateJsonParser();
parser.Parse(stream);
}
// ...
You can create a parser on the fly (without keeping its reference) if you only need parsers to parse and nothing more.

Should private functions modify field variable, or use a return value?

I'm often running into the same trail of thought when I'm creating private methods, which application is to modify (usually initialize) an existing variable in scope of the class.
I can't decide which of the following two methods I prefer.
Lets say we have a class Test with a field variable x. Let it be an integer. How do you usually modify / initialize x ?
a) Modifying the field directly
private void initX(){
// Do something to determine x. Here its very simple.
x = 60;
}
b) Using a return value
private int initX(){
// Do something to determine x. Here its very simple.
return 60;
}
And in the constructor:
public Test(){
// a)
initX();
// b)
x = initX();
}
I like that its clear in b) which variable we are dealing with. But on the other hand, a) seems sufficient most of the time - the function name implies perfectly well what we are doing!
Which one do you prefer and why?
Thank for your answers guys! I'll make this a community wiki as I realize that there is no correct answer to this.
I usually prefer b), only I pick a different name, like computeX() in this case. A few reasons for why:
if I declare computeX() as protected, there is a simple way for a subclass to influent how it works, yet x itself can remain a private field;
I like to declare fields final if that's what they are; in this case a) is not an option since initialization has to happen in compiler (this is Java-specific, but your examples all look Java as well).
That said, I don't have a strong preference between the two methods. For instance, if I need to initialize several related fields at once, I will usually pick option a). That, though, only if I cannot or don't want for some reason, to initialize directly in constructor.
For initialization I prefer constructor initialization if it's possible,
public Test():x(val){...}, or write initialization code in the constructor body. Constructor is the best place to initialize all the fields (actually, it is the purpose of constructor). I'd use private initX() approach only if initialization code for X is too long (just for readability) and call this function from constructor. private int initX() in my opinion has nothing to do with initialization(unless you implement lazy initialization,but in this case it should return &int or const &int) , it is an accessor.
I would prefer option b), because you can make it a const function in languages that support it.
With option a), there is a temptation for new, lazy or just time-stressed developers to start adding little extra tasks into the initX method, instead of creating a new one.
Also, in b), you can remove initX() from the class definition, so consumers of the object don't even have to know it's there. For example, in C++.
In the header:
class Test {
private: int X;
public: Test();
...
}
In the CPP file:
static int initX() { return 60; }
Test::Test() {
X = initX();
}
Removing the init functions from the header file simplifies the class for the people that have to use it.
Neither?
I prefer to initialize in the constructor and only extract out an initialization method if I need a lot of fields initialized and/or need the ability to re-initialize at another point in the life time of an instance (without going through a destruct/construct).
More importantly, what does 60 mean?
If it is a meaningful value, make it a const with a meaningful name: NUMBER_OF_XXXXX, MINUTES_PER_HOUR, FIVE_DOZEN_APPLES, SPEED_LIMIT, ... regardless of how and where you subsequently use it (constructor, init method or getter function).
Making it a named constant makes the value re-useable in and of itself. And using a const is much more "findable", especially for more ubiquitous values (like 1 or -1) then using the actual value.
Only when you want to tie this const value to a specific class would it make sense to me to create a class const or var, or - it the language does not support those - a getter class function.
Another reason to make it a (virtual) getter function would be if descendant classes need the ability to start with a different initial value.
Edit (in response to comments):
For initializations that involve complex calculations I would also extract out a method to do the calculation. The choice of making that method a procedure that directly modifies the field value (a) or a function that returns the value it should be given (b), would be driven by the question whether or not the calculation would be needed at other times than "just the constructor".
If only needed at initialization in the constructor, I would prefer method (a).
If the calculation needs to be done at other times as well, I would opt for method (b) as it also makes it possible to assign the outcome to some other field or local variable and so can be used by descendants or other users of the class without affecting the inner state of the instance.
Actually only a) method behaves as expected (by analyzing method name). Method b) should be named 'return60' in your example or 'getXValue' in some more complicated one.
Both options are correct in my opinion. It all depeneds what was your intention when certain design was choosen. If your method has to do initialization only I would prefer a) beacuse it is simplier. In case x value is also used for something else somewhere in logic using b) option might lead to more consistent code.
You should also always write method names clearly and make those names corresponding with actual logic. (in this case method b) has confusing name).
#Frederik, if you use option b) and you have a LOT of field variables, the constructor will become a quite unwieldy block of code. Sometimes you just can't help but have lots and lots of member variables in a class (example: it's a domain object and it's data comes straight from a very wide table in the database). The most pragmatic approach would be to modularize the code as you need to.

Is there any disadvantage of writing a long constructor?

Does it affect the time in loading the application?
or any other issues in doing so?
The question is vague on what "long" means. Here are some possible interpretations:
Interpretation #1: The constructor has many parameters
Constructors with many parameters can lead to poor readability, and better alternatives exist.
Here's a quote from Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters:
Traditionally, programmers have used the telescoping constructor pattern, in which you provide a constructor with only the required parameters, another with a single optional parameters, a third with two optional parameters, and so on...
The telescoping constructor pattern is essentially something like this:
public class Telescope {
final String name;
final int levels;
final boolean isAdjustable;
public Telescope(String name) {
this(name, 5);
}
public Telescope(String name, int levels) {
this(name, levels, false);
}
public Telescope(String name, int levels, boolean isAdjustable) {
this.name = name;
this.levels = levels;
this.isAdjustable = isAdjustable;
}
}
And now you can do any of the following:
new Telescope("X/1999");
new Telescope("X/1999", 13);
new Telescope("X/1999", 13, true);
You can't, however, currently set only the name and isAdjustable, and leaving levels at default. You can provide more constructor overloads, but obviously the number would explode as the number of parameters grow, and you may even have multiple boolean and int arguments, which would really make a mess out of things.
As you can see, this isn't a pleasant pattern to write, and even less pleasant to use (What does "true" mean here? What's 13?).
Bloch recommends using a builder pattern, which would allow you to write something like this instead:
Telescope telly = new Telescope.Builder("X/1999").setAdjustable(true).build();
Note that now the parameters are named, and you can set them in any order you want, and you can skip the ones that you want to keep at default values. This is certainly much better than telescoping constructors, especially when there's a huge number of parameters that belong to many of the same types.
See also
Wikipedia/Builder pattern
Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters (excerpt online)
Related questions
When would you use the Builder Pattern?
Is this a well known design pattern? What is its name?
Interpretation #2: The constructor does a lot of work that costs time
If the work must be done at construction time, then doing it in the constructor or in a helper method doesn't really make too much of a difference. When a constructor delegates work to a helper method, however, make sure that it's not overridable, because that could lead to a lot of problems.
Here's some quote from Effective Java 2nd Edition, Item 17: Design and document for inheritance, or else prohibit it:
There are a few more restrictions that a class must obey to allow inheritance. Constructors must not invoke overridable methods, directly or indirectly. If you violate this rule, program failure will result. The superclass constructor runs before the subclass constructor, so the overriding method in the subclass will be invoked before the subclass constructor has run. If the overriding method depends on any initialization performed by the subclass constructor, the method will not behave as expected.
Here's an example to illustrate:
public class ConstructorCallsOverride {
public static void main(String[] args) {
abstract class Base {
Base() { overrideMe(); }
abstract void overrideMe();
}
class Child extends Base {
final int x;
Child(int x) { this.x = x; }
#Override void overrideMe() {
System.out.println(x);
}
}
new Child(42); // prints "0"
}
}
Here, when Base constructor calls overrideMe, Child has not finished initializing the final int x, and the method gets the wrong value. This will almost certainly lead to bugs and errors.
Interpretation #3: The constructor does a lot of work that can be deferred
The construction of an object can be made faster when some work is deferred to when it's actually needed; this is called lazy initialization. As an example, when a String is constructed, it does not actually compute its hash code. It only does it when the hash code is first required, and then it will cache it (since strings are immutable, this value will not change).
However, consider Effective Java 2nd Edition, Item 71: Use lazy initialization judiciously. Lazy initialization can lead to subtle bugs, and don't always yield improved performance that justifies the added complexity. Do not prematurely optimize.
Constructors are a little special in that an unhandled exception in a constructor may have weird side effects. Without seeing your code I would assume that a long constructor increases the risk of exceptions. I would make the constructor as simple as needed and utilize other methods to do the rest in order to provide better error handling.
The biggest disadvantage is probably the same as writing any other long function -- that it can get complex and difficult to understand.
The rest is going to vary. First of all, length and execution time don't necessarily correlate -- you could have a single line (e.g., function call) that took several seconds to complete (e.g., connect to a server) or lots of code that executed entirely within the CPU and finished quickly.
Startup time would (obviously) only be affected by constructors that were/are invoked during startup. I haven't had an issue with this in any code I've written (at all recently anyway), but I've seen code that did. On some types of embedded systems (for one example) you really want to avoid creating and destroying objects during normal use, so you create almost everything statically during bootup. Once it's running, you can devote all the processor time to getting the real work done.
Constructor is yet another function. You need very long functions called many times to make the program work slow. So if it's only called once it usually won't matter how much code is inside.
It affects the time it takes to construct that object, naturally, but no more than having an empty constructor and calling methods to do that work instead. It has no effect on the application load time
In case of copy constructor if we use donot use reference in that case
it will create an object and call the copy constructor and passing the
value to the copy constructor and each time a new object is created and
each time it will call the copy constructor it goes to infinite and
fill the memory then it display the error message .
if we pass the reference it will not create the new object for storing
the value. and no recursion will take place
I would avoid doing anything in your constructor that isn't absolutely necessary. Initialize your variables in there, and try not to do much else. Additional functionality should reside in separate functions that you call only if you need to.