Generating tree nodes on the fly - antlr

I have an ANTLRv3 grammar to transform an AST (generated and consumed by other grammars).
One part of it is to rewrite ranges defined like M:N (i.e. 1:5) into their actual list representation — M, M+1, ..., N (i.e. 1, 2, 3, 4, 5).
So, a node ^(RANGE s=INT e=INT) is transformed into a list of INT tokens.
What I currently do now is the following:
range
: ^(RANGE s=INT e=INT) -> { ToSequence(int.Parse($s.text),int.Parse($e.text)) }
;
and the ToSequence method looks like
private ITree ToSequence(int start, int end)
{
var tree = new CommonTree();
for (int i = start; i <= end; i++)
{
tree.AddChild(new CommonTree(new CommonToken(INT, i.ToString())));
}
return tree;
}
It actually works fine, transforming tree node (TIME 1 2 (RANGE 5 10) 3 4) to (TIME 1 2 5 6 7 8 9 10 3 4), but I somewhat unsure whether it is the correct and idiomatic way to do such a transformation.
So, is there any more civilized way to perform this task?

I don’t think that performing such transformations inside the parser is a good idea at all. Your code lacks any kind of validity check. What if the first number is bigger than the second one? And what if errors are detected at the next stage of processing? At that point you have already lost the knowledge about the true structure of your input so you can’t give useful error messages.
I strongly recommend building a tree reflecting the true structure of your input first and transforming the tree afterwards. The transformed node might keep references to their source nodes then.

Related

How to convert two lists into a list of pairs based on some predicate

I have a data class that describes a chef by their name and their skill level and two lists of chefs with various skill levels.
data class Chef(val name: String, val level: Int)
val listOfChefsOne = listOf(
Chef("Remy", 9),
Chef("Linguini", 7))
val listOfChefsTwo = listOf(
Chef("Mark", 6),
Chef("Maria", 8))
I'm to write a function that takes these two lists and creates a list of pairs
so that the two chefs in a pair skill level's add up to 15. The challenge is to do this using only built in list functions and not for/while loops.
println(pairChefs(listOfChefsOne, listOfChefsTwo))
######################################
[(Chef(name=Remy, level=9), Chef(name=Mark, level=6)),
(Chef(name=Linguini, level=7), Chef(name=Maria, level=8))]
As I mentioned previously I'm not to use any for or while loops in my implementation for the function. I've tried using the forEach function to create a list containing all possible pairs between two lists, but from there I've gotten lost as to how I can filter out only the correct pairs.
I think the clue is in the question here!
I've tried using the forEach function to create a list containing all possible pairs between two lists, but from there I've gotten lost as to how I can filter out only the correct pairs.
There's a filter function that looks perfect for this…
To keep things clear, I'll split out a function for generating all possible pairs.  (This is my own, but bears a reassuring resemblance to part of this answer!  In any case, you said you'd already solved this bit.)
fun <A, B> Iterable<A>.product(other: Iterable<B>)
= flatMap{ a -> other.map{ b -> a to b }}
The result can then be:
val result = listOfChefsOne.product(listOfChefsTwo)
.filter{ (chef1, chef2) -> chef1.level + chef2.level == 15 }
Note that although this is probably the simplest and most readable way, it's not the most efficient for large lists.  (It takes time and memory proportional to the product of the sizes of the two lists.)  You could improve large-scale performance by using streams (which would take the same time but constant memory). But for this particular case, it might be even better to group one of the lists by level, then for each element of the other list, you could directly look up a Chef with 15 - its level.  (That would time proportional to the sum of the sizes of the two lists, and space proportional to the size of the first list.)
Here is the pretty simple naive solution:
val result = listOfChefsOne.flatMap { chef1 ->
listOfChefsTwo.mapNotNull { chef2 ->
if (chef1.level + chef2.level == 15) {
chef1 to chef2
} else {
null
}
}
}
println(result) // prints [(Chef(name=Remy, level=9), Chef(name=Mark, level=6)), (Chef(name=Linguini, level=7), Chef(name=Maria, level=8))]

Generating unique random number

I have an application that I want to generate a list of 5 unique numbers on init three times after reading a json. So basically I want to get something like [31, 59, 62, 72, 2, 16, 2, 38, 94, 15, 55, 46, 83, 2, 10]. My challenge is that I am new to functional programming and elm and I am a bit lost. So I understand that Random.generate takes a msg and a generator and returns a cmd message and it is mostly used in the update function but this is not where I need as it is a helper function and doesn't need to communicate with the client. I think it can be used in init but I don't know how. Again my function is recursive and I don't know how to apply this logic with Random.generate recursively.
I understand my code will not work and I have tried it because Random.int does not generate a random number but a type of generate but still I don't know how to apply this to get what I want.
recursion : Int -> List a -> List number
recursion a b =
if List.length b > 5
then b
else
let
rand = Random.int 0 a
in
if(List.member rand b)
then recursion a b
else
recursion a (rand :: b)
can be called with:
recursion 50 []
I want to generate a list/array of 5 unique random 3 times.
Great question. There are two parts here:
- generating random numbers, and
- wiring this all up
You will need the Random library for the former, and a little inspection will lead you to something like
get15Randoms = Random.generate OnRandomNumbers <| Random.list 5 (int 0 100)
This has type Cmd Msg - it's an async operation that will return on a Msg.
Wiring it up will be a series of stages. You refer to doing it in 'init' but that's not how Elm is going to work for you here. In the init function you can start off your json request. Then you'll need something like
init = ( initModel
, Http.get {url =..., expect = Http.expectJson OnJson yourDecoder}
)
update msg model =
case msg of
OnJson (Ok data) ->
-- attach json
( {model | data = data }, get15Randoms )
OnRandomNumbers ints ->
( { model | ints = ints }, Cmd.none )
In other words, once the json comes back you can attach that, and use that iteration of the update to launch your random number request, and then catch the result in a subsequent iteration
Random number generation is a side-effect because it's not predictable by definition, meaning its output isn't purely determined by its inputs. In Elm all side-effects go through the update function because if side-effects were allowed anywhere there would be no guarantee that any part of your code is pure and predictable. Things might start behaving differently at random, and it would be very hard figure out why since random input can occur anywhere.
That said, init is one place where it might make sense to allow side-effects, since there isn't any state yet. But since most side-effects are not immediate and you would most likely want to present some UI to indicate that your app is loading, I assume the API just hasn't been complicated to allow for a use case so rare. Especially since there are a couple workarounds you can use:
Workaround 1 - an empty representation
Since you're using a list to contain the random numbers, you could probably just use an empty list to represent that you've not received the numbers yet. Otherwise use a Maybe, or a custom type. This might be a bit cumbersome since you have to handle the empty case every time you use it, but depending on your use case it might be acceptable.
Workaround 2 - flags
Elm allows for sending data into your program from the outside on initializtion, and will pass this to your init function. This mechanism is called flags. You can use this to generate the numbers in JavaScript before sending it in as a parameter.
In your index.html you would put:
var app = Elm.Main.init({
node: document.getElementById('elm'),
flags: Array.from({length: 15}, () => Math.floor(Math.random() * 50))
});
And in init you accept the numbers as an ordinary argument:
init : List number -> Model
init numbers =
{ myNumbers = numbes
, ...
}
I want to respond with what I ended up doing motivated by #Simon-h's answer. I had a Msg type RndGen Int and since update is a function I decided to call RndGen recursively with help from the update function and flag it off when I got the number of random numbers I needed.
update msg model =
case msg of
NoOp ->
(model, get15Randoms)
RndGen rndGen ->
if List.length (model.questions) < 15
then
if List.member rndGen model.questions
then
(model, get15Randoms)
else
({model | questions = rndGen :: model.questions }, get15Randoms)
else
(model, Cmd.none)
and
get15Randoms =
Random.generate RndGen (Random.int 0 100)
on init
init questions =
(Model (getQuestions questions) 1 True [] False "", get15Randoms)
I will like to know if my thinking aligns with what the elm community will expect.

Kotlin: Why is Sequence more performant in this example?

Currently, I am looking into Kotlin and have a question about Sequences vs. Collections.
I read a blog post about this topic and there you can find this code snippets:
List implementation:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
measure {
list
.filter { it % 3 == 0 }
.average()
}
// 8644 ms
Sequence implementation:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
measure {
sequence
.filter { it % 3 == 0 }
.average()
}
// 822 ms
The point here is that the Sequence implementation is about 10x faster.
However, I do not really understand WHY that is. I know that with a Sequence, you do "lazy evaluation", but I cannot find any reason why that helps reducing the processing in this example.
However, here I know why a Sequence is generally faster:
val result = sequenceOf("a", "b", "c")
.map {
println("map: $it")
it.toUpperCase()
}
.any {
println("any: $it")
it.startsWith("B")
}
Because with a Sequence you process the data "vertically", when the first element starts with "B", you don't have to map for the rest of the elements. It makes sense here.
So, why is it also faster in the first example?
Let's look at what those two implementations are actually doing:
The List implementation first creates a List in memory with 50 million elements.  This will take a bare minimum of 200MB, since an integer takes 4 bytes.
(In fact, it's probably far more than that.  As Alexey Romanov pointed out, since it's a generic List implementation and not an IntList, it won't be storing the integers directly, but will be ‘boxing’ them — storing references to Int objects.  On the JVM, each reference could be 8 or 16 bytes, and each Int could take 16, giving 1–2GB.  Also, depending how the List gets created, it might start with a small array and keep creating larger and larger ones as the list grows, copying all the values across each time, using more memory still.)
Then it has to read all the values back from the list, filter them, and create another list in memory.
Finally, it has to read all those values back in again, to calculate the average.
The Sequence implementation, on the other hand, doesn't have to store anything!  It simply generates the values in order, and as it does each one it checks whether it's divisible by 3 and if so includes it in the average.
(That's pretty much how you'd do it if you were implementing it ‘by hand’.)
You can see that in addition to the divisibility checking and average calculation, the List implementation is doing a massive amount of memory access, which will take a lot of time.  That's the main reason it's far slower than the Sequence version, which doesn't!
Seeing this, you might ask why we don't use Sequences everywhere…  But this is a fairly extreme example.  Setting up and then iterating the Sequence has some overhead of its own, and for smallish lists that can outweigh the memory overhead.  So Sequences only have a clear advantage in cases when the lists are very large, are processed strictly in order, there are several intermediate steps, and/or many items are filtered out along the way (especially if the Sequence is infinite!).
In my experience, those conditions don't occur very often.  But this question shows how important it is to recognise them when they do!
Leveraging lazy-evaluation allows avoiding the creation of intermediate objects that are irrelevant from the point of the end goal.
Also, the benchmarking method used in the mentioned article is not super accurate. Try to repeat the experiment with JMH.
Initial code produces a list containing 50_000_000 objects:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
then iterates through it and creates another list containing a subset of its elements:
.filter { it % 3 == 0 }
... and then proceeds with calculating the average:
.average()
Using sequences allows you to avoid doing all those intermediate steps. The below code doesn't produce 50_000_000 elements, it's just a representation of that 1...50_000_000 sequence:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
adding a filtering to it doesn't trigger the calculation itself as well but derives a new sequence from the existing one (3, 6, 9...):
.filter { it % 3 == 0 }
and eventually, a terminal operation is called that triggers the evaluation of the sequence and the actual calculation:
.average()
Some relevant reading:
Kotlin: Beware of Java Stream API Habits
Kotlin Collections API Performance Antipatterns

Kotlin stdlib operatios vs for loops

I wrote the following code:
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
for (i in src)
{
if (i % 2 == 0) dest.add(Math.sqrt(i.toDouble()))
}
IntellJ (in my case AndroidStudio) is asking me if I want to replace the for loop with operations from stdlib. This results in the following code:
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
src.filter { it % 2 == 0 }
.mapTo(dest) { Math.sqrt(it.toDouble()) }
Now I must say, I like the changed code. I find it easier to write than for loops when I come up with similar situations. However upon reading what filter function does, I realized that this is a lot slower code compared to the for loop. filter function creates a new list containing only the elements from src that match the predicate. So there is one more list created and one more loop in the stdlib version of the code. Ofc for small lists it might not be important, but in general this does not sound like a good alternative. Especially if one should chain more methods like this, you can get a lot of additional loops that could be avoided by writing a for loop.
My question is what is considered good practice in Kotlin. Should I stick to for loops or am I missing something and it does not work as I think it works.
If you are concerned about performance, what you need is Sequence. For example, your above code will be
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
src.asSequence()
.filter { it % 2 == 0 }
.mapTo(dest) { Math.sqrt(it.toDouble()) }
In the above code, filter returns another Sequence, which represents an intermediate step. Nothing is really created yet, no object or array creation (except a new Sequence wrapper). Only when mapTo, a terminal operator, is called does the resulting collection is created.
If you have learned java 8 stream, you may found the above explaination somewhat familiar. Actually, Sequence is roughly the kotlin equivalent of java 8 Stream. They share similiar purpose and performance characteristic. The only difference is Sequence isn't designed to work with ForkJoinPool, thus a lot easier to implement.
When there is multiple steps involved or the collection may be large, it's suggested to use Sequence instead of plain .filter {...}.mapTo{...}. I also suggest you to use the Sequence form instead of your imperative form because it's easier to understand. Imperative form may become complex, thus hard to understand, when there are 5 or more steps involved in the data processing. If there is just one step, you don't need a Sequence, because it just creates garbage and gives you nothing useful.
You're missing something. :-)
In this particular case, you can use an IntProgression:
val progression = 0 until 1_000_000 step 2
You can then create your desired list of squares in various ways:
// may make the list larger than necessary
// its internal array is copied each time the list grows beyond its capacity
// code is very straight forward
progression.map { Math.sqrt(it.toDouble()) }
// will make the list the exact size needed
// no copies are made
// code is more complicated
progression.mapTo(ArrayList(progression.last / 2 + 1)) { Math.sqrt(it.toDouble()) }
// will make the list the exact size needed
// a single intermediate list is made
// code is minimal and makes sense
progression.toList().map { Math.sqrt(it.toDouble()) }
My advice would be to choose whichever coding style you prefer. Kotlin is both object-oriented and functional language, meaning both of your propositions are correct.
Usually, functional constructs favor readability over performance; however, in some cases, procedural code will also be more readable. You should try to stick with one style as much as possible, but don't be afraid to switch some code if you feel like it's better suited to your constraints, either readability, performance, or both.
The converted code does not need the manual creation of the destination list, and can be simplified to:
val src = (0 until 1000000).toList()
val dest = src.filter { it % 2 == 0 }
.map { Math.sqrt(it.toDouble()) }
And as mentioned in the excellent answer by #glee8e you can use a sequence to do a lazy evaluation. The simplified code for using a sequence:
val src = (0 until 1000000).toList()
val dest = src.asSequence() // change to lazy
.filter { it % 2 == 0 }
.map { Math.sqrt(it.toDouble()) }
.toList() // create the final list
Note the addition of the toList() at the end is to change from a sequence back to a final list which is the one copy made during the processing. You can omit that step to remain as a sequence.
It is important to highlight the comments by #hotkey saying that you should not always assume that another iteration or a copy of a list causes worse performance than lazy evaluation. #hotkey says:
Sometimes several loops. even if they copy the whole collection, show good performance because of good locality of reference. See: Kotlin's Iterable and Sequence look exactly same. Why are two types required?
And excerpted from that link:
... in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.
#glee8e says that there are similarities between Kotlin sequences and Java 8 streams, for detailed comparisons see: What Java 8 Stream.collect equivalents are available in the standard Kotlin library?

Dynamically rewriting in ANTLR 3 n number of nodes from a list

I am using ANTLR 3 to parse and rewrite Answer-Set Programs (ASP). What I want to do is parse an ASP program and output an AST with some rewriting. I can easily add and remove nodes to/from the AST but what I need to do is add nodes to the root dynamically (effectively, adding new rules to the ASP program). Which nodes to add and how many is based on the input ASP program.
Below I have an example from my lexer and parser which outputs an AST. r_rule returns a LinkedHashMap that is filled based on what it matches. For each member of the LinkedHashMap, in the rewrite for r_program I want to add a new node to the root node PROGRAM. However, I cannot seem to find a way to iterate through the LinkedHashMap and add new nodes.
#members {
int rID = 0;
}
r_program
: (a=r_rule)* -> ^(PROGRAM r_rule*);
r_rule returns [LinkedHashMap<String, String> somehm]
#init {
$somehm = new LinkedHashMap<String, String>();
String strrID = Integer.toString(++rID);
}
: (head = r_head) ':-'
body=r_body[strrID] {$vartypes.putAll($body.vartypes); } -> ^(LIMPL $head ^(EXTENSION ^(NUMBER[strrID] $head)) $body);
I can use a semantic predicate but only to check a property of the LinkedHashMap. I can arbitrarily loop through the HashMap with inserted code, but I can't then, for each iteration, add child nodes or trigger a rewrite. The code generated is in fact put in the wrong place to even do this in an ugly way using Java (I can't access the root node PARENT).
What can I do about this? A completely different approach is also welcome. Many thanks!
Update 1
An example input is:
head_pred(X, Y, Z) :- body_1(X), body_1(Y), body_1(Z).
An example AST is, apologies for the drawing (n.b. strictly an example used for readability, in reality many more nodes are used in the rewrite)...
PROGRAM
|
|____:-
| |____head_pred(X, Y, X)
| |____body_1(X)
| |____body_1(Y)
| |____body_1(Z)
| |____X == Y
|
|____:-
| |____head_pred(X, Y, X)
| |____body_1(X)
| |____body_1(Y)
| |____body_1(Z)
| |____X == Z
I could go on, the idea is that each rule binds the variables differently, if they can be bound. Different inputs change the number and content of the children of PROGRAM.