code generation for a compiler based on ANTLR - antlr

I am working on a compiler for a language and I have got some problems with operator precedence.
The language is pretty simple, nothing complicated. It has functions, statements, expressions, etc. It's based on a youtube series of videos "Let#s build a compiler".
There is already code generated for the Java Platform (JVM) and everything is fine there.
However I wanted to be able to also generate code for the INTEL platform and so I decided to add a code generator for the 80x86.
I'm still at the very beginning, I have a functional toolchain. I can generate assembly code and then have the assembler (masm32) translate the code to object files and then the linker creates an exe file out of it.
So far so good - however I have problems in generating code for performing divisions and multiplications. In the grammar I first perform divisions and then perform multiplications.
The parser uses a visitor, to be exactly there is one for the JVM platform and one for the 80x86 platform (Windows).
The main problem appears to me that the parser creates a stack by calling the visit-methods and that is very well suited for the JVM platform, since it is also stack based.
However the 80x86 platform works with registers, so I have to load the operands into registers.
For example an expression like "8 * 4 / 2" should evaluate to 16 and the relevant assembly code should look like
mov eax, 8
mov ebx, 4
imul eax,ebx
mov ebx, 2
idiv ebx
However the code generated is
mov eax, 8
mov eax, 4
mov ebx, 2
idiv ebx
imul eax, ebx
Due to the stackbased approach there are two consecutive mov-statements for eax, which is of course completely wrong, since the first value for eax gets overwritten with the second one.
So my question is how to transform the stackbased approach in a more "linear" approach?
I hope that the wording of my question is comprehensible and not too complicated.
Thanks a lot for reading and answering!
Here is the relevant part of the grammar:
expression:
left=expression DIV right=expression #Div
| left=expression MUL right=expression #Mul
| left=expression MINUS right=expression #Minus
| left=expression PLUS right=expression #Plus
| '(' expression ')' #Parens
| left=expression operator=(LT | LE | GT | GE | EQUAL | NOTEQUAL) right=expression #Relational
| left=expression AND right=expression #And
| left=expression OR right=expression #Or
| number=NUMBER #Number
| text=STRING #String
| varName=IDENTIFIER #Variable
| functionCall #FuncCallExpression
;

Related

Grammars: How to add a level of precedence

So lets say I have the following Context Free Grammar for a simple calculator language:
S->TS'
S'->OP1 TE'|e
T->FT'
T'->OP2 FT'|e
F->id|(S)
OP1->+|-
OP2->*|/
As one can see the * and / have higher precedence over + and -.
However, how can I add another level of precedence? Example would be for exponents, ^, (ex:3^2=9) or something else? Please explain your procedure and reasoning on how you got there so I can do it for other operators.
Here's a more readable grammar:
expr: sum
sum : sum add_op term
| term
term: term mul_op factor
| factor
factor: ID
| '(' expr ')'
add_op: '+' | '-'
mul_op: '*' | '/'
This can be easily extended using the same pattern:
expr: bool
bool: bool or_op conj
| conj
conj: conj and_op comp
| comp
/* This one doesn't allow associativity. No a < b < c in this language */
comp: sum comp_op sum
sum : sum add_op term
| term
term: term mul_op factor
| factor
/* Here we'll add an even higher precedence operators */
/* Unlike the other operators, though, this one is right associative */
factor: atom exp_op factor
| atom
atom: ID
| '(' expr ')'
/* I left out the operator definitions. I hope they are obvious. If not,
* let me know and I'll put them back in
*/
I hope the pattern is more or less obvious there.
Those grammars won't work in a recursive descent parser, because recursive descent parsers choke on left recursion. The grammar you have has been run through a left-recursion elimination algorithm, and you could do that to the grammar above as well. But note that eliminating left recursion more or less erases the difference between left- and right-recursion, so after you identify the parse with a recursive descent grammar, you need to fix it according to your knowledge about the associativity of the operator, because associativity is no longer inherent in the grammar.
For these simple productions, eliminating left-recursion is really simple, in two steps. We start with some non-terminal:
foo: foo foo_op bar
| bar
and we flip it around so that it is right associative:
foo: bar foo_op foo
| bar
(If the operator was originally right associative, as with exponentiation above, then this step isn't needed.)
Then we need to left-factor, because LL parsing requires that every alternative for a non-terminal has a unique prefix:
foo : bar foo'
foo': foo_op foo
| ε
Doing that to every recursive production above (that is, all of them except for expr, comp and atom) will yield a grammar which looks like the one you started with, only with more operators.
In passing, I emphasize that there is no mysterious magical force at work here. When the grammar says, for example:
term: term mul_op factor
| factor
what it's saying is that a term (or product, if you prefer) cannot be the right-hand argument of a multiplication, but it can be the left-hand argument. It's also saying that if you're at a point in which a product would be valid, you don't actually need something with a multiplication operator; you can use a factor instead. But obviously you cannot use a sum, since factor doesn't parse expressions with a sum operator. (It does parse anything inside parentheses. But those are things inside parentheses.)
That's the sense in which both associativity and precedence are implicit in the grammar.

mov command with vs. without parenthesis?

I don't understand what is the difference between these two statements
mov [var] , 10
and
mov var,10
in assembly?
For a variable like this:
var: db 0
The instruction mov var,10 would not be allowed by NASM, because in NASM syntax writing var like that (without square brackets) means that you want the address of var as an immediate. And there's no variant of mov that takes an immediate, immediate operand pair.
Adding the square brackets makes it a reference to an address in memory. So mov [var], 10 means store the value 10 at var. Actually you'd have to specify the size of the value to store as well, e.g. mov byte [var], 10. Otherwise NASM doesn't know if you want to store a byte, a word, or a dword, because the immediate 10 could be represented in any of those sizes.
Note that in MASM/TASM syntax mov var, 10 and mov [var], 10 would mean the same thing in this case (they would both have the same meaning as mov [var], 10 in NASM sytax).

x86 64 AT&T , moving part of register into another register

I'd like to move one byte from register rdx to register rbx, like this:
mov %rdx , (%rbx,%r15,1)
where
rdx contains 0x33
,r15 is index and rbx contains 0 at start.
I have tried using this method in many ways , always ending with SIGSEGV error.
In the end I am going to create a rbx register which will contain an array of next rdx values
You can shift the bytes in one at a time, like this:
; Calculate first dl
...
mov %dl,%bl
; Calculate next dl
...
shlq $8,%rbx
mov %dl,%bl
; Calculate next dl
...
shlq $8,%rbx
mov %dl,%bl
etc. This assumes that you want the first byte in the msb, and the last byte in the lsb. The revesre order is a bit more complicated, but not much.

Want to understand load imputed by floating point instructions

At outset, this may be a part-discussion part-solving kind of questions. No intent to offend anyone there.
I have written in 64 bit assembly the algorithm to generate MT Prime based random number generator for 64 bits. This generator function is required to be called 8 billion times to populate an array of size 2048x2048x2048, and generate a random no between 1..small_value (usually, 32)
Now I had two next steps possibilities :
(a) Keep generating numbers, compare with the limits [1..32] and discard those that don't fall within. The run time for this logic is 181,817 ms, measured by calling clock() function.
(b) take the 64 bit random number output in RAX,and scale it using FPU to be between [0..1], and then scale it up in the desired range [1..32] The code sequence for this is as below :
mov word ptr initialize_random_number_scaling,dx
fnclex ; clears status flag
call generate_fp_random_number ; returns a random number in ST(0) between [0..1]
fimul word ptr initialize_random_number_scaling ; Mults ST(0) & stores back in ST(0)
mov word ptr initialize_random_number_base,ax ; Saves base to a memory
fiadd word ptr initialize_random_number_base ; adds the base to the scaled fp number
frndint ; rounds off the ST(0)
fist word ptr initialize_random_number_result ; and stores this number to result.
ffree st(0) ; releases ST(0)
fincstp ; Logically pops the FPU
mov ax, word ptr initialize_random_number_result ; and saves it to AX
And the instructions in generate_fp_random_number are as below :
shl rax,1 ; RAX gets the original 64 bit random number using MT prime algorithm
shr ax,1 ; Clear top bit
mov qword ptr random_number_generator_act_number,rax ; Save the number in memory as we cannot move to ST(0) a number from register
fild qword ptr random_number_generator_max_number ; Load 0x7FFFFFFFFFFFFFFFH
fild qword ptr random_number_generator_act_number ; Load our number
fdiv st(0),st(1) ; We return the value through ST(0) itself, divide our random number with max possible number
fabs
ffree st(1) ; release the st(1)
fld1 ; push to top of stack a 1.0
fcomip st(0), st(1) ; compares our number in ST(1) with ST(0) and sets CF.
jc generate_fp_random_get_next_no ; if ST(0) (=1.0) < ST(1) (our no), we need a new no
fldz ; push to top of stack a 0.0
fcomip st(0),st(1) ; if ST(0) (=0.0) >ST(1) (our no) clears CF
jnc generate_fp_random_get_next_no ; so if the number is above zero the CF will be set
fclex
The problem is, just by adding these instructions, the run time jumps to a whopping 5,633,963 ms! I have written the above using xmm registers as an alternative, and the difference is absolutely marginal. (5,633,703 ms).
Would anyone kindly guide me on what degree of load do these additional instructions impute to the total run time? Is the FPU really this slow ? Or am I missing a trick? As always, all ideas are welcome and am grateful for your time and efforts.
Env : Windows 7 64 bit on Intel 2700K CPU overclocked to 4.4 GHz 16 GB RAM debugged in VS 2012 Express environment
"mov word ptr initialize_random_number_base,ax ; Saves base to a memory"
If you want the max speed you must find out how to separate write instructions and write data into different sections of memory
Rewriting data in the same area of cache creates a "self modifying code" situation
Your compiler may do this, it may not.
You need to know this because unoptimised assembly code runs 10 to 50 times slower
"All modern processors cache code and data memory for efficiency. Performance of assembly-language code can be seriously impaired if data is written to the same block of memory as that in which the code is executing, because it may cause the CPU repeatedly to reload the instruction cache (this is to ensure self-modifying-code works correctly). To avoid this, you should ensure that code and (writable) data do not occupy the same 2 Kbyte block of memory. "
http://www.bbcbasic.co.uk/bbcwin/manual/bbcwina.html#cache
There's a ton of stuff in your code that I can see no reason for. If there was a reason, feel free to correct me, but otherwise here are my alternatives:
For generate_fp_random_number
shl rax, 1
shr rax, 1
mov qword ptr act_number, rax
fild qword ptr max_number
fild qword ptr act_number
fdivrp ; divide actual by max and pop
; and that's it. It's already within bounds.
; It can't be outside [0, 1] by construction.
; It can't be < 0 because we just divided two positive number,
; and it can't be > 1 because we divided by the max it could be
For the other thing:
mov word ptr scaling, dx
mov word ptr base, ax
call generate_fp_random_number
fimul word ptr scaling
fiadd word ptr base
fistp word ptr result ; just save that thing
mov ax, word ptr result
; the default rounding mode is round to nearest,
; so the slow frndint is unnecessary
Also note the complete lack of ffree's etc. By making the right instruction pop, it all just worked out. It usually does.

Is there a hardware unit called " 2's complement "?

I understand that in order to do a subtraction you should do a 2's complement transformation to the second number .
Is there a dedicated Hardware for that checks the MSB and if it is found to be 1 it does the transformation ?
Also , Is this system used for subtraction of floating points ?
The Two's Complement operation is implemented in most languages with the unary - operator. It is only used with signed integer types. It can be implemented in an ALU as either a distinct negation (e.g. NEG) instruction or rolled into another operation, for example when you use a subtract (e.g. SUB) instruction instead of an add (e.g. ADD) instruction.
Your first question is unclear because "the last bit" could refer to either the most-significant bit (MSB) or least significant bit (LSB). In a signed integer, the MSB indicates sign; checking for a negative is usually implemented as the N bit in the condition code register, which is updated from the result of the last instruction executed (though several instructions do not change the condition code register). Computing the two's complement only if the original number is negative is the absolute value (e.g. ABS) operation. Checking the LSB just tells you if the integer is even or odd.
Floating point numbers use a separate sign bit, so 0 and -0 are distinct values. Two's compliment does not work with floating point values; a different approach must be used.
EDIT: An example. Consider the following C code:
#include <stdlib.h>
int do_math(int a, int b)
{
return a - b;
}
int main(int argc, char* const argv[])
{
if(argc < 2)
return 0;
return do_math(atoi(argv[1]), atoi(argv[2]));
}
This can be run with:
$ gcc -O0 foo.c -o foo
$ ./foo 20 10; echo $?
10
On x86_64, the function do_math() contains the following code:
_do_math:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
subl %edx, %eax
leave
ret
The first two lines are the preamble, setting up the stack for the function. The next four lines fetch the input parameters from the stack (since optimization was disabled, parameters weren't passed in registers).
Then the key instruction: subl, which takes the second parameter (%eax, the x86's Extended AX register, 32 bits in size) and subtracts it from the first parameter (%edx, the x86's Extended DX register, also 32 bits in size), storing the result back into %edx. In the ALU, the subl instruction takes the first parameter as-is and adds the two's complement of the second parameter. It calculates the two's complement by inverting the second parameter's bits (similar to the ~ operator in C) and then using a dedicated adder to add 1. This step could be pipelines, it could be optimized so both it and the final addition complete in one cycle, or they could go a step further and roll the two's complement logic into the ALU's adder chain.
The last two lines clean up the stack and return. (The x86 calling conventions store the result in %edx.
EDIT 2: Use the -S option to gcc to generate an assembly file (same name as input file except .c suffix is replaced with .s). For example: gcc -O0 foo.c -S (Had I not turned off the optimizer with -O0, the entire do_math() function could have been inlined into main(), making it much harder to see.)
Look, you don't have to check the number ever. If number is -ve it is stored in 2's complemented form in the memory. And you are using that number CPU changes the number to your calculations itself. You dnt need to check anything. you have to perform operations
No and no.
The transformation is done by code running on the CPU.