Why does the Rust compiler perform a copy when moving an immutable value? - optimization

My intuition must be wrong about moves and copies. I would expect the Rust compiler optimize away moves of an immutable value as a no-op. Since the value is immutable, we can safely reuse it after the move. But Rust 1.65.0 on Godbolt compiles to assembly that copies the value to a new position in memory. The Rust code that I am studying:
pub fn f_int() {
let x = 3;
let y = x;
println!("{}, {}", x, y);
}
The resulting assembly with -C opt-level=3:
; pub fn f_int() {
sub rsp, 88
; let x = 3;
mov dword ptr [rsp], 3
; let y = x;
mov dword ptr [rsp + 4], 3
mov rax, rsp
...
Why does let y = x; result in mov dword ptr [rsp + 4], 3 and mov rax, rsp? Why doesn't the compiler treat y as the same variable as x in the assembly?
(This question looks similar but it is about strings which are not Copy. My question is about integers which are Copy. It looks like what I am describing is not a missed optimization opportunity but a fundamental mistake in my understanding.)

I would not call it a fundamental mistake in your understanding, but there are some interesting observations here.
First, println!() (and the formatting machinery in particular) is surprisingly hard to optimize, due to its design. So the fact that with println!() it was not optimized is not surprising.
Second, it is generally not obvious it is OK to perform this optimization, because it observably make the addresses equivalent. And println!() takes the address of the printed values (and passes them to an opaque function). In fact, Copy types are harder to justify than non-Copy types in that regard, because with Copy types the original variable may still be used after a move while with non-Copy types it is possible that not.

If you change your example like this
pub fn f_int() -> i32 {
let x = 3;
let y = x;
// println!("{}, {}", x, y);
x+y
}
the optimisation takes place
example::f_int:
mov eax, 6
ret
The println!() macro (as well as write!()...) takes references on its parameters and provides the formatting machinery with these references.
Probably, the compiler deduces that providing some functions (that are not inlined) with references requires the data being stored somewhere in memory in order to have an address.
Because the type is Copy, the semantics implies that we have two distinct storages, otherwise, sharing the storage would have been an optimisation for a move operation (not a copy).

Related

obj-c pass struct pointer to arm assembly function - corruption [duplicate]

This question already has an answer here:
return floats to objective-c from arm assembly function
(1 answer)
Closed 6 years ago.
I have a structure in obj-c. I pass a pointer to this structure to an arm assembly function that i've written. When i step into the code i see the pointer get successfully passed in and i can access and modify the values of the structure elements from within my asm code. Life is good - until i return from the asm function. After returning to the calling obj-c code the structure values are all hosed. I can't figure out why. Below are the relevant pieces of my code.
struct myValues{ // define my structure
int ptr2A; // pointer to first float
float A;
float B;
float C;
float D;
float E;
float F;
}myValues;
struct myValues my_asm(int ptr2a, float A, float B, float C, float D, float E, float F); // Prototype for the ASM function
…code here to set values of A-F...
float* ptr2A = &myValues.A; //get the memory address where A is stored
myValues.ptr2A = ptr2A; //put that address into myValues.ptr2A and pass to the ASM function
// now call the ASM code
myValues = my_asm(myValues.ptr2A, myValues.A, myValues.B, myValues.C, myValues.D, myValues.E, myValues.F);
Here is relevant part of my asm code:
mov r5, r1 // r1 has pointer to the first float A
vdiv.f32 s3, s0, s0 //this line puts 1.0 in s3 for ease in debugging
vstr s3, [r5] // poke the 1.0 into the mem location of A
bx lr
When i step through the code everything works as expected and i end up with a 1.0 in the memory location for A. But, once i execute the return (bx lr) and return to the calling obj-c code the values in my structure become garbage. I've dug through the ABI and AACPS (as successfully as a novice probably can) but can't get this figured out. What is happening after that "bx lr" to wack the structure?
Below is "Rev 1" of my asm code. I removed everything except these lines:
_my_asm:
vdiv.f32 s3, s0, s0 // s3 = 1.0
vstr s3, [r1]
bx lr
Ok, this was solution for me. Below is "Rev 2" of the relevant pieces of my obj-c code. I was conflating passing a pointer with passing a copy of the structure - totally hose. This code just passes a pointer to the first float in my struct...which my asm code picks up from general register r0. Man, i'm hard headed. ;-)
void my_asm2(int myptr); // this is my prototype.
This is where i call the asm2 code from my obj-c code:
my_asm2(&myValues.A);
My asm2 code looks like this:
_my_asm2: ; #simple_asm_function
// r0 has pointer to the first float of my myValues structure
// Add prolog code here to play nice
vdiv.f32 s3, s0, s0 //result S3 = 1.0
vstr s3, [r0] // poking a 1.0 back into the myValues.A value
// Add Epilog code here to play nice
bx lr
So, in summary, i can now pass a pointer to my structure myValues to my ASM code and inside my ASM code i can poke new values back into those memory locations. When i return to my calling obj-c code everything is as expected. Thanks to those who helped me fumble along with this hobby. :-)
The solution here is to simply pass a pointer (that points to the memory location of the first float variable in the structure) to the assembly function. Then, any changes the assembly function makes to those memory locations will be intact upon returning to the calling function. Note that this applies to the situation when you are calling assembly code and want that code to operate on an existing data structure (myValues in this case).

Inline ASM - Use 16 or 32 bit C Variable (GCC ARM, Thumb Mode)

I'm currently using the following inline ASM for the Cortex-M3 to branch to a specific address in flash.
__asm("LDR R0, =0x8000"); // Load the branch address
__asm("LDR R1, [R0]"); // Get the branch address
__asm("ORR R1, #1"); // Make sure the Thumb State bit is set.
__asm("BX R1"); // Branch execution
However, I want to replace the hard-coded value 0x8014 with a C variable that will be computed based on some other conditions.
The largest possible value this variable can take is 0x20000, so I'd planned on using a uint32_t to store it.
The compiler being used is arm-none-eabi-gcc v4.9.3
I attempted to modify my inline ASM as follows:
uint32_t destination_address = 0x8000;
__asm( "LDR R0, =%[dest]" : : [dest]"r"(destination_address) );
However, this generates the compiler error:
undefined reference to `r3'
I am fairly new to inline ASM in general. I've tried researching this issue for two days or so, but I've been confused by conflicting answers owing to the diversity of compilers out there and the fact I am using Thumb instructions for the Cortex-M3.
I think my problem is that I need to find the correct constraint for the variable destination_address (range 0x0 - 0x20000), but I'm not sure.
why are you using inline assembly?
extern void HOP ( unsigned int );
...
unsigned int some_address;
..
some_address = some_math;
HOP(some_address);
and a few lines of real asm which you can use the c compiler if you really feel you have to to make an object from to link.
.globl HOP
HOP:
bx r0
the added benefit is it is a branch link basically if you want to be.
the compiler has already computed the address it sounds like so you "simply" need to get it into a register and bx it. Inline assembly is extremely compiler specific so you need to start by talking about what assembler, version, etc you are using.
another thing you can do is if you have this
unsigned int some_address;
..
some_address = some_math;
you can use this assembly somewhere in the project.
ldr r0,=some_address;
ldr r0,[r0]
bx r0
and the linker will resolve the address to the C variable. so can use real assembler or inline for something like that. (if the inline doesnt support something like mov %0,some_address; bx %0 and do the work for you)

Using LLVM to optimize programs that use large structs

I made a toy Brainfuck compiler. It works, but given the known initial state, the output is far less optimized than I hoped.
I have this state structure:
struct state {
unsigned char mem[0x1000];
unsigned long ip;
unsigned index;
};
The state structure (which looks like type { [4096 x i8], i64, i32 } in LLVM IR) is allocated with an alloca instruction, and then zeroed with a memset call (the intrinsic version).
And my operations are implemented as you would expect:
< as state.index--
> as state.index++
- as state.mem[state.index]--
+ as state.mem[state.index]++
. as putchar(state.mem[state.index])
, as state.mem[state.index] = getchar()
[ as the beginning of a while (state.mem[state.index] != 0) { loop
] as the end of a loop
For each operation, I emit the simplest matching LLVM IR I can think of. For instance, + is implemented as:
; %index = &state.index
%index = getelementptr inbounds %"state", %"state"* %state, i64 0, i32 1
; %0 = *%index
%0 = load i64, i64* %index, align 8
; %arrayidx = &state.mem[%0]
%arrayidx = getelementptr inbounds %"state", %"state"* %state, i64 0, i32 0, i64 %0
; %1 = *%arrayidx
%1 = load i8, i8* %arrayidx, align 1
; %inc = %1 + 1
%inc = add i8 %1, 1
; *arrayidx = %inc
store i8 %inc, i8* %arrayidx, align 1
I thought that this would be enough information to let LLVM optimize programs so hard that there would barely be anything left. The initial state is known, no pointer to it is shared, and sequential increments are easy to detect. Obviously, loops are harder to optimize, but I could understand that.
Much to my disappointment, however, the resulting code is still an ugly mess of getelementptr, load and store. None of these were elided in favor of something simpler.
I wasn't sure if I was just doing something wrong, so I took a hello world program and converted it to C by basically replacing each Brainfuck character by its matching C code as shown above, compiled it with Clang on O3 and dumped the resulting IR, and found it to be vastly equivalent. It appears that Clang isn't any more able to cope with this than my poor toy compiler.
However, if I take index off the struct and make it a local, Clang is able to optimize most of its uses into IR registers. So what's the deal here? Why is LLVM not able to optimize patterns of access to a struct? Is there a way I can tell LLVM that this memory is 100% private and that it can optimize its uses any way it wants?
If this makes an important difference, I'm LLVM 3.7 svn, up to date as of sometime last week.
Most probably you're not providing a data layout string. Without this many optimizers are unable to produce decent results - they cannot know the size of the pointer, etc.
See http://llvm.org/docs/LangRef.html#data-layout for more information. I would suggest to grab the data layout string as generated by clang on your platform and paste into .ll as the first step.

Favorability of alloca for array allocation vs simple [] array declaration

Reading some Apple code, I stumbled upon the following C chunk
alloca(sizeof(CMTimeRange) * 3)
is this the same thing as allocation stack memory via
CMTimeRange *p = CMTimeRange[3] ?
Is there any implications on performance? The need to free the memory?
If you really only want to allocate 3 elements of something on the stack the use of alloca makes no sense at all. It only makes sense if you have a variable length that depends on some dynamic parameter at runtime, or if you do an unknown number of such allocations in the same function.
alloca is not a standard function and differs from platform to platform. The C standard has prefered to introduce VLA, variable length arrays as a replacement.
is this the same thing as allocation stack memory via...
I would think not quite. Declaring a local variable causes the memory to be reserved when the stack frame is entered (by subtracting the size of variable from the stack pointer and adjusting for alignment).
It looks like alloca(3) works by adjusting the stack pointer at the moment it is encountered. Note the "Bugs" section of the man page.
alloca() is machine and compiler dependent; its use is discouraged.
alloca() is slightly unsafe because it cannot ensure that the pointer returned points to a valid and usable block of memory. The allocation made may exceed the bounds of the stack, or even go further into other objects in memory, and alloca() cannot determine such an error. Avoid alloca() with large unbounded allocations.
These two points together add up to the following in my opinion:
DO NOT USE ALLOCA
Assuming as Joachim points out you mean CMTimeRange someVariableName[3]...
Both will allocate memory on the stack.
I'm guessing alloca() will have to add extra code after your function prologue to do the allocation... The function prologue is code that the compiler automatically generates for you to create room on the stack. The upshot is that your function may be slightly larger once compiled but not by much... a few extra instructions to modify the stack pointer and possibly stack frame. I guess a compiler could optimize the call out if it wasn't in a conditional branch, or just even lift it outside of a conditional branch though?
I experimented on my MQX compiler with no optimisations... it's not objective-c, just C, also a different platform, but hopefully that's a good enough approximation and does show a difference in emitted code. I used two simple functions with a large array on the stack to make sure stack space had to be used (variable couldn't exist solely in registers).
Obviously it is not advisable to put large arrays on the stack... this is just for demo purposes.
unsigned int TEST1(unsigned int stuff)
{
unsigned int a1[100]; // Make sure it must go on stack
unsigned int a2[100]; // Make sure it must go on stack
a1[0] = 0xdead;
a2[0] = stuff + 10;
return a2[0];
}
unsigned int TEST2(unsigned int stuff)
{
unsigned int a1[100]; // Make sure it must go on stack
unsigned int *a2 = alloca(sizeof(unsigned int)*100);
a1[0] = 0xdead;
a2[0] = stuff + 10;
return a2[0];
}
The following assembler was generated:
TEST1:
Both arrays a1 and a2 are put on the stack in the function prologue...
0: 1cfcb6c8 push %fp
4: 230a3700 mov %fp,%sp
8: 24993901 sub3 %sp,%sp,100 # Both arrays put on stack
c: 7108 mov_s %r1,%r0
e: 1b38bf98 0000dead st 0xdead,[%fp,0xffff_fce0] ; 0xdead
16: e00a add_s %r0,%r0,10
18: 1b9cb018 st %r0,[%fp,0xffff_fe70]
1c: 240a36c0 mov %sp,%fp
20: 1404341b pop %fp
24: 7ee0 j_s [%blink]
TEST2:
Only array a1 is put on the stack in the proglogue... Extra lines of code have to be generated to deal with the alloca.
0: 1cfcb6c8 push %fp
4: 230a3700 mov %fp,%sp
8: 24593c9c sub3 %sp,%sp,50 # Only one array put on stack
c: 240a07c0 mov %r4,%blink
10: 220a0000 mov %r2,%r0
14: 218a0406 mov %r1,0x190 # Extra for alloca()
18: 2402305c sub %sp,%sp,%r1 # Extra for alloca()
1c: 08020000r bl _stkchk # Extra for alloca()
20: 738b mov_s %r3,%sp # Extra, r3 to access write via pointer
22: 1b9cbf98 0000dead st 0xdead,[%fp,0xffff_fe70] ; 0xdead
2a: 22400280 add %r0,%r2,10
2e: a300 st_s %r0,[%r3] # r3 to access write via pointer
30: 270a3100 mov %blink,%r4
34: 240a36c0 mov %sp,%fp
38: 1404341b pop %fp
3c: 7ee0 j_s [%blink]
Also you alloca() memory will be accessed through pointers (unless there are clever compiler optimisations for this... I don't know) so causes actual memory access. Automatic variables might be optimized to being just register accesses, which is better... the compiler can figure out using register colouring what automatic variables are best left in registers and if they ever need to be on the stack.
I had a quick search through C99 standard (C11 is about... my reference is out of date a little). Could not see a reference to alloca so maybe not a standard-defined function. A possible disadvantage?

Function pointers in embedded systems, are they useful?

In an interview they asked me if using function pointers would be beneficial (in terms of speed) when writing code for embedded systems? I had no idea on embedded system so could not answer the question. Just a cloudy or vague answer.
So what are the real benefits? Speed, readability, maintenance,cost?
I think perhaps Viren Shakya's answer misses the point that the interviewer was trying to elicit. In some constructs the use of a function pointer may speed up execution. For example, if you have an index, using that to index an array of function pointers may be faster than a large switch.
If however you are comparing a static function call with a call through a pointer then Viren is right in pointing out that there is an additional operation to load the pointer variable. But no one reasonably tries to use a function pointer in that way (just as an alternative to calling directly).
Calling a function through a pointer is not an alternative to a direct call. So, the question of "advantage" is flawed; they are used in different circumstances, often to simplify other code logic and control flow and not to merely avoid a static function call. Their usefulness is in that the determination of the function to be called is performed dynamically at run-time by your code rather than statically by the linker. In that sense they are of course useful in embedded systems but not for any reason related to embedded systems specifically.
There are many uses.
The single-most important use of function pointers in embedded systems is to create vector tables. Many MCU architectures use a table of addresses located in NVM, where each address points to an ISR (interrupt service routine). Such a vector table can be written in C as an array of function pointers.
Function pointers are also useful for callback functions. As an example from the real world, the other day I was writing a driver for an on-chip realtime clock. There was only one clock on the chip, but I needed many timers. This was solved by saving a counter for each software timer, which was increased by the realtime clock interrupt. The data type looked something like this:
typedef struct
{
uint16_t counter;
void (*callback)(void);
} Timer_t;
When the hardware timer were equal with the software timer, the callback function specified by the user was called, through the function pointer stored together with the counter. Something like the above is quite a common construct in embedded systems.
Function pointers are also useful when creating bootloaders etc, where you will be writing code into NVM in runtime and then call it. You can do this through a function pointer, but never through a linked function, as the code isn't actually there at link time.
Function pointers are of course, as already mentioned, useful for many optimizations, like optimizing away a switch statement where each "case" is an adjacent number.
Another thing to consider is that this question would be a good opportunity to demonstrate how you go about making design decisions during the development process. One response I could imagine giving would be turning around and considering what your implementation alternatives are. Taking a page from Casey's and Lundin's answers, I've found callback functions very useful in isolating my modules from each other and making code changes easier because my code is in a perpetual prototyping stage and things change quickly and often. What my current concerns are is ease of development, not so much speed.
In my case my code generally involves having multiple modules which need to signal each other to synchronize the order of operations. Previously I had implemented this as a whole slew of flags and data structures with extern linkage. With this implementation, two issues generally sucked up my time:
Since any module can touch the extern variables a lot of my time is spent policing each module to make sure those variables are being used as intended.
If another developer introduced a new flag, I found myself diving through multiple modules looking for the original declaration and (hopefully) a usage description in the comments.
With callback functions that problem goes away because the function becomes the signalling mechanism and you take advantage of these benefits:
Module interactions are enforced by function interfaces and you can test for pre/post-conditions.
Less need for globally shared data structures as the callback serves as that interface to outside modules.
Reduced coupling means I can swap out code relatively easier.
At the moment I'll take the performance hit as my device still performs adequately even with all the extra function calls. I'll consider my alternatives when that performance begins to become a bigger issue.
Going back to the interview question, even though you may not be as technically proficient in the nuts and bolts of function pointers, I would think you'd still be a valuable candidate knowing you're cognizant of the tradeoffs made during the design process.
You gain on speed but lose some on readability and maintenance. Instead of a if-then-else tree, if a then fun_a(), else if b then fun_b(), else if c then fun_c() else fun_default(), and having to do that every time, instead if a then fun=fun_a, else if b then fun=fun_b, etc and you do that one time, from then on just call fun(). Much faster. As pointed out you cannot inline, which is another speed trick but inlining on the if-then-else tree doesnt necessarily make it faster than without inlining and is generall not as fast as the function pointer.
You lose a little readability and maintenance because you have to figure out where fun() is set, how often it changes if ever, insure you dont call it before it is setup, but it is still a single searchable name you can use to find and maintain all the places it is used.
It is basically a speed trick to avoid if-then-else trees every time you want to perform a function. If performance is not critical, if nothing else fun() could be static and have the if-then-else tree in it.
EDIT Adding some examples to explain what I was talking about.
extern unsigned int fun1 ( unsigned int a, unsigned int b );
unsigned int (*funptr)(unsigned int, unsigned int);
void have_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
funptr=fun1;
j=fun1(z,5);
j=funptr(y,6);
}
Compiling gives this:
have_fun:
stmfd sp!, {r3, r4, r5, lr}
.save {r3, r4, r5, lr}
ldr r4, .L2
mov r5, r1
mov r0, r2
mov r1, #5
ldr r2, .L2+4
str r2, [r4, #0]
bl fun1
ldr r3, [r4, #0]
mov r0, r5
mov r1, #6
blx r3
ldmfd sp!, {r3, r4, r5, pc}
What I assume Clifford was talking about is that a direct call, if near
enough (depending on the architecture), is one instruction
bl fun1
Where a function pointer, is going to cost you at least two
ldr r3, [r4, #0]
blx r3
I had also mentioned the difference between direct and indirect was
the extra load you incur.
Before moving on it is worth mentioning the pros and cons of inlining.
In the case of ARM which is what these examples are using, the calling
convention uses r0-r3 for incoming parameters to a function and r0
to return. So entry into have_fun() with three parameters means r0-r3
have content. With ARM it is also assumed that a function can destroy
r0-r3, so have_fun() needs to preserve the inputs and then place the
two inputs to fun1() in r0 and r1, so a bit of a register dance happens.
mov r5, r1
mov r0, r2
mov r1, #5
ldr r2, .L2+4
str r2, [r4, #0]
bl fun1
The compiler was smart enough to see that we never needed the first
input to the have_fun() function, so r0 was discarded and allowed to
be changed right away. Also the compiler was smart enough to know
that we would never need the third parameter, z (r2), after sending
it to fun1() on the first call, so it didnt need to save it in a high
register. R1 though, the second parameter to have_fun() does need to
be preserved so it is put in a regsiter that wont get destroyed by fun1().
You can see the same kind of thing happen for the second function call.
Assuming fun1() is this simple function:
inline unsigned int fun1 ( unsigned int a, unsigned int b )
{
return(a+b);
}
When you inline fun1() you get something like this:
stmfd sp!, {r4, lr}
mov r0, r1
mov r1, #6
add r4, r2, #5
The compiler does not need to shuffle the lower registers about to
prepare for a call. Likewise what you may have noticed is that r4 and
lr are preserved on the stack when we enter hello_fun(). With this
ARM calling convention a function can destroy r0-r3 but must preserve
all the other registers, since have_fun() in this case needed more
than four registers to do its thing it saved the contents of r4 on the
stack so that it could use it. Likewise this function as I compiled
it did call another function, the bl/blx instruction uses/destroys the
lr register (r14) so in order for have_fun() to return we also have
to preserve lr on the stack. The simplified example for fun1() did
not show this but another savings you get from inlining is that on entry
the function called does not have to set up a stack frame and preserve
registers, it really is as if you took the code from the function and
shoved it inline with the calling function.
Why wouldnt you inline all the time? Well first it can and will use
more registers and that can lead to more stack use, and stack is slow
relative to registers. Most important though is that it increases the
size of your binary, if fun1() was a good sized function and you called
it 20 times in have_fun() your binary would be considerably larger. For
modern computers with gigabytes of ram, a few hundred or few dozen thousand
bytes is no big deal, but for embedded with limited resources this can
make or break you. On a modern gigahertz multicore desktop, how often
do you need to shave an instruction or five anyway? Sometimes yes but
not all the time for every function. So just because you probably can
get away with it on a desktop you probably should not.
Back to function pointers. So the point I was trying to make with my
answer is, what situations would you likely want to use a function pointer
anyway, what are the use cases and in those use cases how much does
it help or hurt?
The kinds of cases I was thinking of are plugins, or code specific to
a calling parameter or generic code reacting to specific hardware
detected. For example, a hypothetical tar program may want to output
to a tape drive, file system, or other and you may choose to write the
code with generic functions called using function pointers. Upon entry
to the program the command line parameters indicate the output and at
that point you set the function pointers to the device specific
functions.
if(outdev==OUTDEV_TAPE) data_out=data_out_tape;
else if(outdev==OUTDEV_FILE)
{
//open the file, etc
data_out=data_out_file;
}
...
Or perhaps you dont know if you are running on a processor with an
fpu or which fpu type you have but you know that a floating point divide
you want to do can run much faster using the fpu:
if(fputype==FPU_FPA) fdivide=fdivide_fpa;
else if(fputype==FPU_VFP) fdivide=fdivide_vfp;
else fdivide=fdivide_soft;
And absolutely you can use a case statement instead of an if-then-else
tree, pros and cons to each, some compilers turn a case statement int
an if-then-else tree anyway, so it doesnt always matter. The point I
was trying to make is if you do this one time:
if(fputype==FPU_FPA) fdivide=fdivide_fpa;
else if(fputype==FPU_VFP) fdivide=fdivide_vfp;
else fdivide=fdivide_soft;
And do this every where else in the program:
a=fdivide(b,c);
Compared to a non-function-pointer alternative where you do this every
where you want to divide:
if(fputype==FPU_FPA) a=fdivide_fpa(b,c);
else if(fputype==FPU_VFP) a=fdivide_vfp(b,c);
else a=fdivide_soft(b,c);
The function pointer approach, even though it costs you an extra ldr
on each call, is a lot cheaper than the many instructions required for
the if-then-else tree. You pay a little up front to setup the fdivide
pointer one time then pay an extra ldr on each instance, but overall
it is faster than this:
unsigned int fun1 ( unsigned int a, unsigned int b );
unsigned int fun2 ( unsigned int a, unsigned int b );
unsigned int fun3 ( unsigned int a, unsigned int b );
unsigned int (*funptr)(unsigned int, unsigned int);
unsigned int have_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
switch(x)
{
default:
case 1: j=fun1(y,z); break;
case 2: j=fun2(y,z); break;
case 3: j=fun3(y,z); break;
}
return(j);
}
unsigned int more_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
j=funptr(y,z);
return(j);
}
gives us this:
cmp r0, #2
beq .L3
cmp r0, #3
beq .L4
mov r0, r1
mov r1, r2
b fun1
.L3:
mov r0, r1
mov r1, r2
b fun2
.L4:
mov r0, r1
mov r1, r2
b fun3
instead of this
mov r0, r1
ldr r3, .L7
mov r1, r2
blx r3
For the default case the if-then-else tree burns two compares and two
beq's before calling the function directly. Basically sometimes the
if-then-else tree will be faster and sometimes the function pointer
is faster.
Another comment I made is what if you used inlining to make that
if-then-else tree faster, instead of a function pointer, inlining is
always faster right?
unsigned int fun1 ( unsigned int a, unsigned int b )
{
return(a+b);
}
unsigned int fun2 ( unsigned int a, unsigned int b )
{
return(a-b);
}
unsigned int fun3 ( unsigned int a, unsigned int b )
{
return(a&b);
}
unsigned int have_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
switch(x)
{
default:
case 1: j=fun1(y,z); break;
case 2: j=fun2(y,z); break;
case 3: j=fun3(y,z); break;
}
return(j);
}
gives
have_fun:
cmp r0, #2
rsbeq r0, r2, r1
bxeq lr
cmp r0, #3
addne r0, r2, r1
andeq r0, r2, r1
bx lr
LOL, ARM got me on that one. That is nice. You can imagine though
for a generic processor you would get something like
cmp r0, #2
beq .L3
cmp r0, #3
beq .L4
and r0,r1,r2
bx lr
.L3:
sub r0,r1,r2
bx lr
.L4:
add r0,r1,r2
bx lr
You still burn the compares, the more cases you have the longer the
if-then-else tree. It doesnt take much for the average case to take
longer than a function pointer solution.
mov r0, r1
ldr r1, .L7
ldr r3,[r1]
mov r1, r2
blx r3
Then I also mentioned readability and maintenance, using the function
pointer approach you need to always be aware of whether or not
the function pointer has been assigned before using it. You cannot
always just grep for that function name and find what you are looking
for in someone elses code, ideally you find one place where that
pointer is assigned, then you can grep for the real function names.
Yes there are many other use cases for function pointers, and the
ones I have described can be solved in many other ways, efficient
or not. I was trying to give the poster some ideas on how to think
through different scenarios.
I think the most important answer to this interview question is not
that there is a right or wrong answer, because I think there is not.
But to see what the interviewee knows about what compilers do or dont
do, the kinds of things I described above. The interview question
to me is a few questions, do you understand what the compiler actually
does, what instructions it generates. Do you understand that
fewer or more instructions is not necessarily faster. do you understand
these differences across different processors, or do you at least have
a working knowledge for at least one processor. Then it goes on to
readability and maintenance. That is another stream of questions that
has to do with your experience in reading other peoples code, and
then maintaining your own code or other peoples code. It is a cleverly
designed question in my opinion.
I would have said that they are beneficial (in terms of speed) in any environment, not just embedded. The idea being that once the pointer has been pointed at the correct function, there is no further decision logic required in order to call that function.
Yes, they are useful. I'm not sure what the interviewer was getting at. Basically it is irrelevant if the system is embedded or not. Unless you have a severely limited stack.
Speed No, the fastest system would be a single function, and only use global variables and goto's scattered throughout. Good luck with that.
Readability Yes, it might confuse some people, but overall certain code is more readable with function pointers. It will also allow you to increase the separation of concerns between the various aspects of source code.
Maintainability Yes, with function pointers you will have less conditionals, less duplicated code, increased seperation of code, and generally more orthogonal software.
One negative part of function pointers is that they will never be inlined at the callsites. This may or may not be beneficial, depending if you are compiling for speed or size. If the latter, they should be no different to normal function calls.
Another disadvantage of function pointers (with respect to virtual functions since they are nothing but function pointers at the core level):
making a function inline && virtual will force compiler to create out-of-line copy of the same function. This will increase the size of the final binary (assuming heavy use of it is done).
Rule of thumb: 1: Don't make virtual calls inline
That was a trick question. There are industries where pointers are forbidden.
Let's see...
Speed (say we are on ARM): then (theoretically):
(Normal Function Call ARM instruction size) < (Function Pointer Call-setup instruction(s) size)
Since their is a additional level of indirection to setup a function pointer call, it will involve an additional ARM instruction.
PS: A normal function call: a function call that is set up with BL.
PSS: Don't know actual sizes for them but it should be easy to verify.