Favorability of alloca for array allocation vs simple [] array declaration - objective-c

Reading some Apple code, I stumbled upon the following C chunk
alloca(sizeof(CMTimeRange) * 3)
is this the same thing as allocation stack memory via
CMTimeRange *p = CMTimeRange[3] ?
Is there any implications on performance? The need to free the memory?

If you really only want to allocate 3 elements of something on the stack the use of alloca makes no sense at all. It only makes sense if you have a variable length that depends on some dynamic parameter at runtime, or if you do an unknown number of such allocations in the same function.
alloca is not a standard function and differs from platform to platform. The C standard has prefered to introduce VLA, variable length arrays as a replacement.

is this the same thing as allocation stack memory via...
I would think not quite. Declaring a local variable causes the memory to be reserved when the stack frame is entered (by subtracting the size of variable from the stack pointer and adjusting for alignment).
It looks like alloca(3) works by adjusting the stack pointer at the moment it is encountered. Note the "Bugs" section of the man page.
alloca() is machine and compiler dependent; its use is discouraged.
alloca() is slightly unsafe because it cannot ensure that the pointer returned points to a valid and usable block of memory. The allocation made may exceed the bounds of the stack, or even go further into other objects in memory, and alloca() cannot determine such an error. Avoid alloca() with large unbounded allocations.
These two points together add up to the following in my opinion:
DO NOT USE ALLOCA

Assuming as Joachim points out you mean CMTimeRange someVariableName[3]...
Both will allocate memory on the stack.
I'm guessing alloca() will have to add extra code after your function prologue to do the allocation... The function prologue is code that the compiler automatically generates for you to create room on the stack. The upshot is that your function may be slightly larger once compiled but not by much... a few extra instructions to modify the stack pointer and possibly stack frame. I guess a compiler could optimize the call out if it wasn't in a conditional branch, or just even lift it outside of a conditional branch though?
I experimented on my MQX compiler with no optimisations... it's not objective-c, just C, also a different platform, but hopefully that's a good enough approximation and does show a difference in emitted code. I used two simple functions with a large array on the stack to make sure stack space had to be used (variable couldn't exist solely in registers).
Obviously it is not advisable to put large arrays on the stack... this is just for demo purposes.
unsigned int TEST1(unsigned int stuff)
{
unsigned int a1[100]; // Make sure it must go on stack
unsigned int a2[100]; // Make sure it must go on stack
a1[0] = 0xdead;
a2[0] = stuff + 10;
return a2[0];
}
unsigned int TEST2(unsigned int stuff)
{
unsigned int a1[100]; // Make sure it must go on stack
unsigned int *a2 = alloca(sizeof(unsigned int)*100);
a1[0] = 0xdead;
a2[0] = stuff + 10;
return a2[0];
}
The following assembler was generated:
TEST1:
Both arrays a1 and a2 are put on the stack in the function prologue...
0: 1cfcb6c8 push %fp
4: 230a3700 mov %fp,%sp
8: 24993901 sub3 %sp,%sp,100 # Both arrays put on stack
c: 7108 mov_s %r1,%r0
e: 1b38bf98 0000dead st 0xdead,[%fp,0xffff_fce0] ; 0xdead
16: e00a add_s %r0,%r0,10
18: 1b9cb018 st %r0,[%fp,0xffff_fe70]
1c: 240a36c0 mov %sp,%fp
20: 1404341b pop %fp
24: 7ee0 j_s [%blink]
TEST2:
Only array a1 is put on the stack in the proglogue... Extra lines of code have to be generated to deal with the alloca.
0: 1cfcb6c8 push %fp
4: 230a3700 mov %fp,%sp
8: 24593c9c sub3 %sp,%sp,50 # Only one array put on stack
c: 240a07c0 mov %r4,%blink
10: 220a0000 mov %r2,%r0
14: 218a0406 mov %r1,0x190 # Extra for alloca()
18: 2402305c sub %sp,%sp,%r1 # Extra for alloca()
1c: 08020000r bl _stkchk # Extra for alloca()
20: 738b mov_s %r3,%sp # Extra, r3 to access write via pointer
22: 1b9cbf98 0000dead st 0xdead,[%fp,0xffff_fe70] ; 0xdead
2a: 22400280 add %r0,%r2,10
2e: a300 st_s %r0,[%r3] # r3 to access write via pointer
30: 270a3100 mov %blink,%r4
34: 240a36c0 mov %sp,%fp
38: 1404341b pop %fp
3c: 7ee0 j_s [%blink]
Also you alloca() memory will be accessed through pointers (unless there are clever compiler optimisations for this... I don't know) so causes actual memory access. Automatic variables might be optimized to being just register accesses, which is better... the compiler can figure out using register colouring what automatic variables are best left in registers and if they ever need to be on the stack.
I had a quick search through C99 standard (C11 is about... my reference is out of date a little). Could not see a reference to alloca so maybe not a standard-defined function. A possible disadvantage?

Related

Representing objects of properties and methods in memory

Representing objects of properties and methods in memory , if anyone have picture or drawing to expalin how computer deal with it and store properties in memory?
Computers do not really store abstract information of that sort at the basic level. There, you essentially have numbers--in binary, but that is not important--and it is generally up to software to interpret these numbers.
In the Von Neuman model, that close to every system is based on, you have one big address space. You can index into it, so your CPU can, for example, fetch the number that sits on a given address, or write a new number to an address, and that is mostly what there is to storing data. Usually, but not always, the addresses pick individual bytes of your memory, but your computer could address into larger or smaller word sizes, for example, you might have a computer that would address into 32 bit words instead of 8 bit words. It doesn't matter for the overall model, though. You just have a big block of memory and you can get the data at individual addresses.
How you interpret this data is up to the program. Well, almost. In this figure, I've tried to illustrate memory and where we have some data. The data is the zero-terminated string "Hello, World\n", but only if we interpret it as an ASCII-encoded string. If we interpreted it as an array of integers instead, then it would be that. The hardware doesn't care how you interpret the data.
What makes a computer a Neuman model is that both data and program is represented in the same memory. Not only can we get to any data via its address, but we can get to the code we want to run as well. There isn't any difference between the two. A program, or a function, or a method, is just an address where you have a sequence of numbers, and the CPU can interpret these numbers as executable code. You can, in theory, point to "Hello, World\n" and then tell the CPU to run it as a program. (I won't recommend it).
When it comes to executable code, there is the slight difference that the CPU does the interpretation. In your own program, you can mostly choose how to represent data (although there might be some penalties if you want different representations than what you get from the raw hardware), but the CPU will interpret the different numbers as specific instructions and execute them as such. At least that is how it works if you run native code; if you have a virtual machine, then the virtual machine is a program that interprets your code, and its interpretation of the data can be quite different from the CPU's. The virtual machine, though, will typically run native code, so you are still relying on the CPU's interoperation of numbers, although indirectly.
I should also mention that modern hardware and operating systems do not usually stick with the simple Von Neuman model. If you treat program and data as interchangeable, you get some massive security holes. In practise, you have some form of permission set on different memory blocks, and your code has to sit in a block that you are allowed to execute, and your data (typically) is not. You can switch the permissions, though, if you want to autogenerate native executable code, and virtual machines often do this.
Anyway, for simplicity, let's just say that we have a simple Von Neuman model. Then both program and data are just chunks of memory that we either interpret as program (and it will then be executed by the CPU when we tell it to run the code at a given address) or as data (and then our software is responsible for interpreting the numbers in memory as some higher data structure).
There aren't any differences between object, properties, or other higher-level concepts at this level. Those are entirely dealt with at the level(s) above the hardware. They are simply interpretations of the raw numbers that sit in memory.
Update: a few more details...
Storing objects
The hardware doesn’t know anything about objects. It has addresses and there are numbers (or bit-patterns, if you prefer) at those addresses. Most data types span more than one address. If, for example, we can address bytes, but integers take up four bytes (i.e. they are 32-bit integers), then naturally we need four bytes, at four addresses, to represent an integer. They will be represented as four contiguous bytes, and depending on the architecture you might have the most-significant byte first or last (this is known as endianess) So, the number 10 (which fits in a single byte, but is still a four-byte integer) might be represented as 0x00 0x00 0x00 0x0a or 0x0a 0x00 0x00 0x00. The 0x0a byte is 10 and it might be first or last.
What then about structures, which is what is closest to what we think of as objects? They are larger blocks of attributes/properties/entries/whatever, and they are represented the same way. Blocks of memory is all we have.
If you have an object that contains two integers, say a representation of a rectangle, then the object sits somewhere in memory and will contain the representation of those two integers.
rect:
h, w: int
I’ve intentionally made up the syntax for this, since it isn’t language specific, and different languages and runtime systems have different variations on how they do this, but they all do something similar.
Here, one representation could be a block of 8 bytes, two 4-byte integers, where the first is h and the second is w. There might be padding between elements, so the objects are aligned the way the hardware prefers, but I will ignore that here.
If the object sits at address 0xafode4, that means that h also sits there (assuming that there is no extra information stored in the object), and that means that w sits four bytes later, if integers take up four bytes of space. Again, the details will differ, but this is generally how it is done if you know the layout of objects at compile time. (If you don’t know them until runtime, you will instead have a table of attributes, and the object contains the table instead).
Now, what happens if an object contains other objects? Say, what if the rectangle is represented by two points instead, and the points are objects
point:
x, y: int
rect:
p1, p2: point
In the simplest version, nothing changes. The rect object contains two points, so the points are embedded in the memory that represents the rect.
This doesn’t always work, though. If you have polymorphic types, you might not know the concrete type of a contained object, so you cannot allocate memory. In that case, instead of containing the other object, you will have a reference to it, a pointer. The rect object would hold the addresses of the two points, and the points would sit elsewhere in memory. This is also what you have to do if you want to build non-trivial data structures, so it isn’t specific to object orientation or objects.
In an OOP context, there might be a bit more work to it, but we will get to that. First, let’s consider functions (and let’s go back to a rectangle that just holds h and w).
Representation of functions
Code is just blocks of memory as well, but where the numbers represent instructions to the CPU. Let’s say we want to multiply two numbers, then we might have an instruction that looks like
mul a, b, c
that says that the CPU should take the numbers in registers a and b, multiply them, and put the result in register c. You usually have instructions that take the input from memory or as constants or such as well, but let’s just consider a single simple instruction: multiply two numbers you have in registers and put the result in a third register.
The mul instruction has a number. Completely arbitrarily we can say that it is the byte 0xef. The three arguments specify registers, and if they are a byte each we can have up to 256 registers. The full instruction would contain four bytes, the mul instruction 0xef and the three arguments. If we want to multiply register r1 with register r2 and put the result in register r0, the instruction would be
mul r1, r2, r0
0xef 0x01 0x02 0x00
so what the computer sees is the program 0xef 0x01 0x02 0x00.
For functions, we need two things more: a way to return, and a way to handle input and output.
The return bit is easy. There will be a ret instruction that returns to where the function was called, handling stack registers and such in the process. We can pretend that ret has code 0xab.
Input and output is specified by a calling convention, and it isn’t tied to the hardware as such. You need an agreed upon way to pass arguments to functions and you need to know where the result is when the function returns, but that is all there is to it. On our imaginary architecture, we could say that input one and two will be in registers r1 and r2 and that the output should be in r0 when we return. That way, we can make a simple multiplication function
fun mult(a, b): return a * b
with the instructions
mul r1, r2, r0 ; 0xef 0x01 0x02 0x00
ret ; 0xab
and the computer will store it as the numbers 0xef 0x01 0x02 0x00 0xab. If you know where this code/data sits in memory, e.g. 0x00beef, you can call the function call 0x00beef with some other instruction call (that also has a number, say 0x10) and the address (here an address is typically 8 bytes on a desktop, or 64 bits, so the three bytes in 0x00beef would have zeros before or after it, depending on endianes. I will pretend that we have three byte addresses to make it more readable).
To call the function, you first need to get the arguments into the correct registers, so if you want to get the area of our rect object, you want to get h and w into registers r1 and r2.
What you want to do is call
area = mult(rect.h, rect.w)
so how do you get rect.h and rect.w into registers? You need instructions for that. Let’s say that we have a mov instruction (0x12) that looks like this:
mov adr, reg
where adr is an address (3 bytes on this imaginary architecture) and reg is a register (1 byte). The full instruction is 5 bytes (the 0x12 instruction, the 3 byte address and the 1 byte register). If your rect object sits at 0xaf0de4, then we have rect.h at 0xaf0de4 as well, and we have rect.w four bytes later, at 0xaf0de8. Calling mult(rect.h, rect.w) involves these instructions
mov 0xaf0de4, r1 ; rect.h -> r1
mov 0xaf0de8, r2 ; rect.h -> r2
call 0x00beef ; mult(rect.h, rect.w)
; now rect.h * rect.w is in r0
The actual data stored on the computer is the codes for this:
; mov 0xaf0de4, r1
0x12 0xaf 0x0d 0xe4 0x01
; mov 0xaf0de8, r2
0x12 0xaf 0x0d 0xe8 0x02
; call 0x00beef
0x10 0x00 0xbe 0xef
Everything is still just numbers that we can access through addresses.
Here, of course, the addresses we have used are hardwired into the program, and that doesn’t work in real life. You don’t know where all the objects will be when you compile your program. Some addresses you do know, once you fire up your executable. The location of functions, for example, will be known, and the linker can insert the correct addresses where you need them. Locations of objects, typically not. But there will be instructions like mov that takes the address from a register instead of from the program. We could, for example, have an instruction
mov a[offset], b
that moves data from the address stored in register a + offset into register b. It might have a another number, say 0x13 instead of 0x12, but in assembly you typically have the same code so you don’t see it there.
You would also have an instruction for putting a constant into a register, and I wouldn’t be surprised if that is also called mov and would have the form
mov a, b
where a is now a constant, i.e. some number, and you put that number in register b. The assembly looks the same, but the instruction might have number 0x14.
Anyway, we could use that to call mult(rect.h, rect.w) instead. Then the code would be
mov 0xaf0de4, r3 ; put the address of rect in r3
; 0x14 0xaf 0x0d 0xe4 0x03
mov r3[0], r1 ; put the value at r3+0 into r1
; 0x13 0x03 0x00 0x01
mov r3[4], r2 ; put the value at r3+4 into r2
; 0x13 0x03 0x04 0x02
call 0x00beef
; 0x10 0x00 0xbe 0xef
If we have these instructions, we could also modify our function mult(a,b) to one that takes a rectangle as input and returns the area
fun area(rect): rect.h * rect.w
The function can get the address of the object as its single argument, where it would go in register r1, and from there it could load rect.h and rect.w to multiply them.
; area(rect) -- address of rect in r1
mov r1[0], r2 ; rect.h -> r2
mov r1[4], r3 ; rect.w -> r3
mul r2, r3, r0 ; rect.h * rect.w -> r0
ret ; return rect.h * rect.w
It gets more complicated than this, but you should have the idea now. Our functions are sequences of such instructions, and the arguments to them, and the result value, is passed back and forth, usually through registers, by some calling convention. If you want to pass a value to a function, you need to put it in the right register (or on the stack, depending on the calling convention), and then the function will operate on it. What it does with the object is entirely software; the hardware doesn’t care that much.
Classes and polymorphism
What then if we want polymorphic methods? If we have a class hierarchy of geometric objects and rect is just one of them, and all of them should have an area method that, when called, is dispatched based on the objects’ class?
When you have polymorphic methods, what you really have is a bunch of different functions. If you call x.area() on an object x that happens to be a circle, then you are really calling circle_area(x), while if x is a rect you are calling rect_area(x). The only thing you need to make this work is having a mechanism for dispatching to the right function call.
Here, again, the details differ (a lot), but a simple solution is to put pointers to the correct function in the objects. If you call x.area() maybe you know that the first element in the memory of x is a pointer to its specific area function. So, instead of calling a function directly, you fetch the address of the function from x and then you call it.
x.area() == (x.area_func)(x)
All objects you can call area() on should have this function, and they should have it at the same offset from the address of the object, and then it can be as simple as that.
This can, of course, be wasteful in memory if your classes have lots of methods. You are storing a pointer to each method in each object (and you also have to spend time on initialising this, so there is additional overhead there as well).
Then another solution can be to add a level of indirection. If the methods are the same for all objects of a class (which they often are, but not for all languages) then you can put the table of methods in a class object and have a single pointer to the class in each object. When you need to get the right function you first get the class and then you get the function from it.
x.area() == (x.class.area_func)(x)
With single inheritance, the tables in the different classes can have different sizes, and it doesn’t get more complicated because of that. With multiple inheritance, it does get more complicated, but that is handled very differently in different languages so it is hard to say anything general about that.

Strange default value for ints [duplicate]

If in C I write:
int num;
Before I assign anything to num, is the value of num indeterminate?
Static variables (file scope and function static) are initialized to zero:
int x; // zero
int y = 0; // also zero
void foo() {
static int x; // also zero
}
Non-static variables (local variables) are indeterminate. Reading them prior to assigning a value results in undefined behavior.
void foo() {
int x;
printf("%d", x); // the compiler is free to crash here
}
In practice, they tend to just have some nonsensical value in there initially - some compilers may even put in specific, fixed values to make it obvious when looking in a debugger - but strictly speaking, the compiler is free to do anything from crashing to summoning demons through your nasal passages.
As for why it's undefined behavior instead of simply "undefined/arbitrary value", there are a number of CPU architectures that have additional flag bits in their representation for various types. A modern example would be the Itanium, which has a "Not a Thing" bit in its registers; of course, the C standard drafters were considering some older architectures.
Attempting to work with a value with these flag bits set can result in a CPU exception in an operation that really shouldn't fail (eg, integer addition, or assigning to another variable). And if you go and leave a variable uninitialized, the compiler might pick up some random garbage with these flag bits set - meaning touching that uninitialized variable may be deadly.
0 if static or global, indeterminate if storage class is auto
C has always been very specific about the initial values of objects. If global or static, they will be zeroed. If auto, the value is indeterminate.
This was the case in pre-C89 compilers and was so specified by K&R and in DMR's original C report.
This was the case in C89, see section 6.5.7 Initialization.
If an object that has automatic
storage duration is not initialized
explicitely, its value is
indeterminate. If an object that has
static storage duration is not
initialized explicitely, it is
initialized implicitely as if every
member that has arithmetic type were
assigned 0 and every member that has
pointer type were assigned a null
pointer constant.
This was the case in C99, see section 6.7.8 Initialization.
If an object that has automatic
storage duration is not initialized
explicitly, its value is
indeterminate. If an object that has
static storage duration is not
initialized explicitly, then: — if it
has pointer type, it is initialized to
a null pointer; — if it has arithmetic
type, it is initialized to (positive
or unsigned) zero; — if it is an
aggregate, every member is initialized
(recursively) according to these
rules; — if it is a union, the first
named member is initialized
(recursively) according to these
rules.
As to what exactly indeterminate means, I'm not sure for C89, C99 says:
3.17.2 indeterminate valueeither an unspecified value or a trap
representation
But regardless of what standards say, in real life, each stack page actually does start off as zero, but when your program looks at any auto storage class values, it sees whatever was left behind by your own program when it last used those stack addresses. If you allocate a lot of auto arrays you will see them eventually start neatly with zeroes.
You might wonder, why is it this way? A different SO answer deals with that question, see: https://stackoverflow.com/a/2091505/140740
It depends on the storage duration of the variable. A variable with static storage duration is always implicitly initialized with zero.
As for automatic (local) variables, an uninitialized variable has indeterminate value. Indeterminate value, among other things, mean that whatever "value" you might "see" in that variable is not only unpredictable, it is not even guaranteed to be stable. For example, in practice (i.e. ignoring the UB for a second) this code
int num;
int a = num;
int b = num;
does not guarantee that variables a and b will receive identical values. Interestingly, this is not some pedantic theoretical concept, this readily happens in practice as consequence of optimization.
So in general, the popular answer that "it is initialized with whatever garbage was in memory" is not even remotely correct. Uninitialized variable's behavior is different from that of a variable initialized with garbage.
Ubuntu 15.10, Kernel 4.2.0, x86-64, GCC 5.2.1 example
Enough standards, let's look at an implementation :-)
Local variable
Standards: undefined behavior.
Implementation: the program allocates stack space, and never moves anything to that address, so whatever was there previously is used.
#include <stdio.h>
int main() {
int i;
printf("%d\n", i);
}
compile with:
gcc -O0 -std=c99 a.c
outputs:
0
and decompiles with:
objdump -dr a.out
to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 10 sub $0x10,%rsp
40053e: 8b 45 fc mov -0x4(%rbp),%eax
400541: 89 c6 mov %eax,%esi
400543: bf e4 05 40 00 mov $0x4005e4,%edi
400548: b8 00 00 00 00 mov $0x0,%eax
40054d: e8 be fe ff ff callq 400410 <printf#plt>
400552: b8 00 00 00 00 mov $0x0,%eax
400557: c9 leaveq
400558: c3 retq
From our knowledge of x86-64 calling conventions:
%rdi is the first printf argument, thus the string "%d\n" at address 0x4005e4
%rsi is the second printf argument, thus i.
It comes from -0x4(%rbp), which is the first 4-byte local variable.
At this point, rbp is in the first page of the stack has been allocated by the kernel, so to understand that value we would to look into the kernel code and find out what it sets that to.
TODO does the kernel set that memory to something before reusing it for other processes when a process dies? If not, the new process would be able to read the memory of other finished programs, leaking data. See: Are uninitialized values ever a security risk?
We can then also play with our own stack modifications and write fun things like:
#include <assert.h>
int f() {
int i = 13;
return i;
}
int g() {
int i;
return i;
}
int main() {
f();
assert(g() == 13);
}
Note that GCC 11 seems to produce a different assembly output, and the above code stops "working", it is undefined behavior after all: Why does -O3 in gcc seem to initialize my local variable to 0, while -O0 does not?
Local variable in -O3
Implementation analysis at: What does <value optimized out> mean in gdb?
Global variables
Standards: 0
Implementation: .bss section.
#include <stdio.h>
int i;
int main() {
printf("%d\n", i);
}
gcc -O0 -std=c99 a.c
compiles to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 8b 05 04 0b 20 00 mov 0x200b04(%rip),%eax # 601044 <i>
400540: 89 c6 mov %eax,%esi
400542: bf e4 05 40 00 mov $0x4005e4,%edi
400547: b8 00 00 00 00 mov $0x0,%eax
40054c: e8 bf fe ff ff callq 400410 <printf#plt>
400551: b8 00 00 00 00 mov $0x0,%eax
400556: 5d pop %rbp
400557: c3 retq
400558: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40055f: 00
# 601044 <i> says that i is at address 0x601044 and:
readelf -SW a.out
contains:
[25] .bss NOBITS 0000000000601040 001040 000008 00 WA 0 0 4
which says 0x601044 is right in the middle of the .bss section, which starts at 0x601040 and is 8 bytes long.
The ELF standard then guarantees that the section named .bss is completely filled with of zeros:
.bss This section holds uninitialized data that contribute to the
program’s memory image. By definition, the system initializes the
data with zeros when the program begins to run. The section occu-
pies no file space, as indicated by the section type, SHT_NOBITS.
Furthermore, the type SHT_NOBITS is efficient and occupies no space on the executable file:
sh_size This member gives the section’s size in bytes. Unless the sec-
tion type is SHT_NOBITS , the section occupies sh_size
bytes in the file. A section of type SHT_NOBITS may have a non-zero
size, but it occupies no space in the file.
Then it is up to the Linux kernel to zero out that memory region when loading the program into memory when it gets started.
That depends. If that definition is global (outside any function) then num will be initialized to zero. If it's local (inside a function) then its value is indeterminate. In theory, even attempting to read the value has undefined behavior -- C allows for the possibility of bits that don't contribute to the value, but have to be set in specific ways for you to even get defined results from reading the variable.
The basic answer is, yes it is undefined.
If you are seeing odd behavior because of this, it may depended on where it is declared. If within a function on the stack then the contents will more than likely be different every time the function gets called. If it is a static or module scope it is undefined but will not change.
Because computers have finite storage capacity, automatic variables will typically be held in storage elements (whether registers or RAM) that have previously been used for some other arbitrary purpose. If a such a variable is used before a value has been assigned to it, that storage may hold whatever it held previously, and so the contents of the variable will be unpredictable.
As an additional wrinkle, many compilers may keep variables in registers which are larger than the associated types. Although a compiler would be required to ensure that any value which is written to a variable and read back will be truncated and/or sign-extended to its proper size, many compilers will perform such truncation when variables are written and expect that it will have been performed before the variable is read. On such compilers, something like:
uint16_t hey(uint32_t x, uint32_t mode)
{ uint16_t q;
if (mode==1) q=2;
if (mode==3) q=4;
return q; }
uint32_t wow(uint32_t mode) {
return hey(1234567, mode);
}
might very well result in wow() storing the values 1234567 into registers
0 and 1, respectively, and calling foo(). Since x isn't needed within
"foo", and since functions are supposed to put their return value into
register 0, the compiler may allocate register 0 to q. If mode is 1 or
3, register 0 will be loaded with 2 or 4, respectively, but if it is some
other value, the function may return whatever was in register 0 (i.e. the
value 1234567) even though that value is not within the range of uint16_t.
To avoid requiring compilers to do extra work to ensure that uninitialized
variables never seem to hold values outside their domain, and avoid needing
to specify indeterminate behaviors in excessive detail, the Standard says
that use of uninitialized automatic variables is Undefined Behavior. In
some cases, the consequences of this may be even more surprising than a
value being outside the range of its type. For example, given:
void moo(int mode)
{
if (mode < 5)
launch_nukes();
hey(0, mode);
}
a compiler could infer that because invoking moo() with a mode which is
greater than 3 will inevitably lead to the program invoking Undefined
Behavior, the compiler may omit any code which would only be relevant
if mode is 4 or greater, such as the code which would normally prevent
the launch of nukes in such cases. Note that neither the Standard, nor
modern compiler philosophy, would care about the fact that the return value
from "hey" is ignored--the act of trying to return it gives a compiler
unlimited license to generate arbitrary code.
If storage class is static or global then during loading, the BSS initialises the variable or memory location(ML) to 0 unless the variable is initially assigned some value. In case of local uninitialized variables the trap representation is assigned to memory location. So if any of your registers containing important info is overwritten by compiler the program may crash.
but some compilers may have mechanism to avoid such a problem.
I was working with nec v850 series when i realised There is trap representation which has bit patterns that represent undefined values for data types except for char. When i took a uninitialized char i got a zero default value due to trap representation. This might be useful for any1 using necv850es
As far as i had gone it is mostly depend on compiler but in general most cases the value is pre assumed as 0 by the compliers.
I got garbage value in case of VC++ while TC gave value as 0.
I Print it like below
int i;
printf('%d',i);

Inline ASM - Use 16 or 32 bit C Variable (GCC ARM, Thumb Mode)

I'm currently using the following inline ASM for the Cortex-M3 to branch to a specific address in flash.
__asm("LDR R0, =0x8000"); // Load the branch address
__asm("LDR R1, [R0]"); // Get the branch address
__asm("ORR R1, #1"); // Make sure the Thumb State bit is set.
__asm("BX R1"); // Branch execution
However, I want to replace the hard-coded value 0x8014 with a C variable that will be computed based on some other conditions.
The largest possible value this variable can take is 0x20000, so I'd planned on using a uint32_t to store it.
The compiler being used is arm-none-eabi-gcc v4.9.3
I attempted to modify my inline ASM as follows:
uint32_t destination_address = 0x8000;
__asm( "LDR R0, =%[dest]" : : [dest]"r"(destination_address) );
However, this generates the compiler error:
undefined reference to `r3'
I am fairly new to inline ASM in general. I've tried researching this issue for two days or so, but I've been confused by conflicting answers owing to the diversity of compilers out there and the fact I am using Thumb instructions for the Cortex-M3.
I think my problem is that I need to find the correct constraint for the variable destination_address (range 0x0 - 0x20000), but I'm not sure.
why are you using inline assembly?
extern void HOP ( unsigned int );
...
unsigned int some_address;
..
some_address = some_math;
HOP(some_address);
and a few lines of real asm which you can use the c compiler if you really feel you have to to make an object from to link.
.globl HOP
HOP:
bx r0
the added benefit is it is a branch link basically if you want to be.
the compiler has already computed the address it sounds like so you "simply" need to get it into a register and bx it. Inline assembly is extremely compiler specific so you need to start by talking about what assembler, version, etc you are using.
another thing you can do is if you have this
unsigned int some_address;
..
some_address = some_math;
you can use this assembly somewhere in the project.
ldr r0,=some_address;
ldr r0,[r0]
bx r0
and the linker will resolve the address to the C variable. so can use real assembler or inline for something like that. (if the inline doesnt support something like mov %0,some_address; bx %0 and do the work for you)

How do I get started with ARM on iOS?

Just curious as to how to get started understanding ARM under iOS. Any help would be super nice.
In my opinion, the best way to get started is to
Write small snippets of C code (later Objective-C)
Look at the corresponding assembly code
Find out enough to understand the assembly code
Repeat!
To do this you can use Xcode:
Create a new iOS project (a Single View Application is fine)
Add a C file scratchpad.c
In the Project Build Settings, set "Generate Debug Symbols" to "No"
Make sure the target is iOS Device, not Simulator
Open up scratchpad.c and open the assistant editor
Set the assistant editor to Assembly and choose "Release"
Example 1
Add the following function to scratchpad.c:
void do_nothing(void)
{
return;
}
If you now refresh the Assembly in the assistant editor, you should see lots of lines starting with dots (directives), followed by
_do_nothing:
# BB#0:
bx lr
Let's ignore the directives for now and look at these three lines. With a bit of searching on the internet, you'll find out that these lines are:
A label (the name of the function prefixed with an underscore).
Just a comment emitted by the compiler.
The return statement. The b means branch, ignore the x for now (it has something to do with switching between instruction sets), and lr is the link register, where callers store the return address.
Example 2
Let's beef it up a bit and change the code to:
extern void do_nothing(void);
void do_nothing_twice(void)
{
do_nothing();
do_nothing();
}
After saving and refreshing the assembly, you get the following code:
_do_nothing_twice:
# BB#0:
push {r7, lr}
mov r7, sp
blx _do_nothing
pop.w {r7, lr}
b.w _do_nothing
Again, with a bit of searching on the internet, you'll find out the meaning of each line. Some more work needs to be done because make two calls: The first call needs to return to us, so we need to change lr. That is done by the blx instruction, which does not only branch to _do_nothing, but also stores the address of the next instruction (the return address) in lr.
Because we change the return address, we have to store it somewhere, so it is pushed on the stack. The second jump has a .w suffixed to it, but let's ignore that for now. Why doesn't the function look like this?
_do_nothing_twice:
# BB#0:
push {lr}
blx _do_nothing
pop.w {lr}
b.w _do_nothing
That would work as well, but in iOS, the convention is to store the frame pointer in r7. The frame pointer points to the place in the stack where we store the previous frame pointer and the previous return address.
So what the code does is: First, it pushes r7 and lr to the stack, then it sets r7 to point to the new stack frame (which is on the top of the stack, and sp points to the top of the stack), then it branches for the first time, then it restores r7 and lr, finally it branch for the second time. Abx lr at the end is not needed, because the called function will return to lr, which points to our caller.
Example 3
Let's have a look at a last example:
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
The assembly code is:
_swap:
# BB#0:
ldr r2, [r0]
ldr r3, [r1]
str r3, [r0]
str r2, [r1]
bx lr
With a bit of searching, you will learn that arguments and return values are stored in registers r0-r3, and that we may use those freely for our calculations. What the code does is straightforward: It loads the value that r0 and r1 point to in r2 and r3, then it stores them back in exchanged order, then it branches back.
And So On
That's it: Write small snippets, get enough info to roughly understand what's going on in each line, repeat. Hope that helps!

Function pointers in embedded systems, are they useful?

In an interview they asked me if using function pointers would be beneficial (in terms of speed) when writing code for embedded systems? I had no idea on embedded system so could not answer the question. Just a cloudy or vague answer.
So what are the real benefits? Speed, readability, maintenance,cost?
I think perhaps Viren Shakya's answer misses the point that the interviewer was trying to elicit. In some constructs the use of a function pointer may speed up execution. For example, if you have an index, using that to index an array of function pointers may be faster than a large switch.
If however you are comparing a static function call with a call through a pointer then Viren is right in pointing out that there is an additional operation to load the pointer variable. But no one reasonably tries to use a function pointer in that way (just as an alternative to calling directly).
Calling a function through a pointer is not an alternative to a direct call. So, the question of "advantage" is flawed; they are used in different circumstances, often to simplify other code logic and control flow and not to merely avoid a static function call. Their usefulness is in that the determination of the function to be called is performed dynamically at run-time by your code rather than statically by the linker. In that sense they are of course useful in embedded systems but not for any reason related to embedded systems specifically.
There are many uses.
The single-most important use of function pointers in embedded systems is to create vector tables. Many MCU architectures use a table of addresses located in NVM, where each address points to an ISR (interrupt service routine). Such a vector table can be written in C as an array of function pointers.
Function pointers are also useful for callback functions. As an example from the real world, the other day I was writing a driver for an on-chip realtime clock. There was only one clock on the chip, but I needed many timers. This was solved by saving a counter for each software timer, which was increased by the realtime clock interrupt. The data type looked something like this:
typedef struct
{
uint16_t counter;
void (*callback)(void);
} Timer_t;
When the hardware timer were equal with the software timer, the callback function specified by the user was called, through the function pointer stored together with the counter. Something like the above is quite a common construct in embedded systems.
Function pointers are also useful when creating bootloaders etc, where you will be writing code into NVM in runtime and then call it. You can do this through a function pointer, but never through a linked function, as the code isn't actually there at link time.
Function pointers are of course, as already mentioned, useful for many optimizations, like optimizing away a switch statement where each "case" is an adjacent number.
Another thing to consider is that this question would be a good opportunity to demonstrate how you go about making design decisions during the development process. One response I could imagine giving would be turning around and considering what your implementation alternatives are. Taking a page from Casey's and Lundin's answers, I've found callback functions very useful in isolating my modules from each other and making code changes easier because my code is in a perpetual prototyping stage and things change quickly and often. What my current concerns are is ease of development, not so much speed.
In my case my code generally involves having multiple modules which need to signal each other to synchronize the order of operations. Previously I had implemented this as a whole slew of flags and data structures with extern linkage. With this implementation, two issues generally sucked up my time:
Since any module can touch the extern variables a lot of my time is spent policing each module to make sure those variables are being used as intended.
If another developer introduced a new flag, I found myself diving through multiple modules looking for the original declaration and (hopefully) a usage description in the comments.
With callback functions that problem goes away because the function becomes the signalling mechanism and you take advantage of these benefits:
Module interactions are enforced by function interfaces and you can test for pre/post-conditions.
Less need for globally shared data structures as the callback serves as that interface to outside modules.
Reduced coupling means I can swap out code relatively easier.
At the moment I'll take the performance hit as my device still performs adequately even with all the extra function calls. I'll consider my alternatives when that performance begins to become a bigger issue.
Going back to the interview question, even though you may not be as technically proficient in the nuts and bolts of function pointers, I would think you'd still be a valuable candidate knowing you're cognizant of the tradeoffs made during the design process.
You gain on speed but lose some on readability and maintenance. Instead of a if-then-else tree, if a then fun_a(), else if b then fun_b(), else if c then fun_c() else fun_default(), and having to do that every time, instead if a then fun=fun_a, else if b then fun=fun_b, etc and you do that one time, from then on just call fun(). Much faster. As pointed out you cannot inline, which is another speed trick but inlining on the if-then-else tree doesnt necessarily make it faster than without inlining and is generall not as fast as the function pointer.
You lose a little readability and maintenance because you have to figure out where fun() is set, how often it changes if ever, insure you dont call it before it is setup, but it is still a single searchable name you can use to find and maintain all the places it is used.
It is basically a speed trick to avoid if-then-else trees every time you want to perform a function. If performance is not critical, if nothing else fun() could be static and have the if-then-else tree in it.
EDIT Adding some examples to explain what I was talking about.
extern unsigned int fun1 ( unsigned int a, unsigned int b );
unsigned int (*funptr)(unsigned int, unsigned int);
void have_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
funptr=fun1;
j=fun1(z,5);
j=funptr(y,6);
}
Compiling gives this:
have_fun:
stmfd sp!, {r3, r4, r5, lr}
.save {r3, r4, r5, lr}
ldr r4, .L2
mov r5, r1
mov r0, r2
mov r1, #5
ldr r2, .L2+4
str r2, [r4, #0]
bl fun1
ldr r3, [r4, #0]
mov r0, r5
mov r1, #6
blx r3
ldmfd sp!, {r3, r4, r5, pc}
What I assume Clifford was talking about is that a direct call, if near
enough (depending on the architecture), is one instruction
bl fun1
Where a function pointer, is going to cost you at least two
ldr r3, [r4, #0]
blx r3
I had also mentioned the difference between direct and indirect was
the extra load you incur.
Before moving on it is worth mentioning the pros and cons of inlining.
In the case of ARM which is what these examples are using, the calling
convention uses r0-r3 for incoming parameters to a function and r0
to return. So entry into have_fun() with three parameters means r0-r3
have content. With ARM it is also assumed that a function can destroy
r0-r3, so have_fun() needs to preserve the inputs and then place the
two inputs to fun1() in r0 and r1, so a bit of a register dance happens.
mov r5, r1
mov r0, r2
mov r1, #5
ldr r2, .L2+4
str r2, [r4, #0]
bl fun1
The compiler was smart enough to see that we never needed the first
input to the have_fun() function, so r0 was discarded and allowed to
be changed right away. Also the compiler was smart enough to know
that we would never need the third parameter, z (r2), after sending
it to fun1() on the first call, so it didnt need to save it in a high
register. R1 though, the second parameter to have_fun() does need to
be preserved so it is put in a regsiter that wont get destroyed by fun1().
You can see the same kind of thing happen for the second function call.
Assuming fun1() is this simple function:
inline unsigned int fun1 ( unsigned int a, unsigned int b )
{
return(a+b);
}
When you inline fun1() you get something like this:
stmfd sp!, {r4, lr}
mov r0, r1
mov r1, #6
add r4, r2, #5
The compiler does not need to shuffle the lower registers about to
prepare for a call. Likewise what you may have noticed is that r4 and
lr are preserved on the stack when we enter hello_fun(). With this
ARM calling convention a function can destroy r0-r3 but must preserve
all the other registers, since have_fun() in this case needed more
than four registers to do its thing it saved the contents of r4 on the
stack so that it could use it. Likewise this function as I compiled
it did call another function, the bl/blx instruction uses/destroys the
lr register (r14) so in order for have_fun() to return we also have
to preserve lr on the stack. The simplified example for fun1() did
not show this but another savings you get from inlining is that on entry
the function called does not have to set up a stack frame and preserve
registers, it really is as if you took the code from the function and
shoved it inline with the calling function.
Why wouldnt you inline all the time? Well first it can and will use
more registers and that can lead to more stack use, and stack is slow
relative to registers. Most important though is that it increases the
size of your binary, if fun1() was a good sized function and you called
it 20 times in have_fun() your binary would be considerably larger. For
modern computers with gigabytes of ram, a few hundred or few dozen thousand
bytes is no big deal, but for embedded with limited resources this can
make or break you. On a modern gigahertz multicore desktop, how often
do you need to shave an instruction or five anyway? Sometimes yes but
not all the time for every function. So just because you probably can
get away with it on a desktop you probably should not.
Back to function pointers. So the point I was trying to make with my
answer is, what situations would you likely want to use a function pointer
anyway, what are the use cases and in those use cases how much does
it help or hurt?
The kinds of cases I was thinking of are plugins, or code specific to
a calling parameter or generic code reacting to specific hardware
detected. For example, a hypothetical tar program may want to output
to a tape drive, file system, or other and you may choose to write the
code with generic functions called using function pointers. Upon entry
to the program the command line parameters indicate the output and at
that point you set the function pointers to the device specific
functions.
if(outdev==OUTDEV_TAPE) data_out=data_out_tape;
else if(outdev==OUTDEV_FILE)
{
//open the file, etc
data_out=data_out_file;
}
...
Or perhaps you dont know if you are running on a processor with an
fpu or which fpu type you have but you know that a floating point divide
you want to do can run much faster using the fpu:
if(fputype==FPU_FPA) fdivide=fdivide_fpa;
else if(fputype==FPU_VFP) fdivide=fdivide_vfp;
else fdivide=fdivide_soft;
And absolutely you can use a case statement instead of an if-then-else
tree, pros and cons to each, some compilers turn a case statement int
an if-then-else tree anyway, so it doesnt always matter. The point I
was trying to make is if you do this one time:
if(fputype==FPU_FPA) fdivide=fdivide_fpa;
else if(fputype==FPU_VFP) fdivide=fdivide_vfp;
else fdivide=fdivide_soft;
And do this every where else in the program:
a=fdivide(b,c);
Compared to a non-function-pointer alternative where you do this every
where you want to divide:
if(fputype==FPU_FPA) a=fdivide_fpa(b,c);
else if(fputype==FPU_VFP) a=fdivide_vfp(b,c);
else a=fdivide_soft(b,c);
The function pointer approach, even though it costs you an extra ldr
on each call, is a lot cheaper than the many instructions required for
the if-then-else tree. You pay a little up front to setup the fdivide
pointer one time then pay an extra ldr on each instance, but overall
it is faster than this:
unsigned int fun1 ( unsigned int a, unsigned int b );
unsigned int fun2 ( unsigned int a, unsigned int b );
unsigned int fun3 ( unsigned int a, unsigned int b );
unsigned int (*funptr)(unsigned int, unsigned int);
unsigned int have_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
switch(x)
{
default:
case 1: j=fun1(y,z); break;
case 2: j=fun2(y,z); break;
case 3: j=fun3(y,z); break;
}
return(j);
}
unsigned int more_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
j=funptr(y,z);
return(j);
}
gives us this:
cmp r0, #2
beq .L3
cmp r0, #3
beq .L4
mov r0, r1
mov r1, r2
b fun1
.L3:
mov r0, r1
mov r1, r2
b fun2
.L4:
mov r0, r1
mov r1, r2
b fun3
instead of this
mov r0, r1
ldr r3, .L7
mov r1, r2
blx r3
For the default case the if-then-else tree burns two compares and two
beq's before calling the function directly. Basically sometimes the
if-then-else tree will be faster and sometimes the function pointer
is faster.
Another comment I made is what if you used inlining to make that
if-then-else tree faster, instead of a function pointer, inlining is
always faster right?
unsigned int fun1 ( unsigned int a, unsigned int b )
{
return(a+b);
}
unsigned int fun2 ( unsigned int a, unsigned int b )
{
return(a-b);
}
unsigned int fun3 ( unsigned int a, unsigned int b )
{
return(a&b);
}
unsigned int have_fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int j;
switch(x)
{
default:
case 1: j=fun1(y,z); break;
case 2: j=fun2(y,z); break;
case 3: j=fun3(y,z); break;
}
return(j);
}
gives
have_fun:
cmp r0, #2
rsbeq r0, r2, r1
bxeq lr
cmp r0, #3
addne r0, r2, r1
andeq r0, r2, r1
bx lr
LOL, ARM got me on that one. That is nice. You can imagine though
for a generic processor you would get something like
cmp r0, #2
beq .L3
cmp r0, #3
beq .L4
and r0,r1,r2
bx lr
.L3:
sub r0,r1,r2
bx lr
.L4:
add r0,r1,r2
bx lr
You still burn the compares, the more cases you have the longer the
if-then-else tree. It doesnt take much for the average case to take
longer than a function pointer solution.
mov r0, r1
ldr r1, .L7
ldr r3,[r1]
mov r1, r2
blx r3
Then I also mentioned readability and maintenance, using the function
pointer approach you need to always be aware of whether or not
the function pointer has been assigned before using it. You cannot
always just grep for that function name and find what you are looking
for in someone elses code, ideally you find one place where that
pointer is assigned, then you can grep for the real function names.
Yes there are many other use cases for function pointers, and the
ones I have described can be solved in many other ways, efficient
or not. I was trying to give the poster some ideas on how to think
through different scenarios.
I think the most important answer to this interview question is not
that there is a right or wrong answer, because I think there is not.
But to see what the interviewee knows about what compilers do or dont
do, the kinds of things I described above. The interview question
to me is a few questions, do you understand what the compiler actually
does, what instructions it generates. Do you understand that
fewer or more instructions is not necessarily faster. do you understand
these differences across different processors, or do you at least have
a working knowledge for at least one processor. Then it goes on to
readability and maintenance. That is another stream of questions that
has to do with your experience in reading other peoples code, and
then maintaining your own code or other peoples code. It is a cleverly
designed question in my opinion.
I would have said that they are beneficial (in terms of speed) in any environment, not just embedded. The idea being that once the pointer has been pointed at the correct function, there is no further decision logic required in order to call that function.
Yes, they are useful. I'm not sure what the interviewer was getting at. Basically it is irrelevant if the system is embedded or not. Unless you have a severely limited stack.
Speed No, the fastest system would be a single function, and only use global variables and goto's scattered throughout. Good luck with that.
Readability Yes, it might confuse some people, but overall certain code is more readable with function pointers. It will also allow you to increase the separation of concerns between the various aspects of source code.
Maintainability Yes, with function pointers you will have less conditionals, less duplicated code, increased seperation of code, and generally more orthogonal software.
One negative part of function pointers is that they will never be inlined at the callsites. This may or may not be beneficial, depending if you are compiling for speed or size. If the latter, they should be no different to normal function calls.
Another disadvantage of function pointers (with respect to virtual functions since they are nothing but function pointers at the core level):
making a function inline && virtual will force compiler to create out-of-line copy of the same function. This will increase the size of the final binary (assuming heavy use of it is done).
Rule of thumb: 1: Don't make virtual calls inline
That was a trick question. There are industries where pointers are forbidden.
Let's see...
Speed (say we are on ARM): then (theoretically):
(Normal Function Call ARM instruction size) < (Function Pointer Call-setup instruction(s) size)
Since their is a additional level of indirection to setup a function pointer call, it will involve an additional ARM instruction.
PS: A normal function call: a function call that is set up with BL.
PSS: Don't know actual sizes for them but it should be easy to verify.