What is the most efficient way to pass a local variable address to extended inline ASM for ARM-v7? - embedded

Consider this code snippet:
__asm volatile (
" MOVW R0, #0x0000 \n\t"
" MOVT R0, #0x3004 \n\t"
" LDRSH R1, [R0, #12] \n\t"
⋮
)
This is the hard-coded way for loading what's stored at address 0x30040000 + 12 into R1.
I understand I can use a list of input and output variables and a clobber list. So if I use something like this:
int16_t localVar = 10;
__asm volatile (
" <some code with %[lv]> \n\t"
" LDRSH R1, ??? \n\t"
⋮
: [lv] "r" (localVar)
⋮
)
What do I need to replace <some code with %[lv]> and/or the question marks with so that I end up with value 10 in register R1? I generally don't know how to do this. It would be great to learn the most efficient way for ARM-v7 in terms of execution time needed.

Related

Variable exporting between asm files

In asm file1 I try to export a variable and use it in another.
I've tried to find how to do that from manuals & tutorials, but no success.
So, how can I share a global variable between asm files?
// File 1
// Here is saved value of a register (r10) to a variable.
.section .data
.global r10_save
r10_save_addr: .word r10_save
.section .text
ldr r13, =r10_save_addr // Load address for the global variable to some reg (r13)
str r13, [r10] // Save r13 to global variable
// File 2
// Here the intention is to use the variable (that should have r10 value stored).
.section .data
str_r10:
.asciz "r10 = 0x"
strlen_r10 = .-str_r10
.section .text
/* Here I want to use the reference of a variable
which has got its value in other file.
*/
mov r0, $1 //
ldr r1, =str_r10 // address of text string
ldr r2, =strlen_r10 // number of bytes to write
mov r7, $4 //
swi 0
You can use extern to get the value of global varibles:
// File 2
// Here the intention is to use the variable (that should have r10 value stored).
.section .data
str_r10:
.asciz "r10 = 0x"
strlen_r10 = .-str_r10
.section .text
/* Here I want to use the reference of a variable
which has got its value in other file.
*/
.extern r10_save // or EXTERN DATA(r10_save)
mov r0, $1 //
ldr r1, =str_r10 // address of text string
ldr r2, =strlen_r10 // number of bytes to write
mov r7, $4 //
swi 0
Then you can access r10_save in the second file too.

How to create inline assembly functions in C?

I am using a MSP 432 and have to create an assembly functions in C. I tried creating the functions using __asm void PendSV_Handler{}. But that does not work and says Expected an Identifier.
Also, I am trying to run this assembler command cpsid i but it says CPSID is undefined but CPSIE i works. I am a bit confused at this point. I am fairly new to this and I am still learning.
Below is the code I am trying to make assembly. I try making the function assembly by doing __asm void PendSV_handler.
I was not sure if it would be easier to just create a asm. file with these instructions.
OSThread *volatile OS_curr;
OSThread *volatile OS_next;
void PendSV_Handler(void){
__asm__("cpsid i"
//if(OS_curr != (OSThread *)0)
"ldr r0, =OS_curr"
"ldr r0, [r0]"
"cbz r0, PendSV_restore");
// Push r4 - r11
__asm__("PUSH {r4-r11}"
"ldr r1, =OS_curr"
"ldr r1, [r1]"
// OS_curr -> sp = sp;
"str sp, [r1]");
PendSV_restore
// sp=OS_next -> sp;
__asm__("ldr r0, =OS_next;"
"ldr r0, [r0]"
"ldr r0, [r0]"
"str sp, [r13]");
// OS_curr = OS_next;
__asm__("ldr r0, =OS_next"
"ldr r1, [pc, 0xc]"
"ldr r0, =OS_curr"
"str r0, [r1]"
//Pop r4-r11
"POP {r4-r11}"
// __enable_interrupts();
"cpsie i"
//return to next thread
"bx r14");
}
Inline assembler syntax is not redefined in the C programming language, and its support and syntax is compiler specific. In GCC:
void PendSV_Handler(void)
{
__asm__("cpsid i");
//if(OS_curr != (OSThread *)0)
__asm__("ldr r0, =OS_curr");
__asm__("ldr r0, [r0]");
__asm__("cbz r0, PendSV_restore");
// Push r4 - r11
__asm__("PUSH {r4-r11}");
__asm__("ldr r1, =OS_curr");
__asm__("ldr r1, [r1]");
// OS_curr -> sp = sp;
__asm__("str sp, [r1]");
PendSV_restore:
// sp=OS_next -> sp;
__asm__("ldr r0, =OS_next;");
__asm__("ldr r0, [r0]");
__asm__("ldr r0, [r0]");
__asm__("str sp, [r13]");
// OS_curr = OS_next;
__asm__("ldr r0, =OS_next");
__asm__("ldr r1, [pc, 0xc]");
__asm__("ldr r0, =OS_curr");
__asm__("str r0, [r1]");
//Pop r4-r11
__asm__("POP {r4-r11}");
// __enable_interrupts();
__asm__("cpsie i");
//return to next thread
__asm__("bx r14");
}
Referencing
Inline assembler syntax is not redefined in the C programming language
I did in the past for a university project
inline void function(param1,param2)
{
asm volatile ("param1");
asm volatile ("param2");
}
But, if u are working on ARM, look at the instruction set to see whitch commands are possible
As an example:
If you want to write some timing critical stuff, that needs to be in the written order you could do something like this
inline void COLOUR_GLCD_write_address_data(uint8_t address, uint16_t data)
{
asm volatile ("DMB");
*(volatile uint16_t*)LCD_CMD_ADDRESS = address;
asm volatile ("DMB");
*(volatile uint16_t*)LCD_DATA_ADDRESS = data;
}
This was for sending data to an external lcd via the BUS Interface on an Atmel SAME70.
Hope that helped =)

Assembly variables are deleted (16 bit x86 assembly)

I'm still playing with retro programming in turbo C for MS-DOS, and I found some trounble using variables.
If I define some variables at the start of the assembly code (in BSS or DATA), and try to use them inside the assembly function, most of the time these variables are deleted, or end up containing random data.
I learned a bit of assembly for the game boy :) and variables always worked well and never were deleted or modified, I guess x86 asm is different.
Then I tried this using inline assembly and it was a bit better, there is just one variable (width) not working.
void draw_map_column(MAP map, TILE *t){
word *tiledata = &t->data;
int *mapdata = map.data;
int width = map.width<<1;
word tile_offset = 0;
word map_offset = 0;
word screen_offset = 0;
asm{
push ds
push di
push si
mov dx,12 //column
lds bx,[tiledata]
lds si,ds:[bx] //ds:si data address
mov [tile_offset],ds
mov [tile_offset+2],si
les bx,[mapdata]
mov ax,es:[bx]
mov cl,8
shl ax,cl
add si,ax
mov di,screen_offset //es:di screen address
}
loop_tile:
asm{
mov ax,0A000h
mov es,ax
mov ax,16
}
copy_tile:
asm{
mov cx,8
rep movsw
add di,320-16
dec ax
jnz copy_tile
mov ds,[tile_offset]
mov si,[tile_offset+2]
mov ax,map_offset
add ax,[width] //"width" does never contain the value stored at the start
mov map_offset,ax
les bx,[mapdata]
add bx,ax
mov ax,es:[bx]
mov cl,8
shl ax,cl
add si,ax
dec dx
jnz loop_tile
pop si
pop di
pop ds
}
}
Just note the "witdh" variable which is not working at all, if I replace it with a number (40), the code just works as expected (this draws a column of tiles using a map array, and some tiles stored in ram).
I guess it has something to do with the push/pop etc, and something is not set as it should.
Also what happens in pure assembly? none of the variables were working. I defined them as DW and also added:
push bp
mov bp,sp
;function
mov sp,bp
pop bp
Thanks.
Well once again thanks a lot, next time I'll be more patient before asking.
Just in case this is useful for someone, I had defined a variable using the wrong size.
There are other things that can be improved, but that's another question.
Variable "tileoffset" holds a 32 bit address, so it must be a "dword", not a "word". Then the function should be like this:
void draw_map_column(MAP map, TILE *t){
word *tiledata = &t->data;
int *mapdata = map.data;
int width = map.width<<1;
dword tile_offset = 0; //changed to dword to store 32 bit address
word map_offset = 0;
word screen_offset = 0;
asm{
push ds
push di
push si
mov dx,12 //column
lds bx,[tiledata]
lds si,ds:[bx] //ds:si data address
mov word ptr[tile_offset],ds //store a word
mov word ptr[tile_offset+2],si
les bx,[mapdata]
mov ax,es:[bx]
mov cl,8
shl ax,cl
add si,ax
mov di,screen_offset //es:di screen address
}
loop_tile:
asm{
mov ax,0A000h
mov es,ax
mov ax,16
}
copy_tile:
asm{
mov cx,8
rep movsw
add di,320-16
dec ax
jnz copy_tile
mov ds,word ptr[tile_offset] //read a word to the register
mov si,word ptr[tile_offset+2]
mov ax,map_offset
add ax,[width]
mov map_offset,ax
les bx,[mapdata]
add bx,ax
mov ax,es:[bx]
mov cl,8
shl ax,cl
add si,ax
dec dx
jnz loop_tile
pop si
pop di
pop ds
}

How unwind ARM Cortex M3 stack

The ARM Coretex STM32's HardFault_Handler can only get several registers values, r0, r1,r2, r3, lr, pc, xPSR, when crash happened. But there is no FP and SP in the stack. Thus I could not unwind the stack.
Is there any solution for this? Thanks a lot.
[update]
Following a web instruction to let ARMGCC(Keil uvision IDE) generate FP by adding a compiling option "--use_frame_pointer", but I could not find the FP in the stack. I am a real newbie here. Below is my demo code:
int test2(int i, int j)
{
return i/j;
}
int main()
{
SCB->CCR |= 0x10;
int a = 10;
int b = 0;
int c;
c = test2(a,b);
}
enum { r0 = 0, r1, r2, r3, r11, r12, lr, pc, psr};
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
uint32_t r0_val = faultStackAddress[r0];
uint32_t r1_val = faultStackAddress[r1];
uint32_t r2_val = faultStackAddress[r2];
uint32_t r3_val = faultStackAddress[r3];
uint32_t r12_val = faultStackAddress[r12];
uint32_t r11_val = faultStackAddress[r11];
uint32_t lr_val = faultStackAddress[lr];
uint32_t pc_val = faultStackAddress[pc];
uint32_t psr_val = faultStackAddress[psr];
}
I have two questions here:
1. I am not sure where the index of FP(r11) in the stack, or whether it is pushed into stack or not. I assume it is before r12, because I compared the assemble source before and after adding the option "--use_frame_pointer". I also compared the values read from Hard_Fault_Handler, seems like r11 is not in the stack. Because r11 address I read points to a place where the code is not my code.
[update] I have confirmed that FP is pushed into the stack. The second question still needs to be answered.
See below snippet code:
Without the option "--use_frame_pointer"
test2 PROC
MOVS r0,#3
BX lr
ENDP
main PROC
PUSH {lr}
MOVS r0,#0
BL test2
MOVS r0,#0
POP {pc}
ENDP
with the option "--use_frame_pointer"
test2 PROC
PUSH {r11,lr}
ADD r11,sp,#4
MOVS r0,#3
MOV sp,r11
SUB sp,sp,#4
POP {r11,pc}
ENDP
main PROC
PUSH {r11,lr}
ADD r11,sp,#4
MOVS r0,#0
BL test2
MOVS r0,#0
MOV sp,r11
SUB sp,sp,#4
POP {r11,pc}
ENDP
2. Seems like FP is not in the input parameter faultStackAddress of Hard_Fault_Handler(), where can I get the caller's FP to unwind the stack?
[update again]
Now I understood the last FP(r11) is not stored in the stack. All I need to do is to read the value of r11 register, then I can unwind the whole stack.
So now my final question is how to read it using inline assembler of C. I tried below code, but failed to read the correct value from r11 following the reference of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472f/Cihfhjhg.html
volatile int top_fp;
__asm
{
mov top_fp, r11
}
r11's value is 0x20009DCC
top_fp's value is 0x00000004
[update 3] Below is my whole code.
int test5(int i, int j, int k)
{
char a[128] = {0} ;
a[0] = 'a';
return i/j;
}
int test2(int i, int j)
{
char a[18] = {0} ;
a[0] = 'a';
return test5(i, j, 0);
}
int main()
{
SCB->CCR |= 0x10;
int a = 10;
int b = 0;
int c;
c = test2(a,b); //create a divide by zero crash
}
/* The fault handler implementation calls a function called Hard_Fault_Handler(). */
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
volatile int top_fp;
__asm
{
mov top_fp, r11
}
//TODO: use top_fp to unwind the whole stack.
}
[update 4] Finally, I made it out. My solution:
Note: To access r11, we have to use embedded assembler, see here, which costs me much time to figure it out.
//we have to use embedded assembler.
__asm int getRegisterR11()
{
mov r0,r11
BX LR
}
//call it from Hard_Fault_Handler function.
/*
Function call stack frame:
FP1(r11) -> | lr |(High Address)
| FP2|(prev FP)
| ...|
Current FP(r11) ->| lr |
| FP1|(prev FP)
| ...|(Low Address)
With FP, we can access lr(link register) which is the address to return when the current functions returns(where you were).
Then (current FP - 1) points to prev FP.
Thus we can unwind the stack.
*/
void unwindBacktrace(uint32_t topFp, uint16_t* backtrace)
{
uint32_t nextFp = topFp;
int j = 0;
//#define BACK_TRACE_DEPTH 5
//loop backtrace using FP(r11), save lr into an uint16_t array.
for(int i = 0; i < BACK_TRACE_DEPTH; i++)
{
uint32_t lr = *((uint32_t*)nextFp);
if ((lr >= 0x08000000) && (lr <= 0x08FFFFFF))
{
backtrace[j*2] = LOW_16_BITS(lr);
backtrace[j*2 + 1] = HIGH_16_BITS(lr);
j += 1;
}
nextFp = *((uint32_t*)nextFp - 1);
if (nextFp == 0)
{
break;
}
}
}
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
//get back trace
int topFp = getRegisterR11();
unwindBacktrace(topFp, persistentData.faultStack.back_trace);
}
Very primitive method to unwind the stack in such case is to read all stack memory above SP seen at the time of HardFault_Handler and process it using arm-none-eabi-addr2line. All link register entries saved on stack will be transformed into source line (remember that actual code path goes the line before LR points to). Note, if functions in between were called using branch instruction (b) instead of branch and link (bl) you'll not see them using this method.
(I don't have enough reputation points to write comments, so I'm editing my answer):
UPDATE for question 2:
Why do you expect that Hard_Fault_Handler has any arguments? Hard_Fault_Handler is usally a function to which address is stored in vector (exception) table. When the processor exception happens then Hard_Fault_Handler will be executed. There is no arguments passing involved doing this. But still, all registers at the time the fault happens are preserved. Specifically, if you compiled without omit-frame-pointer you can just read value of R11 (or R7 in Thumb-2 mode). However, to be sure that in your code Hard_Fault_Handler is actually a real hard fault handler, look into startup.s code and see if Hard_Fault_Handler is at the third entry in vector table. If there is an other function, it means Hard_Fault_Handler is just called from that function explicitly. See this article for details. You can also read my blog :) There is a chapter about stack which is based on Android example, but a lot of things are the same in general.
Also note, most probably in faultStackAddress should be stored a stack pointer, not a frame pointer.
UPDATE 2
Ok, lets clarify some things. Firstly, please paste the code from which you call Hard_Fault_Handler. Secondly, I guess you call it from within real HardFault exception handler. In that case you cannot expect that R11 will be at faultStackAddress[r11]. You've already mentioned it at the first sentence in your question. There will be only r0-r3, r12, lr, pc and psr.
You've also written:
But there is no FP and SP in the stack. Thus I could not unwind the
stack. Is there any solution for this?
The SP is not "in the stack" because you have it already in one of the stack registers (msp or psp). See again THIS ARTICLE. Also, FP is not crucial to unwind stack because you can do it without it (by "navigating" through saved Link Registers). Other thing is that if you dump memory below your SP you can expect FP to be just next to saved LR if you really need it.
Answering your last question: I don't now how you're verifying this code and how you're calling it (you need to paste full code). You can look into assembly of that function and see what's happening under the hood. Other thing you can do is to follow this post as a template.

Trying to understand ARM assembly of iOS app

Consider the following Objective C interface definition
#import <Foundation/Foundation.h>
#interface MyClass : NSObject
#property NSObject* myprop;
#end
The assembly generated for ARMv7 by Xcode 5 for [MyClass myprop] looks like
.code 16 # #"\01-[MyClass myprop]"
.thumb_func "-[MyClass myprop]"
"-[MyClass myprop]":
Lfunc_begin0:
.cfi_startproc
# BB#0:
sub sp, #8
#DEBUG_VALUE: -[MyClass myprop]:self <- undef
#DEBUG_VALUE: -[MyClass myprop]:_cmd <- undef
str r0, [sp, #4]
str r1, [sp]
ldr r0, [sp, #4]
movw r2, :lower16:(_OBJC_IVAR_$_MyClass.myprop-(LPC0_0+4))
movt r2, :upper16:(_OBJC_IVAR_$_MyClass.myprop-(LPC0_0+4))
LPC0_0:
add r2, pc
ldr r2, [r2]
movs r3, #1
add sp, #8
b.w _objc_getProperty
Ltmp0:
Lfunc_end0:
.cfi_endproc
I want to understand the resulting assembly and have the following questions about it:
1) The last instruction (b.w _objc_getProperty) sets the PC to the address of the label _objc_getProperty. But how does this procedure know where to jump back? Does it assume that the method is invoked with bl, and therefore the link register contains the target address?
2) What do the 3 lines below the second #DEBUG_VALUE do?
If I understand it correctly, the content of r0 is stored at stack offset 4, r1 is stored at the current stack (offset 0), and r0 is filled with stack offset 4. But why does the last instruction changes anything? Doesn't it just mean that r0 is filled with what it already contained? What are the values used for? _obj_getProperty?
3) Why is r3 set to 1 at the end? (movs r3, #1)?
In C code the function would probably look like this:
resulttype Lfunc_begin0(type1 arg1, type2 arg2)
{
return _objc_getProperty(arg1, arg2, Myclass_myprop, 1);
}
First let's look at the following example:
int func(void)
{
a();
b();
return c();
}
It would now be possible to do the function call to "c()" the following way:
save_lr
bl a
bl b
bl c
restore_original_lr
bx lr
Alternatively it would be possible to perform the following code:
save_lr
bl a
bl b
restore_original_lr
b c
In this case the "bx lr" instruction at the end of "c()" will directly jump to the function that called ourself and we do not have to do the "bx lr".
Because some function calls in the C code may destroy the content of R0 and R1 unoptimized C code will save these registers to the stack just for the case that their value is needed later. Well-optimized C code would check this and remove the three instructions after the "#DEBUG" lines. Even the "add SP," and "sub SP," lines could be optimized away!
The _obj_getProperty function obviously takes four arguments. R0 and R1 are simply passed (as shown in the C code above) while R2 and R3 are the additional arguments for that function.
What the 4th argument (R3=1) really means cannot be seen from the assembler code shown here.