I'm running some code under Valgrind, compiled with gcc 7.5 targeting an aarch64 (ARM 64 bits) architecture, with optimizations enabled.
I get the following error:
==3580== Invalid write of size 8
==3580== at 0x38865C: ??? (in ...)
==3580== Address 0x1ffeffdb70 is on thread 1's stack
==3580== 16 bytes below stack pointer
This is the assembly dump in the vicinity of the offending code:
388640: a9bd7bfd stp x29, x30, [sp, #-48]!
388644: f9000bfc str x28, [sp, #16]
388648: a9024ff4 stp x20, x19, [sp, #32]
38864c: 910003fd mov x29, sp
388650: d1400bff sub sp, sp, #0x2, lsl #12
388654: 90fff3f4 adrp x20, 204000 <_IO_stdin_used-0x4f0>
388658: 3dc2a280 ldr q0, [x20, #2688]
38865c: 3c9f0fe0 str q0, [sp, #-16]!
I'm trying to ascertain whether this is a possible bug in my code (note that I've thoroughly reviewed my code and I'm fairly confident it's correct), or whether Valgrind will blindly report any writes below the stack pointer as an error.
Assuming the latter, it looks like a Valgrind bug since the offending instruction at 0x38865c uses the pre-decrement addressing mode, so it's not actually writing below the stack pointer.
Furthermore, at address 0x388640 a similar access (and again with pre-decrement addressing mode) is performed, yet this isn't reported by Valgrind; the main difference being the use of an x register at address 0x388640 versus a q register at address 38865c.
I'd also like to draw attention to the large stack pointer subtraction at 0x388650, which may or may not have anything to do with the issue (note this subtraction makes sense, given that the offending C code declares a large array on the stack).
So, will anyone help me make sense of this, and whether I should worry about my code?
The code looks fine, and the write is certainly not below the stack pointer. The message seems to be a valgrind bug, possibly #432552, which is marked as fixed. OP confirms that the message is not produced after upgrading valgrind to 3.17.0.
code declares a large array on the stack
should [I] worry about my code?
I think it depends upon your desire for your code to be more portable.
Take this bit of code that I believe represents at least one important thing you mentioned in your post:
#include <stdio.h>
#include <stdlib.h>
long long foo (long long sz, long long v) {
long long arr[sz]; // allocating a variable on the stack
arr[sz-1] = v;
return arr[sz-1];
}
int main (int argc, char *argv[]) {
long long n = atoll(argv[1]);
long long v = foo(n, n);
printf("v = %lld\n", v);
}
$ uname -mprsv
Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64 i386
$ gcc test.c
$ a.out 1047934
v = 1047934
$ a.out 1047935
Segmentation fault: 11
$ uname -snrvmp
Linux localhost.localdomain 3.19.8-100.fc20.x86_64 #1 SMP Tue May 12 17:08:50 UTC 2015 x86_64 x86_64
$ gcc test.c
$ ./a.out 2147483647
v = 2147483647
$ ./a.out 2147483648
v = 2147483648
There are at least some minor portability concerns with this code. The amount of allocatable stack memory for these two environments differs significantly. And that's only for two platforms. Haven't tried it on my Windows 10 vm but I don't think I need to because I got bit by this one a long time ago.
Beyond OP issue that was due to a Valgrind bug, the title of this question is bound to attract more people (like me) who are getting "invalid write at X bytes below stack pointer" as a legitimate error.
My piece of advice: check that the address you're writing to is not a local variable of another function (not present in the call stack)!
I stumbled upon this issue while attempting to write into the address returned by yyget_lloc(yyscanner) while outside of function yyparse (the former returns the address of a local variable in the latter).
I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine.
Here's the code:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds ; ds = cs
pop es ; es = 0xB800
jmp start
; input = di (position*2), ax (character and attributes)
putchar:
stosw
ret
; input = si (NUL-terminated string)
print:
cli
cld
.nextChar:
lodsb ; mov al, [ds:si] ; si += 1
test al, al
jz .finish
call putchar
jmp .nextChar
.finish:
sti
ret
start:
mov ah, 0x0E
mov di, 8
; should print P
mov al, byte [msg]
call putchar
; should print A
mov al, byte [msg + 1]
call putchar
; should print O
mov al, byte [msg + 2]
call putchar
; should print !
mov al, byte [msg + 3]
call putchar
; should print X
mov al, 'X'
call putchar
; should print Y
mov al, 'Y'
call putchar
cli
hlt
msg: db 'PAO!', 0
; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0
; Header
db 0x55
db 0xAA
The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.
It prints something like:
The ORG Directive
When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .
If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.
We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
Example using ORG 0x0000:
BITS 16
ORG 0x0000
start:
push cs
pop ds ; DS=CS
push 0xb800
pop es ; ES = 0xB800 (video memory)
mov ah, 0x0E ; AH = Attribute (yellow on black)
mov al, byte [msg]
mov [es:0x00], ax ; This should print letter 'P'
mov al, byte [msg+1]
mov [es:0x02], ax ; This should print letter 'A'
mov al, 'O'
mov [es:0x04], ax ; This should print letter 'O'
mov al, '!'
mov [es:0x06], ax ; This should print letter '!'
cli
hlt
msg: db "PA"
; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55
If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.
VirtualBox / CS:IP / Segment:Offset Pairs
In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .
If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .
Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.
What is Going Wrong with Your Bootloader?
Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin
00000000 <.data>:
0: 0e push cs
1: 1f pop ds
2: 68 00 b8 push 0xb800
5: 07 pop es
6: b4 0e mov ah,0xe
8: a0 24 00 mov al,ds:0x24
b: 26 a3 00 00 mov es:0x0,ax
f: a0 25 00 mov al,ds:0x25
12: 26 a3 02 00 mov es:0x2,ax
16: b0 4f mov al,0x4f
18: 26 a3 04 00 mov es:0x4,ax
1c: b0 21 mov al,0x21
1e: 26 a3 06 00 mov es:0x6,ax
22: fa cli
23: f4 hlt
24: 50 push ax ; Letter 'P'
25: 41 inc cx ; Letter 'A'
...
1fe: 55 push bp
1ff: aa stos BYTE PTR es:[di],al
Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).
If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:
0: 0e push cs
1: 1f pop ds
Since CS was zero, then DS is zero. Now let us look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!
There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:
push cs
pop ds ; DS=CS
With:
push 0x07c0
pop ds ; DS=0x07c0
This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:
8: a0 24 00 mov al,ds:0x24
Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.
Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.
Why Do Immediate Values Print?
With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?
Let us examine the disassembly of the 3rd character O more carefully:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.
Ross Ridge's Suggestion and Why it Works in VirtualBox
Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?
Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 0e push cs
7c01: 1f pop ds
7c02: 68 00 b8 push 0xb800
7c05: 07 pop es
7c06: b4 0e mov ah,0xe
7c08: a0 24 7c mov al,ds:0x7c24
7c0b: 26 a3 00 00 mov es:0x0,ax
7c0f: a0 25 7c mov al,ds:0x7c25
7c12: 26 a3 02 00 mov es:0x2,ax
7c16: b0 4f mov al,0x4f
7c18: 26 a3 04 00 mov es:0x4,ax
7c1c: b0 21 mov al,0x21
7c1e: 26 a3 06 00 mov es:0x6,ax
7c22: fa cli
7c23: f4 hlt
7c24: 50 push ax ; Letter 'P'
7c25: 41 inc cx ; Letter 'A'
...
7dfe: 55 push bp
7dff: aa stos BYTE PTR es:[di],al
VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.
Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:
7c08: a0 24 7c mov al,ds:0x7c24
DS=0x07c0 in this scenario. We now have something resembling mov al,0x07c0:0x7c24 when the program actually runs. Ut-oh, that looks bad. What does that translate to as a physical address? (0x07c0<<4)+0x7c24 = 0x0F824. That is somewhere above our bootloader and it will contain whatever happens to be there after the computer boots. Likely zeros, but it should be assumed to be garbage. Clearly not where our msg string was loaded!
So how do we resolve this? To amend what Ross Ridge suggested, and to heed the advice I previously gave about explicitly setting DS to the segment we really want (don't assume CS is correct and then blindly copy to DS) we should place 0x0000 into DS when our bootloader starts if we use ORG 0x7c00. So we can change this code:
ORG 0x7c00
start:
push cs
pop ds ; DS=CS
to:
ORG 0x7c00
start:
xor ax, ax ; ax=0x0000
mov ds, ax ; DS=0x0000
Here we don't rely on an untrusted value in CS. We simply set DS to the segment value that makes sense given the ORG we used. You could have pushed 0x0000 and popped it into DS as you have been doing. I am more accustomed to zeroing out a register and moving that to DS.
By taking this approach, it doesn't matter what value in CS might have been used to reach our bootloader, the code would still reference the appropriate memory location for our data.
Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00
In my General Bootloader Tips that I wrote in a previous StackOverflow answer, tip #1 is very important:
When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x07c00 and that the boot drive number is loaded into the DL register.
The BIOS could have FAR JMP'ed (or equivalent) to our code with jmp 0x07c0:0x0000, and some emulators and real hardware do it this way. Others use jmp 0x0000:0x7c00 like VirtualBox does.
We should account for this by setting DS explicitly to what we need, and set it to what makes sense for the value we use in our ORG directive.
Summary
Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.
Your code could be fixed to use either ORG 0x0000 as you originally had it, if we set DS appropriately to 0x07c0 as previously discussed. That could look like:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x07c0
pop ds ; DS=0x07c0 since we use ORG 0x0000
pop es
Alternatively we could have used ORG 0x7c00 like this:
ORG 0x7c00
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x0000
pop ds ; DS=0x0000 since we use ORG 0x7c00
pop es
Assume I have a variable called Block_Size and without initialization.
Would
Block_Size db ?
mov DS:Block_Size, 1
be equal to
Block_Size db 1
No, Block_Size db ? has to go in the BSS or data section, not mixed in with your code.
If you wrote
my_function:
Block_Size db ?
mov DS:Block_Size, 1
...
ret
your code would crash. ? isn't really uninitialized, it's actually zeroed. So then the CPU decoded the instructions starting at my_function (e.g. after some other code ran call my_function), it would actually decode the 0 as code. (IIRC, opcode 0 is add, and then the opcode of the mov instruction would be decoded as the operand byte of add (ModR/M).)
Try assembling it, and then use a disassembler to show you how it would decode, along with the hex dump of the machine code.
db assembles a byte into the output file at the current position, just like add eax, 2 assembles 83 c0 02 into the output file.
You can't use db the way you declare variable in C
void foo() {
unsigned char Block_size = 1;
}
A non-optimizing compiler would reserve space on the stack for Block_size. Look at compiler asm output if you're curious. (But it will be more readable if you enable optimization. You can use volatile to force the compiler to actually store to memory so you can see that part of the asm in optimized code.)
Maybe related: Assembly - .data, .code, and registers...?
If you wrote
.data
Block_size db ?
.code
set_blocksize:
mov [Block_size], 1
ret
it would be somewhat like this C:
unsigned char Block_size;
void set_blocksize(void) {
Block_size = 1;
}
If you don't need something to live in memory, don't use db or dd for it. Keep it in registers. Or use Block_size equ 1 to define a constant, so you can do stuff like mov eax, Block_size + 4 instead of mov eax, 5.
Variables are a high-level concept that assembly doesn't really have. In asm, data you're working with can be in a register or in memory somewhere. Reserving static storage for it is usually unnecessary, especially for small programs. Use comments to keep track of what you put in which register.
db literally stands for "define byte" so it will put the byte there, where the move command can have you place a particular value in a register overwriting whatever else was there.
I'm new to controller coding. Please anyone help me to understand the below points.
How code executes in the controller?
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
what all the process will be execute in the controller?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct? if so when flash code move to RAM?
5.If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
I'm really confused how it works.
You say controller do you mean microcontroller?
Microcontrollers are designed to be systems on a chip, this includes the non-volatile storage where the program lives. Namely flash or some other form of rom. Just like on your x86 desktop/laptop/server there is some rom/flash in the address space of the processor at the address that the processor uses to boot. You have not specified a microcontroller so it depends on which microcontroller you are talking about as to the specific address and those details, but that doesnt matter in general they all tend to be designed to work the same way.
So there is some flash to use as a general term mapped into the address space of the processor, your reset/interrupt vector tables or start address or whatever the architecture requires PLUS your program/application are in flash in the address space. Likewise some amount of ram is there, generally you do NOT run your programs from ram like you would with your laptop/desktop/server, the rams tend to be relatively small and the flash is there for your program to live. There are exceptions, for example performance, sometimes the flash operates with wait states, and often the sram can run at the cpu rate so you might want to copy some execution time sensitive routines to ram to be run. Generally not though.
There are exceptions of course, these would include situations where the logic ideally but sometimes there is a semi-secret rom with a bootloader in the chip, but your program is loaded from outside the chip into ram then run. Sometimes you may wish to design your application that way for some reason, and having bootloaders is not uncommon, a number of microcontrollers have a chip vendor supplied bootloader in a separate flash space that you may or may not be able to replace, these allow you to do development or in circuit programming of the flash.
A microcontroller contains a processor just like your desktop/laptop/server or phone or anything else like that. It is a system on a chip rather than spread across a board, so you have the processor itself, you have some non-volatile storage as mentioned above and you have ram and the peripherals all on the same chip. So just like any other processor there are logic/design defined rules for how it boots and runs (uses a vector table of addresses or uses well known entry point addresses) but beyond that it is just machine code instructions that are executed. Nothing special. What all processes are run are the ones you write and tell it to run, it runs the software you write which at the end of the day is just machine code. Processes, functions, threads, tasks, procedures, etc these are all human terms to try to manage software development, you pick the language (although the vast majority are programmed in C with a little assembly) and the software design so long as it fits within the constraints of the system.
EDIT
So lets say I had an arm microcontroller with flash starting at address 0x00000000 and ram starting at address 0x20000000. Assume an older arm like the ARM7TDMI which was used in microcontrollers (some of which can still be purchased). So the way that processor boots is there are known addresses that execution starts for reset and for interrupts and undefined exceptions and things like that. The reset address is 0x00000000 so after reset the processor starts execution at address 0x00000000 it reads that instruction first and runs it. The next exception handler starts execution at address 0x00000004 and so on for several possible exceptions, so as you will see we have to branch out of this exception table. as the first thing we do.
here is an example program that would run but doesnt do anything interesting, just demonstrates a few things.
vectors.s
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
reset:
mov sp,#0x20000000
orr sp,sp,0x8000
bl one
hang: b hang
one.c
unsigned int hello;
unsigned int world;
extern unsigned int two ( unsigned int );
unsigned int one ( void )
{
hello=5;
world=6;
world+=two(hello);
return(hello+world);
}
two.c
extern unsigned int hello;
extern unsigned int world;
unsigned int two ( unsigned int temp )
{
hello++;
world+=2;
return(hello+world+temp);
}
memmap (the linker script)
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x10000
ram : ORIGIN = 0x20000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
and then I build it
arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
arm-none-eabi-ld vectors.o one.o two.o -T memmap -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
before we look at the linked output we can look at the individual parts
arm-none-eabi-objdump -D vectors.o
vectors.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: ebfffffe bl 0 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
That is what is in the object file, an object file is not just machine code or data, it also includes various other things, how much data there is how much program there is, it might as in this case contain label names to make debugging easier, the label "hang" and "reset" and others are not in the machine code, these are for the human to make programming easier the machine code has no notion of labels. But the object file depending on the format (there are many, elf, coff, etc) and depending on the tool and default and command line options determine how much stuff goes in this file.
Notice since we have not "linked" the program the branch to the function one() is actually incomplete as you will see in the final linked binary. The one label (function name) is not defined in this code so it cannot yet be resolved, the linker has to do it.
same story with the one function
arm-none-eabi-objdump -D one.o
one.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <one>:
0: e3a03005 mov r3, #5
4: e3a02006 mov r2, #6
8: e92d4070 push {r4, r5, r6, lr}
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
14: e1a00003 mov r0, r3
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
20: ebfffffe bl 0 <two>
24: e5943000 ldr r3, [r4]
28: e5952000 ldr r2, [r5]
2c: e0800003 add r0, r0, r3
30: e5840000 str r0, [r4]
34: e0800002 add r0, r0, r2
38: e8bd4070 pop {r4, r5, r6, lr}
3c: e12fff1e bx lr
...
that is the machine code and a disassembly that makes up the one function, the function two is not resolved in this code so it also has a placeholder as well as the global variables hello and world.
these two are getting the address of hello and world from locations
that have to be filled in by the linker
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
and these two perform the initial write of values to hello and world as the code shows
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
hello=5;
world=6;
Notice all the addresses are zero based, they have not been linked.
two is similar if you look at it yourself.
The linker script tells the linker that we want .text the program, the machine code to live at 0x00000000 and .bss to be at 0x20000000. bss is global things that are not initialized like
unsigned int this:
.data which I dont deal with here are things like
unsigned int this=5;
global things that are initialized, .bss is assumed by programmers to be zero, but I cheated here and did not zero out the .bss memory space which you will see, instead I initialized the variables in the program rather than pre-initialized them and had to do different work.
reset:
mov sp,#0x20000000
orr sp,sp,#0x8000
bl one
hang: b hang
normally a bootstrap like above would need to deal with the stack as needed (certainly in the case of baremetal microcontroller code like this) as well as zero .bss and copy .data to ram. It takes more linker and compiler magic to put the initalized variables
unsigned int like_this=7;
in flash, as we need to remember that that variable boots with the value 7 and ram is volatile, doesnt survive a power outage. so to support .data you have to tell the linker it wants to live in 0x2000xxxx but put it in flash somewhere and I will copy it over. I didnt demonstrate that here.
from the so.list output of commands above, fully linked program.
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: eb000000 bl 30 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
00000030 <one>:
30: e3a03005 mov r3, #5
34: e3a02006 mov r2, #6
38: e92d4070 push {r4, r5, r6, lr}
3c: e59f402c ldr r4, [pc, #44] ; 70 <one+0x40>
40: e59f502c ldr r5, [pc, #44] ; 74 <one+0x44>
44: e1a00003 mov r0, r3
48: e5853000 str r3, [r5]
4c: e5842000 str r2, [r4]
50: eb000008 bl 78 <two>
54: e5943000 ldr r3, [r4]
58: e5952000 ldr r2, [r5]
5c: e0800003 add r0, r0, r3
60: e5840000 str r0, [r4]
64: e0800002 add r0, r0, r2
68: e8bd4070 pop {r4, r5, r6, lr}
6c: e12fff1e bx lr
70: 20000004 andcs r0, r0, r4
74: 20000000 andcs r0, r0, r0
00000078 <two>:
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
88: e2822001 add r2, r2, #1
8c: e2833002 add r3, r3, #2
90: e52de004 push {lr} ; (str lr, [sp, #-4]!)
94: e082e003 add lr, r2, r3
98: e08e0000 add r0, lr, r0
9c: e58c2000 str r2, [r12]
a0: e5813000 str r3, [r1]
a4: e49de004 pop {lr} ; (ldr lr, [sp], #4)
a8: e12fff1e bx lr
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
Disassembly of section .bss:
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
at address 0x00000000 the address that the first instruction executes after reset for this architecture is a branch to address 0x20 and then we do more stuff and call the one() function. main() is to some extent arbitrary and in this case I can make whatever function names I want I dont need main() specifically so didnt feel like using it after reset the bootstrap calls one() and one() calls two() and then both return back.
We can see that not only did the linker put all of my program in the 0x00000000 address space, it patched up the addresses to branch to the nested functions.
28: eb000000 bl 30 <one>
50: eb000008 bl 78 <two>
It also defined the addresses for hello and there in ram
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
in the address space we asked for and patched up the functions so they could access these global variables
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
I used the disassembler, the word at 0xAC for example is not an andcs instruction it is the address 0x20000000 where we have the variable hello stored. This disassembler tries to disassemble everything, instructions or data so we know that is not instructions so just ignore the disassembly.
Now this elf file format is not the exact bytes you put in the flash when programming, some tools you use to program a flash might accept this file format and then extract from it the actual bytes that go in the flash, ignoring the rest of the file (or using it to find those bytes).
arm-none-eaby-objcopy so.elf -O binary so.bin
would create a file that represents just the data that would go in flash.
arm-none-eabi-objcopy so.elf -O binary so.bin
calvin so # hexdump so.bin
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
0000010 0005 ea00 0004 ea00 0003 ea00 0002 ea00
0000020 d202 e3a0 d902 e38d 0000 eb00 fffe eaff
0000030 3005 e3a0 2006 e3a0 4070 e92d 402c e59f
0000040 502c e59f 0003 e1a0 3000 e585 2000 e584
0000050 0008 eb00 3000 e594 2000 e595 0003 e080
0000060 0000 e584 0002 e080 4070 e8bd ff1e e12f
0000070 0004 2000 0000 2000 c02c e59f 102c e59f
0000080 2000 e59c 3000 e591 2001 e282 3002 e283
0000090 e004 e52d e003 e082 0000 e08e 2000 e58c
00000a0 3000 e581 e004 e49d ff1e e12f 0000 2000
00000b0 0004 2000
00000b4
this is dumping little endian halfwords (16 bit) but you can still see
that the machine code from above is in there and that is all that is
in there.
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
...
If/when you dump the flash back out you only have the machine code and maybe some .data depending on how you build your project. The microcontroller can as mentioned above execute this code directly from flash and that is the primary use case, and generally it is fast enough for the type of work microcontrollers are used for. Sometimes you can speed up the microcontroller, but the flash generally has a speed limit that might be slower and they might have to add wait states so that it doesnt push the flash too fast and cause corruption. And yes with some work you can copy some or all of your program to ram and run it there if you have enough resources (ram) and are that pushed for performance (and have exhausted other avenues like examining what the compiler is producing and if you can affect that with command line options or by adjusting or cleaning up your code).
Code executes on the microcontroller similar to any other microprocessor, though code if often organized separate from data (google "Harvard Architecture"). The program counter starts at the reset vector (see next answer) and advances every instruction, changing when branching instructions occur.
Typically your compiler will insert into your code a number of "vectors". These vectors usually include a "reset vector" that points at the place where your microcontroller expects the first instruction. It might be at memory location zero, or it might be elsewhere. From there, it operates on the code similar to any other computer. Every microprocessor and microcontroller expects code to start at a certain memory location upon reset, though it varies among different parts. For more information on vectors, [here's a handy reference(http://www.avrbeginners.net/architecture/int/int.html). Note the second sentence which talks about the reset vector and its address at 0x0000.
Microcontrollers are often coded in assembly language or C, so that programmers can control to the byte what code is running. Those exact processes are what will run.
This might vary from chip to chip, but with the chips I'm expert in, code is not copied to RAM to execute. Again, it's the Harvard architecture at work. Small microcontrollers might have as little as zero RAM and as much as a few Kbytes, but typically the instructions are read directly from flash. Proper programming in these environments means the heap is tiny, the stack is carefully controlled, and RAM is used very sparingly.
I recommend you pick a processor line -- I'm expert at the Atmel ATtiny and ATmega controllers -- and read their datasheets to understand in detail how they work. Atmel documentation is thorough and they also publish many application notes for specific applications, often with useful code examples. There are also internet forums dedicated to discussion and learning on the Atmel AVR line.
How code executes in the controller?
If you mean, "how does the code start executing", the answer is that once the MCU has determined that the supply voltage and clocks are ok, it will automatically start executing at the boot address. But, now we're getting into the gory details. I am mostly into MMU-less controllers such as ARM Cortex-M, 8051, PIC, AVR etc., so my answer might not apply fully to your questions.
The boot address is typically the first address in the flash for most small MCUs, but in some MCUs, the flash is expected to contain a vector at a specific location, which in turns points to the first start address. Other MCUs, such as ARM, allows the electronic designer to select if the MCU shall start executing from internal flash, external flash, system boot ROM (if such exists), enter some kind of bootloader mode etc., by setting certain pins high or low.
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
See the above answer.
what all the process will be execute in the controller?
I don't understand the question. Can you please rephrase it?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct?
This depends on the design of the firmware. If you really need to, you would copy the code from Flash to RAM and execute from RAM, but if the internal flash is large enough and you don't need to squeeze every clock of the MCU, you would simply execute from flash. It's so much easier. And safer, too, since it's harder for a bug to accidentally overwrite the code-space.
But, in case you need a lot of code, your MCU might not have enough flash to fit everything. In that case, you would need to store the code in an external flash. Depending on how price-sensitive you are, you will possibly choose an SPI-flash. Since it is impossible to execute from those flash:es, you must copy the code to RAM and execute from RAM.
if so when flash code move to RAM?
This would normally be implemented in a boot-loader, or very early in the main() function. If your RAM is smaller than the flash, you will need to implement some kind of page-swap algorithm, dynamically copying code from flash as you need it. This is basically similar to how any Linux-based MCU works, but you might need to carefully design the memory layout.
If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
Yes. You will certainly need to adjust the memory map, using compile-time switches to the linker and compiler.
I would be grateful if can explain me what's happening in the following example using printf, compiling with nasm and gcc.
Why is "sud" only printed on the screen? I don't understand, also, why is "sudobor" printed on the screen when I exchange "push 'sud'" with "push 'sudo'"?
Can someone, also explain why do i need to push esp? Is it a null, that is required to be at the end of the string in printf?
Thank you advance.
This is string.s file:
section .data
section .text
global start
extern printf
start:
push ebp
mov ebp, esp
push 'bor'
push 'sud'
push esp
call printf
mov esp, ebp
pop dword ebp
ret
this is c file:
#include <stdio.h>
#include <stdlib.h>
extern void start();
int main(void) {
start();
}
First off, thanks for blowing my mind. When I first looked at your code, I didn't believe it would work at all. Then I tried it and reproduced your results. Now it makes perfect sense to me, albeit in a twisted way. :-) I'll try to explain it.
First, let's look at the more sane way to achieve this. Define a string in the data portion of the ASM file:
section .data
string: db "Hey, is this thing on?", 0
Then push the address of that string on the stack before calling printf:
push string
call printf
So, that first parameter to printf (last parameter pushed on the stack before the call) is the pointer to the format string. What your code did was push the string on the stack, followed by the stack pointer which then pointed to the string.
Next, I'm going to replace your strings so that they are easier to track in disassembly:
push '567'
push '123'
push esp
call printf
Assemble with nasm, and then disassemble with objdump:
nasm string.s -f elf32 -o string.o
objdump -d -Mintel string.o
When you push, e.g., '123', that gets converted to a 32-bit hex digit-- 0x333231 in this case. Note that the full 32 bits are 0x00333231.
3: 68 35 36 37 00 push 0x373635
8: 68 31 32 33 00 push 0x333231
d: 54 push esp
Pushing onto the stack decrements the stack pointer. Assuming an initial stack pointer of 0x70 (contrived for simplicity), this is the state of the stack before calling printf:
64: 68: 6c: 70:
68 00 00 00 31 32 33 00 35 36 37 00 ...
So, when print is called, it uses the first parameter as the string pointer and starts printing characters until it sees a NULL (0x00).
That's why this example only prints "123" ("sud" in your original).
So let's push "1234" instead of "123". This means we are pushing the value 0x34333231. When calling printf the stack now looks like:
64: 68: 6c: 70:
68 00 00 00 31 32 33 34 35 36 37 00 ...
Now there is no NULL gap between those 2 strings on the stack and this example will print "1234567" (or "sudobor" in your original).
Implications: Try pushing "5678" instead of "567". You will probably get a segmentation fault because printf will just keep reading characters to print until it tries to read memory it doesn't have permission to read. Also, try pushing a string that is longer than 4 characters (e.g., "push '12345'"). The assembler won't let you because it can't convert that to a 32-bit number.