How embedded code executes after reset

How embedded code executes after reset - embedded

I'm new to controller coding. Please anyone help me to understand the below points.
How code executes in the controller?
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
what all the process will be execute in the controller?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct? if so when flash code move to RAM?
5.If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
I'm really confused how it works.

You say controller do you mean microcontroller?
Microcontrollers are designed to be systems on a chip, this includes the non-volatile storage where the program lives. Namely flash or some other form of rom. Just like on your x86 desktop/laptop/server there is some rom/flash in the address space of the processor at the address that the processor uses to boot. You have not specified a microcontroller so it depends on which microcontroller you are talking about as to the specific address and those details, but that doesnt matter in general they all tend to be designed to work the same way.
So there is some flash to use as a general term mapped into the address space of the processor, your reset/interrupt vector tables or start address or whatever the architecture requires PLUS your program/application are in flash in the address space. Likewise some amount of ram is there, generally you do NOT run your programs from ram like you would with your laptop/desktop/server, the rams tend to be relatively small and the flash is there for your program to live. There are exceptions, for example performance, sometimes the flash operates with wait states, and often the sram can run at the cpu rate so you might want to copy some execution time sensitive routines to ram to be run. Generally not though.
There are exceptions of course, these would include situations where the logic ideally but sometimes there is a semi-secret rom with a bootloader in the chip, but your program is loaded from outside the chip into ram then run. Sometimes you may wish to design your application that way for some reason, and having bootloaders is not uncommon, a number of microcontrollers have a chip vendor supplied bootloader in a separate flash space that you may or may not be able to replace, these allow you to do development or in circuit programming of the flash.
A microcontroller contains a processor just like your desktop/laptop/server or phone or anything else like that. It is a system on a chip rather than spread across a board, so you have the processor itself, you have some non-volatile storage as mentioned above and you have ram and the peripherals all on the same chip. So just like any other processor there are logic/design defined rules for how it boots and runs (uses a vector table of addresses or uses well known entry point addresses) but beyond that it is just machine code instructions that are executed. Nothing special. What all processes are run are the ones you write and tell it to run, it runs the software you write which at the end of the day is just machine code. Processes, functions, threads, tasks, procedures, etc these are all human terms to try to manage software development, you pick the language (although the vast majority are programmed in C with a little assembly) and the software design so long as it fits within the constraints of the system.
EDIT
So lets say I had an arm microcontroller with flash starting at address 0x00000000 and ram starting at address 0x20000000. Assume an older arm like the ARM7TDMI which was used in microcontrollers (some of which can still be purchased). So the way that processor boots is there are known addresses that execution starts for reset and for interrupts and undefined exceptions and things like that. The reset address is 0x00000000 so after reset the processor starts execution at address 0x00000000 it reads that instruction first and runs it. The next exception handler starts execution at address 0x00000004 and so on for several possible exceptions, so as you will see we have to branch out of this exception table. as the first thing we do.
here is an example program that would run but doesnt do anything interesting, just demonstrates a few things.
vectors.s
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
reset:
mov sp,#0x20000000
orr sp,sp,0x8000
bl one
hang: b hang
one.c
unsigned int hello;
unsigned int world;
extern unsigned int two ( unsigned int );
unsigned int one ( void )
{
hello=5;
world=6;
world+=two(hello);
return(hello+world);
}
two.c
extern unsigned int hello;
extern unsigned int world;
unsigned int two ( unsigned int temp )
{
hello++;
world+=2;
return(hello+world+temp);
}
memmap (the linker script)
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x10000
ram : ORIGIN = 0x20000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
and then I build it
arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
arm-none-eabi-ld vectors.o one.o two.o -T memmap -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
before we look at the linked output we can look at the individual parts
arm-none-eabi-objdump -D vectors.o
vectors.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: ebfffffe bl 0 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
That is what is in the object file, an object file is not just machine code or data, it also includes various other things, how much data there is how much program there is, it might as in this case contain label names to make debugging easier, the label "hang" and "reset" and others are not in the machine code, these are for the human to make programming easier the machine code has no notion of labels. But the object file depending on the format (there are many, elf, coff, etc) and depending on the tool and default and command line options determine how much stuff goes in this file.
Notice since we have not "linked" the program the branch to the function one() is actually incomplete as you will see in the final linked binary. The one label (function name) is not defined in this code so it cannot yet be resolved, the linker has to do it.
same story with the one function
arm-none-eabi-objdump -D one.o
one.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <one>:
0: e3a03005 mov r3, #5
4: e3a02006 mov r2, #6
8: e92d4070 push {r4, r5, r6, lr}
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
14: e1a00003 mov r0, r3
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
20: ebfffffe bl 0 <two>
24: e5943000 ldr r3, [r4]
28: e5952000 ldr r2, [r5]
2c: e0800003 add r0, r0, r3
30: e5840000 str r0, [r4]
34: e0800002 add r0, r0, r2
38: e8bd4070 pop {r4, r5, r6, lr}
3c: e12fff1e bx lr
...
that is the machine code and a disassembly that makes up the one function, the function two is not resolved in this code so it also has a placeholder as well as the global variables hello and world.
these two are getting the address of hello and world from locations
that have to be filled in by the linker
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
and these two perform the initial write of values to hello and world as the code shows
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
hello=5;
world=6;
Notice all the addresses are zero based, they have not been linked.
two is similar if you look at it yourself.
The linker script tells the linker that we want .text the program, the machine code to live at 0x00000000 and .bss to be at 0x20000000. bss is global things that are not initialized like
unsigned int this:
.data which I dont deal with here are things like
unsigned int this=5;
global things that are initialized, .bss is assumed by programmers to be zero, but I cheated here and did not zero out the .bss memory space which you will see, instead I initialized the variables in the program rather than pre-initialized them and had to do different work.
reset:
mov sp,#0x20000000
orr sp,sp,#0x8000
bl one
hang: b hang
normally a bootstrap like above would need to deal with the stack as needed (certainly in the case of baremetal microcontroller code like this) as well as zero .bss and copy .data to ram. It takes more linker and compiler magic to put the initalized variables
unsigned int like_this=7;
in flash, as we need to remember that that variable boots with the value 7 and ram is volatile, doesnt survive a power outage. so to support .data you have to tell the linker it wants to live in 0x2000xxxx but put it in flash somewhere and I will copy it over. I didnt demonstrate that here.
from the so.list output of commands above, fully linked program.
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: eb000000 bl 30 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
00000030 <one>:
30: e3a03005 mov r3, #5
34: e3a02006 mov r2, #6
38: e92d4070 push {r4, r5, r6, lr}
3c: e59f402c ldr r4, [pc, #44] ; 70 <one+0x40>
40: e59f502c ldr r5, [pc, #44] ; 74 <one+0x44>
44: e1a00003 mov r0, r3
48: e5853000 str r3, [r5]
4c: e5842000 str r2, [r4]
50: eb000008 bl 78 <two>
54: e5943000 ldr r3, [r4]
58: e5952000 ldr r2, [r5]
5c: e0800003 add r0, r0, r3
60: e5840000 str r0, [r4]
64: e0800002 add r0, r0, r2
68: e8bd4070 pop {r4, r5, r6, lr}
6c: e12fff1e bx lr
70: 20000004 andcs r0, r0, r4
74: 20000000 andcs r0, r0, r0
00000078 <two>:
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
88: e2822001 add r2, r2, #1
8c: e2833002 add r3, r3, #2
90: e52de004 push {lr} ; (str lr, [sp, #-4]!)
94: e082e003 add lr, r2, r3
98: e08e0000 add r0, lr, r0
9c: e58c2000 str r2, [r12]
a0: e5813000 str r3, [r1]
a4: e49de004 pop {lr} ; (ldr lr, [sp], #4)
a8: e12fff1e bx lr
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
Disassembly of section .bss:
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
at address 0x00000000 the address that the first instruction executes after reset for this architecture is a branch to address 0x20 and then we do more stuff and call the one() function. main() is to some extent arbitrary and in this case I can make whatever function names I want I dont need main() specifically so didnt feel like using it after reset the bootstrap calls one() and one() calls two() and then both return back.
We can see that not only did the linker put all of my program in the 0x00000000 address space, it patched up the addresses to branch to the nested functions.
28: eb000000 bl 30 <one>
50: eb000008 bl 78 <two>
It also defined the addresses for hello and there in ram
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
in the address space we asked for and patched up the functions so they could access these global variables
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
I used the disassembler, the word at 0xAC for example is not an andcs instruction it is the address 0x20000000 where we have the variable hello stored. This disassembler tries to disassemble everything, instructions or data so we know that is not instructions so just ignore the disassembly.
Now this elf file format is not the exact bytes you put in the flash when programming, some tools you use to program a flash might accept this file format and then extract from it the actual bytes that go in the flash, ignoring the rest of the file (or using it to find those bytes).
arm-none-eaby-objcopy so.elf -O binary so.bin
would create a file that represents just the data that would go in flash.
arm-none-eabi-objcopy so.elf -O binary so.bin
calvin so # hexdump so.bin
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
0000010 0005 ea00 0004 ea00 0003 ea00 0002 ea00
0000020 d202 e3a0 d902 e38d 0000 eb00 fffe eaff
0000030 3005 e3a0 2006 e3a0 4070 e92d 402c e59f
0000040 502c e59f 0003 e1a0 3000 e585 2000 e584
0000050 0008 eb00 3000 e594 2000 e595 0003 e080
0000060 0000 e584 0002 e080 4070 e8bd ff1e e12f
0000070 0004 2000 0000 2000 c02c e59f 102c e59f
0000080 2000 e59c 3000 e591 2001 e282 3002 e283
0000090 e004 e52d e003 e082 0000 e08e 2000 e58c
00000a0 3000 e581 e004 e49d ff1e e12f 0000 2000
00000b0 0004 2000
00000b4
this is dumping little endian halfwords (16 bit) but you can still see
that the machine code from above is in there and that is all that is
in there.
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
...
If/when you dump the flash back out you only have the machine code and maybe some .data depending on how you build your project. The microcontroller can as mentioned above execute this code directly from flash and that is the primary use case, and generally it is fast enough for the type of work microcontrollers are used for. Sometimes you can speed up the microcontroller, but the flash generally has a speed limit that might be slower and they might have to add wait states so that it doesnt push the flash too fast and cause corruption. And yes with some work you can copy some or all of your program to ram and run it there if you have enough resources (ram) and are that pushed for performance (and have exhausted other avenues like examining what the compiler is producing and if you can affect that with command line options or by adjusting or cleaning up your code).

Code executes on the microcontroller similar to any other microprocessor, though code if often organized separate from data (google "Harvard Architecture"). The program counter starts at the reset vector (see next answer) and advances every instruction, changing when branching instructions occur.
Typically your compiler will insert into your code a number of "vectors". These vectors usually include a "reset vector" that points at the place where your microcontroller expects the first instruction. It might be at memory location zero, or it might be elsewhere. From there, it operates on the code similar to any other computer. Every microprocessor and microcontroller expects code to start at a certain memory location upon reset, though it varies among different parts. For more information on vectors, [here's a handy reference(http://www.avrbeginners.net/architecture/int/int.html). Note the second sentence which talks about the reset vector and its address at 0x0000.
Microcontrollers are often coded in assembly language or C, so that programmers can control to the byte what code is running. Those exact processes are what will run.
This might vary from chip to chip, but with the chips I'm expert in, code is not copied to RAM to execute. Again, it's the Harvard architecture at work. Small microcontrollers might have as little as zero RAM and as much as a few Kbytes, but typically the instructions are read directly from flash. Proper programming in these environments means the heap is tiny, the stack is carefully controlled, and RAM is used very sparingly.
I recommend you pick a processor line -- I'm expert at the Atmel ATtiny and ATmega controllers -- and read their datasheets to understand in detail how they work. Atmel documentation is thorough and they also publish many application notes for specific applications, often with useful code examples. There are also internet forums dedicated to discussion and learning on the Atmel AVR line.

How code executes in the controller?
If you mean, "how does the code start executing", the answer is that once the MCU has determined that the supply voltage and clocks are ok, it will automatically start executing at the boot address. But, now we're getting into the gory details. I am mostly into MMU-less controllers such as ARM Cortex-M, 8051, PIC, AVR etc., so my answer might not apply fully to your questions.
The boot address is typically the first address in the flash for most small MCUs, but in some MCUs, the flash is expected to contain a vector at a specific location, which in turns points to the first start address. Other MCUs, such as ARM, allows the electronic designer to select if the MCU shall start executing from internal flash, external flash, system boot ROM (if such exists), enter some kind of bootloader mode etc., by setting certain pins high or low.
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
See the above answer.
what all the process will be execute in the controller?
I don't understand the question. Can you please rephrase it?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct?
This depends on the design of the firmware. If you really need to, you would copy the code from Flash to RAM and execute from RAM, but if the internal flash is large enough and you don't need to squeeze every clock of the MCU, you would simply execute from flash. It's so much easier. And safer, too, since it's harder for a bug to accidentally overwrite the code-space.
But, in case you need a lot of code, your MCU might not have enough flash to fit everything. In that case, you would need to store the code in an external flash. Depending on how price-sensitive you are, you will possibly choose an SPI-flash. Since it is impossible to execute from those flash:es, you must copy the code to RAM and execute from RAM.
if so when flash code move to RAM?
This would normally be implemented in a boot-loader, or very early in the main() function. If your RAM is smaller than the flash, you will need to implement some kind of page-swap algorithm, dynamically copying code from flash as you need it. This is basically similar to how any Linux-based MCU works, but you might need to carefully design the memory layout.
If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
Yes. You will certainly need to adjust the memory map, using compile-time switches to the linker and compiler.

Related

How do I define and use variables in the bootsector with NASM? [duplicate]

I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine.
Here's the code:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds ; ds = cs
pop es ; es = 0xB800
jmp start
; input = di (position*2), ax (character and attributes)
putchar:
stosw
ret
; input = si (NUL-terminated string)
print:
cli
cld
.nextChar:
lodsb ; mov al, [ds:si] ; si += 1
test al, al
jz .finish
call putchar
jmp .nextChar
.finish:
sti
ret
start:
mov ah, 0x0E
mov di, 8
; should print P
mov al, byte [msg]
call putchar
; should print A
mov al, byte [msg + 1]
call putchar
; should print O
mov al, byte [msg + 2]
call putchar
; should print !
mov al, byte [msg + 3]
call putchar
; should print X
mov al, 'X'
call putchar
; should print Y
mov al, 'Y'
call putchar
cli
hlt
msg: db 'PAO!', 0
; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0
; Header
db 0x55
db 0xAA
The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.
It prints something like:

The ORG Directive
When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .
If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.
We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
Example using ORG 0x0000:
BITS 16
ORG 0x0000
start:
push cs
pop ds ; DS=CS
push 0xb800
pop es ; ES = 0xB800 (video memory)
mov ah, 0x0E ; AH = Attribute (yellow on black)
mov al, byte [msg]
mov [es:0x00], ax ; This should print letter 'P'
mov al, byte [msg+1]
mov [es:0x02], ax ; This should print letter 'A'
mov al, 'O'
mov [es:0x04], ax ; This should print letter 'O'
mov al, '!'
mov [es:0x06], ax ; This should print letter '!'
cli
hlt
msg: db "PA"
; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55
If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.
VirtualBox / CS:IP / Segment:Offset Pairs
In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .
If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .
Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.
What is Going Wrong with Your Bootloader?
Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin
00000000 <.data>:
0: 0e push cs
1: 1f pop ds
2: 68 00 b8 push 0xb800
5: 07 pop es
6: b4 0e mov ah,0xe
8: a0 24 00 mov al,ds:0x24
b: 26 a3 00 00 mov es:0x0,ax
f: a0 25 00 mov al,ds:0x25
12: 26 a3 02 00 mov es:0x2,ax
16: b0 4f mov al,0x4f
18: 26 a3 04 00 mov es:0x4,ax
1c: b0 21 mov al,0x21
1e: 26 a3 06 00 mov es:0x6,ax
22: fa cli
23: f4 hlt
24: 50 push ax ; Letter 'P'
25: 41 inc cx ; Letter 'A'
...
1fe: 55 push bp
1ff: aa stos BYTE PTR es:[di],al
Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).
If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:
0: 0e push cs
1: 1f pop ds
Since CS was zero, then DS is zero. Now let us look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!
There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:
push cs
pop ds ; DS=CS
With:
push 0x07c0
pop ds ; DS=0x07c0
This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:
8: a0 24 00 mov al,ds:0x24
Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.
Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.
Why Do Immediate Values Print?
With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?
Let us examine the disassembly of the 3rd character O more carefully:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.
Ross Ridge's Suggestion and Why it Works in VirtualBox
Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?
Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 0e push cs
7c01: 1f pop ds
7c02: 68 00 b8 push 0xb800
7c05: 07 pop es
7c06: b4 0e mov ah,0xe
7c08: a0 24 7c mov al,ds:0x7c24
7c0b: 26 a3 00 00 mov es:0x0,ax
7c0f: a0 25 7c mov al,ds:0x7c25
7c12: 26 a3 02 00 mov es:0x2,ax
7c16: b0 4f mov al,0x4f
7c18: 26 a3 04 00 mov es:0x4,ax
7c1c: b0 21 mov al,0x21
7c1e: 26 a3 06 00 mov es:0x6,ax
7c22: fa cli
7c23: f4 hlt
7c24: 50 push ax ; Letter 'P'
7c25: 41 inc cx ; Letter 'A'
...
7dfe: 55 push bp
7dff: aa stos BYTE PTR es:[di],al
VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.
Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:
7c08: a0 24 7c mov al,ds:0x7c24
DS=0x07c0 in this scenario. We now have something resembling mov al,0x07c0:0x7c24 when the program actually runs. Ut-oh, that looks bad. What does that translate to as a physical address? (0x07c0<<4)+0x7c24 = 0x0F824. That is somewhere above our bootloader and it will contain whatever happens to be there after the computer boots. Likely zeros, but it should be assumed to be garbage. Clearly not where our msg string was loaded!
So how do we resolve this? To amend what Ross Ridge suggested, and to heed the advice I previously gave about explicitly setting DS to the segment we really want (don't assume CS is correct and then blindly copy to DS) we should place 0x0000 into DS when our bootloader starts if we use ORG 0x7c00. So we can change this code:
ORG 0x7c00
start:
push cs
pop ds ; DS=CS
to:
ORG 0x7c00
start:
xor ax, ax ; ax=0x0000
mov ds, ax ; DS=0x0000
Here we don't rely on an untrusted value in CS. We simply set DS to the segment value that makes sense given the ORG we used. You could have pushed 0x0000 and popped it into DS as you have been doing. I am more accustomed to zeroing out a register and moving that to DS.
By taking this approach, it doesn't matter what value in CS might have been used to reach our bootloader, the code would still reference the appropriate memory location for our data.
Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00
In my General Bootloader Tips that I wrote in a previous StackOverflow answer, tip #1 is very important:
When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x07c00 and that the boot drive number is loaded into the DL register.
The BIOS could have FAR JMP'ed (or equivalent) to our code with jmp 0x07c0:0x0000, and some emulators and real hardware do it this way. Others use jmp 0x0000:0x7c00 like VirtualBox does.
We should account for this by setting DS explicitly to what we need, and set it to what makes sense for the value we use in our ORG directive.
Summary
Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.
Your code could be fixed to use either ORG 0x0000 as you originally had it, if we set DS appropriately to 0x07c0 as previously discussed. That could look like:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x07c0
pop ds ; DS=0x07c0 since we use ORG 0x0000
pop es
Alternatively we could have used ORG 0x7c00 like this:
ORG 0x7c00
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x0000
pop ds ; DS=0x0000 since we use ORG 0x7c00
pop es

How do I start up a minimal project with STM32CubeMX? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm attempting to learn embedded development, and I'm currently playing around with a STM32F407G board.
So far, I've been able to toggle the LEDs based on the User button press using the high level driver APIs provided by the CubeMX.
However, I now want to recreate the same functionality without any API help. Instead, using the base addresses and registers provided in the reference manual, I want to basically recreate the APIs.
Thus far, I've disabled all the peripherals using the GUI:
But I feel like there's a better way to do this. I'm not entirely sure what peripherals I definitely need to even debug the code on the board.
Essentially, I want enough start-up code so that I'm able to load (flash?) the code into the microcontroller, and debug main(). Everything else (such as toggling the LEDs, detecting the User button interrupt, etc) will be something I want to take care of.

You do not need to recreate the APIs. Jest program using the registers. I do it all the time in almost all of my projects (unless using some kind of HAL is my client requirement)
You need to have:
startup code with the vector table.
CMSIS headers for the convenience.
How to archive it using the CubeMx. It is actually quite easy.
Create the project
Import to your favorite IDE.
In the project options delete the USE_HAL_DRIVER definition
Exclude from the build (or delete) all the files from the /Driver/STMxxxxx_HAL_Driver
Delete everything from the main.c file
Add:
#include "stm32f4xx.h" // CMSIS headers
int main(void)
{
}
and enjoy :)

flash.s
.cpu cortex-m4
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
so.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) dummy(ra);
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build (dont need the none-eabi, with this code you can use the arm-linux-gnueabi or whichever arm-whatever-gcc/as/ld you want (within reason))
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m4 -c so.c -o so.o
arm-none-eabi-ld -o so.elf -T flash.ld flash.o so.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
And from notes I took a while back you can use dfu-util to write your binary
dfu-util -d 0483:df11 -c 1 -i 0 -a 0 -s 0x08000000 -D myprogram.bin
The above does nothing much, but is a framework for you to add the enable of the gpio block, make the led pin an output, then turn it off and on with some code to kill time in a loop. and/or poll one gpio pin and drive another to match (use the button to light or turn off an led).
openocd connects up fine to this board/family with stlink...
you wont need interrupts or clock configs at first (if ever).
Ahh, right...After building no matter what path you take examine the binary and make sure it has a chance of working.
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000041 stmdaeq r0, {r0, r6}
8000008: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800000c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000010: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000014: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000018: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800001c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000020: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000024: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000028: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800002c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000030: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000034: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000038: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800003c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
08000040 <reset>:
8000040: f000 f808 bl 8000054 <notmain>
8000044: e7ff b.n 8000046 <hang>
08000046 <hang>:
8000046: e7fe b.n 8000046 <hang>
vector table up front with the stack pointer and thats fine, dont have to use it, ignore the stmdaeq disassembly it is just trying to disassemble the vector entries because I used objdump to examine the binary. the vector table needs odd numbers the address of the entry ORRED with one. Technically if small enough you can use 0x00000000 on these parts as that is really where it will be mapped when it boots this code, but because it is also mapped at 0x08000000, full amount of flash for these ST parts you will typically see it done like this. if you switch to another cortex-m based part from another family (NXP, Atmel/Microchip, etc) then you may need to use another address be it 0x00000000 or some other that that part family uses.
if you dont see the beginning of the binary looking like this with the stack pointer init value and the vector table, then you are not likely to have much luck with booting...no matter what library/software path you take.
Note the correct answer was given in a comment to this question. It is primarily opinion based. there are multiple library solutions and over time vendors will keep changing them for various (generally non-technical) reaasons. Professionally you should be able to go down either path, truly bare metal or some sandbox or somewhere in the middle. If you dont write all of the code and use someone elses you are still responsible for that project, so you should spend the time to dig into it and check the quality and accuracy of that code. You should be surprised by what you find, you own it you should fix it if you dont like it and/or replace it.
Neither path is automatically simpler nor faster nor more reliable. There is no right answer to this so you need to be flexible.
Both documents and library code are buggy, expect this, expect to deal with this.

Setup / Errors with Floating Point on TI AM3517 Cortex-A8

I'm getting an undefined instruction exception when executing:
0xED2D8B0E VPUSH {D8-D14}
(Note: The statement was generated by the compiler as part of C language function entry protocol.)
Initialization code:
;; Initialize VFP (if needed).
;; BL __iar_init_vfp HJ REMOVED AND REPLACED WITH BELOW
MRC p15, #0, r1, c1, c0, #2 ; r1 = Access Control Register
ORR r1, r1, #(0xf << 20) ; enable full access for p10,11
MCR p15, #0, r1, c1, c0, #2 ; Access Control Register = r1
MOV r1, #0
MCR p15, #0, r1, c7, c5, #4 ; flush prefetch buffer because of FMXR below
; and CP 10 & 11 were only just enabled
; Enable VFP itself
MOV r0,#0x40000000
FMXR FPEXC, r0 ; FPEXC = r0
I get the undefined exception when the target FPU is set up as VFPv3 or VFPV3 + NEON.
The initialization code is placed in the "cstartup.c" file, at the __iar_program_start and ?cstartup code, following this code snippet:
MRC p15,0,R1,C1,C0,0
LDR R0,=CP_DIS_MASK ;; 0xFFFFEFFA
AND R1,R1,R0
ORR R1,R1,#(1<<12)
MCR p15,0,R1,C1,C0,0
Registers (before VPUSH):
CPSR: 0x80000113
APSR: 0x80000000
SPSR: 0x000001D3
Tools:
IAR Embedded Workbench IDE & Compiler - 7.40
I-Jet debugging probe
Zoom AM3517 eval board
TI AM35X Cortex-A8 processor
Questions:
In the initialization code above, which statements are required for
NEON and which for VFP?
Are there any initialization instructions I'm missing for NEON and
VFP initialization?
Are there statements I need to place in the macro file for the debug
probe?

The code presented in the question correctly initializes the floating point processor on a Cortex-A8 processor.
The issue of getting undefined instruction exception (which led up to this question), was caused by the O.S. writing an invalid value to the FPEXC register, causing the Floating Point Processor to be disabled.

Why do I have to enable peripheral clocks one at a time?

In a minimal STM32 application I've written that writes characters to USART1, the USART doesn't seem to work when I try to enable all the clocks I need at once:
RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA
| RCC_APB2Periph_AFIO
| RCC_APB2Periph_USART1, ENABLE);
But when I enable the clocks one at a time, it works:
RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA, ENABLE);
RCC_APB2PeriphClockCmd(RCC_APB2Periph_AFIO, ENABLE);
RCC_APB2PeriphClockCmd(RCC_APB2Periph_USART1, ENABLE);
Why is this? Is there a specific order these clocks have to be enabled in? (If so, where is this documented?)
(I've left out all the code following this that initializes the GPIO pins, sets up the USART, and starts sending content, as it's the same in each application. If it's relevant, let me know and I'll include it.)
The device I'm using is the STM32F103VET6.
Since there's some interest in the assembly involved, here it is. For all three clocks at once:
00000000 <main>:
0: b590 push {r4, r7, lr}
2: b089 sub sp, #36 ; 0x24
4: af00 add r7, sp, #0
6: f244 0014 movw r0, #16389 ; 0x4005
a: 2101 movs r1, #1
c: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
And for one clock at a time:
00000000 <main>:
0: b590 push {r4, r7, lr}
2: b089 sub sp, #36 ; 0x24
4: af00 add r7, sp, #0
6: 2004 movs r0, #4
8: 2101 movs r1, #1
a: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
e: 2001 movs r0, #1
10: 2101 movs r1, #1
12: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
16: f44f 4080 mov.w r0, #16384 ; 0x4000
1a: 2101 movs r1, #1
1c: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
...
And here's RCC_APB2PeriphClockCmd:
00000000 <RCC_APB2PeriphClockCmd>:
0: 4b04 ldr r3, [pc, #16] ; (14 <RCC_APB2PeriphClockCmd+0x14>)
2: 699a ldr r2, [r3, #24]
4: b109 cbz r1, a <RCC_APB2PeriphClockCmd+0xa>
6: 4310 orrs r0, r2
8: e001 b.n e <RCC_APB2PeriphClockCmd+0xe>
a: ea22 0000 bic.w r0, r2, r0
e: 6198 str r0, [r3, #24]
10: 4770 bx lr
12: bf00 nop
14: 40021000 .word 0x40021000
0x40021000 is the base address of the RCC peripheral; the #24 offset points to the RCC_APB2ENR register, which has a bit for each clock that's being enabled. (See page 109 of RM0008 for details.)

Well, I think I figured it out, and it turned out to not be a hardware problem at all... there were a number of problems with my toolchain configuration:
I was setting -nostdlib. This was causing some global initialization code to not be generated. I'm not sure how important that was, but other issues included:
I was not passing -mthumb and other CPU options to the linker. This was causing some of the generated startup code to be garbage.
My startup file didn't contain a call to __libc_init_array. This was causing some more initialization code to be dropped at link time.
I'm still not sure why splitting up the peripheral clock initializations managed to work around this. Perhaps the change in the amount of code was bumping something to just the right alignment? Anyways, solving the underlying issues seems to have patched things up so far (although I'm still kind of suspicious of some of the remaining startup code).

You might want to let us know exactly which device you're using and/or look at the errata for that device. For example, the errata for the STM32L100x6/8/B-A (and other) devices has the following (http://www.st.com/web/en/resource/technical/document/errata_sheet/DM00097022.pdf):
2.6.1 Delay after an RCC peripheral clock enabling
Description
A delay between an RCC peripheral clock enable and the effective
peripheral enabling should be taken into account in order to manage
the peripheral read/write to registers.
This delay depends on the peripheral's mapping:
If the peripheral is mapped on AHB: the delay should be equal to 2 AHB cycles.
If the peripheral is mapped on APB: the delay should be equal to 1 + (AHB/APB prescaler) cycles.
Workarounds
Use the DSB instruction to stall the Cortex-M CPU pipeline until the instruction is completed.
Insert "n" NOPs between the RCC enable bit write and the peripheral register writes (n = 2 for AHB peripherals, n = 1 + AHB/APB
prescaler in case of APB peripherals).
This doesn't really sound like your problem but it might be related (maybe the one-at-a-time enabling introduces a delay that turns out to be necessary).

enter low power mode within u-boot, wake up on interrupt

I try to implement a low power "deep sleep" functionality into uboot on button press. Button press is handled by linux and a magic code is set to make u-boot aware of the stay asleep do not reboot"
printf ("\nDisable interrupts to restore them later\n");
rupts = disable_interrupts();
printf ("\nEnable interrupts to enable magic wakeup later\n");
enable_interrupts();
printf ("\nSuspending. Press button to restart\n");
while(probe_button()/*gpio probe*/){
#if 1
//FIXME recheck if that one actually needs an unmasked interrupt or any is ok
__asm__ __volatile__(
"mcr p15, 0, %0, c7, c0, 4\n" /* read cp15 */
"mov %0, %0"
: "=r" (tmp)
:
: "memory"
);
#else
udelay (10000);
#endif
}
if (rupts) {
printf ("\nRe-Enabling interrupts\n");
enable_interrupts();
}
Unfortunatly the power dissipation does not change at all (got power dissipation measurment tied to the chip), no matter if hotspinning is used or not. Beyond that, if I use the Wait-For-Interrupt CP15 instruction, it never wakes up. The button is attached to one of the GPIOs. The plattform is Marvell Kirkwood ARM9EJ-S based.
I enabled some CONFIG_IRQ_* manually, and create implementation for arch_init_irq() aswell as do_irq(), I think there is my issue.
According to the CP15 instruction docs it should be just enough that a interrupt gets triggered (no matter if masked or not!).
Can anyone tell me what I am doing wrong or what needs to be done beyond the code above?
Thanks a lot in advance!

I'm not sure if it is the only reason your aproach isn't working on power saving but your inline assembly isn't correct. According to this article you need to execute:
MOV R0, #0
MCR p15, 0, r0, c7, c0, 4
but your inline assembly
__asm__ __volatile__(
"mcr p15, 0, %0, c7, c0, 4\n" /* read cp15 */
"mov %0, %0"
: "=r" (tmp)
:
: "memory"
);
produces
0: ee073f90 mcr 15, 0, r3, cr7, cr0, {4}
4: e1a03003 mov r3, r3
8: e12fff1e bx lr
I am not sure what's your intent but mov r3, r3 doesn'ẗ have any effect. So you are making coprocessor call with a random value. You also need to set r3 (ARM source register for mcr) before mcr call. Btw when you put 'memory' in clobber list it means
... will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.
Try this line,
asm("MOV R0, #0\n MCR p15, 0, r0, c7, c0, 4" : : : "r0");
it produces
c: e3a00000 mov r0, #0 ; 0x0
10: ee070f90 mcr 15, 0, r0, cr7, cr0, {4}
For power saving in general, I would recommend this article at ARM's web site.
Bonus section:
A small answer to your claim on backward compability of this coprocessor supplied WFI:
ARMv7 processors (including Cortex-A8, Cortex-A9, Cortex-R4 and Cortex-M3) all implement the WFI instruction to enter "wait for interrupt" mode. On these processors, the coprocessor write used on earlier processors will always execute as a NOP. It is therefore possible to write code that will work across ARMv6K, ARMv6T2 and all profiles of ARMv7 by executing both the MCR and WFI instruction, though on ARM11MPCore this will cause "wait for interrupt" mode to be entered twice. To write fully portable code that enters "wait for interrupt" mode, the CPUID register must be read at runtime to determine whether "wait for interrupt" is available and the instruction needed to enter it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas