All flags and registers set to execute interrupt and doesn´t happen - interrupt

I have written a code for AVR Tiny 44 to execute ainteruption for timer compare routine
All registers are set, all flags ok in the AVR studio simulator but the interuption never happens
what am I missing
SEI
LDI R16,0B00000001
OUT TCCR0B,R16 ; SET THE TIMER GO
CYCLE:
NOP
NOP
NOP
RJMP CYCLE

conf_initial:
ldi r16,0b11110000 ;both compares in set when compare
out TCCR0A,r16
clr r16
out TCNT0,r16
ldi r16,0x0f
out OCR0A,r16 ;TEMP COMPARE TO 0F B
out OCR0B,r16 ;TIME COMPATE TO 0F A
LDI R16,0B00000111
OUT TIMSK0,R16
SBI DDRA,7
SBI DDRB,2
RET

Related

How do I define and use variables in the bootsector with NASM? [duplicate]

I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine.
Here's the code:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds ; ds = cs
pop es ; es = 0xB800
jmp start
; input = di (position*2), ax (character and attributes)
putchar:
stosw
ret
; input = si (NUL-terminated string)
print:
cli
cld
.nextChar:
lodsb ; mov al, [ds:si] ; si += 1
test al, al
jz .finish
call putchar
jmp .nextChar
.finish:
sti
ret
start:
mov ah, 0x0E
mov di, 8
; should print P
mov al, byte [msg]
call putchar
; should print A
mov al, byte [msg + 1]
call putchar
; should print O
mov al, byte [msg + 2]
call putchar
; should print !
mov al, byte [msg + 3]
call putchar
; should print X
mov al, 'X'
call putchar
; should print Y
mov al, 'Y'
call putchar
cli
hlt
msg: db 'PAO!', 0
; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0
; Header
db 0x55
db 0xAA
The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.
It prints something like:
The ORG Directive
When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .
If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.
We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
Example using ORG 0x0000:
BITS 16
ORG 0x0000
start:
push cs
pop ds ; DS=CS
push 0xb800
pop es ; ES = 0xB800 (video memory)
mov ah, 0x0E ; AH = Attribute (yellow on black)
mov al, byte [msg]
mov [es:0x00], ax ; This should print letter 'P'
mov al, byte [msg+1]
mov [es:0x02], ax ; This should print letter 'A'
mov al, 'O'
mov [es:0x04], ax ; This should print letter 'O'
mov al, '!'
mov [es:0x06], ax ; This should print letter '!'
cli
hlt
msg: db "PA"
; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55
If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.
VirtualBox / CS:IP / Segment:Offset Pairs
In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .
If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .
Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.
What is Going Wrong with Your Bootloader?
Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin
00000000 <.data>:
0: 0e push cs
1: 1f pop ds
2: 68 00 b8 push 0xb800
5: 07 pop es
6: b4 0e mov ah,0xe
8: a0 24 00 mov al,ds:0x24
b: 26 a3 00 00 mov es:0x0,ax
f: a0 25 00 mov al,ds:0x25
12: 26 a3 02 00 mov es:0x2,ax
16: b0 4f mov al,0x4f
18: 26 a3 04 00 mov es:0x4,ax
1c: b0 21 mov al,0x21
1e: 26 a3 06 00 mov es:0x6,ax
22: fa cli
23: f4 hlt
24: 50 push ax ; Letter 'P'
25: 41 inc cx ; Letter 'A'
...
1fe: 55 push bp
1ff: aa stos BYTE PTR es:[di],al
Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).
If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:
0: 0e push cs
1: 1f pop ds
Since CS was zero, then DS is zero. Now let us look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!
There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:
push cs
pop ds ; DS=CS
With:
push 0x07c0
pop ds ; DS=0x07c0
This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:
8: a0 24 00 mov al,ds:0x24
Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.
Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.
Why Do Immediate Values Print?
With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?
Let us examine the disassembly of the 3rd character O more carefully:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.
Ross Ridge's Suggestion and Why it Works in VirtualBox
Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?
Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 0e push cs
7c01: 1f pop ds
7c02: 68 00 b8 push 0xb800
7c05: 07 pop es
7c06: b4 0e mov ah,0xe
7c08: a0 24 7c mov al,ds:0x7c24
7c0b: 26 a3 00 00 mov es:0x0,ax
7c0f: a0 25 7c mov al,ds:0x7c25
7c12: 26 a3 02 00 mov es:0x2,ax
7c16: b0 4f mov al,0x4f
7c18: 26 a3 04 00 mov es:0x4,ax
7c1c: b0 21 mov al,0x21
7c1e: 26 a3 06 00 mov es:0x6,ax
7c22: fa cli
7c23: f4 hlt
7c24: 50 push ax ; Letter 'P'
7c25: 41 inc cx ; Letter 'A'
...
7dfe: 55 push bp
7dff: aa stos BYTE PTR es:[di],al
VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.
Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:
7c08: a0 24 7c mov al,ds:0x7c24
DS=0x07c0 in this scenario. We now have something resembling mov al,0x07c0:0x7c24 when the program actually runs. Ut-oh, that looks bad. What does that translate to as a physical address? (0x07c0<<4)+0x7c24 = 0x0F824. That is somewhere above our bootloader and it will contain whatever happens to be there after the computer boots. Likely zeros, but it should be assumed to be garbage. Clearly not where our msg string was loaded!
So how do we resolve this? To amend what Ross Ridge suggested, and to heed the advice I previously gave about explicitly setting DS to the segment we really want (don't assume CS is correct and then blindly copy to DS) we should place 0x0000 into DS when our bootloader starts if we use ORG 0x7c00. So we can change this code:
ORG 0x7c00
start:
push cs
pop ds ; DS=CS
to:
ORG 0x7c00
start:
xor ax, ax ; ax=0x0000
mov ds, ax ; DS=0x0000
Here we don't rely on an untrusted value in CS. We simply set DS to the segment value that makes sense given the ORG we used. You could have pushed 0x0000 and popped it into DS as you have been doing. I am more accustomed to zeroing out a register and moving that to DS.
By taking this approach, it doesn't matter what value in CS might have been used to reach our bootloader, the code would still reference the appropriate memory location for our data.
Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00
In my General Bootloader Tips that I wrote in a previous StackOverflow answer, tip #1 is very important:
When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x07c00 and that the boot drive number is loaded into the DL register.
The BIOS could have FAR JMP'ed (or equivalent) to our code with jmp 0x07c0:0x0000, and some emulators and real hardware do it this way. Others use jmp 0x0000:0x7c00 like VirtualBox does.
We should account for this by setting DS explicitly to what we need, and set it to what makes sense for the value we use in our ORG directive.
Summary
Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.
Your code could be fixed to use either ORG 0x0000 as you originally had it, if we set DS appropriately to 0x07c0 as previously discussed. That could look like:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x07c0
pop ds ; DS=0x07c0 since we use ORG 0x0000
pop es
Alternatively we could have used ORG 0x7c00 like this:
ORG 0x7c00
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x0000
pop ds ; DS=0x0000 since we use ORG 0x7c00
pop es

Defining variables in asm

Is the following a valid way to define variables in asm?
.globl main
.globl a, b, c, d
a: .byte 4
b: .value 7
c: .long 0x0C # 11
d: .quad 9
main:
mov $0, %eax
add a(%rip), %eax
add b(%rip), %eax
add c(%rip), %eax
add d(%rip), %eax
ret
For example, is it required/suggested to have a .TEXT section? How exactly does a(%rip) resolve to the value of $4? That seems almost like magic to me.
The default section is .text; lines before any section directive assemble into the .text section. So you do have one, and in fact you put everything in it, including your data. (Or read-only constants.) Normally you should put static constants in .rodata (or .rdata on Windows), not in .text, for performance reasons. (Mixing code and data wastes space in the I-cache and D-cache, and in TLBs.)
It doesn't resolve to an immediate $4 at assemble time, it resolves to an address. In this case, using a RIP-relative addressing mode. See what does "mov offset(%rip), %rax" do? / How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more about the fact that it means "address of symbol a with respect to RIP", not actually RIP + absolute address of the symbol.
In other cases, the symbol a does generally resolve to its (32-bit) absolute address when used as add $a, %rdi or something.
It's only at runtime that the CPU loads data (which you put there with directives like .long) from that static storage. If you changed what was in memory (e.g. with a debugger, or by running other instructions) before add c(%rip), %eax executed, it would load a different value.
You put your constant data in .text, along with the code, which is generally not what you want for performance reasons. But it means the assembler can resolve the RIP-relative addressing at assemble time instead of only using a relocation that the linker has to fill in. Although it seems GAS chooses not to resolve the references and still leaves it for the linker:
$ gcc -c foo.s
$ objdump -drwC -Matt foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 04 .byte 0x4
0000000000000001 <b>:
1: 07 (bad)
...
0000000000000003 <c>:
3: 0c 00 or $0x0,%al
...
0000000000000007 <d>:
7: 09 00 or %eax,(%rax)
9: 00 00 add %al,(%rax)
b: 00 00 add %al,(%rax)
...
000000000000000f <main>:
f: b8 00 00 00 00 mov $0x0,%eax
14: 03 05 00 00 00 00 add 0x0(%rip),%eax # 1a <main+0xb> 16: R_X86_64_PC32 a-0x4
1a: 03 05 00 00 00 00 add 0x0(%rip),%eax # 20 <main+0x11> 1c: R_X86_64_PC32 b-0x4
20: 03 05 00 00 00 00 add 0x0(%rip),%eax # 26 <main+0x17> 22: R_X86_64_PC32 c-0x4
26: 03 05 00 00 00 00 add 0x0(%rip),%eax # 2c <main+0x1d> 28: R_X86_64_PC32 d-0x4
2c: c3 retq
(Attempted disassembly of your data as instructions happens because you put them in .text. objdump -d only disassembles .text, and non-immediate constants are normally placed in .rodata.)
Linking it into a executable resolves those symbol references:
$ gcc -nostdlib -static foo.s # not a working executable, just link it without extra stuff
$ objdump -drwC -Matt a.out
... (bogus data omitted)
000000000040100f <main>:
40100f: b8 00 00 00 00 mov $0x0,%eax
401014: 03 05 e6 ff ff ff add -0x1a(%rip),%eax # 401000 <a>
40101a: 03 05 e1 ff ff ff add -0x1f(%rip),%eax # 401001 <b>
401020: 03 05 dd ff ff ff add -0x23(%rip),%eax # 401003 <c>
401026: 03 05 db ff ff ff add -0x25(%rip),%eax # 401007 <d>
40102c: c3 retq
Note the 32-bit little-endian 2's complement encoding of the relative offsets in the RIP+rel32 addressing modes. (And the comment with the absolute address, added by objdump for convenience in this disassembly output.)
BTW, most assemblers including GAS have macro facilities, so you could have used a = 4 or .equ a, 4 to define it as an assemble-time constant, instead of emitting data into the output there. Then you'd use it as add $a, %eax, which would assemble to an add $sign_extended_imm8, r/m32 opcode.
Also, all your loads are dword sized (determined by the register operand), so only 1 of them matches the size of the data directives you used. Single-step through your code and look at the high bits of EAX.
Assembly language doesn't really have variables. It has tools you can use to implement the high-level-language concept of variables, including variables with static storage class. (A label and some space in .data or .bss. Or .rodata for const "variables".)
But if you use the tools differently, you can do things like load 4 bytes that span the .byte, the .value (16-bit), and the first byte of the .long. So after the first instruction, you'll have EAX += 0x0c000704 (because x86 is little-endian). This is totally legal to write in assembler, and nothing is checking to enforce the concept of a variable as ending before the next label.
(Unless you use MASM, which does have variables; in that case you'd have had to write add eax, dword ptr [a]; without the size override MASM would complain about the mismatch between a dword register and a byte variable. Other flavours of asm syntax, like NASM and AT&T, assume you know what you're doing and don't try to be "helpful".)

How embedded code executes after reset

I'm new to controller coding. Please anyone help me to understand the below points.
How code executes in the controller?
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
what all the process will be execute in the controller?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct? if so when flash code move to RAM?
5.If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
I'm really confused how it works.
You say controller do you mean microcontroller?
Microcontrollers are designed to be systems on a chip, this includes the non-volatile storage where the program lives. Namely flash or some other form of rom. Just like on your x86 desktop/laptop/server there is some rom/flash in the address space of the processor at the address that the processor uses to boot. You have not specified a microcontroller so it depends on which microcontroller you are talking about as to the specific address and those details, but that doesnt matter in general they all tend to be designed to work the same way.
So there is some flash to use as a general term mapped into the address space of the processor, your reset/interrupt vector tables or start address or whatever the architecture requires PLUS your program/application are in flash in the address space. Likewise some amount of ram is there, generally you do NOT run your programs from ram like you would with your laptop/desktop/server, the rams tend to be relatively small and the flash is there for your program to live. There are exceptions, for example performance, sometimes the flash operates with wait states, and often the sram can run at the cpu rate so you might want to copy some execution time sensitive routines to ram to be run. Generally not though.
There are exceptions of course, these would include situations where the logic ideally but sometimes there is a semi-secret rom with a bootloader in the chip, but your program is loaded from outside the chip into ram then run. Sometimes you may wish to design your application that way for some reason, and having bootloaders is not uncommon, a number of microcontrollers have a chip vendor supplied bootloader in a separate flash space that you may or may not be able to replace, these allow you to do development or in circuit programming of the flash.
A microcontroller contains a processor just like your desktop/laptop/server or phone or anything else like that. It is a system on a chip rather than spread across a board, so you have the processor itself, you have some non-volatile storage as mentioned above and you have ram and the peripherals all on the same chip. So just like any other processor there are logic/design defined rules for how it boots and runs (uses a vector table of addresses or uses well known entry point addresses) but beyond that it is just machine code instructions that are executed. Nothing special. What all processes are run are the ones you write and tell it to run, it runs the software you write which at the end of the day is just machine code. Processes, functions, threads, tasks, procedures, etc these are all human terms to try to manage software development, you pick the language (although the vast majority are programmed in C with a little assembly) and the software design so long as it fits within the constraints of the system.
EDIT
So lets say I had an arm microcontroller with flash starting at address 0x00000000 and ram starting at address 0x20000000. Assume an older arm like the ARM7TDMI which was used in microcontrollers (some of which can still be purchased). So the way that processor boots is there are known addresses that execution starts for reset and for interrupts and undefined exceptions and things like that. The reset address is 0x00000000 so after reset the processor starts execution at address 0x00000000 it reads that instruction first and runs it. The next exception handler starts execution at address 0x00000004 and so on for several possible exceptions, so as you will see we have to branch out of this exception table. as the first thing we do.
here is an example program that would run but doesnt do anything interesting, just demonstrates a few things.
vectors.s
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
reset:
mov sp,#0x20000000
orr sp,sp,0x8000
bl one
hang: b hang
one.c
unsigned int hello;
unsigned int world;
extern unsigned int two ( unsigned int );
unsigned int one ( void )
{
hello=5;
world=6;
world+=two(hello);
return(hello+world);
}
two.c
extern unsigned int hello;
extern unsigned int world;
unsigned int two ( unsigned int temp )
{
hello++;
world+=2;
return(hello+world+temp);
}
memmap (the linker script)
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x10000
ram : ORIGIN = 0x20000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
and then I build it
arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
arm-none-eabi-ld vectors.o one.o two.o -T memmap -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
before we look at the linked output we can look at the individual parts
arm-none-eabi-objdump -D vectors.o
vectors.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: ebfffffe bl 0 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
That is what is in the object file, an object file is not just machine code or data, it also includes various other things, how much data there is how much program there is, it might as in this case contain label names to make debugging easier, the label "hang" and "reset" and others are not in the machine code, these are for the human to make programming easier the machine code has no notion of labels. But the object file depending on the format (there are many, elf, coff, etc) and depending on the tool and default and command line options determine how much stuff goes in this file.
Notice since we have not "linked" the program the branch to the function one() is actually incomplete as you will see in the final linked binary. The one label (function name) is not defined in this code so it cannot yet be resolved, the linker has to do it.
same story with the one function
arm-none-eabi-objdump -D one.o
one.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <one>:
0: e3a03005 mov r3, #5
4: e3a02006 mov r2, #6
8: e92d4070 push {r4, r5, r6, lr}
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
14: e1a00003 mov r0, r3
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
20: ebfffffe bl 0 <two>
24: e5943000 ldr r3, [r4]
28: e5952000 ldr r2, [r5]
2c: e0800003 add r0, r0, r3
30: e5840000 str r0, [r4]
34: e0800002 add r0, r0, r2
38: e8bd4070 pop {r4, r5, r6, lr}
3c: e12fff1e bx lr
...
that is the machine code and a disassembly that makes up the one function, the function two is not resolved in this code so it also has a placeholder as well as the global variables hello and world.
these two are getting the address of hello and world from locations
that have to be filled in by the linker
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
and these two perform the initial write of values to hello and world as the code shows
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
hello=5;
world=6;
Notice all the addresses are zero based, they have not been linked.
two is similar if you look at it yourself.
The linker script tells the linker that we want .text the program, the machine code to live at 0x00000000 and .bss to be at 0x20000000. bss is global things that are not initialized like
unsigned int this:
.data which I dont deal with here are things like
unsigned int this=5;
global things that are initialized, .bss is assumed by programmers to be zero, but I cheated here and did not zero out the .bss memory space which you will see, instead I initialized the variables in the program rather than pre-initialized them and had to do different work.
reset:
mov sp,#0x20000000
orr sp,sp,#0x8000
bl one
hang: b hang
normally a bootstrap like above would need to deal with the stack as needed (certainly in the case of baremetal microcontroller code like this) as well as zero .bss and copy .data to ram. It takes more linker and compiler magic to put the initalized variables
unsigned int like_this=7;
in flash, as we need to remember that that variable boots with the value 7 and ram is volatile, doesnt survive a power outage. so to support .data you have to tell the linker it wants to live in 0x2000xxxx but put it in flash somewhere and I will copy it over. I didnt demonstrate that here.
from the so.list output of commands above, fully linked program.
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: eb000000 bl 30 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
00000030 <one>:
30: e3a03005 mov r3, #5
34: e3a02006 mov r2, #6
38: e92d4070 push {r4, r5, r6, lr}
3c: e59f402c ldr r4, [pc, #44] ; 70 <one+0x40>
40: e59f502c ldr r5, [pc, #44] ; 74 <one+0x44>
44: e1a00003 mov r0, r3
48: e5853000 str r3, [r5]
4c: e5842000 str r2, [r4]
50: eb000008 bl 78 <two>
54: e5943000 ldr r3, [r4]
58: e5952000 ldr r2, [r5]
5c: e0800003 add r0, r0, r3
60: e5840000 str r0, [r4]
64: e0800002 add r0, r0, r2
68: e8bd4070 pop {r4, r5, r6, lr}
6c: e12fff1e bx lr
70: 20000004 andcs r0, r0, r4
74: 20000000 andcs r0, r0, r0
00000078 <two>:
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
88: e2822001 add r2, r2, #1
8c: e2833002 add r3, r3, #2
90: e52de004 push {lr} ; (str lr, [sp, #-4]!)
94: e082e003 add lr, r2, r3
98: e08e0000 add r0, lr, r0
9c: e58c2000 str r2, [r12]
a0: e5813000 str r3, [r1]
a4: e49de004 pop {lr} ; (ldr lr, [sp], #4)
a8: e12fff1e bx lr
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
Disassembly of section .bss:
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
at address 0x00000000 the address that the first instruction executes after reset for this architecture is a branch to address 0x20 and then we do more stuff and call the one() function. main() is to some extent arbitrary and in this case I can make whatever function names I want I dont need main() specifically so didnt feel like using it after reset the bootstrap calls one() and one() calls two() and then both return back.
We can see that not only did the linker put all of my program in the 0x00000000 address space, it patched up the addresses to branch to the nested functions.
28: eb000000 bl 30 <one>
50: eb000008 bl 78 <two>
It also defined the addresses for hello and there in ram
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
in the address space we asked for and patched up the functions so they could access these global variables
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
I used the disassembler, the word at 0xAC for example is not an andcs instruction it is the address 0x20000000 where we have the variable hello stored. This disassembler tries to disassemble everything, instructions or data so we know that is not instructions so just ignore the disassembly.
Now this elf file format is not the exact bytes you put in the flash when programming, some tools you use to program a flash might accept this file format and then extract from it the actual bytes that go in the flash, ignoring the rest of the file (or using it to find those bytes).
arm-none-eaby-objcopy so.elf -O binary so.bin
would create a file that represents just the data that would go in flash.
arm-none-eabi-objcopy so.elf -O binary so.bin
calvin so # hexdump so.bin
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
0000010 0005 ea00 0004 ea00 0003 ea00 0002 ea00
0000020 d202 e3a0 d902 e38d 0000 eb00 fffe eaff
0000030 3005 e3a0 2006 e3a0 4070 e92d 402c e59f
0000040 502c e59f 0003 e1a0 3000 e585 2000 e584
0000050 0008 eb00 3000 e594 2000 e595 0003 e080
0000060 0000 e584 0002 e080 4070 e8bd ff1e e12f
0000070 0004 2000 0000 2000 c02c e59f 102c e59f
0000080 2000 e59c 3000 e591 2001 e282 3002 e283
0000090 e004 e52d e003 e082 0000 e08e 2000 e58c
00000a0 3000 e581 e004 e49d ff1e e12f 0000 2000
00000b0 0004 2000
00000b4
this is dumping little endian halfwords (16 bit) but you can still see
that the machine code from above is in there and that is all that is
in there.
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
...
If/when you dump the flash back out you only have the machine code and maybe some .data depending on how you build your project. The microcontroller can as mentioned above execute this code directly from flash and that is the primary use case, and generally it is fast enough for the type of work microcontrollers are used for. Sometimes you can speed up the microcontroller, but the flash generally has a speed limit that might be slower and they might have to add wait states so that it doesnt push the flash too fast and cause corruption. And yes with some work you can copy some or all of your program to ram and run it there if you have enough resources (ram) and are that pushed for performance (and have exhausted other avenues like examining what the compiler is producing and if you can affect that with command line options or by adjusting or cleaning up your code).
Code executes on the microcontroller similar to any other microprocessor, though code if often organized separate from data (google "Harvard Architecture"). The program counter starts at the reset vector (see next answer) and advances every instruction, changing when branching instructions occur.
Typically your compiler will insert into your code a number of "vectors". These vectors usually include a "reset vector" that points at the place where your microcontroller expects the first instruction. It might be at memory location zero, or it might be elsewhere. From there, it operates on the code similar to any other computer. Every microprocessor and microcontroller expects code to start at a certain memory location upon reset, though it varies among different parts. For more information on vectors, [here's a handy reference(http://www.avrbeginners.net/architecture/int/int.html). Note the second sentence which talks about the reset vector and its address at 0x0000.
Microcontrollers are often coded in assembly language or C, so that programmers can control to the byte what code is running. Those exact processes are what will run.
This might vary from chip to chip, but with the chips I'm expert in, code is not copied to RAM to execute. Again, it's the Harvard architecture at work. Small microcontrollers might have as little as zero RAM and as much as a few Kbytes, but typically the instructions are read directly from flash. Proper programming in these environments means the heap is tiny, the stack is carefully controlled, and RAM is used very sparingly.
I recommend you pick a processor line -- I'm expert at the Atmel ATtiny and ATmega controllers -- and read their datasheets to understand in detail how they work. Atmel documentation is thorough and they also publish many application notes for specific applications, often with useful code examples. There are also internet forums dedicated to discussion and learning on the Atmel AVR line.
How code executes in the controller?
If you mean, "how does the code start executing", the answer is that once the MCU has determined that the supply voltage and clocks are ok, it will automatically start executing at the boot address. But, now we're getting into the gory details. I am mostly into MMU-less controllers such as ARM Cortex-M, 8051, PIC, AVR etc., so my answer might not apply fully to your questions.
The boot address is typically the first address in the flash for most small MCUs, but in some MCUs, the flash is expected to contain a vector at a specific location, which in turns points to the first start address. Other MCUs, such as ARM, allows the electronic designer to select if the MCU shall start executing from internal flash, external flash, system boot ROM (if such exists), enter some kind of bootloader mode etc., by setting certain pins high or low.
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
See the above answer.
what all the process will be execute in the controller?
I don't understand the question. Can you please rephrase it?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct?
This depends on the design of the firmware. If you really need to, you would copy the code from Flash to RAM and execute from RAM, but if the internal flash is large enough and you don't need to squeeze every clock of the MCU, you would simply execute from flash. It's so much easier. And safer, too, since it's harder for a bug to accidentally overwrite the code-space.
But, in case you need a lot of code, your MCU might not have enough flash to fit everything. In that case, you would need to store the code in an external flash. Depending on how price-sensitive you are, you will possibly choose an SPI-flash. Since it is impossible to execute from those flash:es, you must copy the code to RAM and execute from RAM.
if so when flash code move to RAM?
This would normally be implemented in a boot-loader, or very early in the main() function. If your RAM is smaller than the flash, you will need to implement some kind of page-swap algorithm, dynamically copying code from flash as you need it. This is basically similar to how any Linux-based MCU works, but you might need to carefully design the memory layout.
If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
Yes. You will certainly need to adjust the memory map, using compile-time switches to the linker and compiler.

enter low power mode within u-boot, wake up on interrupt

I try to implement a low power "deep sleep" functionality into uboot on button press. Button press is handled by linux and a magic code is set to make u-boot aware of the stay asleep do not reboot"
printf ("\nDisable interrupts to restore them later\n");
rupts = disable_interrupts();
printf ("\nEnable interrupts to enable magic wakeup later\n");
enable_interrupts();
printf ("\nSuspending. Press button to restart\n");
while(probe_button()/*gpio probe*/){
#if 1
//FIXME recheck if that one actually needs an unmasked interrupt or any is ok
__asm__ __volatile__(
"mcr p15, 0, %0, c7, c0, 4\n" /* read cp15 */
"mov %0, %0"
: "=r" (tmp)
:
: "memory"
);
#else
udelay (10000);
#endif
}
if (rupts) {
printf ("\nRe-Enabling interrupts\n");
enable_interrupts();
}
Unfortunatly the power dissipation does not change at all (got power dissipation measurment tied to the chip), no matter if hotspinning is used or not. Beyond that, if I use the Wait-For-Interrupt CP15 instruction, it never wakes up. The button is attached to one of the GPIOs. The plattform is Marvell Kirkwood ARM9EJ-S based.
I enabled some CONFIG_IRQ_* manually, and create implementation for arch_init_irq() aswell as do_irq(), I think there is my issue.
According to the CP15 instruction docs it should be just enough that a interrupt gets triggered (no matter if masked or not!).
Can anyone tell me what I am doing wrong or what needs to be done beyond the code above?
Thanks a lot in advance!
I'm not sure if it is the only reason your aproach isn't working on power saving but your inline assembly isn't correct. According to this article you need to execute:
MOV R0, #0
MCR p15, 0, r0, c7, c0, 4
but your inline assembly
__asm__ __volatile__(
"mcr p15, 0, %0, c7, c0, 4\n" /* read cp15 */
"mov %0, %0"
: "=r" (tmp)
:
: "memory"
);
produces
0: ee073f90 mcr 15, 0, r3, cr7, cr0, {4}
4: e1a03003 mov r3, r3
8: e12fff1e bx lr
I am not sure what's your intent but mov r3, r3 doesn'ẗ have any effect. So you are making coprocessor call with a random value. You also need to set r3 (ARM source register for mcr) before mcr call. Btw when you put 'memory' in clobber list it means
... will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.
Try this line,
asm("MOV R0, #0\n MCR p15, 0, r0, c7, c0, 4" : : : "r0");
it produces
c: e3a00000 mov r0, #0 ; 0x0
10: ee070f90 mcr 15, 0, r0, cr7, cr0, {4}
For power saving in general, I would recommend this article at ARM's web site.
Bonus section:
A small answer to your claim on backward compability of this coprocessor supplied WFI:
ARMv7 processors (including Cortex-A8, Cortex-A9, Cortex-R4 and Cortex-M3) all implement the WFI instruction to enter "wait for interrupt" mode. On these processors, the coprocessor write used on earlier processors will always execute as a NOP. It is therefore possible to write code that will work across ARMv6K, ARMv6T2 and all profiles of ARMv7 by executing both the MCR and WFI instruction, though on ARM11MPCore this will cause "wait for interrupt" mode to be entered twice. To write fully portable code that enters "wait for interrupt" mode, the CPUID register must be read at runtime to determine whether "wait for interrupt" is available and the instruction needed to enter it.

Add two 32-bit integers in Assembler for use in VB6

I would like to come up with the byte code in assembler (assembly?) for Windows machines to add two 32-bit longs and throw away the carry bit. I realize the "Windows machines" part is a little vague, but I'm assuming that the bytes for ADD are pretty much the same in all modern Intel instruction sets.
I'm just trying to abuse VB a little and make some things faster. So as an example of running direct assembly in VB, the hex string "8A4C240833C0F6C1E075068B442404D3E0C20800" is the assembly code for SHL that can be "injected" into a VB6 program for a fast SHL operation expecting two Long parameters (we're ignoring here that 32-bit longs in VB6 are signed, just pretend they are unsigned).
Along those same lines, what is the hex string of bytes representing assembler instructions that will do the same thing to return the sum of two 32-bit unsigned integers?
The hex code above for SHL is, according to the author:
mov eax, [esp+4]
mov cl, [esp+8]
shl eax, cl
ret 8
I spit those bytes into a file and tried unassembling them in a windows command prompt using the old debug utility, but I figured out it's not working with the newer instruction set because it didn't like EAX when I tried assembling something but it was happy with AX.
I know from comments in the source code that SHL EAX, CL is D3E0, but I don't have any reference to know what the bytes are for instruction ADD EAX, CL or I'd try it. (Though I know now that the operands have to be the same size.)
I tried flat assembler and am not getting anything I can figure out how to use. I used it to assemble the original SHL code and got a very different result, not the same bytes. Help?
I disassembled the bytes you provided and got the following code:
(__TEXT,__text) section
f:
00000000 movb 0x08(%esp),%cl
00000004 xorl %eax,%eax
00000006 testb $0xe0,%cl
00000009 jne 0x00000011
0000000b movl 0x04(%esp),%eax
0000000f shll %cl,%eax
00000011 retl $0x0008
Which is definitely more complicated than the source code the author provided. It checks that the second operand isn't too large, for example, which isn't in the code you showed at all (see Edit 2, below, for a more complete analysis). Here's a simple stdcall function that adds two arguments together and returns the result:
mov 4(%esp), %eax
add 8(%esp), %eax
ret $8
Assembling that gives me this output:
(__TEXT,__text) section
00000000 8b 44 24 04 03 44 24 08 c2 08 00
I hope those bytes do what you want them to!
Edit: Perhaps more usefully, I just did the same in C:
__attribute__((__stdcall__))
int f(int a, int b)
{
return a + b;
}
Compiled with -Oz and -fomit-frame-pointer it generates exactly the same code (well, functionally equivalent, anyway):
$ gcc -arch i386 -fomit-frame-pointer -Oz -c -o example.o example.c
$ otool -tv example.o
example.o:
(__TEXT,__text) section
_f:
00000000 movl 0x08(%esp),%eax
00000004 addl 0x04(%esp),%eax
00000008 retl $0x0008
The machine code output:
$ otool -t example.o
example.o:
(__TEXT,__text) section
00000000 8b 44 24 08 03 44 24 04 c2 08 00
Sure beats hand-writing assembly code!
Edit 2:
#ErikE asked in the comments below what would happen if a shift of 32 bits or greater was attempted. The disassembled code at the top of this answer (for the bytes provided in the original question) can be represented by the following higher-level code:
unsigned int shift_left(unsigned int a, unsigned char b)
{
if (b > 32)
return 0;
else
return a << b;
}
From this logic it's pretty easy to see that if you pass a value greater than 32 as the second parameter to the shift function, you'll just get 0 back.