DWC OTG Controller - 0x1d Offset

DWC OTG Controller - 0x1d Offset - raspberry-pi2

If the DWC OTG Controller has base address of 0x3f980000 on the Raspberry Pi2 running Raspbian then it appears that merely loading from said address with offset 0x1d, via virtual address of 0x76ff8000 created by mmap, causes it to lock up/freeze and a power cycle is required to restore.
...
17 .baseaddr: .word 0x3f980000 // base address
...
23 // set up file and virtual map to it............ */
24 bl open_file
25 str r0, [sp, #0] // store file handler on stack
26 bl map_file
27 str r0, [sp, #8] // store virt GPIO mem address on stack
28
32 ldr r3, [sp, #8] // virt base address
34 ldr r2, [r3, #0x1d] // get contents
Once the load at line 34 takes place, the device is still detectable on LAN but wlan0 and eth0 are unresponsive and port scanning returns null.
Does anyone has any idea of what monster is crouched behind door 0x1d that's scaring the bits out of the DWC OTG controller?

Given that 0x1d creates and odd address, I would guess the system takes an address exception at that point (load 32 from odd address). You might try offset ox1c or 0x20 for better results.

Related

How do I start up a minimal project with STM32CubeMX? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm attempting to learn embedded development, and I'm currently playing around with a STM32F407G board.
So far, I've been able to toggle the LEDs based on the User button press using the high level driver APIs provided by the CubeMX.
However, I now want to recreate the same functionality without any API help. Instead, using the base addresses and registers provided in the reference manual, I want to basically recreate the APIs.
Thus far, I've disabled all the peripherals using the GUI:
But I feel like there's a better way to do this. I'm not entirely sure what peripherals I definitely need to even debug the code on the board.
Essentially, I want enough start-up code so that I'm able to load (flash?) the code into the microcontroller, and debug main(). Everything else (such as toggling the LEDs, detecting the User button interrupt, etc) will be something I want to take care of.

You do not need to recreate the APIs. Jest program using the registers. I do it all the time in almost all of my projects (unless using some kind of HAL is my client requirement)
You need to have:
startup code with the vector table.
CMSIS headers for the convenience.
How to archive it using the CubeMx. It is actually quite easy.
Create the project
Import to your favorite IDE.
In the project options delete the USE_HAL_DRIVER definition
Exclude from the build (or delete) all the files from the /Driver/STMxxxxx_HAL_Driver
Delete everything from the main.c file
Add:
#include "stm32f4xx.h" // CMSIS headers
int main(void)
{
}
and enjoy :)

flash.s
.cpu cortex-m4
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
so.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) dummy(ra);
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build (dont need the none-eabi, with this code you can use the arm-linux-gnueabi or whichever arm-whatever-gcc/as/ld you want (within reason))
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m4 -c so.c -o so.o
arm-none-eabi-ld -o so.elf -T flash.ld flash.o so.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
And from notes I took a while back you can use dfu-util to write your binary
dfu-util -d 0483:df11 -c 1 -i 0 -a 0 -s 0x08000000 -D myprogram.bin
The above does nothing much, but is a framework for you to add the enable of the gpio block, make the led pin an output, then turn it off and on with some code to kill time in a loop. and/or poll one gpio pin and drive another to match (use the button to light or turn off an led).
openocd connects up fine to this board/family with stlink...
you wont need interrupts or clock configs at first (if ever).
Ahh, right...After building no matter what path you take examine the binary and make sure it has a chance of working.
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000041 stmdaeq r0, {r0, r6}
8000008: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800000c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000010: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000014: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000018: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800001c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000020: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000024: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000028: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800002c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000030: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000034: 08000047 stmdaeq r0, {r0, r1, r2, r6}
8000038: 08000047 stmdaeq r0, {r0, r1, r2, r6}
800003c: 08000047 stmdaeq r0, {r0, r1, r2, r6}
08000040 <reset>:
8000040: f000 f808 bl 8000054 <notmain>
8000044: e7ff b.n 8000046 <hang>
08000046 <hang>:
8000046: e7fe b.n 8000046 <hang>
vector table up front with the stack pointer and thats fine, dont have to use it, ignore the stmdaeq disassembly it is just trying to disassemble the vector entries because I used objdump to examine the binary. the vector table needs odd numbers the address of the entry ORRED with one. Technically if small enough you can use 0x00000000 on these parts as that is really where it will be mapped when it boots this code, but because it is also mapped at 0x08000000, full amount of flash for these ST parts you will typically see it done like this. if you switch to another cortex-m based part from another family (NXP, Atmel/Microchip, etc) then you may need to use another address be it 0x00000000 or some other that that part family uses.
if you dont see the beginning of the binary looking like this with the stack pointer init value and the vector table, then you are not likely to have much luck with booting...no matter what library/software path you take.
Note the correct answer was given in a comment to this question. It is primarily opinion based. there are multiple library solutions and over time vendors will keep changing them for various (generally non-technical) reaasons. Professionally you should be able to go down either path, truly bare metal or some sandbox or somewhere in the middle. If you dont write all of the code and use someone elses you are still responsible for that project, so you should spend the time to dig into it and check the quality and accuracy of that code. You should be surprised by what you find, you own it you should fix it if you dont like it and/or replace it.
Neither path is automatically simpler nor faster nor more reliable. There is no right answer to this so you need to be flexible.
Both documents and library code are buggy, expect this, expect to deal with this.

How embedded code executes after reset

I'm new to controller coding. Please anyone help me to understand the below points.
How code executes in the controller?
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
what all the process will be execute in the controller?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct? if so when flash code move to RAM?
5.If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
I'm really confused how it works.

You say controller do you mean microcontroller?
Microcontrollers are designed to be systems on a chip, this includes the non-volatile storage where the program lives. Namely flash or some other form of rom. Just like on your x86 desktop/laptop/server there is some rom/flash in the address space of the processor at the address that the processor uses to boot. You have not specified a microcontroller so it depends on which microcontroller you are talking about as to the specific address and those details, but that doesnt matter in general they all tend to be designed to work the same way.
So there is some flash to use as a general term mapped into the address space of the processor, your reset/interrupt vector tables or start address or whatever the architecture requires PLUS your program/application are in flash in the address space. Likewise some amount of ram is there, generally you do NOT run your programs from ram like you would with your laptop/desktop/server, the rams tend to be relatively small and the flash is there for your program to live. There are exceptions, for example performance, sometimes the flash operates with wait states, and often the sram can run at the cpu rate so you might want to copy some execution time sensitive routines to ram to be run. Generally not though.
There are exceptions of course, these would include situations where the logic ideally but sometimes there is a semi-secret rom with a bootloader in the chip, but your program is loaded from outside the chip into ram then run. Sometimes you may wish to design your application that way for some reason, and having bootloaders is not uncommon, a number of microcontrollers have a chip vendor supplied bootloader in a separate flash space that you may or may not be able to replace, these allow you to do development or in circuit programming of the flash.
A microcontroller contains a processor just like your desktop/laptop/server or phone or anything else like that. It is a system on a chip rather than spread across a board, so you have the processor itself, you have some non-volatile storage as mentioned above and you have ram and the peripherals all on the same chip. So just like any other processor there are logic/design defined rules for how it boots and runs (uses a vector table of addresses or uses well known entry point addresses) but beyond that it is just machine code instructions that are executed. Nothing special. What all processes are run are the ones you write and tell it to run, it runs the software you write which at the end of the day is just machine code. Processes, functions, threads, tasks, procedures, etc these are all human terms to try to manage software development, you pick the language (although the vast majority are programmed in C with a little assembly) and the software design so long as it fits within the constraints of the system.
EDIT
So lets say I had an arm microcontroller with flash starting at address 0x00000000 and ram starting at address 0x20000000. Assume an older arm like the ARM7TDMI which was used in microcontrollers (some of which can still be purchased). So the way that processor boots is there are known addresses that execution starts for reset and for interrupts and undefined exceptions and things like that. The reset address is 0x00000000 so after reset the processor starts execution at address 0x00000000 it reads that instruction first and runs it. The next exception handler starts execution at address 0x00000004 and so on for several possible exceptions, so as you will see we have to branch out of this exception table. as the first thing we do.
here is an example program that would run but doesnt do anything interesting, just demonstrates a few things.
vectors.s
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
reset:
mov sp,#0x20000000
orr sp,sp,0x8000
bl one
hang: b hang
one.c
unsigned int hello;
unsigned int world;
extern unsigned int two ( unsigned int );
unsigned int one ( void )
{
hello=5;
world=6;
world+=two(hello);
return(hello+world);
}
two.c
extern unsigned int hello;
extern unsigned int world;
unsigned int two ( unsigned int temp )
{
hello++;
world+=2;
return(hello+world+temp);
}
memmap (the linker script)
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x10000
ram : ORIGIN = 0x20000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
and then I build it
arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c one.c -o one.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c two.c -o two.o
arm-none-eabi-ld vectors.o one.o two.o -T memmap -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
before we look at the linked output we can look at the individual parts
arm-none-eabi-objdump -D vectors.o
vectors.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: ebfffffe bl 0 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
That is what is in the object file, an object file is not just machine code or data, it also includes various other things, how much data there is how much program there is, it might as in this case contain label names to make debugging easier, the label "hang" and "reset" and others are not in the machine code, these are for the human to make programming easier the machine code has no notion of labels. But the object file depending on the format (there are many, elf, coff, etc) and depending on the tool and default and command line options determine how much stuff goes in this file.
Notice since we have not "linked" the program the branch to the function one() is actually incomplete as you will see in the final linked binary. The one label (function name) is not defined in this code so it cannot yet be resolved, the linker has to do it.
same story with the one function
arm-none-eabi-objdump -D one.o
one.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <one>:
0: e3a03005 mov r3, #5
4: e3a02006 mov r2, #6
8: e92d4070 push {r4, r5, r6, lr}
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
14: e1a00003 mov r0, r3
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
20: ebfffffe bl 0 <two>
24: e5943000 ldr r3, [r4]
28: e5952000 ldr r2, [r5]
2c: e0800003 add r0, r0, r3
30: e5840000 str r0, [r4]
34: e0800002 add r0, r0, r2
38: e8bd4070 pop {r4, r5, r6, lr}
3c: e12fff1e bx lr
...
that is the machine code and a disassembly that makes up the one function, the function two is not resolved in this code so it also has a placeholder as well as the global variables hello and world.
these two are getting the address of hello and world from locations
that have to be filled in by the linker
c: e59f402c ldr r4, [pc, #44] ; 40 <one+0x40>
10: e59f502c ldr r5, [pc, #44] ; 44 <one+0x44>
and these two perform the initial write of values to hello and world as the code shows
18: e5853000 str r3, [r5]
1c: e5842000 str r2, [r4]
hello=5;
world=6;
Notice all the addresses are zero based, they have not been linked.
two is similar if you look at it yourself.
The linker script tells the linker that we want .text the program, the machine code to live at 0x00000000 and .bss to be at 0x20000000. bss is global things that are not initialized like
unsigned int this:
.data which I dont deal with here are things like
unsigned int this=5;
global things that are initialized, .bss is assumed by programmers to be zero, but I cheated here and did not zero out the .bss memory space which you will see, instead I initialized the variables in the program rather than pre-initialized them and had to do different work.
reset:
mov sp,#0x20000000
orr sp,sp,#0x8000
bl one
hang: b hang
normally a bootstrap like above would need to deal with the stack as needed (certainly in the case of baremetal microcontroller code like this) as well as zero .bss and copy .data to ram. It takes more linker and compiler magic to put the initalized variables
unsigned int like_this=7;
in flash, as we need to remember that that variable boots with the value 7 and ram is volatile, doesnt survive a power outage. so to support .data you have to tell the linker it wants to live in 0x2000xxxx but put it in flash somewhere and I will copy it over. I didnt demonstrate that here.
from the so.list output of commands above, fully linked program.
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e3a0d202 mov sp, #536870912 ; 0x20000000
24: e38dd902 orr sp, sp, #32768 ; 0x8000
28: eb000000 bl 30 <one>
0000002c <hang>:
2c: eafffffe b 2c <hang>
00000030 <one>:
30: e3a03005 mov r3, #5
34: e3a02006 mov r2, #6
38: e92d4070 push {r4, r5, r6, lr}
3c: e59f402c ldr r4, [pc, #44] ; 70 <one+0x40>
40: e59f502c ldr r5, [pc, #44] ; 74 <one+0x44>
44: e1a00003 mov r0, r3
48: e5853000 str r3, [r5]
4c: e5842000 str r2, [r4]
50: eb000008 bl 78 <two>
54: e5943000 ldr r3, [r4]
58: e5952000 ldr r2, [r5]
5c: e0800003 add r0, r0, r3
60: e5840000 str r0, [r4]
64: e0800002 add r0, r0, r2
68: e8bd4070 pop {r4, r5, r6, lr}
6c: e12fff1e bx lr
70: 20000004 andcs r0, r0, r4
74: 20000000 andcs r0, r0, r0
00000078 <two>:
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
88: e2822001 add r2, r2, #1
8c: e2833002 add r3, r3, #2
90: e52de004 push {lr} ; (str lr, [sp, #-4]!)
94: e082e003 add lr, r2, r3
98: e08e0000 add r0, lr, r0
9c: e58c2000 str r2, [r12]
a0: e5813000 str r3, [r1]
a4: e49de004 pop {lr} ; (ldr lr, [sp], #4)
a8: e12fff1e bx lr
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
Disassembly of section .bss:
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
at address 0x00000000 the address that the first instruction executes after reset for this architecture is a branch to address 0x20 and then we do more stuff and call the one() function. main() is to some extent arbitrary and in this case I can make whatever function names I want I dont need main() specifically so didnt feel like using it after reset the bootstrap calls one() and one() calls two() and then both return back.
We can see that not only did the linker put all of my program in the 0x00000000 address space, it patched up the addresses to branch to the nested functions.
28: eb000000 bl 30 <one>
50: eb000008 bl 78 <two>
It also defined the addresses for hello and there in ram
20000000 <hello>:
20000000: 00000000 andeq r0, r0, r0
20000004 <world>:
20000004: 00000000 andeq r0, r0, r0
in the address space we asked for and patched up the functions so they could access these global variables
78: e59fc02c ldr r12, [pc, #44] ; ac <two+0x34>
7c: e59f102c ldr r1, [pc, #44] ; b0 <two+0x38>
80: e59c2000 ldr r2, [r12]
84: e5913000 ldr r3, [r1]
ac: 20000000 andcs r0, r0, r0
b0: 20000004 andcs r0, r0, r4
I used the disassembler, the word at 0xAC for example is not an andcs instruction it is the address 0x20000000 where we have the variable hello stored. This disassembler tries to disassemble everything, instructions or data so we know that is not instructions so just ignore the disassembly.
Now this elf file format is not the exact bytes you put in the flash when programming, some tools you use to program a flash might accept this file format and then extract from it the actual bytes that go in the flash, ignoring the rest of the file (or using it to find those bytes).
arm-none-eaby-objcopy so.elf -O binary so.bin
would create a file that represents just the data that would go in flash.
arm-none-eabi-objcopy so.elf -O binary so.bin
calvin so # hexdump so.bin
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
0000010 0005 ea00 0004 ea00 0003 ea00 0002 ea00
0000020 d202 e3a0 d902 e38d 0000 eb00 fffe eaff
0000030 3005 e3a0 2006 e3a0 4070 e92d 402c e59f
0000040 502c e59f 0003 e1a0 3000 e585 2000 e584
0000050 0008 eb00 3000 e594 2000 e595 0003 e080
0000060 0000 e584 0002 e080 4070 e8bd ff1e e12f
0000070 0004 2000 0000 2000 c02c e59f 102c e59f
0000080 2000 e59c 3000 e591 2001 e282 3002 e283
0000090 e004 e52d e003 e082 0000 e08e 2000 e58c
00000a0 3000 e581 e004 e49d ff1e e12f 0000 2000
00000b0 0004 2000
00000b4
this is dumping little endian halfwords (16 bit) but you can still see
that the machine code from above is in there and that is all that is
in there.
0000000 0006 ea00 0008 ea00 0007 ea00 0006 ea00
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
...
If/when you dump the flash back out you only have the machine code and maybe some .data depending on how you build your project. The microcontroller can as mentioned above execute this code directly from flash and that is the primary use case, and generally it is fast enough for the type of work microcontrollers are used for. Sometimes you can speed up the microcontroller, but the flash generally has a speed limit that might be slower and they might have to add wait states so that it doesnt push the flash too fast and cause corruption. And yes with some work you can copy some or all of your program to ram and run it there if you have enough resources (ram) and are that pushed for performance (and have exhausted other avenues like examining what the compiler is producing and if you can affect that with command line options or by adjusting or cleaning up your code).

Code executes on the microcontroller similar to any other microprocessor, though code if often organized separate from data (google "Harvard Architecture"). The program counter starts at the reset vector (see next answer) and advances every instruction, changing when branching instructions occur.
Typically your compiler will insert into your code a number of "vectors". These vectors usually include a "reset vector" that points at the place where your microcontroller expects the first instruction. It might be at memory location zero, or it might be elsewhere. From there, it operates on the code similar to any other computer. Every microprocessor and microcontroller expects code to start at a certain memory location upon reset, though it varies among different parts. For more information on vectors, [here's a handy reference(http://www.avrbeginners.net/architecture/int/int.html). Note the second sentence which talks about the reset vector and its address at 0x0000.
Microcontrollers are often coded in assembly language or C, so that programmers can control to the byte what code is running. Those exact processes are what will run.
This might vary from chip to chip, but with the chips I'm expert in, code is not copied to RAM to execute. Again, it's the Harvard architecture at work. Small microcontrollers might have as little as zero RAM and as much as a few Kbytes, but typically the instructions are read directly from flash. Proper programming in these environments means the heap is tiny, the stack is carefully controlled, and RAM is used very sparingly.
I recommend you pick a processor line -- I'm expert at the Atmel ATtiny and ATmega controllers -- and read their datasheets to understand in detail how they work. Atmel documentation is thorough and they also publish many application notes for specific applications, often with useful code examples. There are also internet forums dedicated to discussion and learning on the Atmel AVR line.

How code executes in the controller?
If you mean, "how does the code start executing", the answer is that once the MCU has determined that the supply voltage and clocks are ok, it will automatically start executing at the boot address. But, now we're getting into the gory details. I am mostly into MMU-less controllers such as ARM Cortex-M, 8051, PIC, AVR etc., so my answer might not apply fully to your questions.
The boot address is typically the first address in the flash for most small MCUs, but in some MCUs, the flash is expected to contain a vector at a specific location, which in turns points to the first start address. Other MCUs, such as ARM, allows the electronic designer to select if the MCU shall start executing from internal flash, external flash, system boot ROM (if such exists), enter some kind of bootloader mode etc., by setting certain pins high or low.
If we dump the code to the controller it will save it in the Flash memory. after reset how the code will fetch from the memory?
See the above answer.
what all the process will be execute in the controller?
I don't understand the question. Can you please rephrase it?
I came to know that at the run time code will be copied to RAM memory(?) and executes from the RAM. is this statement is correct?
This depends on the design of the firmware. If you really need to, you would copy the code from Flash to RAM and execute from RAM, but if the internal flash is large enough and you don't need to squeeze every clock of the MCU, you would simply execute from flash. It's so much easier. And safer, too, since it's harder for a bug to accidentally overwrite the code-space.
But, in case you need a lot of code, your MCU might not have enough flash to fit everything. In that case, you would need to store the code in an external flash. Depending on how price-sensitive you are, you will possibly choose an SPI-flash. Since it is impossible to execute from those flash:es, you must copy the code to RAM and execute from RAM.
if so when flash code move to RAM?
This would normally be implemented in a boot-loader, or very early in the main() function. If your RAM is smaller than the flash, you will need to implement some kind of page-swap algorithm, dynamically copying code from flash as you need it. This is basically similar to how any Linux-based MCU works, but you might need to carefully design the memory layout.
If code will copy from flash to RAM, then it will use the RAM space. then that much of RAM bytes is occupied, so Stack and heap need to be used after this memory?
Yes. You will certainly need to adjust the memory map, using compile-time switches to the linker and compiler.

Setup / Errors with Floating Point on TI AM3517 Cortex-A8

I'm getting an undefined instruction exception when executing:
0xED2D8B0E VPUSH {D8-D14}
(Note: The statement was generated by the compiler as part of C language function entry protocol.)
Initialization code:
;; Initialize VFP (if needed).
;; BL __iar_init_vfp HJ REMOVED AND REPLACED WITH BELOW
MRC p15, #0, r1, c1, c0, #2 ; r1 = Access Control Register
ORR r1, r1, #(0xf << 20) ; enable full access for p10,11
MCR p15, #0, r1, c1, c0, #2 ; Access Control Register = r1
MOV r1, #0
MCR p15, #0, r1, c7, c5, #4 ; flush prefetch buffer because of FMXR below
; and CP 10 & 11 were only just enabled
; Enable VFP itself
MOV r0,#0x40000000
FMXR FPEXC, r0 ; FPEXC = r0
I get the undefined exception when the target FPU is set up as VFPv3 or VFPV3 + NEON.
The initialization code is placed in the "cstartup.c" file, at the __iar_program_start and ?cstartup code, following this code snippet:
MRC p15,0,R1,C1,C0,0
LDR R0,=CP_DIS_MASK ;; 0xFFFFEFFA
AND R1,R1,R0
ORR R1,R1,#(1<<12)
MCR p15,0,R1,C1,C0,0
Registers (before VPUSH):
CPSR: 0x80000113
APSR: 0x80000000
SPSR: 0x000001D3
Tools:
IAR Embedded Workbench IDE & Compiler - 7.40
I-Jet debugging probe
Zoom AM3517 eval board
TI AM35X Cortex-A8 processor
Questions:
In the initialization code above, which statements are required for
NEON and which for VFP?
Are there any initialization instructions I'm missing for NEON and
VFP initialization?
Are there statements I need to place in the macro file for the debug
probe?

The code presented in the question correctly initializes the floating point processor on a Cortex-A8 processor.
The issue of getting undefined instruction exception (which led up to this question), was caused by the O.S. writing an invalid value to the FPEXC register, causing the Floating Point Processor to be disabled.

Why do I have to enable peripheral clocks one at a time?

In a minimal STM32 application I've written that writes characters to USART1, the USART doesn't seem to work when I try to enable all the clocks I need at once:
RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA
| RCC_APB2Periph_AFIO
| RCC_APB2Periph_USART1, ENABLE);
But when I enable the clocks one at a time, it works:
RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA, ENABLE);
RCC_APB2PeriphClockCmd(RCC_APB2Periph_AFIO, ENABLE);
RCC_APB2PeriphClockCmd(RCC_APB2Periph_USART1, ENABLE);
Why is this? Is there a specific order these clocks have to be enabled in? (If so, where is this documented?)
(I've left out all the code following this that initializes the GPIO pins, sets up the USART, and starts sending content, as it's the same in each application. If it's relevant, let me know and I'll include it.)
The device I'm using is the STM32F103VET6.
Since there's some interest in the assembly involved, here it is. For all three clocks at once:
00000000 <main>:
0: b590 push {r4, r7, lr}
2: b089 sub sp, #36 ; 0x24
4: af00 add r7, sp, #0
6: f244 0014 movw r0, #16389 ; 0x4005
a: 2101 movs r1, #1
c: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
And for one clock at a time:
00000000 <main>:
0: b590 push {r4, r7, lr}
2: b089 sub sp, #36 ; 0x24
4: af00 add r7, sp, #0
6: 2004 movs r0, #4
8: 2101 movs r1, #1
a: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
e: 2001 movs r0, #1
10: 2101 movs r1, #1
12: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
16: f44f 4080 mov.w r0, #16384 ; 0x4000
1a: 2101 movs r1, #1
1c: f7ff fffe bl 0 <RCC_APB2PeriphClockCmd>
...
And here's RCC_APB2PeriphClockCmd:
00000000 <RCC_APB2PeriphClockCmd>:
0: 4b04 ldr r3, [pc, #16] ; (14 <RCC_APB2PeriphClockCmd+0x14>)
2: 699a ldr r2, [r3, #24]
4: b109 cbz r1, a <RCC_APB2PeriphClockCmd+0xa>
6: 4310 orrs r0, r2
8: e001 b.n e <RCC_APB2PeriphClockCmd+0xe>
a: ea22 0000 bic.w r0, r2, r0
e: 6198 str r0, [r3, #24]
10: 4770 bx lr
12: bf00 nop
14: 40021000 .word 0x40021000
0x40021000 is the base address of the RCC peripheral; the #24 offset points to the RCC_APB2ENR register, which has a bit for each clock that's being enabled. (See page 109 of RM0008 for details.)

Well, I think I figured it out, and it turned out to not be a hardware problem at all... there were a number of problems with my toolchain configuration:
I was setting -nostdlib. This was causing some global initialization code to not be generated. I'm not sure how important that was, but other issues included:
I was not passing -mthumb and other CPU options to the linker. This was causing some of the generated startup code to be garbage.
My startup file didn't contain a call to __libc_init_array. This was causing some more initialization code to be dropped at link time.
I'm still not sure why splitting up the peripheral clock initializations managed to work around this. Perhaps the change in the amount of code was bumping something to just the right alignment? Anyways, solving the underlying issues seems to have patched things up so far (although I'm still kind of suspicious of some of the remaining startup code).

You might want to let us know exactly which device you're using and/or look at the errata for that device. For example, the errata for the STM32L100x6/8/B-A (and other) devices has the following (http://www.st.com/web/en/resource/technical/document/errata_sheet/DM00097022.pdf):
2.6.1 Delay after an RCC peripheral clock enabling
Description
A delay between an RCC peripheral clock enable and the effective
peripheral enabling should be taken into account in order to manage
the peripheral read/write to registers.
This delay depends on the peripheral's mapping:
If the peripheral is mapped on AHB: the delay should be equal to 2 AHB cycles.
If the peripheral is mapped on APB: the delay should be equal to 1 + (AHB/APB prescaler) cycles.
Workarounds
Use the DSB instruction to stall the Cortex-M CPU pipeline until the instruction is completed.
Insert "n" NOPs between the RCC enable bit write and the peripheral register writes (n = 2 for AHB peripherals, n = 1 + AHB/APB
prescaler in case of APB peripherals).
This doesn't really sound like your problem but it might be related (maybe the one-at-a-time enabling introduces a delay that turns out to be necessary).

enter low power mode within u-boot, wake up on interrupt

I try to implement a low power "deep sleep" functionality into uboot on button press. Button press is handled by linux and a magic code is set to make u-boot aware of the stay asleep do not reboot"
printf ("\nDisable interrupts to restore them later\n");
rupts = disable_interrupts();
printf ("\nEnable interrupts to enable magic wakeup later\n");
enable_interrupts();
printf ("\nSuspending. Press button to restart\n");
while(probe_button()/*gpio probe*/){
#if 1
//FIXME recheck if that one actually needs an unmasked interrupt or any is ok
__asm__ __volatile__(
"mcr p15, 0, %0, c7, c0, 4\n" /* read cp15 */
"mov %0, %0"
: "=r" (tmp)
:
: "memory"
);
#else
udelay (10000);
#endif
}
if (rupts) {
printf ("\nRe-Enabling interrupts\n");
enable_interrupts();
}
Unfortunatly the power dissipation does not change at all (got power dissipation measurment tied to the chip), no matter if hotspinning is used or not. Beyond that, if I use the Wait-For-Interrupt CP15 instruction, it never wakes up. The button is attached to one of the GPIOs. The plattform is Marvell Kirkwood ARM9EJ-S based.
I enabled some CONFIG_IRQ_* manually, and create implementation for arch_init_irq() aswell as do_irq(), I think there is my issue.
According to the CP15 instruction docs it should be just enough that a interrupt gets triggered (no matter if masked or not!).
Can anyone tell me what I am doing wrong or what needs to be done beyond the code above?
Thanks a lot in advance!

I'm not sure if it is the only reason your aproach isn't working on power saving but your inline assembly isn't correct. According to this article you need to execute:
MOV R0, #0
MCR p15, 0, r0, c7, c0, 4
but your inline assembly
__asm__ __volatile__(
"mcr p15, 0, %0, c7, c0, 4\n" /* read cp15 */
"mov %0, %0"
: "=r" (tmp)
:
: "memory"
);
produces
0: ee073f90 mcr 15, 0, r3, cr7, cr0, {4}
4: e1a03003 mov r3, r3
8: e12fff1e bx lr
I am not sure what's your intent but mov r3, r3 doesn'ẗ have any effect. So you are making coprocessor call with a random value. You also need to set r3 (ARM source register for mcr) before mcr call. Btw when you put 'memory' in clobber list it means
... will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.
Try this line,
asm("MOV R0, #0\n MCR p15, 0, r0, c7, c0, 4" : : : "r0");
it produces
c: e3a00000 mov r0, #0 ; 0x0
10: ee070f90 mcr 15, 0, r0, cr7, cr0, {4}
For power saving in general, I would recommend this article at ARM's web site.
Bonus section:
A small answer to your claim on backward compability of this coprocessor supplied WFI:
ARMv7 processors (including Cortex-A8, Cortex-A9, Cortex-R4 and Cortex-M3) all implement the WFI instruction to enter "wait for interrupt" mode. On these processors, the coprocessor write used on earlier processors will always execute as a NOP. It is therefore possible to write code that will work across ARMv6K, ARMv6T2 and all profiles of ARMv7 by executing both the MCR and WFI instruction, though on ARM11MPCore this will cause "wait for interrupt" mode to be entered twice. To write fully portable code that enters "wait for interrupt" mode, the CPUID register must be read at runtime to determine whether "wait for interrupt" is available and the instruction needed to enter it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas