Should the system clock be set before copying the sections from rom to ram?

Should the system clock be set before copying the sections from rom to ram? - clock

I am having doubts in when should the system clock be setup in the microcontroller. My question is should the system clock be setup properly before invoking the functions to copy .data & .bss sections from ROM to RAM?
My rationale for setting up the system clock prior to the copying process is to speed up the execution time in copying. Do I make sense? Without the setup clock, I feel that that copying process will be slower due to a lower clock speed. Can anyone corroborate this?
If the clock was not setup, it would be running on some default value(lower than the intended value for operation). The reason for setting up the clock is to offer a quick startup timing before the actual applications run.
Generally, should one copy the required sections from rom to ram before the clock is setup properly(i.e the application should take care in setting up the clock)?Please clarify and point to relevant resources.
Thank You.

Related

Can't erase the first 256KB of Nor Flash(M29W256GL) using BDI2000

I'm porting u-boot-2013.10 to MPC8306 based board. Previously, I can erase the first several sectors of Nor Flash using BDI2000. But after sometime, when the porting task is nearly done, (I mean that I can use gdb to trace the code execution and find the u-boot code runs into command line mainloop, though there are no serial output at the time, due to error configure of Serial Port)the first 256KB of Nor Flash can't be erased even if after power off reset. Other sectors can be erased normally.
The Nor Flash is Micron M29W256GL, with block size 128KB. I'm sure the WP# Pin has been pulled high, so there is no hardware protection upon the first block.
When config the jumper on board to change the PowerPC Config Word in order not to let MPC8306 fetch boot code at power up, the problem remains.
I used to run u-boot-1.1.6 on this board, I have erased this version of u-boot so many times without the problem mentioned above. I guess u-boot-2013.10 made some new approach to flash manipulation or others, for example, non-volatile protection on first 256KB of flash.
Is there someone can help me to solve the problem? I would very much appreciate your help.

u-boot : Relocation

This one is a basic question related to u-boot.
Why does the u-boot code relocate itself ?
Ok, it makes sense if u-boot is executing from NOR-flash or boot ROM space but if it runs from SDRAM already why does it have to relocate itself once again ?

This question comes up frequently. Good answers sometimes too.
I agree it is handy to load the build to SDRAM during development. That works for me, I do it all the time. I have some special boot code in flash which does not enable MMU/cache. For my u-boot builds I switch CONFIG_SYS_TEXT_BASE between flash and ram builds. I run my development builds that way routinely.
As a practical matter, handling re-initialization of MMU/cache would be a nontrivial matter. And U-Boot benefits IMO from simplicity, as result of leaving out things like that.
The tech lead at Denx has expressed his opinion. IIRC his other posts are more strongly worded than that one. I get the impression that he does not like to repeat himself.
update: why relocate. Memory access is faster from RAM than from ROM, this will matter particularly if target has no instruction cache. Executing from RAM allows flash reprogramming; also (more minor) it allows software breakpoints with "trap" instructions; also it is more like the target's normal mode of operation, so if e.g. burst reads from RAM are iffy the failure will be seen at early boot.

U-boot has to reserve 3 regions in memory that stores: 1) u-boot itself, 2) uImage (compressed kernel), and 3) uncompressed kernel. These 3 regions must be carefully placed in u-boot to prevent conflict.
However, the previous stage boot-loader, (BL2 or BL1) that brings u-boot into DRAM memory don\t know u-boot's planing on these 3 regions. So it can only loads u-boot onto a lower address in DRAM memory and jump to it. Then, after u-boot execute some basic initialization and detect current PC is not in planed location, u-boot call relocate function that move u-boot to the planned location and jump to it.

The code of NOR flash must initialize the SDRAM, Then the copy code from Nor Flash to SDRAM, The process will copy itself, because you could enable MMU, we will start Virtual address mapping.

Bootloader Working

I am working on Uboot bootloader. I have some basic question about the functionality of Bootloader and the application it is going to handle:
Q1: As per my knowledge, bootloader is used to download the application into memory. Over internet I also found that bootloader copies the application to RAM and then the application runs from RAM. I am confused with the working of Bootloader...When application is provided to bootloader through serial or TFTP, What happens next, whether Bootloader copies it to RAM first or whether it writes directly to Flash.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
Q3: What is the meaning of statement "My application is running from RAM/FLASH"? Is it mean that our application's .text segment or .code segment is in RAM/FLASH? And we are not concerned about .bss section because it is designed to be in RAM.
Thanks
Phogat

When any hardware system is designed, the designer must consider where the executable code will be located. The answer depends on the microcontroller, the included memory types, and the system requirements. So the answer varies from system to system. Some systems execute code located in RAM. Other systems execute code located in flash. You didn't tell us enough about your system to know what it is designed to do.
A system might be designed to execute code from RAM because RAM access times are faster than flash so code can execute faster. A system might be designed to execute code from flash because flash is plentiful and RAM may not be. A system might be designed to execute code from flash so that it boots more quickly. These are just some examples and there are other considerations as well.
RAM is volatile so it does not retain code through a power cycle. If the system executes code located in RAM then a bootloader is required to obtain and write the code to RAM at powerup. Flash is non-volatile so execution can start right away at powerup and a bootloader is not necessary (but may still be useful).
Regarding Q3, the answer is yes. If the system is running from RAM then the .text will be located in RAM (but not until after the bootloader has copied it to there). If the system is running from flash then the .text section will be located in flash. The .bss section is variables and will be in RAM regardless of where the .text section is.

Yes, in general a bootloader boots the system, but it might also provide a mechanism for interrupting the default boot path and allow alternate firmware to be downloaded and run instead, as well as other features (like flashing).
Traditional rom had a traditional ram like interface, address, data, chip select, read/write, etc. And you can still buy rom that way, but it is cheaper from a pin real estate perspective to use something spi or i2c based, which is slower. Not desireable to run from, but tolerable to read once then run from ram. newer flash technologies can/have had problems with read-disturb, where if your code is in a tight loop reading the same instructions or for any other reason the flash is being read too fast, the charge can drop such that a read returns the wrong data, potentially causing the program to change course or crash. Also your PC and other linux platforms are used to copying the kernel from NV storage (hard disk) to ram and then running from there so the copy from flash to ram and run from ram has a comfort level, and is often faster than flash. So there are many potential reasons to not use flash, but depending on the system it may be possible to run from flash just fine (some systems the flash in question is not accessible directly and not executable, of course SOME rom in that system needed to be executable/bootable).
It simplifies the coding challenges if you program the flash with something that is in ram. You can create and debug the code one time that reads from ram and writes to flash and reads from flash and writes to ram. DONE. Now you can work on separate code that receives data from serial to ram, or from ram to serial. DONE. Then work on code that does the same over ethernet or usb or whatever DONE. You dont have to deal with inventing a protocol or solving the problem of timing. Flash writing is very slow, and even xmodem at a moderate speed can be way too fast, so you have to buffer that data in ram anyway, might as well make the tasks completely separate, instead of an xmodem or any other serial based flash loader with a big ram based fifo, just move the data to ram, then separately go from ram to flash. Same for other interfaces. It is technically possible to buffer the data and give the illusion of going from the download interface straight to flash, and depending on the protocol it is technically possible to hold off the sender so that as little as one flash page is required in ram before programming flash. With the older parallel flashes you could do something pretty cool which I dont think most people figured out. When you stop writing to the flash page for some known period of time the flash would automatically start to program that page and you have to wait for 10ms or something like that before it is done. What folks assumed was you had to program sequential addresses and had to get the new data for the next address in that period of time and would demand high serial port speeds, etc, the reality is you can pound the same address over and over again with the same data and the flash wont start to program the page, and the download interface can be infinitely slow. Serial flashes work differently and either dont need tricks or have different tricks.
RAM/FLASH is not some industry term. It likely means that .text is in rom (flash) and .data and .bss are in ram. A copy of the initial state of .data will probably be on flash as well and copied to ram before main() is called, likewise .bss will be zeroed before main() is called. look at crt0.S for most platforms in gnu sources (glibc, or is it gcc, I dont know) to get the gist of how the bootstrap works in a generic fashion.
A bootloader is not required to run linux or other operating systems, you dont NEED uboot, but it is quite useful. Linux is pretty easy, you copy the kernel and root file system, either set some registers or some tags in memory or both then branch to the entry point in the kernel and linux takes over from there. Because linux is so complicated it is desireable to have a complicated bootloader that can capitalize on high speed interfaces like ethernet (rather than being limited to serial or slower).

I would add something regarding your question Q2.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
It is not only about having SPI or similar serial external code memory (which is not that often anyway).
Even the external ROM/FLASH/EPROM/ connected to the usual high speed parallel bus will will prevent a system from running on a maximum clock (with zero wait state) even on the relatively slow MCUs due to the external memory access time. You would need 10 ns FLASH access time for the 100 MHz clock, which is not so easy to get (if economically possible at all). And you would agree that 100 MHz is not such a brain spinning speed any more :-)
That is why many MCU/CPU architectures are doing tricks with reading multiply instructions at once, or having internal cash, or doing whatever was needed to compensate for a slow external code memory. Only most older 8-bit architectures can execute the code directly from the flash memory ('in place').
Even if your only code memory was the internal Flash, something need to be done to speed it up. Take a look for example at this article:
http://www.iqmagazineonline.com/magazine/pdf/v_3_2_pdf/pg14-15-18-19-9Q6Phillips-Z.pdf
It desribes how the ARM7 has incorporated something they called MAM (Memory Accelerator Module). It is a good read, and you will find some measures there to speed up the code memory access for the specific ARM7 arhitecture (goes for most others):
Limit maximum clock frequency (from 80 MHz to about 20 MHz for the example in the article)
Insert wait-cycles during flash accesses
Use an instruction cache
Copy the program code from flash to RAM
Obviously, if the instruction cache was not an option (too small, or the clock too high) you are really left only with execution from the RAM, after relocating the code there at the start up.
There is an option also to run only specific section of code from the RAM, which could be specified to the linker. For the DSP (Digital System Processing) systems, there was really no option to run from the EPROM/FLASH even in the old days with clock around only few tens of MHz, let alone now.
Another issue is debugging, the options for debugging the code placed in ROM, or even Flash, are very limited (you have to move section of the code to RAM to be able to set a break point on most systems).

Regarding Q2, one of the difficulties you may face executing from Flash is another code update. If you are executing from the same block of Flash you are trying to update, the system will crash. This depends on your system architecture (how your application and bootloader are organized in Flash) but may be particularly hard to avoid if you are trying to update the bootloader itself.

Inter Processor Interrupt usage

An educational principle is: There is not such a thing as a stupid question. The basic idea behind this is that people learn by asking.
I was asked to: "Can you show and explain at a programming level what bad will happen if every task could execute all instructions."
I did give the code
main(){
_asm_("cli;");
while(1);
}
and explained it (the system frozen for good- UP)
Then I was asked: "Is it possible give an example so that system do not freeze even this clearing interrupts is done?"
I did modify the previous example:
I did give the code
main(){
_asm_("cli;");
i=i/0;
while(1);
}
and explained it.
Trivially: If we have demand paging i=i/0 causes first a page fault (the data page not present) and an other task can be scheduled to run interrupts enabled during the disk read and later on divide by zero will throw this task away for good.
But the answers were based on UP. What about SMP? I must tell that answers are incomplete.
It still easy enough to construct:
int i;
main(){
for(i=0;i<100;i++)// Suppose we have less than 100 CPUs
if(fork())
{ sleep(5);//The generating task has (most probable) time to do all forks
_asm_("cli;");
while(1);
}
}
which will disable interrupts for all CPUs, because every CPU gets a poisonous task to run.
Even so far a stupid question did reveal many things good to learn to a beginner: privileged instructions, paging, fault handling, scheduling during DMA, fork.....
But a minor doubt remains (shame on me) about the first program running on a SMP.
Will one CPU be out permanently or not?
Other CPUs continue and can send re_schedule() IPI message.
What happens then?
It can be easy to speculate that the frozen CPU do not wake up, because interrupts are disabled.
But to be perfectly sure must know more.
My question was:
Is the Inter Processor Interrupt (IPI) maskable or non-maskable?
I mean in the most common "popular" implementations?
Excuse my stupid question. It can't be very difficult to find an answer. I will seek it.
I mean interrupt pin number (telling maskable, I guess).
My own answer - correct?
I studied the issue, because nobody else did like it, coming to following thoughts:
With important real-time applications we have had for a long time a watchdog timer (HW interrupting cpu to answer somehow "I am alive").
For example we have main control computer and standby computer taking care of the system if the main computer is down.
What about Linux?
What kind watchdog- have we one?
We can compile the Linux kernel with or without watchdog.
What the Linux watchdog does?
On many(!) x86/x86-64 type hardware there is a feature that enables us to generate 'watchdog NMI interrupts'.
It's even possible to disable the NMI watchdog in run-time by writing "0" to /proc/sys/kernel/nmi_watchdog.
If any CPU in the system does not execute the period local timer interrupt for more than 5 seconds, APIC tries to fix the situation by a non-maskable interrupt (cpu executes the handler, and kills the process)!
(SCC Linux is an different case as to NMI.)
My answers (in the original question) were based on the system without watchdog!
It is problematic to answer at a general level and give examples based on some fixed system. The answers can be correct or not depending the cpu and configuration and settings.
Anyway, talking about NMI did make some sense? Did it?

If the CPU didn't restrict access to some instructions, it would be too easy to accidentally or deliberately cause a catastrophe.
push $0
push $0
lidt (%esp)
int $42
This code sequence will reset an x86 processor. Here's why:
The code loads the IDTR register with an interrupt descriptor table (IDT) at linear address 0, with a size of one byte.
Raises interrupt 42, which can't work because it is beyond the 1-byte limit of the IDT.
The CPU tries to raise a general protection fault, interrupt 13. This fails too, because interrupt 13 is beyond the one byte limit.
The CPU tries to raise a double fault exception, interrupt 8. This fails too, interrupt 8 is beyond the limit of the IDT.
This is known as a triple-fault. The CPU does a shutdown bus cycle to tell the motherboard that it is now ignoring everything and stopping execution. The motherboard asserts reset, rebooting the machine.
This is actually negligible compared to what code could do. A code sequence could easily hijack the machine altogether and start destroying all of the data on the hard drive, it could send all of your files to a malicious server on the internet, it could change your password, enable remote access, connect out to a malicious server and grant an attacker unlimited shell access. There's no limit on what a program could do.
Processors have privileged instructions for two reasons, the primary purpose is to protect the operating system from buggy programs that might accidentally do something to bring down or hijack the whole machine. The secondary purpose is to restrict deliberately malicious programs from doing the same.

Which takes longer time? Switching between the user & kernel modes or switching between two processes?

Which takes longer time?
Switching between the user & kernel modes (or) switching between two processes?
Please explain the reason too.
EDIT : I do know that whenever there is a context switch, it takes some time for the dispatcher to save the status of the previous process in its PCB, and then reload the next process from its corresponding PCB. And for switching between the user and the kernel modes, I know that the mode bit has to be changed. Isn't it all, or is there more to it?

Switching between processes (given you actually switch, not run them in parallel) by an order of oh-my-god.
Trapping from userspace to kernelspace used to be done with a processor interrupt earlier. Around 2005 (don't remember the kernel version), and after a discussion on the mailing list where someone found that trapping was slower (in absolute measures!) on a high-end xeon processor than on an earlier Pentium II or III (again, my memory), they implemented it with a new cpu instruction sysenter (which had actually existed since Pentium Pro I think). This is done in the Virtual Dynamic Shared Object (vdso) page in each process (cat /proc/pid/maps to find it) IIRC.
So, nowadays, a kernel trap is basically just a couple of cpu instructions, hence rather few cycles, compared to tenths or hundreds of thousands when using an interrupt (which is really slow on modern CPU's).
A context switch between processes is heavy. It means storing all processor state (registers, etc) to RAM (at a magic memory location in the user process space actually, guess where!), in practice dirtying all cached memory in the cpu, and reading back the process state for the new process. It will (likely) have nothing still in the cpu cache from last time it ran, so each memory read will be a cache miss, and needed to be read from RAM. This is rather slow. When I was at the university, I "invented" (well, I did come up with the idea, knowing that there is plenty of dye in a CPU, but not enough cool if it's constantly powered) a cache that was infinite size although unpowered when unused (only used on context switches i.e.) in the CPU, and implemented this in Simics. Implemented support for this magic cache I called CARD (Context-switch Active, Run-time Drowsy) in Linux, and benchmarked rather heavily. I found that it could speed-up a Linux machine with lots of heavy processes sharing the same core with about 5%. This was at relatively short (low-latency) process time slices, though.
Anyway. A context switch is still pretty heavy, while a kernel trap is basically free.
Answer to at which memory location in user-space, for each process:
At address zero. Yep, the null pointer! You can't read from this entire page from user-space anyway :) This was back in 2005, but it's probably the same now unless the CPU state information has grown larger than a page size, in which case they might have changed the implementation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas