I'm using a cortex m4 with freertos and i receive the following error
Err: -110595: Hardfault occurred!
I have no idea what to check.
The meaning is CRIT_ERR_HARD_FAULT but how do i trace it back?
It's always tricky and there is no common checklist what should be checked to give you 100% chance for quick success here.
Anyway checking several registers' values should give you enough information to proceed. To do it, you have to know what's the procedure of entering an exception - especially what's happening with core registers and the stack: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/Babefdjc.html
If you can use online debugging, put a breakpoint in hard fault handler and check the following:
stacked PC - as written in the article above, on exception entry ARM core automatically pushes registers r0,r1,r2,r3,sp,lr,pc and psr on stack. Look it up to check where the program was before execution
current LR - to verify if you came from Thread mode (normal program execution) or another interrupt compare current LR with the table in the article
ISR_NUMBER in IPSR being part of current PSR - to verify whether in fact hard fault exception occurred or your hard fault handler is used as a sink for all types of faults
CFSR and other fault related registers in SCB - it should give you more information what's exactly caused the problem. Since SCB is a peripheral block, it's not visible in most IDEs by default. Install peripheral plugin or simply access addresses via memory inspection window http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/Cihcfefj.html
If you're unable to do online debugging, you'll need a feature which will dump these registers somehow.
Good luck!
Are your HardFault handlers strongly defined? If you have the ability to connect a debugger and the fault is reproducible you can set a breakpoint in your fault handler and check the stack trace for possible areas of interest.
Alternatively, this guide provides a highly portable and useful method of diagnosing hard faults and gathering information post fault for the ARM processor.
Related
I am trying to manually set the ISER0 and STIR registrs to invoke interrupt number 3 which is RTC Wakeup Interrup for educational purposes. Here is my code:
I step through the register contents, but somehow the code was not able to write to the ISER0 and STIR registers, as is shown below where i am trying to clear the ISER0 register.
Can someone please explain me what am i doing wrong here?
ISER has one bit for every exception, STIR takes exception number - 16.
In the second image you write 0 to the ISER register. That has no effect as described from both ARM architecture reference manual and STM32 M4 Programming Manual.
I would put a volatile on both register pointers declaration, but in this case I don't think it's an issue.
The issue is solved. Somehow, the stlink debug with openOCD was causing an issue with semihosting enabled to which i have no explanation for now. When I changed the debugging to STLink with SWV viewer, it worked ok...
I'm not sure if I'm missing something here, but it doesn't appear to be possible to debug exceptions in Jetbrains Rider.
I have an incredibly simple piece of code that throws an exception (invalid file name) and there is no way I can find to
a) stop on the exception line in my code raising the exception, and
b) view the value of any variables in my source code that may have contributed to the exception.
I've recorded a sample video here that shows the debug attempt, and why it seems illogically impossible.
Has anyone found a way of debugging this stuff? Is Rider actually broken?
Sample video showing (attempted) debug session
For anyone experiencing the same situation, enable "Any Exception" and disable "Only break on exceptions thrown from user code" in Breakpoint Options.
You can also (as #mu88 mentions) disable debugging of external sources, but that simply reduces the clutter in stack frames.
How do you debug an RTOS application? I am using KEIL µVision and when I hit debug, the program steps through the main function until the function that initializes the RTOS kernel and then you can't step any further. The code itself works though. It is not mine btw, but I have to work on it. Is this normal behavior with RTOS applications or is this related to the program?
Yes, this is normal. You need to set breakpoints in the source code for the tasks that were created in main(): the only purpose of main() in a FreeRTOS application is to :
initialize the hardware,
create the resources (timers, semaphores...) and tasks your application will need,
start the scheduler
The application should never return from vTaskStartScheduler() if they were enough resources available.
Put break-points at the entry point of each task you need to debug. When you step the over the scheduler start (or simply run) the debugger will halts at the first task that runs. When that task blocks, some other task will be selected to run according to the scheduling rules.
Generally when debugging and you reach a blocking call, step-over it, other tasks may run and the debugger will stop at the next line only when the task becomes ready (depending on the nature of the blocking call). Often you will want to predict what task will run as a result of the call, and put a breakpoint in that task. For example if you issue a message send, you might place a breakpoint after the message receive call of the receiving task.
The point is you cannot "step-through" a context switch unless you have the RTOS source or do it at the assembler level, which is seldom useful or productive, and will not work for preemption.
You get a somewhat better RTOS debug experience and tool support in Keil if you use Keil's own RTX5 RTOS rather then FreeRTOS, but all of the above remains true.
Yes, this is an expected behaviour. The best way to debug a RTOS application is to place breakpoints at all tasks, key function entry points and step debug.
The debugger supports various methods of single-stepping through an application as in below link.
http://www.keil.com/products/uvision/db_exe_step.asp
Typical challenges in debugging RTOS application can be dealing with interrupt handling, synchronization issues and register/memory corruption.
Keil µVision's System Analyzer enables one to view the program execution time frame, status of each thread. It shall also help in viewing interrupts, exceptions if tracer is enabled.
Sorry - this is long! I'm sure I'll get some TL;DRs. :/
I'm not at all new to the world of Cortex M3/4; I have encountered plenty of hard fault errors in the past and, without exception, they have been due to stack overflow on FreeRTOS. However, in this case, I'm really struggling to track down a hard fault on a system that has someone else's old code that I have slightly modified.
I have a system with an LCD and touch screen. We have new hardware, which is almost identical to the old hardware other than it changing from an LPC1788 to a drop-in equivalent LPC4088 and the touch screen being I2C rather than SPI.
I'm using Keil uvision (which is new to me) with an NXP4088 which is an M4 core and Keil RL-ARM RTOS (also new to me) which is using C/C++ hybrid, the C++ also not something I have much experience with. On top of this, there is Segger emWin (which I've never used) closed source code where it always seems to be crashing. It will render a few screens, read the touch screen buttons etc and then fall over. Sometimes it falls over immediately though.
I have followed this:
http://www.keil.com/appnotes/files/apnt209.pdf
I have attached a picture of the debugger/IDE when it crashes below (click to enlarge).
When it crashes, the highlighted green task in the OS is, without exception, ApplicationTask (which I have not modified).
If I am reading the info correctly the Keil uvision debugger tells me that the stack being used was the MSP stack which is at address 0x20003238. There's a memory dump below:
If I understand correctly, this means that R0, 2, 3 and 12 are 0, the program counter is at 0 as are LR and PSR. However, this goes against what's in the Call Stack + Locals window in the first picture. If I right click on the 0x00005A50 underneath ApplicationTask:4 and choose caller code, it itells me it is
BL.W GUI_ALLOC_UnlockH
Which is in the emWin binary blob I think.
However, if I look at 0x20001B60 (which is the PSP stack value) as below:
That seems to tally up much better with what the Call Stack + Local Window tells me. It also seems to tell me that it's crashing in emWin and extensive Googling shows that Segger always completely wash their hands of any possibility their closed source code could be at fault. To be fair, it's unlikely as it's been working OK until I modified the code to use an I2C touch screen interface rather than SPI. However, where it's crashing (or seems to be) is nothing to do with the code I have modified.
Also, this window below:
Gives the BFAR address as 0xF00B4DDA and the memory manager fault address as 0xF00B4DDA. I don't know whether I should be interpreting this as to being the issue.
I have found a few other posts around the web, including one staggeringly similar one to this here on Stack Overflow (but all have no solution associated with them) where people have the same issue.
So, my questions are:
Am I reading this data correctly and understanding the Keil document I linked to? I really feel I must be missing something with this MSP/PSP issue.
Am I using the caller code function with uvision correctly? The bit where I right click on Call Stack + Locals' address below ApplicationTask:4 and it always seems to take me to some Segger code I can't examine and surely isn't what's at fault.
Should I really be reading the issue as a bus fault address with it trying to read from or write to 0xF00B4DDA which is reserved space?
I tried implementing a piece of code such as this:
https://blog.frankvh.com/2011/12/07/cortex-m3-m4-hard-fault-handler/
But that just stops the whole system running properly and ends up in at BKPT instruction in some init code. On top of this, I am not convinced this kind of thing would tell me any more than uvision does, other than it showing me slightly faster and with zero effort. Am I right in this latter assumption?
I am evaluating different multiprocessing libraries for a fault tolerant application. I basically need any process to be allowed to crash without stopping the whole application.
I can do it using the fork() system call. The limit here is that the process can be created on the same machine, only.
Can I do the same with MPI? If a process created with MPI crashes, can the parent process keep running and eventually create a new process?
Is there any alternative (possibly multiplatform and open source) library to get the same result?
As reported here, MPI 4.0 will have support for fault tolerance.
If you want collectives, you're going to have to wait for MPI-3.something (as High Performance Mark and Hristo Illev suggest)
If you can live with point-to-point, and you are a patient person willing to raise a bunch of bug reports against your MPI implementation, you can try the following:
disable the default MPI error handler
carefully check every single return code from your MPI programs
keep track in your application which ranks are up and which are down. Oh, and when they go down they can never get back. but you're unable to use collectives anyway (see my opening statement), so that's not a huge deal, right?
Here's an old paper (back when Bill still worked at Argonne. I think it's from 2003):
http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf . It lays out the kinds of fault tolerant things one can do in MPI. Perhaps such a "constrained MPI" might still work for your needs.
If you're willing to go for something research quality, there's two implementations of a potential fault tolerance chapter for a future version of MPI (MPI-4?). The proposal is called User Level Failure Mitigation. There's an experimental version in MPICH 3.2a2 and a branch of Open MPI that also provides the interfaces. Both are far from production quality, but you're welcome to try them out. Just know that since this isn't in the MPI Standard, the function prefixes are not MPI_*. For MPICH, they're MPIX_*, for the Open MPI branch, they're OMPI_* (though I believe they'll be changing theirs to be MPIX_* soon as well.
As Rob Latham mentioned, there will be lots of work you'll need to do within your app to handle failures, though you don't necessarily have to check all of your return codes. You can/should use MPI error handlers as a callback function to simplify things. There's information/examples in the spec available along with the Open MPI branch.