How to debug a freeRTOS application? - embedded

How do you debug an RTOS application? I am using KEIL µVision and when I hit debug, the program steps through the main function until the function that initializes the RTOS kernel and then you can't step any further. The code itself works though. It is not mine btw, but I have to work on it. Is this normal behavior with RTOS applications or is this related to the program?

Yes, this is normal. You need to set breakpoints in the source code for the tasks that were created in main(): the only purpose of main() in a FreeRTOS application is to :
initialize the hardware,
create the resources (timers, semaphores...) and tasks your application will need,
start the scheduler
The application should never return from vTaskStartScheduler() if they were enough resources available.

Put break-points at the entry point of each task you need to debug. When you step the over the scheduler start (or simply run) the debugger will halts at the first task that runs. When that task blocks, some other task will be selected to run according to the scheduling rules.
Generally when debugging and you reach a blocking call, step-over it, other tasks may run and the debugger will stop at the next line only when the task becomes ready (depending on the nature of the blocking call). Often you will want to predict what task will run as a result of the call, and put a breakpoint in that task. For example if you issue a message send, you might place a breakpoint after the message receive call of the receiving task.
The point is you cannot "step-through" a context switch unless you have the RTOS source or do it at the assembler level, which is seldom useful or productive, and will not work for preemption.
You get a somewhat better RTOS debug experience and tool support in Keil if you use Keil's own RTX5 RTOS rather then FreeRTOS, but all of the above remains true.

Yes, this is an expected behaviour. The best way to debug a RTOS application is to place breakpoints at all tasks, key function entry points and step debug.
The debugger supports various methods of single-stepping through an application as in below link.
http://www.keil.com/products/uvision/db_exe_step.asp
Typical challenges in debugging RTOS application can be dealing with interrupt handling, synchronization issues and register/memory corruption.
Keil µVision's System Analyzer enables one to view the program execution time frame, status of each thread. It shall also help in viewing interrupts, exceptions if tracer is enabled.

Related

C++ | Adding workload to a existing thread from a injected DLL

in my project i injected a DLL(64-bit Windows 10) in to a external process with Manual-map & Thread-hijacking and i do some stuff in there.
In current state i use "RtlCreateUserThread" to create a new thread and do some extra workload in there to distribute it for better performance.
My question is now... Is it possible to access other threads from the current process (hijack it) and add your own workload/code there. Without creating a new thread?
I didn't found anything helpful yet in the internet and the code i used and modified for Thread-hijacking seems to only work for a DLL file. Because i am pretty new to C++ i am still learning i am already thankful for any help.
(If you want to see the source for injector Google GHInjector your find the library on github.)
It is possible, but so complicated and may not work in all cases.
You need to splice existing thread's machine codes, so you will need write access to code page memory.
Logic:
find thread id and thread handle, then suspend thread with SuspendThread WINAPI call
suspended thread can be in wait state or in system DLL call now, so you need to analyze current execution stack, backtrace it and find execution address from application space. You need API functions StackWalk, and PDB files in some cases. Also it depends on running architecture (x86, amd64, ...). Walk through stack until your EIP/RIP will not be in application memory address space
decode machine instruction (it will be 'call') and splice next instructions to your function call. You need to use __declspec(naked) declared function or ASM implemented one for execute your code and replaced instructions.
ResumeThread
This method may work only once because no guarantees that application code is executed in loop.

What happens if an MPI process crashes?

I am evaluating different multiprocessing libraries for a fault tolerant application. I basically need any process to be allowed to crash without stopping the whole application.
I can do it using the fork() system call. The limit here is that the process can be created on the same machine, only.
Can I do the same with MPI? If a process created with MPI crashes, can the parent process keep running and eventually create a new process?
Is there any alternative (possibly multiplatform and open source) library to get the same result?
As reported here, MPI 4.0 will have support for fault tolerance.
If you want collectives, you're going to have to wait for MPI-3.something (as High Performance Mark and Hristo Illev suggest)
If you can live with point-to-point, and you are a patient person willing to raise a bunch of bug reports against your MPI implementation, you can try the following:
disable the default MPI error handler
carefully check every single return code from your MPI programs
keep track in your application which ranks are up and which are down. Oh, and when they go down they can never get back. but you're unable to use collectives anyway (see my opening statement), so that's not a huge deal, right?
Here's an old paper (back when Bill still worked at Argonne. I think it's from 2003):
http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf . It lays out the kinds of fault tolerant things one can do in MPI. Perhaps such a "constrained MPI" might still work for your needs.
If you're willing to go for something research quality, there's two implementations of a potential fault tolerance chapter for a future version of MPI (MPI-4?). The proposal is called User Level Failure Mitigation. There's an experimental version in MPICH 3.2a2 and a branch of Open MPI that also provides the interfaces. Both are far from production quality, but you're welcome to try them out. Just know that since this isn't in the MPI Standard, the function prefixes are not MPI_*. For MPICH, they're MPIX_*, for the Open MPI branch, they're OMPI_* (though I believe they'll be changing theirs to be MPIX_* soon as well.
As Rob Latham mentioned, there will be lots of work you'll need to do within your app to handle failures, though you don't necessarily have to check all of your return codes. You can/should use MPI error handlers as a callback function to simplify things. There's information/examples in the spec available along with the Open MPI branch.

Is there a way to ensure that a kernel module runs in a specific process context?

Basically, how can I make sure that in my module, a specific process is current. I've looked at kick_process, but I'm not sure how to have my module execute in the context of that process once kicking it into kernel mode.
I found this related question, but it has no replies. I believe an answer to my question could help that asker as well.
Note: I am aware that if I want the task_struct of a process, I can look it up. I'm interested in running in a specific context since I want to call functions that reference current.
Best way i have found to do anything in the context of a particular process in the kernel, is to sleep in process context(wait_* family of functions) and wake up that thread and do whatever needs to be done in that context. This would ofcourse mean you would have to have the application call into the kernel via IOCTL or something and sleep on that thread and wake it up whenever you need to do something. This seems to be a very widely used and popular mechanism.

What is the 'COM context' referred to in the 'ContextSwitchDeadlock' MDA message?

While running unit tests, I'm getting the MDA shown below.
In the error message, what is the hexadecimal value refered to as a 'COM context'?
Can I determine this value for a given STA thread? If so, how?
Managed Debugging Assistant
'ContextSwitchDeadlock' has detected a
problem in 'C:\Program Files\Microsoft
Visual Studio
9.0\Common7\IDE\vstesthost.exe'. Additional Information: The CLR has
been unable to transition from COM
context 0x14cff0 to COM context
0x14d218 for 60 seconds. The thread
that owns the destination
context/apartment is most likely
either doing a non pumping wait or
processing a very long running
operation without pumping Windows
messages. This situation generally has
a negative performance impact and may
even lead to the application becoming
non responsive or memory usage
accumulating continually over time. To
avoid this problem, all single
threaded apartment (STA) threads
should use pumping wait primitives
(such as CoWaitForMultipleHandles) and
routinely pump messages during long
running operations.
From what I can see (looking at mscorwks disassembly) it is a IObjContext*, returned from CoGetContextToken().
Basically it looks like a call is queued up using IContextCallback::ContextCallback() from mscorwks!CtxEntry::EnterContextOle32BugAware(), which in turn calls mscorwks!CtxEntry::EnterContextCallback() once the object context (apartment) processes the message. They use a CLREvent to signal callback completion. For STA threads, not pumping messages would cause the event wait to timeout, which triggers the ContextSwitchDeadlock MDA.
Note: I'm not running this under a debugger so I can't confirm the behavior, but this is probably reasonably accurate.
I haven't seen that before, I suspect it's just an internal pointer. Neither thread IDs nor thread handles are usually that large.
There's been no way of getting the apartment type from the current thread, and I've never seen an apartment ID (other than a GUID representing the source/target apartment when marshalling) in native code.
The unit test is most likely running in MTA mode and you have code in it that displays a UI.
One COM context is visual studio, the other is the UI inside your unit test. You can either not display a UI or turn off the MDA.
It looks like the STA COM application does not spin the message loop. This would cause STA COM to just die. I got no idea what the COM context is but it strikes me that you should be able to reproduce your target app to just hang for long periods of time.
Sounds like there is a function that takes more than 60 seconds to run. Have you been able to isolate it?
EDIT This is the guy that wrote COM interop for .NET.
http://blogs.msdn.com/cbrumme/
Have a look at his blog or check out the boards that he frequents. He's written lots of stuff about why COM interop the way it is. It might help to ask this question on one of the Microsoft boards.

How would I go about taking a snapshot of a process to preserve its state for future investigation? Is this possible?

Whether this is possible I don't know, but it would mighty useful!
I have a process that fails periodically (running in Windows 2000). I then have just one chance to react to it before having to restart it and painfully wait for it to fail again. I didn't write the process so don't have the source to debug. The failure is seemingly random.
With a snapshot of the process I could repeatedly and quickly test reactions to the failure.
I had thought of running inside a VM but this isn't possible in this instance.
EDIT:
#Jon Cage asked:
When you say a snapshot, you mean capturing a process when it's about to fail (including memory, program state etc. etc.) ...and then replaying it's final few seconds repeatedly to see what effect it has on some other component?
This is exactly what I mean!
I think minidump is what you are looking for.
You can also used Userdump:
The User Mode Process Dumper
(userdump) dumps any running Win32
processes memory image (including
system processes such as csrss.exe,
winlogon.exe, services.exe, etc) on
the fly, without attaching a debugger,
or terminating target processes.
Generated dump file can be analyzed or
debugged by using the standard
debugging tools.
This article shows you how to use it.
My best bet is to start the process in a debugger (OllyDbg being my preferred tool).
The process will pause on an exception, and you can try to figure out what happened shortly before that.
This needs some understanding of assembler and does not allow to create a snapshot of the process for later analysis. You would need to write your own debugger for that - it should be theoretically possible.