Is there a way to force a Colab to halt after the current cell is done? - google-colaboratory

After I've used the Run All button or similar to start multiple cells running in sequence, is it possible to force the runtime to halt after the current cell? Hitting Runtime -> Interrupt Execution only halts the runtime if I manage to hit Interrupt Execution while it's working; if the runtime environment is between cells (which it is the vast majority of the time since my individual cells run very quickly), hitting Interrupt Execution does nothing.

Related

exceptions vs interrupts in RiscV?

I read that exceptions are raised by instructions and interrupts are raised by external events.
But I couldn't understand few things:
In which type we stop immediately and in which we let the current line of code finish?
In both cases and after doing 1 we run the next program, is that right?
The terminology is used differently by different authors.
The hardware exception mechanism is used by both software caused exceptions, and by external interrupts.
In which type we stop immediately and in which we let the current line of code finish?
The processor can do what its designers want up to a point.
If there is an external interrupt, the processor may choose to finish some of the work in progress before taking the interrupt.
If there is a software caused interrupt, the processor simply cannot execute that instruction (or any instructions after) and must abort its execution, and take the exception without completing the offending instruction.
In both cases and after doing 1 we run the next program, is that right?
In the case of external interrupt, the most efficient is to resume the program that was interrupted, as the caches and page tables are hot for that process.  However, an external interrupt may change the runnable status of another higher priority process, or may end the timeslice of the interrupted process, indicating another process should now get CPU time instead of the one interrupted.
In the case of a software caused exception, it is could be a non-resumable situation (null pointer dereference), in which case, yes another process get the CPU after.  It could be a resumable situation, like I/O request or page fault.  If servicing the situation requires interfacing with peripherals, that may block the interrupted process, so another process would get some CPU time.

Minimum callgrind command for callgraph generation and profiling

I want to use callgrind to profile my program, but it is slowed down too much. What I want to do is generate a callgraph using kcachegrind where every node shows how much percentage the program spent in which function. Can you tell me which features I can safely disable for better performance so this info is still generated?
Thanks a lot!
Quick Overview
Callgrind is essentially a cache profiler (both instruction and data) that works at function-level granularity in order to reproduce the call graph. The profiler observes actions that trigger events during program execution and updates various aggregate counters maintained by the simulator.
However, this fine-grained simulation of cache events comes at a heavy cost of program runtime. You should know that even with all profiling turned off and no useful data being collected, Callgrind will still have a minimum of about 2-4x hit in runtime. When actively collecting data, it would be an average of 10-20x slower.
Is this theoretical minimum acceptable for your requirement? If not, you should consider other profiling options - discussed here. But if, with some careful control, speeding up large, uninteresting chunks of your program to only a 2-4x slowdown sounds reasonable, read on!
Available Hooks
Callgrind offers 2 forms of control over the collection of profiling data. It's important to understand their inter-dependencies in order to make an informed choice:
Intrumentation state - When disabled, no program actions are observed and thus, no events are triggered or collected. The simulator basically switches to an 'idle' state; this is what helps you achieve the theoretical 2-4x minimum I mentioned above (see Nulgrind).
But be warned, this should be used carefully! While it offers attractive benefits, this can have non-trivial effects on accuracy. From the documentation:
However, this only should be used with care and in a coarse fashion: every mode change resets the simulator state (ie. whether a memory block is cached or not) and flushes Valgrinds internal cache of instrumented code blocks, resulting in latency penalty at switching time.
Collection state - When disabled, the aggregate counters are not updated with triggered events. This provides a way to streamline collected data to only the interesting parts of your call stack.
However, intuitively, this does not offer any noticeable speedup in execution time. And of course, instrumentation needs to be switched on for collection to be enabled.
Commands
valgrind --tool=callgrind
--instr-atstart=<yes|no> ;; default = yes
--collect-atstart=<yes|no> ;; default = yes
--toggle-collect=<function> ;; Toggle collection at entry/exit of specific function
<PROGRAM> <PROGRAM_OPTIONS>
Instrumentation - Turning this off in the beginning indicates you have to turn it back on again at the appropriate time. 2 alternate ways to do this:
During program execution, use the following command from the shell at the appropriate time.
callgrind_control -i <on|off>
This would require visibility into your program execution as well as some tolerance in accuracy due to the latency of deploying the command. You could use a few shell tricks to help, of course.
Insert the following macros into your program code and recompile your binary.
CALLGRIND_START_INSTRUMENTATION;
CALLGRIND_STOP_INSTRUMENTATION;
Collection - Similarly, if disabled at the start, collection needs to be toggled around the interesting parts of the code. 2 alternate ways to do this:
Use the --toggle-collect=<function> flag during launch. By definition, this would be inclusive of all the sub-calls within this function. If you can thus identify a particular parent function as your bottleneck, this can be a useful method to isolate relevant data and keep the generated call graph minimal.
Tip: Wildcards are supported in the function name!
Use the following macro before and after the relevant portion of your program code and recompile your binary. This can give you more fine-grained control within functions.
CALLGRIND_TOGGLE_COLLECT;
Summary
To combine all the ideas above, a good approach would be:
#include <callgrind.h>
// Uninteresting program chunk
CALLGRIND_START_INSTRUMENTATION;
// A few extra lines to allow cache warm-up
CALLGRIND_TOGGLE_COLLECT;
// Portion to profile
CALLGRIND_TOGGLE_COLLECT;
CALLGRIND_DUMP_STATS;
CALLGRIND_STOP_INSTRUMENTATION;
// Rest of the program
Recompile, and launch Callgrind with:
valgrind --tool=callgrind --instr-atstart=no --collect-atstart=no <PROGRAM> <PROGRAM_OPTIONS>
Note that there will be 2 Callgrind output files generated by this method - the first created by the DUMP_STATS macro, and the second at program exit. DUMP_STATS zeroes all counters after use, which means the second log will report 0 events.
Within the active instrumentation block, you could also toggle collection multiple times and dump collected stats for each chunk.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?
Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.
I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

Kernel - Scheduler : what happens when switching between process

Context:
I don't really understand how the kernel saves the state of a running code when it gets to exceed its time slice.
I don't visualize what happens actually.
Question:
1) Where is stored the current running code (and its stack ?) ?
2) When the kernel will "see" the code again, will it just follow an offset and keep going as if nothing happened ?
It is not clear to me.
Thanks
Current code instruction pointer and current stack pointer are stored in task_struct->ip and task_struct->sp (for x86) and new process's task_struct->ip and task_struct->sp and are loaded back to sp and ip registers when switch_to() is called in Linux kernel.
Kernel's switch_to() does many things like resetup of EIP, stack, FPU, segment descriptors, debug registers while switching to new process.
Then kernel's switch_mm() switch the virtual memory mappings from last process to new process.
It depends on the OS but as a general rule there is a block of storage which holds information about each process (usually called the Process Control Block or PCB). This information includes a pointer to the current line of code that is being executed and the contents of registers etc, so the process can start again where it stopped last time.
This block of information is owned by the OS itself not the process so it lives beyond the suspension of the process.
The program code itself is not stored in the PCB - it simply exists in memory or on disk. It can even be shared between processes, for example several processes may be running the same program, each at a different point in the code at any given time and each with their own set of 'variables' or data unique to that process's run of the program. All the OS needs is the variables and the line number or pointer to know where a particular process was in the code when it was suspended, and it can start from that point again.
It is worth noting that any RAM the process was using may or may not be still there when it restarts. In general an OS will try to leave recently used or frequently used RAM chunks (or 'pages') in memory if possible. If it needs to free up space, however, it may swap the 'page' out to disk, but disk access is much, much slower, hence the desire to avoid swapping out memory which is likely to be used again if possible.
In the worst case situation an OS may find it swaps out a process and then very soon the new process need to use some memory which has to be retrieved from disk. It is suspended while this happens as the retrieval take a long time in CPU terms. It may then happen that the next process also very soon finds itself in the same situation. The OS is now spending a lot of its time swapping processes and memory in and out and much less of its time doing real work - this is commonly called 'thrashing'.

Watchdog timer triggers during MSP430F5529 initialization

I am coding a simple game and trying to test it on an MSP430F5529 microcontroller. The problem I have encountered is relating to the watchdog timer.
The code I have written causes a device reset, which is an indication of a watchdog timer issue. I assume I need to stop it even before the first line of my main code, some sort of pre-initialisation code. Am I on the right track by sayin that or may the problem lie some other sections of the code as well?
To make it more clear, my main code is as follows (in simple form):
Stop the watchdog timer.
Initialize the board (GPIO pins).
Set up the Vcore voltage for CPU.
Set up the reference crystal (XTAL).
Set up the system clock.
Enable interrupts (globally).
Set up real time clock (RTC).
Set up LCD display.
Initialize the buttons.
Wait in an appropriate LPM mode for the user input.
As far as I am concerned, this sequence of code should be right.
Here are some thoughts. You must explicitly disable the watchdog if you do not plan on feeding it. You shouldn't have to do this in the pre-init code (unless you personally modified the pre-init code and made it longer to execute). Doing it at the beginning of main should be OK barring the following case. There's the possibility that having static arrays may force them to be initialized to zero in the pre-init code. If they are large, that could take some time, maybe enough to have the watchdog trigger before getting out of the pre-init code. Also, on at least some MSP430s, you must unlock the Clock registers with a password before writing to them. If you don't, the chip will reset.
Here is a link discussing watchdog in pre-init code if you haven't seen it already:
http://e2e.ti.com/support/microcontrollers/msp430/f/166/t/267695.aspx