Understanding branches in gcov files

Understanding branches in gcov files - objective-c

I'm trying to understand the output of the gcov tool. Running it with no options makes sense, but I'm wanting to try and understand the branch coverage options. Unfortunately it's hard to make sense of what the branches do and why they aren't taken. Below is the output for a method (compile using the latest LLVM/Clang build).
function -[TestCoverageAppDelegate loopThroughArray:] called 5 returned 100% blocks executed 88%
5: 30:- (NSInteger)loopThroughArray:(NSArray *)array {
5: 31: NSInteger i = 0;
22: 32: for (NSString *string in array) {
branch 0 taken 0
branch 1 taken 7
-: 33:
22: 34: }
branch 0 taken 4
branch 1 taken 3
branch 2 taken 0
branch 3 taken 3
5: 35: return i;
-: 36:}
I've run 5 test through this, passing in nil, an empty array, an array with 1 object, and array with 2 objects and an array with 4 objects. I can guess that in the first case, branch 1 means "go into the loop" but I haven't a clue what branch 0 is. In the second case branch 0 seems to be loop through again, branch 1 seems to be end the loop and branch 3 is continue/exit the loop, but I have no idea what branch 2 is or why/when it would be executed.
If anyone knows how to decipher the branch info, or knows of any detailed documentation on what it all means, I'd appreciate the help.

Gcov works by instrumenting (while compiling) every basic block of machine commands (you can think about assembler). Basic block means a linear section of code, which have no branches inside it and no lables inside it. So, If and only if you start running a basic block, you will reach end of basic block. Basic blocks are organized in CFG (Control flow graph, think about it as directed graph), which shows relations between basicblocks (edge from V1 to V2 is V1 calls V2; and V2 is called by V1). So, profile-arcs mode of compiler and gcov want to get execution count for every line and do this via counting basic block executions. Some of edges in CFG are instrumented and some are not, because there are algebraic relations between basic blocks in graph.
Your ObjC construction (for..in) is lowered (converted in early compilation) to several basic blocks. So, gcov sees 4 branches, because it sees only lowered BBs. It knows nothing about this lowering, but it knows which line corresponds to every assembler instruction (this is debug info). So, branches are edges of CFG.
If you want to see basic blocks, you should do an assembler dump of compiled program or disassemble a binary or dump CFG from compiler. You can do this both for profile-arcs and non-profile-arcs modes and compare them.
profile-arcs mode will have a lot calls and increments of something like "__llvm_gcov_ctr" or "__llvm_gcda_edge" - it is an actual instrumentation of basic blocks.

Related

How to build a decompiler for Lua 5.1?

I am building a decompiler for Lua 5.1. (for study purposes only)
Here is the generated code:
main <test.lua:0,0> (12 instructions, 48 bytes at 008D0520)
0+ params, 2 slots, 0 upvalues, 0 locals, 6 constants, 0 functions
1 [1] LOADK 0 -2 ; 2
2 [1] SETGLOBAL 0 -1 ; plz_help_me
3 [2] LOADK 0 -4 ; 24
4 [2] SETGLOBAL 0 -3 ; oh_no
5 [3] GETGLOBAL 0 -1 ; plz_help_me
6 [3] GETGLOBAL 1 -3 ; oh_no
7 [3] ADD 0 0 1
8 [3] SETGLOBAL 0 -5 ; plz_work
9 [4] GETGLOBAL 0 -6 ; print
10 [4] GETGLOBAL 1 -5 ; plz_work
11 [4] CALL 0 2 1
12 [4] RETURN 0 1
constants (6) for 008D0520:
1 "plz_help_me"
2 2
3 "oh_no"
4 24
5 "plz_work"
6 "print"
locals (0) for 008D0520:
upvalues (0) for 008D0520:
Original Code:
plz_help_me = 2
oh_no = 24
plz_work = plz_help_me + oh_no
print(plz_work)
How to build a decompiler efficiently to generate this code? Should I use AST trees to map the behavior of the code? (opcodes in this case)

Lua VM is a register machine with a nearly unlimited supply of registers, which means you don't have to deal with the consequences of register allocation. It makes the whole thing much more bearable than decompiling, say, x86.
A very convenient intermediate representation for going up the abstraction level would have been an SSA. A trivial transform treating registers as local variable pointers and leaving memory loads as is, followed by an SSA-transform [1], will give you a code suitable for further analysis. Next step will be loop detection (done purely on CFG level), and, helped by SSA, detection of the loop variables and loop invariants. Once done, you'll see that only a handful of common patterns exist, that can be directly translated to higher level loops. Detecting if and other linear control flow sequences is even easier, once you're in an SSA.
One nice property of an SSA is that you can easily construct high level AST expressions out of it. You have a use count for every SSA variable, so you can simply substitute all the single-use variables (that were not produced by side-effect instructions) in place of their use (and side-effect ones too if you maintain their order). Only multi-use variables will remain.
Of course, you'll never get meaningful local variable names out of this procedure. Globals are preserved.
[1] https://pfalcon.github.io/ssabook/latest/

It looks like there is a lua decompiler for lua 5.1 written in C that can be used for study; https://github.com/viruscamp/luadec https://www.lua.org/wshop05/Muhammad.pdf describes how it works.
And on the extremely simple example you give, it comes back with the exactly same thing:
$ ./luadec ../tmp/luac.out
-- Decompiled using luadec 2.2 rev: 895d923 for Lua 5.1 from https://github.com/viruscamp/luadec
-- Command line: ../tmp/luac.out
-- params : ...
-- function num : 0
plz_help_me = 2
oh_no = 24
plz_work = plz_help_me + oh_no
print(plz_work)
There is a debug flag -d to show a little bit about how it works:
-- Decompiled using luadec 2.2 rev: 895d923 for Lua 5.1 from https://github.com/viruscamp/luadec
-- Command line: -d ../tmp/luac.out
LoopTree of function 0
FUNCTION_STMT=0x2e2f5ec0 prep=-1 start=0 body=0 end=11 out=12 block=0x2e2f5f10
----------------------------------------------
1 LOADK 0 1
SET_SIZE(tpend) = 0
next bool: 0
locals(0):
vpend(0):
tpend(1): 0{2}
Note: The Lua VM in version in 5.1 (and other versions up to the latest 5.4) seems to be stack oriented. This means that in order to compute plz_hep_me + oh_no, both values need to be pushed onto a stack and then when the ADD bytecode instruction is seen two entries are popped off of the stack and the result is pushed onto the stack. From the decompiler's standpoint, a tree fragment or "AST" fragment of the addition with its two operands is created.
In the output above, a constant is push onto stack tpend to which we see 0{2}. The stuff in brackets seems to hold the output representation. Later on, just before the ADD opcode you will see two entries on the stack. Entries are separate simply with a space in the debug output.
----------------------------------------------
2 SETGLOBAL 0 0
next bool: 0
locals(0):
vpend(1): -1{plz_help_me=2}
tpend(0):
----------------------------------------------
Seems to set variable plz_help_me from the top of the stack and push the assignment statement instead.
3 LOADK 0 3
SET_SIZE(tpend) = 0
next bool: 0
locals(0):
vpend(0):
tpend(1): 0{24}
...
At this point I guess the decompiler sees that the stack is 0 SET_SIZE(tpend)=0 or the first statement then is done.
Moving onto the next statement, the constant 24 is then loaded onto tpend.
In vpend and tpend, we see the output as it gets built up.
Looking at the C code, It looks like it loops over functions and within that loops over what it thinks are statements, and within that is driven by the a loop over bytecode instructions but "walking" them based on instruction type and information it saves.
It looks it does build an AST on the fly in a somewhat ad-hoc manner. (By "ad hoc" I simply mean programmer code builds the tree).
Although for compilation especially of optimized code Single Static Assignment (SSA) is a win, in the decompilation direction of high-level bytecode such as is found here, usually this is neither needed nor wanted. SSA maps to finite reusable names to a potentially infinite set.
In a decompiler we want to go the other direction from infinite registers (or even finite registers) back to names the programmer used and here this mapping is already given, so just use it. This is simpler too.
If the same name is used for two conceptually different computations, tracking that is probably wanted. Correcting or improving the style of the source code in the way that the decompiler writer feels might be nice, but usually isn't appreciated. (In fact, in my experience many programmers using a decompiler like this will call it a bug, when technically, it isn't.)
The luadec decompiler breaks things into functions and walks over disassembled instructions in that in an ad hoc way based on specific instructions for various opcodes. After this tree or list of tree nodes is built, an AST if you will (and that is the term this decompiler uses), there seem to be custom print routines for each AST node type.
Personally, I think there is a better way, and one that is slightly different from the one suggested in another answer.
I wrote about decompilation from this novel perspective here. I believe it applies to Lua as well. And older, longer more research oriented paper here.
As with many decompilers, logic instructions introduce control flow, and these strategies which are instruction based and that go down a path early on from the instructions, have a hard time shifting these around. So a pattern matching/parsing/term-rewrite approach where control flow is marked in the instructions such as with addition of pseudo Phi-function instructions (or its equivalent). Dominator regions from a control-flow graph also greatly help here.
Edit:
There is also unluac, a lua decompiler written in Java that can be studied. Although there are very few comments in the code describing how it works, the code it self seems well organized and the functions are pretty small and clear. It seems similar in operation in that there is a part that is bytecode operation based, and a part that is more expression-tree based. It seems to be currently worked on and is the only decompiler that supports lua versions from 5.0 up to 5.4.

How to break the gem5 executable in GDB at a the nth instruction?

Using --debug-flags ExecAll tracing, I found that there is a bug at the Nth instruction, which happens at the Nth line of the log.
Is there an easy way to break specifically at that instruction to debug it in GDB and view gem5's internal state?

The simplest approach is to use --debug-break as shown at: schedBreak(<tick>) gdb debugging function not working
That makes gem5 raise a signal at a given simulation, which GDB stops at by default. You can determine what simulation time corresponds to your instruction by looking at an --debug-flags ExecAll trace beforehand.
You will want to break on the tick much more often than on the Nth instructions, in particular since gem5 simulates the instruction pipeline, and therefor there can be multiple instructions in flight at the same time.
Alternatively, from GDB your point of interest sees the ExecutionContext object, which if often called xc, you can just add a conditional breakpoint like:
b MyClass::myFunction if xc->numInsts.data()->value() == <n> - 2
The -2 is needed because this index is zero based, and because the tick increments after instruction execution.
You can also find the tick time rather than instruction count with:
p xc->cpu->tick
or from the other commonly available ThreadContext object with:
p tc->baseCpu->tick
You generally want to do this from the ::tick() function of your CPU model of interest.
For AtomicSimpleCPU::tick() you could also break just before the second instruction with:
b AtomicSimpleCPU::tick if (*threadInfo[curThread]).numInst == 1
Or to break at a given tick, say 1000 (500 is the one before it):
b AtomicSimpleCPU::tick if tick == 500
Two other important break locations are at the main event loop when an event is executed:
b EventQueue::serviceOne() if head->when() == 1000
and the event scheduling target point:
b EventQueue::schedule if when == <target-time>
b EventQueue::reschedule if when == <target-time>
or for the time of schedule itself:
b EventQueue::schedule if _curTick == 1000
b EventQueue::reschedule if _curTick == 1000
Together with reverse debugging and:
--debug-flags Event
these event breakpoints will actually allow you to understand what gem5 is doing.
Note however that conditional breakpoints significantly slow down simulation unfortunately... arghh.
Another useful technique to have in mind is that you can do a run that stops shortly after the point of interest with:
-m <tick>
and then reverse debug back to the exact point of interest, possibly conditionally since now you will be close the the point of interest, so the performance loss will not be a huge problem. You can then just continue going back to the root cause.
Tested in gem5 9f247403e558977738b5911a45e5776afff87b1a.

Bitcoin Blockchain - Verification process

As the title states, basically my question is about the blockchain verification. I know whats a block chain and basically understood how mining is working, except once simple thing.
Let's say we have 2 guys, Bob and Adam.
Blockchain:
|1|-|2|-|3|-{4} - Bob Chain
|1|-|2|-|3|-{4} - Adam Chain
Assume both Bob and Adam found a new block, but it will not be verified until someone finds a next block. So my questions is whats is happening in a situation if Adam will find a block |5| first. Will Bob get his reward for finding a block? Or it means if Adam found one block, he has to find the next one which is extremely difficult without a huge network of computational resources in order to verify his previous block |4| and get reward for block 4 of 12.5 bitcoins, because the nodes will only accept the longest blockchain? I hope I clearly illustrated the picture. I've tried to find the answer in different videos and materials but somehow this aspect was put aside. If my assumption is true, it means there is no way how can a single person to earn anything from mining without a huge network ?

First of all, in Bitcoin when someone creates a block, he broadcasta it to the rest ot the network. As you said, if there are two people that create the block at the same time, they will broadcast it. So, you will get two blocks at the same time. Although you save both blocks, you will try to mine one of them. After some time, one of the two branches will be longer, so you will delete the second one.
The miners of the Blockchain will create some blocks and a branch will be longer after some time.
In Blockchain, a block is considered well when it has 100 blocks (I d0n't know exactly how many) above of it. So, the reward is taken after 100 blocks, not before.

Who out of Adam or Bob gets the reward depends on whose block remains as part of the 'Best chain' eventually. This in turn partly depends on consensus rules and partly on how the things transpired. This is explained as follows
Adam and Bob claimed that they found the block first by broadcasting to peers almost at same time.
Let's presume that a peer named 'ITWala' saw these 2 blocks which would be at same height. Assuming Adam's block reached ITWala's node first. So this would lead to something known as 'forking' in block chain terminology and is quite normal to occur.
**Chain status on ITWala's node leading to forking **
Block1 --> Block 2 --> Block 3 --> Block 4 (Adam's Block)
|
|--- Block 4 (Bob's Block)
One of following can happen:
Case 1 - For sake of simplicity, assume that Bob is the only one making claim for block 5. Now 'ITWala' receives block 5. He tries to make the chain longer by trying to fit at one end of the fork created from Block 4 from Adam. It does not fit because the hash of the previous block would not match.
Result
Fork at the end of Adam's block is discarded. Fork with Bob's block becomes active chain and thus Bob becomes the winner of 4 as well as 5 for award.
Case 2:
Block 5 is created by 'ITWala' or some peer of ITWala who synced up with copy on ITWala's node.
Result: In this case, ITWala would use Adam's block to activate the best chain as it arrived first making him the winner for block 4. Block 5 is awarded to ITWala
There can be more combinations. However point here is that the block that remains in best chain wins the award.

PLC Object Oriented Programming - Using methods

I'm writing a program for a Schneider PLC using structured text, and I'm trying to do it using object oriented programming.
Being a newbie in PLC programming, I wrote a simple test program such a this:
okFlag:=myObject.aMethod();
IF okFlag THEN
// it's ok, go on
ELSE
// error handling
END_IF
aMethod must perform some operations, wait for the result (there is a "time-out" check to avoid deadlocks) and return TRUE or FALSE
This is what I expected during program execution
1) when the okFlag:=myObject.aMethod(); is reached, the code inside aMethod is executed until a result is returned. When I say "executed" I mean that in the next scan cycle the execution of aMethodcontinues from the point it had reached before.
2) the result of method calling is checked and the main flow of the program is executed
and this is what happens:
1) aMethod is executed but the program flow continues. That is, when it reaches the end of aMethod a value it's returned, even if the events that aMethod should wait for are still executing.
2) on the next cycle, aMethod is called again and restarts from the beginning
This is the first solution I found:
VAR_STATIC
imBusy: BOOL
END_VAR
METHOD aMethod: INT;
IF NOT(imBusy) THEN
imBusy:=FALSE;
aMethod:=-1; // result of method while in progress
ELSE
aMethod:=-1;
<rest of code. If everything is ok, the result is 0, otherwise is 1>
END_IF
imBusy:=aMethod<0;
and the main program:
CASE (myObject.aMethod()) OF
0: // it's ok, go on
1: // error handling
ELSE
// still executing...
END_CASE
and this seems to work, but I don't know if it's the right approach.
There are some libraries from Schneider which use methods that return boolean and seem to work as I expected in my program. That is: when the cycle reaches the call to method for the first time the program flow is "deviated" somehow so that in the next cycle it enters again the method until it's finished. It's there a way to have this behaviour ?

generally OOP isn't the approach that people would take when using IEC61131 languages. Your best bet is probably to implement your code as a state machine. I've used this approach in the past as a way of simplifying a complex sequence so that it is easier for plant maintainers to interpret.
Typically what I would recommend if you are going to take this approach is to try to segregate your state machine itself from your working code; you can implement a state machine of X steps, and then have your working code reference the statemachine step.
A simple example might look like:
stepNo := 0;
IF (start AND stepNo = 0) THEN
StepNo = 1;
END_IF;
(* there's a shortcut unity operation for resetting this array to zeroes which is faster, but I can't remember it off the top of my head... *)
ActiveStepArray := BlankStepArray;
IF stepNo > 0 THEN
IF StepComplete[stepNo] THEN
stepNo := stepNo +1;
END_IF;
ActiveStepArray[stepNo] := true;
END_IF;
Then in other code sections you can put...
IF ActiveStep[1] THEN
(* Do something *)
StepComplete[1] := true;
END_IF;
IF ActiveStep[2] THEN
(* Do Something *)
StepComplete[2] := true;
END_IF;
(* etc *)
The nice thing about this approach is that you can actually put all of the state machine code (including jumps, resets etc) into a DFB, test it and then shelve it, and then just use the active step, step complete, and any other inputs you require.
Your code is still always going to execute an entire section of logic, but if you really want to avoid that then you'll have to use a lot of IF statements, which will impede readability.
Hope that helps.

Why not use SFC it makes your live easier in many cases, since it is state machine language itself. Do subprogram, wait condition do another .. rince and repeat. :)
Don't hang just for ST, the other IEC languages are better in some other tasks and keep thing as clear as possible. There should be not so much "this is my cake" mentality on the industrial PLC programming circles as it is on the many other programming fields, since application timeline can be 40 years and you left the firm 20 years ago to better job and programs are almost always location/customer or atleast hardware specific.
http://www.automation.com/pdf_articles/IEC_Programming_Thayer_L.pdf

How does reverse debugging work?

GDB has a new version out that supports reverse debug (see http://www.gnu.org/software/gdb/news/reversible.html). I got to wondering how that works.
To get reverse debug to work it seems to me that you need to store the entire machine state including memory for each step. This would make performance incredibly slow, not to mention using a lot of memory. How are these problems solved?

I'm a gdb maintainer and one of the authors of the new reverse debugging. I'd be happy to talk about how it works. As several people have speculated, you need to save enough machine state that you can restore later. There are a number of schemes, one of which is to simply save the registers or memory locations that are modified by each machine instruction. Then, to "undo" that instruction, you just revert the data in those registers or memory locations.
Yes, it is expensive, but modern cpus are so fast that when you are interactive anyway (doing stepping or breakpoints), you don't really notice it that much.

Note that you must not forget the use of simulators, virtual machines, and hardware recorders to implement reverse execution.
Another solution to implement it is to trace execution on physical hardware, such as is done by GreenHills and Lauterbach in their hardware-based debuggers. Based on this fixed trace of the action of each instruction, you can then move to any point in the trace by removing the effects of each instruction in turn. Note that this assumes that you can trace all things that affect the state visible in the debugger.
Another way is to use a checkpoint + re-execution method, which is used by VmWare Workstation 6.5 and Virtutech Simics 3.0 (and later), and which seems to be coming with Visual Studio 2010. Here, you use a virtual machine or a simulator to get a level of indirection on the execution of a system. You regularly dump the entire state to disk or memory, and then rely on the simulator being able to deterministically re-execute the exact same program path.
Simplified, it works like this: say that you are at time T in the execution of a system. To go to time T-1, you pick up some checkpoint from point t < T, and then execute (T-t-1) cycles to end up one cycle before where you were. This can be made to work very well, and apply even for workloads that do disk IO, consist of kernel-level code, and performs device driver work. The key is to have a simulator that contains the entire target system, with all its processors, devices, memories, and IOs. See the gdb mailinglist and the discussion following that on the gdb mailing list for more details. I use this approach myself quite regularly to debug tricky code, especially in device drivers and early OS boots.
Another source of information is a Virtutech white paper on checkpointing (which I wrote, in full disclosure).

During an EclipseCon session we also asked how they do this with the Chronon Debugger for Java. That one does not allow you to actually step back, but can play back a recorded program execution in such a way that it feels like reverse debugging. (The main difference is that you cannot change the running program in the Chronon debugger, while you can do that in most other Java debuggers.)
If I understood it correctly, it manipulates the byte code of the running program, such that every change of an internal state of the program is recorded. External states don't need to be recorded additionally. If they influence your program in some way, then you must have an internal variable matching that external state (and therefore that internal variable is enough).
During playback time they can then basically recreate every state of the running program from the recorded state changes.
Interestingly the state changes are much smaller than one would expect on first look. So if you have a conditional "if" statement, you would think that you need at least one bit to record whether the program took the then- or the else-statement. In many cases you can avoid even that, like in the case that those different branches contain a return value. Then it is enough to record only the return value (which would be needed anyway) and to recalculate the decision about the executed branch from the return value itself.

Although this question is old, most of the answers are too, and as reverse-debugging remains an interesting topic, I'm posting a 2015 answer. Chapters 1 and 2 of my MSc thesis, Combining reverse debugging and live programming towards visual thinking in computer programming, covers some of the historical approaches to reverse debugging (especially focused on the snapshot-(or checkpoint)-and-replay approach), and explains the difference between it and omniscient debugging:
The computer, having forward-executed the program up to some point, should really be able to provide us with information about it. Such an improvement is possible, and is found in what are called omniscient debuggers. They are usually classified as reverse debuggers, although they might more accurately be described as "history logging" debuggers, as they merely record information during execution to view or query later, rather than allow the programmer to actually step backwards in time in an executing program. "Omniscient" comes from the fact that the entire state history of the program, having been recorded, is available to the debugger after execution. There is then no need to rerun the program, and no need for manual code instrumentation.
Software-based omniscient debugging started with the 1969 EXDAMS system where it was called "debug-time history-playback". The GNU debugger, GDB, has supported omniscient debugging since 2009, with its 'process record and replay' feature. TotalView, UndoDB and Chronon appear to be the best omniscient debuggers currently available, but are commercial systems. TOD, for Java, appears to be the best open-source alternative, which makes use of partial deterministic replay, as well as partial trace capturing and a distributed database to enable the recording of the large volumes of information involved.
Debuggers that do not merely allow navigation of a recording, but are actually able to step backwards in execution time, also exist. They can more accurately be described as back-in-time, time-travel, bidirectional or reverse debuggers.
The first such system was the 1981 COPE prototype ...

mozilla rr is a more robust alternative to GDB reverse debugging
https://github.com/mozilla/rr
GDB's built-in record and replay has severe limitations, e.g. no support for AVX instructions: gdb reverse debugging fails with "Process record does not support instruction 0xf0d at address"
Upsides of rr:
much more reliable currently. I have tested it relatively long runs of several complex software.
also offers a GDB interface with gdbserver protocol, making it a great replacement
small performance drop for most programs, I haven't noticed it myself without doing measurements
the generated traces are small on disk because only very few non-deterministic events are recorded, I've never had to worry about their size so far
rr achieves this by first running the program in a way that records what happened on every single non-deterministic event such as a thread switch.
Then during the second replay run, it uses that trace file, which is surprisingly small, to reconstruct exactly what happened on the original non-deterministic run but in a deterministic way, either forwards or backwards.
rr was originally developed by Mozilla to help them reproduce timing bugs that showed up on their nightly testing the following day. But the reverse debugging aspect is also fundamental for when you have a bug that only happens hours inside execution, since you often want to step back to examine what previous state led to the later failure.
The following example showcases some of its features, notably the reverse-next, reverse-step and reverse-continue commands.
Install on Ubuntu 18.04:
sudo apt-get install rr linux-tools-common linux-tools-generic linux-cloud-tools-generic
sudo cpupower frequency-set -g performance
# Overcome "rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 3."
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
Test program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int f() {
int i;
i = 0;
i = 1;
i = 2;
return i;
}
int main(void) {
int i;
i = 0;
i = 1;
i = 2;
/* Local call. */
f();
printf("i = %d\n", i);
/* Is randomness completely removed?
* Recently fixed: https://github.com/mozilla/rr/issues/2088 */
i = time(NULL);
printf("time(NULL) = %d\n", i);
return EXIT_SUCCESS;
}
compile and run:
gcc -O0 -ggdb3 -o reverse.out -std=c89 -Wextra reverse.c
rr record ./reverse.out
rr replay
Now you are left inside a GDB session, and you can properly reverse debug:
(rr) break main
Breakpoint 1 at 0x55da250e96b0: file a.c, line 16.
(rr) continue
Continuing.
Breakpoint 1, main () at a.c:16
16 i = 0;
(rr) next
17 i = 1;
(rr) print i
$1 = 0
(rr) next
18 i = 2;
(rr) print i
$2 = 1
(rr) reverse-next
17 i = 1;
(rr) print i
$3 = 0
(rr) next
18 i = 2;
(rr) print i
$4 = 1
(rr) next
21 f();
(rr) step
f () at a.c:7
7 i = 0;
(rr) reverse-step
main () at a.c:21
21 f();
(rr) next
23 printf("i = %d\n", i);
(rr) next
i = 2
27 i = time(NULL);
(rr) reverse-next
23 printf("i = %d\n", i);
(rr) next
i = 2
27 i = time(NULL);
(rr) next
28 printf("time(NULL) = %d\n", i);
(rr) print i
$5 = 1509245372
(rr) reverse-next
27 i = time(NULL);
(rr) next
28 printf("time(NULL) = %d\n", i);
(rr) print i
$6 = 1509245372
(rr) reverse-continue
Continuing.
Breakpoint 1, main () at a.c:16
16 i = 0;
When debugging complex software, you will likely run up to a crash point, and then fall inside a deep frame. In that case, don't forget that to reverse-next on higher frames, you must first:
reverse-finish
up to that frame, just doing the usual up is not enough.
The most serious limitations of rr in my opinion are:
https://github.com/mozilla/rr/issues/2089 you have to do a second replay from scratch, which can be costly if the crash you are trying to debug happens, say, hours into execution
https://github.com/mozilla/rr/issues/1373 x86 only
UndoDB is a commercial alternative to rr: https://undo.io Both are trace / replay based, but I'm not sure how they compare in terms of features and performance.

Nathan Fellman wrote:
But does reverse debugging only allow you to roll back next and step commands that you typed, or does it allow you to undo any number of instructions?
You can undo any number of instructions. You're not restricted to, for instance,
only stopping at the points where you stopped when you were going forward. You can
set a new breakpoint and run backwards to it.
For instance, if I set a breakpoint on an instruction and let it run until then, can I then roll back to the previous instruction, even though I skipped over it?
Yes. So long as you turned on recording mode before you ran to the breakpoint.

Here is how another reverse-debugger called ODB works. Extract:
Omniscient Debugging is the idea of
collecting "time stamps" at each
"point of interest" (setting a value,
making a method call,
throwing/catching an exception) in a
program and then allowing the
programmer to use those time stamps to
explore the history of that program
run.
The ODB ... inserts
code into the program's classes as
they are loaded and when the program
runs, the events are recorded.
I'm guessing the gdb one works in the same kind of way.

Reverse debugging means you can run the program backwards, which is very useful to track down the cause of a problem.
You don't need to store the complete machine state for each step, only the changes. It is probably still quite expensive.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas