Valgrind not showing source code for the dynamic library - valgrind

I'm trying to debug my program using Valgrind. I compiled with -g3 -O0 -ggdb. How ever I am unable to see the source code corresponding to the point where Valgrind finds problem. The output just shows the name of the (binary)library.

These addresses are of no interest. They belong to the runtime support code that runs after main and calls destructors of global objects and atexit routines. They do not have any source (that you wrote) associated with them.
You can tell that from their placement between exit and __cxa_finalize in the call stack. No user code could possibly belong there.

Silly question, but do you have the source for that library? If not, and that library wasn't compiled with debugging symbols, valgrind isn't going to decompile the binary and show you source.

Valgrind is complaining about a double free on exit. The line:
Address 0x5980ec0 is 0 bytes inside a block of size 29 free'd
Is pointing you where this memory block was previously freed. Taking into account that this is also in exit I can think of two possible reasons:
Some global and static variables that are been freed (with C++ I've seen this problem when directly assigning two global objects, containing pointers, using default copy constructor. As both pointers refer to same memory address, on exit, this is freed twice).
libslm.so has been loaded by using dlopen, then, on exit, it is closed and can also cause some problems with currently managed memory.
I'm assuming that libslm.so is yours so, I think, in both scenarios is important to know something about lines you marked. Have you checked that the path in the log is the same were you have your libraries with debug information? Is AddrScram linked against these libraries (with same exact path)?

Related

How Can I Aware Register Spilling via Objdump File?

How can I be aware of register spilling by looking at an objdump file?
My thought is that it can be done by tracking the stack pointer: moving sp beyond function prologue and epilogue, indicates register spilling.
I want to know which lines of codes are doing register spilling. Also, where are the registers restored pointed to global variable, also stack?
Register spilling doesn't require moving the stack pointer, a local variable may be spilled to the stack and constantly used directly from there while still in the current frame, and the compiler would just use the stack frame with its offset instead of a register.
Your best bet is just looking for memory addresses being read and/or written to constantly. This may even happen where there are available registers around because of compiler deficiencies, or inability to prove that no other thread/code unit are accessing some local variable by addr (for example if the variable address is copied somewhere out of scope). In such cases maintaining that variable in memory is necessary.

loading shared library into shared memory

Is there anyway I can load a shared library into shared memory in a process so that some other process can simply map that shared memory (to the same address) and simply invoke functions? I understand that the external in the shared library need to have an additional jump into process-specific memory locations to call into appropriate functions (like elf plt). But, is such a thing viable with today's tools.
But, is such a thing viable with today's tools.
Not with today's tools, nor ever.
Sure, if your shared library has completely self-contained functions, then it will work. But the moment your library references external data or functions, you will crash and burn.
I understand that the external in the shared library need to have an additional jump into process-specific memory locations to call into appropriate functions
I don't think you understand. Let's consider an example:
void *foo() { return malloc(1); }
When this is built into a shared library on Linux, the result is:
0x00000000000006d0 <+0>: mov $0x1,%edi
0x00000000000006d5 <+5>: jmpq 0x5c0 <malloc#plt>
and
Dump of assembler code for function malloc#plt:
0x00000000000005c0 <+0>: jmpq *0x200a5a(%rip) # 0x201020 <malloc#got.plt>
0x00000000000005c6 <+6>: pushq $0x1
0x00000000000005cb <+11>: jmpq 0x5a0
So the question is: where will jmpq *0x200a5a(%rip) go in the second process. Answer: one of two places.
If the first process has already called malloc (very likely), then the jmpq will go to address of malloc in the first process, which is exceedingly unlikely to be the address of malloc in the second process, and more likely to be unmapped, or be in the middle of some data. Either way, you crash.
If the first process has not yet called malloc, then the jmpq in the second process will jump to address of the runtime loader (ld-linux.so.2 or similar on Linux, ld.so on Solaris) resolver function. Again, that address is very unlikely to also be the address of the resolver in the second process, and if it's not, you crash.
But it gets worse from here. If by some improbable magic you ended up actually calling malloc in the second process, that malloc is itself very likely to crash, because it will try to use data structures it has set up previously, using memory obtained from sbrk or mmap. These data structures are present in the first process, but not in the second, and so you crash again.

Is it possible to save an objective-c block to a file and later read it from there to use it?

I would like to save an objective-c block to a file (or any other storage e.g. FTP server) and later load it from there and execute it.
From the Blocks Programming Guide > Using Blocks > Copying Blocks, I know that blocks can be stored in the heap. Because anything stored there can be modified, I think that it is possible to read and write arbitrary content from/to the heap and treat the data as a block.
My problem is, how do you save a block to a file? I don't even know what its structure is/how many bytes it covers. I highly doubt that doing a sizeof() and then reading/writing as many bytes is sufficient. Please help me in finding a start to read and write blocks to/from memory and to understand how they are composed.
Let's start from this code:
void (^myBlock)(void) = ^{ printf("Hello, I'm a Block\n"); };
printf("block size: %lu\n", sizeof(myBlock));
myBlock();
Output:
block size: 4
Hello, I'm a Block
As you can imagine, if this works, a long list of fascinating concepts could be implemented in iOS. Just to name a few:
Downloading executable code (as a block) from the web on the fly, storing it in the heap, and executing it, thus making dynamically linked libraries possible in iOS. From this idea, many more possibilities spawn which are simply too many to write in here.
Compiling code in-app and execute immediately, thus enabling any kind of natively executed scripting languages in iOS apps.
Manipulating code at runtime on the machine level in iOS. This is an important topic for AI and evolutionary/random algorithms.
A block object can be stored in the heap. But a block object itself, like other objects, does not contain executable code -- it only contains captured variables, some metadata, and a pointer to the underlying function that is executed. Even if you could hypothetically serialize block objects, you could only unserialize them on a system that has implemented the same block, i.e. has the same executable code.
To make an analogy, what you are saying applies equally with a normal Objective-C object -- Objective-C objects exist on the heap, you can serialize many Objective-C objects, and Objective-C objects contain executable "methods" that you can call on them. Does that mean you can "download executable code (as an object) from the web on the fly, storing it in the heap, and call methods on it, thus making dynamically linked libraries possible in iOS."? Of course not. You can only potentially unserialize objects on a system that has the same class.
It is not possible:
when you copy the block on the heap you are copying the address of the block itself, not the code of the block.
Moreover the possibility of run not compiled and signed code is against the concept of sandbox, and it'd open the possibility to run evil code in your app breaking the security.
You could implement a custom language interpreter in your app to run a interpred code, but it would be against the Apple policy and it would be rejected during the review process.

Garbage value undetected in debug mode

I've recently discovered the following in my code:
for (NSInteger i; i<x; i++){
...
}
Now, clearly, i should have been initialised. What I find strange is that while in "debug" profile (XCode), this error goes undetected and the for loop executes without issue. When the application is released using the "release" profile, a crash occurs.
What flags are responsible for letting this kind of mistake execute in "debug" profile?
Thanks in advance.
This could be considered a Heisenbug. A declaration without an initialization will typically allocate some space in the stack frame for the variable and if you read the variable you will get whatever happened to be at that location in memory. When compiled for the debug profile the storage for variables can shift around compared to release. It just happens that whatever is in that location in memory for debug mode does not cause a crash (probably a positive number) but when in release mode it is some value that causes a crash (probably a negative number).
The clang static analyser should detect this. I have the analyse when building option switched on always.
In the C language, using an initialized variable isn't an error but an Undefined Behavior.
Undefined behavior exists because C is designed to be a very efficient low-level language. Using an initialized variable is undefined behavior because it allows the compiler to optimize the variable allocation, as no default value is required.
But the compiler is licensed to do whatever he wants when an undefined behavior occurs. The C Standard FAQ says:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
So any implementation of an undefined behavior is valid (even if it produces code that formats your hard drive).
Xcode uses different optizations for Debug and Release configurations. Debug configuration has no optimization (-O0 flag) so the compiled executable must stays close to your code, allowing you to debug it more easily. On the other hand, Release configuration produces strongly optimized executables (-Os flag) because you want your application to run fast.
Due to that difference, undefined behaviours may (or may not) produce different results in Release and Debug configurations.
Though the LLVM compiler is quite verbose, it does not emit warnings by default for undefined behaviors. You may however run the static analyzer, which can detect that kind of issues.
More information about undefined behaviors and how they are handled by compilers in What Every Programmer Should Know About Undefined Behavior.
I doubt it is so much flags as the compiler is optimizing out the "unused" variable i. Release mode includes far more optimizations then debug mode.
Different compiler optimizations may or may not use a different memory location or register for you uninitialized variable. Different garbage (perhaps from previously used variables, computations or addresses used by your app) will be left in these different locations before you start using the variable.
The "responsibility" goes to not initializing the variable, as what garbage is left in what locations may not be visible to the compiler, especially in debug mode with most optimatizations off (e.g. you got "lucky" with the debug build).
i has not been initialized . You are just declaring the i variable not initializing the variable.
Writing just NSInteger i; just declares a variable not initializes it.
You can initialize the variable by below mentioned code.
for (NSInteger i=1; i<x; i++){
...
}

How does it know where my value is in memory?

When I write a program and tell it int c=5, it puts the value 5 into a little bit of it's memory, but how does it remember which one? The only way I could think of would be to have another bit of memory to tell it, but then it would have to remember where it kept that as well, so how does it remember where everything is?
Your code gets compiled before execution, at that step your variable will be replaced by the actual reference of the space where the value will be stored.
This at least is the general principle. In reality it will be way more complecated, but still the same basic idea.
There are lots of good answers here, but they all seem to miss one important point that I think was the main thrust of the OP's question, so here goes. I'm talking about compiled languages like C++, interpreted ones are much more complex.
When compiling your program, the compiler examines your code to find all the variables. Some variables are going to be global (or static), and some are going to be local. For the static variables, it assigns them fixed memory addresses. These addresses are likely to be sequential, and they start at some specific value. Due to the segmentation of memory on most architectures (and the virtual memory mechanisms), every application can (potentially) use the same memory addresses. Thus, if we assume the memory space programs are allowed to use starts at 0 for our example, every program you compile will put the first global variable at location 0. If that variable was 4 bytes, the next one would be at location 4, etc. These won't conflict with other programs running on your system because they're actually being mapped to an arbitrary sequential section of memory at run time. This is why it can assign a fixed address at compile time without worrying about hitting other programs.
For local variables, instead of being assigned a fixed address, they're assigned a fixed address relative to the stack pointer (which is usually a register). When a function is called that allocates variables on the stack, the stack pointer is simply moved by the required number of bytes, creating a gap in the used bytes on the stack. All the local variables are assigned fixed offsets to the stack pointer that put them into that gap. Every time a local variable is used, the real memory address is calculated by adding the stack pointer and the offset (neglecting caching values in registers). When the function returns, the stack pointer is reset to the way it was before the function was called, thus the entire stack frame including local variables is free to be overwritten by the next function call.
read Variable (programming) - Memory allocation:
http://en.wikipedia.org/wiki/Variable_(programming)#Memory_allocation
here is the text from the link (if you don't want to actually go there, but you are missing all the links within the text):
The specifics of variable allocation
and the representation of their values
vary widely, both among programming
languages and among implementations of
a given language. Many language
implementations allocate space for
local variables, whose extent lasts
for a single function call on the call
stack, and whose memory is
automatically reclaimed when the
function returns. (More generally, in
name binding, the name of a variable
is bound to the address of some
particular block (contiguous sequence)
of bytes in memory, and operations on
the variable manipulate that block.
Referencing is more common for
variables whose values have large or
unknown sizes when the code is
compiled. Such variables reference the
location of the value instead of the
storing value itself, which is
allocated from a pool of memory called
the heap.
Bound variables have values. A value,
however, is an abstraction, an idea;
in implementation, a value is
represented by some data object, which
is stored somewhere in computer
memory. The program, or the runtime
environment, must set aside memory for
each data object and, since memory is
finite, ensure that this memory is
yielded for reuse when the object is
no longer needed to represent some
variable's value.
Objects allocated from the heap must
be reclaimed—especially when the
objects are no longer needed. In a
garbage-collected language (such as
C#, Java, and Lisp), the runtime
environment automatically reclaims
objects when extant variables can no
longer refer to them. In
non-garbage-collected languages, such
as C, the program (and the programmer)
must explicitly allocate memory, and
then later free it, to reclaim its
memory. Failure to do so leads to
memory leaks, in which the heap is
depleted as the program runs, risking
eventual failure from exhausting
available memory.
When a variable refers to a data
structure created dynamically, some of
its components may be only indirectly
accessed through the variable. In such
circumstances, garbage collectors (or
analogous program features in
languages that lack garbage
collectors) must deal with a case
where only a portion of the memory
reachable from the variable needs to
be reclaimed
There's a multi-step dance that turns c = 5 into machine instructions to update a location in memory.
The compiler generates code in two parts. There's the instruction part (load a register with the address of C; load a register with the literal 5; store). And there's a data allocation part (leave 4 bytes of room at offset 0 for a variable known as "C").
A "linking loader" has to put this stuff into memory in a way that the OS will be able to run it. The loader requests memory and the OS allocates some blocks of virtual memory. The OS also maps the virtual memory to physical memory through an unrelated set of management mechanisms.
The loader puts the data page into one place and instruction part into another place. Notice that the instructions use relative addresses (an offset of 0 into the data page). The loader provides the actual location of the data page so that the instructions can resolve the real address.
When the actual "store" instruction is executed, the OS has to see if the referenced data page is actually in physical memory. It may be in the swap file and have to get loaded into physical memory. The virtual address being used is translated to a physical address of memory locations.
It's built into the program.
Basically, when a program is compiled into machine language, it becomes a series of instructions. Some instructions have memory addresses built into them, and this is the "end of the chain", so to speak. The compiler decides where each variable will be and burns this information into the executable file. (Remember the compiler is a DIFFERENT program to the program you are writing; just concentrate on how your own program works for the moment.)
For example,
ADD [1A56], 15
might add 15 to the value at location 1A56. (This instruction would be encoded using some code that the processor understands, but I won't explain that.)
Now, other instructions let you use a "variable" memory address - a memory address that was itself loaded from some location. This is the basis of pointers in C. You certainly can't have an infinite chain of these, otherwise you would run out of memory.
I hope that clears things up.
I'm going to phrase my response in very basic terminology. Please don't be insulted, I'm just not sure how proficient you already are and want to provide an answer acceptable to someone who could be a total beginner.
You aren't actually that far off in your assumption. The program you run your code through, usually called a compiler (or interpreter, depending on the language), keeps track of all the variables you use. You can think of your variables as a series of bins, and the individual pieces of data are kept inside these bins. The bins have labels on them, and when you build your source code into a program you can run, all of the labels are carried forward. The compiler takes care of this for you, so when you run the program, the proper things are fetched from their respective bin.
The variables you use are just another layer of labels. This makes things easier for you to keep track of. The way the variables are stored internally may have very complex or cryptic labels on them, but all you need to worry about is how you are referring to them in your code. Stay consistent, use good variable names, and keep track of what you're doing with your variables and the compiler/interpreter takes care of handling the low level tasks associated with that. This is a very simple, basic case of variable usage with memory.
You should study pointers.
http://home.netcom.com/~tjensen/ptr/ch1x.htm
Reduced to the bare metal, a variable lookup either reduces to an address that is some statically known offset to a base pointer held in a register (the stack pointer), or it is a constant address (global variable).
In an interpreted language, one register if often reserved to hold a pointer to a data structure (the "environment") that associates variable names with their current values.
Computers ultimately only undertand on and off - which we conveniently abstract to binary. This language is the basest level and is called machine language. I'm not sure if this is folklore - but some programmers used to (or maybe still do) program directly in machine language. Typing or reading in binary would be very cumbersome, which is why hexadecimal is often used to abbreviate the actual binary.
Because most of us are not savants, machine language is abstracted into assembly language. Assemply is a very primitive language that directly controls memory. There are a very limited number of commands (push/pop/add/goto), but these ultimately accomplish everything that is programmed. Different machine architectures have different versions of assembly, but the gist is that there are a few dozen key memory registers (physically in the CPU) - in a x86 architecture they are EAX, EBX, ECX, EDX, ... These contain data or pointers that the CPU uses to figure out what to do next. The CPU can only do 1 thing at a time and it uses these registers to figure out what to do next. Computers seem to be able to do lots of things simultaneously because the CPU can process these instructions very quickly - (millions/billions instructions per second). Of course, multi-core processors complicate things, but let's not go there...
Because most of us are not smart or accurate enough to program in assembly where you can easily crash the system, assembly is further abstracted into a 3rd generation language (3GL) - this is your C/C++/C#/Java etc... When you tell one of these languages to put the integer value 5 in a variable, your instructions are stored in text; the assembler compiles your text into an assembly file (executable); when the program is executed, the program and its instructions are queued by the CPU, when it is show time for that specific line of code, it gets read in the the CPU register and processed.
The 'not smart enough' comments about the languages are a bit tongue-in-cheek. Theoretically, the further you get away from zeros and ones to plain human language, the more quickly and efficiently you should be able to produce code.
There is an important flaw here that a few people make, which is assuming that all variables are stored in memory. Well, unless you count the CPU registers as memory, then this won't be completely right. Some compilers will optimize the generated code and if they can keep a variable stored in a register then some compilers will make use of this!
Then, of course, there's the complex matter of heap and stack memory. Local variables can be located in both! The preferred location would be in the stack, which is accessed way more often than the heap. This is the case for almost all local variables. Global variables are often part of the data segment of the final executable and tend to become part of the heap, although you can't release these global memory areas. But the heap is often used for on-the-fly allocations of new memory blocks, by allocating memory for them.
But with Global variables, the code will know exactly where they are and thus write their exact location in the code. (Well, their location from the beginning of the data segment anyways.) Register variables are located in the CPU and the compiler knows exactly which register, which is also just told to the code. Stack variables are located at an offset from the current stack pointer. This stack pointer will increase and decrease all the time, depending on the number of levels of procedures calling other procedures.
Only heap values are complex. When the application needs to store data on the heap, it needs a second variable to store it's address, otherwise it could lose track. This second variable is called a pointer and is located as global data or as part of the stack. (Or, on rare occasions, in the CPU registers.)
Oh, it's even a bit more complex than this, but already I can see some eyes rolling due to this information overkill. :-)
Think of memory as a drawer into which you decide how to devide it according to your spontaneous needs.
When you declare a variable of type integer or any other type, the compiler or interpreter (whichever) allocates a memory address in its Data Segment (DS register in assembler) and reserves a certain amount of following addresses depending on your type's length in bit.
As per your question, an integer is 32 bits long, so, from one given address, let's say D003F8AC, the 32 bits following this address will be reserved for your declared integer.
On compile time, whereever you reference your variable, the generated assembler code will replace it with its DS address. So, when you get the value of your variable C, the processor queries the address D003F8AC and retrieves it.
Hope this helps, since you already have much answers. :-)