Static ELF alternative to dlsym - elf

Is it possible to lookup the location of a function using ELF? Similar to what
void *f = dlopen(NULL,..);
void *func = dlsym(f, "myfunc");
does, but without requiring -rdynamic during compilation?
I can see using nm that the naming of the items is still present in a compiled binary?:
0000000000400716 T lookup
0000000000400759 T main
Can I use this information to locate the items once the program is loaded into memory?

Can I use this information to locate the items once the program is loaded into memory?
You sure can: iterate over all symbols in the a.out until you find the matching one. Example code to iterate over symbols is here. Or use libelf.
If you need to perform multiple symbol lookups, iterate once (slow) over all symbols, build a map from symbol name to its address, and perform lookups using that map.
Update:
The example you point to seems incomplete? It uses data and elf, where are they coming from?
Yes, you need to apply a little bit of elbow grease to that example.
The data is location in memory where the a.out is read into, or (better) mmaped.
You can either mmap the a.out yourself, or find the existing mapping via e.g. getauxval(AT_PHDR) rounded down to page size.
The ehdr is (ElfW(Ehdr) *)data (that is, data cast to Elf32_Ehdr or Elf64_Ehdr as appropriate.
If this is not clear, then you probably should just use libelf, which takes care of the details for you.
Also, does ELF only allow me to find the name of the symbol, or can it actually give me the pointer to the in memory location of the symbol?
It can give you both: str + sym[i].st_name is the name, sym[i].st_value is the pointer (the value displayed by nm).
(presumably the e.g. 0000000000400716 is some relative base address, not the actual in memory location, right?)
No, actually (for this binary) it's the absolute address.
Position-independent binaries do use relative addresses (so you'll need something like getauxval mentioned above to find the base location of such executable), but this particular binary looks like ET_EXEC (use readelf -h a.out to verify this). The address 0x400000 is typical address for loading of non-PIE executables on Linux x86_64 (which is probably what your system is).

Related

Developing a simple bootloader for an Embedded system

I've been tasked in developing a simple bootloader for an embedded system. We are not running any OS or RTOS so I want it to be really simple.
This code will be stored in a ROM and the processor will begin execution at power on.
My goal is to have a first part written in ASM which would take care of the following operations:
Initialize the processor
Copy the .data segment from ROM to RAM
Clear the .bss segment in RAM
Call main
Main would be obviously written in C and perform higher level operations like self-test etc...
Now what I really don't know how to do is combine these two programs into a single one. I found a crappy tool that basically uses objcopy to gather the .text and .data sections from executables and appends some asm in front but this seem to be a really ugly way to do it and I was wondering if someone could point me in the right direction?
You can (in principle) link the object file generated from the assembler code like you would link any object from your program.
The catch is that you need to lay out the generated executable so that your startup code is in the beginning. If you use GNU ld, the way to do that is a linker script.
Primitive setup (not checked for syntax errors):
MEMORY
{
FLASH (RX) : ORIGIN = 0, LENGTH = 256K
RAM (RWX) : ORIGIN = 0x40000000, LENGTH = 4M
}
SECTIONS
{
.bootloader 0 : AT(0) { bootloader.o(.text) } >FLASH AT>FLASH
.text : { _stext = .; *(.text .text.* .rodata .rodata.*); _etext = . } >FLASH AT>FLASH
.data : { _sdata = .; *(.data .data.*); _edata = .; _sdata_load = LOADADDR(.data) } >RAM AT>FLASH
.bss (NOLOAD) { _sbss = .; *(.bss .bss.*); _ebss = . } >RAM
}
The basic idea is to give the linker a rough idea of the memory map, then assign sections from the input files to sections in the final program.
The linker keeps the distinction between the "virtual" and the "load" address for every output section, so you can tell it to generate a binary where the code is relocated for the final addresses, but the layout in the executable is different (here, I tell it to place the .data section in RAM, but append it to the .text section in flash).
Your bootloader can then use the symbols provided (_sdata, _edata, _sdata_load) to find the data section both in RAM and in flash, and copy it.
Final caveat: if your program uses static constructors, you also need a constructor table, and the bootloader needs to call the static constructors.
Simon is right on. There are simpler linker scripts than that that will work just fine for what you are doing but the bottom line is it is the linker that takes the objects and makes the binary, so depending on the linker you are using you have to understand the ways you can tell that linker to do stuff and then have it do it. Unfortunately I dont think there is an industry standard to this you have to go linker by linker and understand them. And certainly with gnu ld there are many very complicated linker scripts out there, some folks live to solve things in the linker.

clang compiler produces different object files from same sources

I have a simple hello world objective-c lib:
hello.m:
#import <Foundation/Foundation.h>
#import "hello.h"
void sayHello()
{
#ifdef FRENCH
NSString *helloWorld = #"Hello World!\n";
#else
NSString *helloWorld = #"Bonjour Monde!\n";
#endif
NSFileHandle *stdout = [NSFileHandle fileHandleWithStandardOutput];
NSData *strData = [helloWorld dataUsingEncoding: NSASCIIStringEncoding];
[stdout writeData: strData];
}
the hello.h file looks like this:
int main (int argc, const char * argv[]);
int sum(int a, int b);
void sayHello();
This compiles just fine on osx and linux using clang and gcc.
Now my question:
When running a clean compile against hello.m multiple times with clang on ubuntu the generated hello.o can differ. This seems not related to a timestamp, because even after a second or more, the generated .o file can have the same checksum. From my naive point of view, this seems like a complete random/unpredicatable behaviour.
I ran the compilation with the -Sto inspect the generated assembler code. The assembler code also differs (as expected). The diff file of comparing the assembler code can be found here: http://pastebin.com/uY1LERGX
From a first look it just looks like the sorting is different in the assembler code.
This does not happen when compiling it with gcc.
Is there a way to tell clang to generate exactly the same .o file like gcc does?
clang --version:
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
The feature when compiler always produce the same code is called Reproducible Builds or deterministic compilation.
One of possible sources of compiler's output instability is ASLR (Address space layout randomization). Sometimes compiler, or some libraries used by it, may read object address and use them, for example as keys of hashes or maps; or when sorting objects according to their addresses. When compiler is iterating over the hash, it will read objects in the order that depends on addresses of objects, and ASLR will place objects in different orders. The effect of such may looks like your reordered symbols (.quads in your diffs)
You can disable Linux ASLR globally with echo 0 | sudo tee /proc/sys/kernel/randomize_va_space. Local way of disabling ASLR in Linux is
setarch `uname -m` -R /bin/bash`
man page of setarch says: -R, "--addr-no-randomize" Disables randomization of the virtual address space (turns on ADDR_NO_RANDOMIZE).
For OS X 10.6 there is DYLD_NO_PIE environment variable (check man dyld, possible usage in bash export DYLD_NO_PIE=1); in 10.7 and newer there is --no_pie build flag to be used in building the LLVM itself or by setting _POSIX_SPAWN_DISABLE_ASLR which should be used in posix_spawnattr_setflags before starting the llvm; or by using in 10.7+ the script http://src.chromium.org/viewvc/chrome/trunk/src/build/mac/change_mach_o_flags.py with --no-pie option to clear PIE flag from llvm binaries (thanks to asan people).
There were some errors in clang and llvm which prevents/prevented them to be completely deterministic, for example:
[cfe-dev] clang: not deterministic anymore? - Nov 3 2009, indeterminism was detected on code from LLVM bug 5355. Author says that indeterminism was present only with -g option enabled
[LLVMdev] Deterministic code generation and llvm::Iterators (2010)
[llvm-commits] Fix some TableGen non-deterministic behavior. (Sep 2012)
r196520 - Fix non-deterministic behavior. - SLPVectorizer was fixed into deterministic only at Dec 5, 2013 (replaced SmallSet with VectorSet)
190793 - TableGen: give asm match classes deterministic order. "TableGen was sorting the entries in some of its internal data structures by pointer." - Sep 16, 2013
LLVM bug 14901 is the case when order of compiler warnings was Non-deterministic (Jan 2013).
The patch from 14901 contains comments about non-deterministic iterating over llvm::DenseMap:
- typedef llvm::DenseMap<const VarDecl *, std::pair<UsesVec*, bool> > UsesMap;
+ typedef std::pair<UsesVec*, bool> MappedType;
+ // Prefer using MapVector to DenseMap, so that iteration order will be
+ // the same as insertion order. This is needed to obtain a deterministic
+ // order of diagnostics when calling flushDiagnostics().
+ typedef llvm::MapVector<const VarDecl *, MappedType> UsesMap;
...
- // FIXME: This iteration order, and thus the resulting diagnostic order,
- // is nondeterministic.
Documentation of LLVM says that there are non-deterministic and deterministic variants of several internal containers, like Map vs MapVector: trunk/docs/ProgrammersManual.rst:
1164 The difference between SetVector and other sets is that the order of iteration
1165 is guaranteed to match the order of insertion into the SetVector. This property
1166 is really important for things like sets of pointers. Because pointer values
1167 are non-deterministic (e.g. vary across runs of the program on different
1168 machines), iterating over the pointers in the set will not be in a well-defined
1169 order.
1170
1171 The drawback of SetVector is that it requires twice as much space as a normal
1172 set and has the sum of constant factors from the set-like container and the
1173 sequential container that it uses. Use it **only** if you need to iterate over
1174 the elements in a deterministic order.
...
1277 StringMap iteratation order, however, is not guaranteed to be deterministic, so
1278 any uses which require that should instead use a std::map.
...
1364 ``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The
1365 main difference is that the iteration order is guaranteed to be the insertion
1366 order, making it an easy (but somewhat expensive) solution for non-deterministic
1367 iteration over maps of pointers.
It is possible that some authors of LLVM thought that in their code there was no need to save determinism in iteration order. For example, there are comments in ARMTargetStreamer about usage of MapVector for ConstantPools (ARMTargetStreamer.cpp - class AssemblerConstantPools). But how can we sure that all usages of non-deterministic containers like DenseMap will not affect output of compiler? There are tens loops iterating over DenseMap: "DenseMap.*const_iterator" regex in codesearch.debian.net
Your version of LLVM and clang (3.0, from 2011-11-30) is clearly too old to have all determinism enhances from 2012 and 2013 years (some are listed in my answer). You should update your LLVM and Clang, then recheck your program for deterministic compilation, then locate non-determinism in shorter and easier to reproduce examples (e.g. save bc - bitcode - from middle stages), then you can post a bug in LLVM bugzilla.
Try the -S option for clang and gcc during compiling your source. This will generate a .s file in which you can see the assembler code this could give you an idea what the differences on a lower level. Maybe you will realise the output will be the same and your problem shifts from the compiler further down to the linker.
You should report this as a bug; a compiler certainly should be deterministic.
Your guess about the sort order is quite probably correct, in my experience. Most likely the compiler makes an arbitrary decision when two items compare equal (according to whatever measure is significant; they don't have to be actually the same), and that can vary depending on environmental factors, somehow. I've seen this before, in GCC, in which the same compiler compiled for different host OS produced different results; in that case it turned out that the Windows qsort function operated slightly differently to the Linux (glibc) implementation.
That said, it could be something else; compilers aren't supposed to make random decisions, but there plenty of opportunities for arbitrary decisions that might turn out to be unstable, somehow (address space randomization, perhaps?)

Does anyone know how to triangulate location using Arduino and SIM900?

I have a SIM900 and Arduino Leonardo. using the SIM900.h library I have it all working and receiving text messages, etc however I'm wondering how I can use it to either grab all the local tower information or grab the same and triangulate the LAT, LONG, ETC from that information.
You can get information about the local tower (and for a few neighboring towers) with the AT+CENG=2 command. This include things like tower ID and signal level. You'll need to know the geographic location of these towers and do the triangulation yourself.
I suggest you take a look at this project: http://www.open-electronics.org/mini-gsm-localizer-without-gps/. It has an open-source firmware that you may find useful.
Here’s sequence of AT commands needed to get location of module:
AT+SAPBR=3,1,"CONTYPE","GPRS" // set bearer parameter
OK
AT+SAPBR=3,1,"APN","internet" // set apn
OK
AT+SAPBR=1,1 // activate bearer context
OK
AT+SAPBR=2,1 // get context ip address
+SAPBR: 1,1,"10.151.43.104"
OK
AT+CIPGSMLOC=1,1 // triangulate
+CIPGSMLOC: 0,19.667806,49.978185,2014/03/20,14:12:27
OK
Location is not acurrate though, first test got me coordinates located around 4 kilometers away from my place. Usually it’s not that bad, enough for simple applications.
you can use AT+COPS? command to reach location of tower. the 4-digit number expresses the location. for decode the number yıu should use LAC.
i.e +CGREG: 1, A9F0, 200D6E
(the second term A9F0 is the location number of tower)

Undefined reference on libdc1394

I'm using libdc1394-2.2 for camera Bumblebee2.
However, when I try to release bandwith with code below:
if (dc1394_iso_release_bandwidth(camera, val)==DC1394_SUCCESS)
printf( "Succesfully released %d bytes of bandwidth\n", val);
Throws the next error:
undefined reference to `dc1394_iso_release_bandwidth'
However, the function 'dc1394_iso_release_bandwidth', is included in 'iso.h' and this header is included in the main program.
Someone knows how solve the problem?
You're correct, that function is indeed listed in the dc1394-2 stream iso.h header file and with no complex conditional compilation which may cause it to not appear in your translation unit.
One thing that may be an issue is the rather common name iso.h - I'd modify your g++ compilation statement to include a -H flag, which should list the headers being loaded up. It's possible that the iso.h header file you're loading is not actually the dc1394 one.
A long shot, I know, but worth checking if only to discount the possibility.

How do I access an integer array within a struct/class from in-line assembly (blackfin dialect) using gcc?

Not very familiar with in-line assembly to begin with, and much less with that of the blackfin processor. I am in the process of migrating a legacy C application over to C++, and ran into a problem this morning regarding the following routine:
//
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
:: "a" ( buffer ), "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
I have a class that contains an array of shorts that is used for audio processing:
class AudProc
{
enum { buffer_size = 512 };
short M_samples[ buffer_size * 2 ];
// remaining part of class omitted for brevity
};
Within the AudProc class I have a method that calls clear_buffer, passing it the samples array:
clear_buffer ( M_samples, sizeof ( M_samples ) / 2 );
This generates a "Bus Error" and aborts the application.
I have tried making the array public, and that produces the same result. I have also tried making it static; that allows the call to go through without error, but no longer allows for multiple instances of my class as each needs its own buffer to work with. Now, my first thought is, it has something to do with where the buffer is in memory, or from where it is being accessed. Does something need to be changed in the in-line assembly to make this work, or in the way it is being called?
Thought that this was similar to what I was trying to accomplish, but it is using a different dialect of asm, and I can't figure out if it is the same problem I am experiencing or not:
GCC extended asm, struct element offset encoding
Anyone know why this is occurring and how to correct it?
Does anyone know where there is helpful documentation regarding the blackfin asm instruction set? I've tried looking on the ADSP site, but to no avail.
I would suspect that you could define your clear_buffer as
inline void clear_buffer (short * buffer, int len) {
memset (buffer, 0, sizeof(short)*len);
}
and probably GCC is able to optimize (when invoked with -O2 or -O3) that cleverly (because GCC knows about memset).
To understand assembly code, I suggest running gcc -S -O -fverbose-asm on some small C file, then to look inside the produced .s file.
I would have take a guess, because I don't know Blackfin assembler:
That LC0 sounds like "loop counter", LSETUP looks like a macro/insn, which, well, setups a loop between two labels and with a certain loop counter.
The "%0" operands is apparently the address to write to and we can safely guess it's incremented in the loop, in other words it's both an input and output operand and should be described as such.
Thus, I suggest describing it as in input-output operand, using "+" constraint modifier, as follows:
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
: "+a" ( buffer )
: "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
This is, of course, just a hypothesis, but you could disassemble the code and check if by any chance GCC allocated the same register for "%0" and "%2".
PS. Actually, only "+a" should be enough, early-clobber is irrelevant.
For anyone else who runs into a similar circumstance, the problem here was not with the in-line assembly, nor with the way it was being called: it was with the classes / structs in the program. The class that I believed to be the offender was not the problem - there was another class that held an instance of it, and due to other members of that outer class, the inner one was not aligned on a word boundary. This was causing the "Bus Error" that I was experiencing. I had not come across this before because the classes were not declared with __attribute__((packed)) in other code, but they are in my implementation.
Giving Type Attributes - Using the GNU Compiler Collection (GCC) a read was what actually sparked the answer for me. Two particular attributes that affect memory alignment (and, thus, in-line assembly such as I am using) are packed and aligned.
As taken from the aforementioned link:
aligned (alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:
struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));
force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.
Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire struct or union type.
As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:
struct S { short f[3]; } __attribute__ ((aligned));
Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables that have types that you have aligned this way.
In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two that is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.
Note that although you can ask the compiler to select a time-efficient alignment for a given type and then declare only individual stand-alone objects of that type, the compiler's ability to select a time-efficient alignment is primarily useful only when you plan to create arrays of variables having the relevant (efficiently aligned) type. If you declare or use arrays of variables of an efficiently-aligned type, then it is likely that your program also does pointer arithmetic (or subscripting, which amounts to the same thing) on pointers to the relevant type, and the code that the compiler generates for these pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types.
The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below.
Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.
.
packed
This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.
Specifying this attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members. Specifying the -fshort-enums flag on the line is equivalent to specifying the packed attribute on all enum definitions.
In the following example struct my_packed_struct's members are packed closely together, but the internal layout of its s member is not packed—to do that, struct my_unpacked_struct needs to be packed too.
struct my_unpacked_struct
{
char c;
int i;
};
struct __attribute__ ((__packed__)) my_packed_struct
{
char c;
int i;
struct my_unpacked_struct s;
};
You may only specify this attribute on the definition of an enum, struct or union, not on a typedef that does not also define the enumerated type, structure or union.
The problem which I was experiencing was specifically due to the use of packed. I attempted to simply add the aligned attribute to the structs and classes, but the error persisted. Only removing the packed attribute resolved the problem. For now, I am leaving the aligned attribute on them and testing to see if I find any improvements in the efficiency of the code as mentioned above, simply due to their being aligned on word boundaries. The application makes use of arrays of these structures, so perhaps there will be better performance, but only profiling the code will say for certain.