Requirement of symbol table in the final executable? - symbol-table

I read that the windows portable executable format contains a symbol table. I understand why a symbol table would be required during the semantic analysis phase of the compilation and also during code generation. But I don't understand why the final executable itself should contain a symbol table since the addresses are mapped into the assembly code by this stage. What am I missing??

I can't really speak specifically for PE, but I'd imagine it's similar to the situation for ELF, where there are two different symbol tables to speak of:
The "ordinary" symbol table (the one one would normally refer to as "the symbol table"), is optional in the final executable. If it's present, it's used by debuggers and other programs that inspect a program with symbolic information. It is normally generated by the linker, but can, and often is, stripped away afterwards to reduce the file size.
The dynamic symbol table is used for linking against DSOs at runtime, and as such needs to be present for executables that use dynamic linking. It only lists the external symbols that the executable needs (or wants to publicize, which is also possible), however; not every symbol that was present inside it during linking.

Related

Nested library namespace in CMake

When I see CMake libraries with namespaces they are always in the form
Parent::Component.
If I have a sufficiently large library, there may be subsections of that library that have components. I am wondering if it is possible/appropriate to do something like ParentProject::Subgouping::SpecificComponent or for a more real world example Raytracing::Math::Utils.
In short, can I use multiple namespaces in a CMake library name? If it is possible, is it a good idea?
In short, can I use multiple namespaces in a CMake library name?
Yes. A colon (:) is just like any other character in a CMake target name. However, the target_link_libraries command will interpret any argument containing :: in its name as a proper CMake target, rather than as a potential system library. So if you mis-type a target name or it otherwise doesn't exist, you'll get a useful error at configure time, rather than a broken build.
Having multiple instances of :: in the name behaves the same as having just one.
If it is possible, is it a good idea?
It's about as good an idea as nested namespaces are in C++. If it makes sense, do it. The only minor difference is that CMake has no using namespace equivalent, so they're slightly less convenient to type.
In several of my projects, I use a namespace like Project::Tools:: to hold any build-time tools (like custom code generators) that need to be built separately for the sake of cross-compilation (when CMAKE_CROSSCOMPILING_EMULATOR is not an option).

How to reuse Fortran modules without copying source or creating libraries

I'm having trouble understanding if/how to share code among several Fortran projects without building libraries or duplicating source code.
I am using Eclipse/Photran with the Intel compiler (ifort) on a linux system, but I believe I'm having a bigger conceptual problem with modules than with the specific tools.
Here's a simple example: In ~/workspace/cow I have a source directory (src) containing cow.f90 (the PROGRAM) and two modules m_graze and m_moo in m_graze.f90 and m_moo.f90, respectively. This project builds and links properly to create the executable 'cow'. The executable and modules (m_graze.mod and m_moo.mod) are stored in ~/workspace/cow/Debug and object files are stored under ~/workspace/cow/Debug/src
Later, I create ~/workplace/sheep and have src/sheep.f90 as the program and src/m_baa.f90 as the module m_baa. I want to 'use m_graze, only: ruminate' in sheep.f90 to get access to the ruminate() subroutine. I could just copy m_graze.f90 but that could lead to code getting out of sync and doesn't take into account any dependencies m_graze might have. For these reasons, I'd rather leave m_graze in the cow project and compile and link sheep.f90 against it.
If I try to compile the sheep project, I'll get an error like:
error #7002: Error in opening the compiled module file. Check INCLUDE paths. [M_GRAZE]
Under Properties:Project References for sheep, I can select the cow project. Under Properties:Fortran Build:Settings:Intel Compiler:Preprocessor I can add ~/workspace/cow/Debug (location of the module files) to the list of include directories so the compiler now finds the cow modules and compiles sheep.f90. However the linker dies with something like:
Building target: sheep
Invoking: Intel(R) Fortran Linker
ifort -L/home/me/workspace/cow/Debug -o "sheep" ./src/sheep.o
./src/sheep.o: In function `sheep':
/home/me/workspace/sheep/src/sheep.f90:11: undefined reference to `m_graze_mp_ruminate_'
This would normally be solved by adding libraries and library paths to the linker settings except there are no appropriate libraries to link to (this is Fortran, not C.)
The cow project was perfectly capable of compiling and linking together cow.f90, m_graze.f90 and m_moo.f90 into an executable. Yet while the sheep project can compile sheep.f90 and m_baa.f90 and can find the module m_graze.mod, it can't seem to find the symbols for m_graze even though all the requisite information is present on the system for it to do so.
It would seem to be an easy matter of configuration to get the linker portion of ifort to find the missing pieces and put them together but I have no idea what magic words need to be entered where in the Photran UI to make this happen.
I confess an utter lack of interest and competence in C and the C build process and I'd rather avoid the diversion of creating libraries (.a or .so) unless that's the only way to make this work.
Ultimately, I'm looking for a pure Fortran solution to this problem so I can keep a single copy of the source code and don't have to manually maintain a pile of custom Makefiles.
So can this be done?
Apologies if this has already been documented somewhere; Google is only showing me simple build examples, how to create modules, and how to link with existing libraries. There don't seem to be (m)any examples of code reuse with modules that don't involve duplicating source code.
Edit
As respondents have pointed out, the .mod files are necessary but not sufficient; either object code (in the form of m_graze.o) or static or shared libraries must be specified during the linking phase. The .mod files describe the interface to the object code/library but both are necessary to build the final executable.
For an oversimplified toy problem such as this, that's sufficient to answer the question as posed.
In a larger project with more complex dependencies (in my case, 80+KLOC of F90 linking to the MKL version of LAPACK95), the IDE or toolchain may lack sufficient automatic or user-interface facilities to make sharing a single canonical set of source files a viable strategy. The choice seems to be between risking duplicate source files getting out of sync, giving up many of the benefits of an IDE (i.e. avoiding manual creation of make/CMake/SCons files), or, in all likelihood, both. While a revision control system and good code organization can help, it's clear that sharing a single canonical set of source files among projects is far from easy given the current state of Eclipse.
Some background which I suspect you already know: Typically (including ifort) compiling the source code for a Fortran module results in two outputs - a "mod" file that contains a description of the Fortran entities that the module defines that the compiler needs to find whenever it sees a USE statement for the module, and object code for the linker that implements the procedures and variable storage, etc., that the module defines.
Your first error (the one you solved) is because the compiler couldn't find the mod file.
The second error is because the linker hasn't been told about the object code that implements the stuff that was in the source file with the module. I'm not an Eclipse user by any means, but a brute force way of specifying that is just to add the object file (xxxxx/Debug/m_graze.o) as an additional linker option (Fortran Build > Settings, under Intel Fortran Linker > Command Line). (Other tool chains have explicit "additional object file" properties for their link stage - there may well be a better way of doing this for the Intel chain.)
For more involved examples you would typically create a library out of the shared code. That's not really C specific, the only Fortran aspect is that the libraries archive of object code needs to be provided alongside the mod files that the Fortran compiler generates.
Yes the object code must be provided. E.g., when you install libnetcdf-dev in Debian (apt-get install libnetcdf-dev), there is a /usr/include/netcdf.mod file that is included.
You can now use all netcdf routines in your Fortran code. E.g.,
program main
use netcdf
...
end
but you'll have link to the netcdf shared (or static) library, i.e.,
gfortran -I/usr/include/ main.f90 -lnetcdff
However, as user MSB mentioned the mod file can only be used by gfortran that comes with the distribution (apt-get install gfortran). If you want to use any other compiler (even a different version that you may have installed yourself) then you'll have to build netcdf yourself using that particular compiler.
So creating a library is not a bad solution.

static versus shared libraries in small embedded systems using C without OS (assuming XIP)

Does small embedded system without RTOS/OS uses dynamic/shared libraries. my understanding is that its very tough to use it and will be not productive.
If we are calling an API multiple times which is present in a static library. Does API code will be placed at every call location like macro expansion or code/text will be common for all calls. I think code/text will be common.
If I have made a static library for a .c files which has multiple API's and I am statically linking it with main file and in main file only one API has been called so my question is does whole library is included in final .bin or only particular API code.
from above questions you can assume that I am missing fundamentals itself so can anyone please provide the related links to brush up these.
Regards
[edit]
I have tried following things
addition.c module
`int addition(int a,int b)`
`{`
`int result;`
`result = a + b;`
`return result;`
`}`
`size addition.o`
23 0 0 23 17 addition.o
multiplication.c module
`int multiplication(int a, int b)`
`{`
`int result;`
`result = a * b;`
`return result;`
`}`
`size multiplication.o`
21 0 0 21 15 multiplication.o
created object file of both and put in archieve
ar cr libarith.a addition.o multiplication.o
then statically linked to my main application
example.c module
`#include "header.h"`
`#include <stdio.h>`
`1:int main()`
`2:{`
`3:int result;`
`4:result = addition(1,2);`
`5:printf("addition result is : %d\n",result);`
`6:result = multiplication(3,2);`
`7:printf("multiplication result is : %d\n",result);`
`8:return 0;`
`9:}`
gcc -static example.c -L. -larith -o example
size of example
511141 1928 7052 520121 7efb9 example
commented line number 6 of example.c
and again linked
gcc -static example.c -L. -larith -o example
size of example
511109 1928 7052 520089 7ef99 example
32 bytes of difference between above two
thats mean addition.o is not included in example
merged both modules addition.c and multiplication.c as addmult.c as below
int addition(int a,int b)
{
int result;
result = a + b;
return result;
}
int multiplication(int a, int b)
{
int result;
result = a * b;
return result;
}
created object file and put in archieve
before doing that i have deleted previous archieve
ar cr libarith.a addmult.o
now commented line number 6 of example.c
gcc -static example.c -L. -larith -o example
size example
511093 1928 7052 520073 7ef89 example
uncommented line nmber 6 of example.c
size example
511141 1928 7052 520121 7efb9 example
My question is in both cases if both functions are called final text size is same but if only one function is called then there is difference of 16
but multiplication.o size is 23 so definitly it has been not included but how we will justify 16.
If i am missing some fundamental itself ?
To dynamically load and link a library at runtime requires code to perform the load/link operation. That capability is normally part of an operating system. Moreover in a system without mass-storage of some kind, dynamic linking would not have any benefits since the dynamically linked code would have to exist in memory in any case so may as well have been statically linked.
To answer the second part of your question, a static library is simply a collection of object files in an archive. The linker will only extract and link the object code necessary to resolve symbols referenced in the executable as a whole. Some smart linkers can discard unused functions from within an object file, but you should not rely on that.
So by linking a static library you are not including all the unused code in the library. You can probably tell that by comparing the size of all your library files with the size of the executable binary - you will probably see that your executable is far smaller than the sum of the sizes of the libraries linked. Also your linker will have an option to create a map file which will tell you exactly what code has been included, and if it has a cross-reference output facility, what code references or is referenced by what.
If you are building your own static libraries, or even your own non-library code, it will pay to ensure good granularity at the object file level. For example if an object file contains two functions, one used and one unused, most linkers will have no choice but to include both, whereas if the functions are defined in separate compilation units (source files), then they will be in separate object files (even when collated into a library) and can be separately linked.
If you really have a embedded system without any operating system, then your hardware has essentially a fixed software, which you can change only by physical means (e.g. a soldering iron, or plugging something, etc...). In that case, that software runs on the "bare iron" and is doing somehow what an OS is providing (it is managing the physical resources and interacts directly with the I/O ports by appropriate machine instruction).
In particular, an embedded system without any OS cannot have any kind of dynamic libraries, because by definition these libraries need to be inside some files (on the embedded processor), and to have files you need an operating system.
The exact definition of what exactly is an operating system is debatable and fuzzy; I believe that providing a file system is one of the roles of most current OSes
Since shared libraries (or static libraries) are libraries sitting inside some files, you cannot have them without an OS. Something which provide files is by definition an operating system.
Perhaps you are using a cross-development chain to develop your embedded software. If you want to get something which runs on the bare metal, your chain has to ultimately give a single binary image which you can flash into a ROM, then solder or plug that ROM -or transfer somehow physically- in your embedded hardware (some tools enable you to flash an entire self contained processor).
I believe you might be confused, and you should read more about operating systems, kernels, the linux kernel, file systems, syscalls, RTOS, linkers & loaders, cross-compilers, microcontrollers, shared libraries, dynamic linkers ....
As Clifford suggested in comments, you could have an embedded system with some file system and some dynamic linker; in my view that would make an embryonic operating system, but it is a debatable matter of definition.
Notice that making a dynamic linker might not be an easy task (you'll need to do relocation); you could either make a generic ELF dynamic loader, or you could restrict the form of the dynamically loaded modules, and perhaps use your specific ld script to generate them.
You already have all the fundamentals you need. Without an operating system, mass storage (disc, filesystem, etc) and mulitple/many different programs that can take advantage of the shared library it doesnt make any sense. You dont save anything and it probably costs you a little more if you were to fake it enough to use a shared library in a fixed bare metal environment.
You mentioned having codesourcery, how do you learn these things? You disassemble your binaries and see what the compiler did. Does it link the entire gcc library because you used one divide? Does it link the entire C library because you used one function (does it even work to try to link a C library function, many have system calls to an operating system which you have to resolve). Start by using a simple divide in a very simple function (needs to be generic)
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a/b);
}
DO NOT call that function with fixed constants and do not call it from the same .c file, the best thing would be to simply add that function as is, and do nothing else with it just have it sit there. You may hit problems even trying to compile it, once you do, disassemble and see what the compiler did with it, see if the entire gcc library was added or just the code for that one function.
You cant trust any old web page or resource as it may not be the same tools you are using and may be out dated, the compiler you are using right now is the one that matters, right now, no other. And the answers are all right there in front of you.
No, they dont use dynamic libraries, the functions needed are linked in as needed. The optimizer may choose to inline some code, but in general the code for each function is in one place and each call to it is a call, it is not like a macro, in general. Again the optimizer may choose otherwise for performance reasons (small enough functions that dont consume too much memory and are small enough that the code required to make a function call is excessive compared to the function itself. Also that function needs to be in the same optimization space, for gcc this is the same .c file, for llvm this could be any code in the project.
I have some examples, cortex-m and others, bare metal. http://github.com/dwelch67 you may find some that may help answer your questions, examine for example that the compiler will implement a public function like the one above AND inline it when used. If you declare the function as static, then the optimizer, if it inlines, doesnt need to implement the function in the binary. if you make a call to a function like that in the same .c file, for example
c = fun(10,5);
there is a good chance that the optimizer if used, will replace that code with
c = 2;
and not perform the divide at all.

Library symbols and user symbols in ELF

My questions is related to symbols in an ELF. As we know an ELF's Symbol table holds information needed to locate and relocate a program’s symbolic definitions and references.
My question is that can we differentiate b/w a library symbol and user defined symbol (if both are global)? consider the scenario in which no source code is available and you have only ELF.
A static library is just an archive of unlinked object files (.o) (with index to speed up linker searching for symbols in it). When you link against such library, the linker takes each unresolved symbol and tries to find it there. If it finds it, it extracts corresponding object and adds it to the collection to link. So no, you can't tell whether symbol comes from static library.
If you have another instance of the library that is sufficiently close to what the executable was linked against, you could look which symbols it defines and than assume that all those symbols, plus any symbols those depend on, come from the library.
It is of course possible to tell symbols defined in shared library, because that remains different file.
But there is another point: It is most likely illegal to provide a Linux binary without sources statically linked against libc. That is, it is definitely illegal if that libc is the GNU Libc, because that is distributed under the terms of LGPL and LGPL requires providing (on request) sources of all derived code excepting code that is linked to it dynamically. If it uses different libc like sourceware newlib or bionic libc (Android) (I can't find any other). I am not however sure how well such code would work in a GNU libc-based system.

How can I isolate third-parties' C/C++/ObjC libraries' symbols from each other?

I have a project that needs to incorporate two third-party libraries, libA and libB. I have little, if any, influence over the third-party libraries. The problem being is that both libA and libB include different versions of a common library, ASIHTTPRequest. As a result, I'm getting errors like:
-[ASIFormDataRequest setNumberOfTimesToRetryOnTimeout:]: unrecognized selector sent to instance 0x3b4170
, which I can only assume are because libA is referring to libB's implementation of ASIHTTPRequest (or the other way around).
I've tried playing around with strip -s <symbol file> -u <library> to isolate the libraries' symbols from each other, but that results in XCode's linker spitting out thousands of warnings and doesn't actually fix the main problem outlined above.
ld: warning: can't add line info to anonymous symbol anon-func-0x0 from ...
In general, how can/should one isolate libraries from each other?
There is absolutely no way to do so. One Objective-C application can have only one meaning for one symbol at a time. If you load two different versions of one library the last one will overwrite the first one.
Two workarounds:
convince the developer to use a recent version
run both libraries in separate processes
If they used the same linker symbol name for different routines, the only way out (short of hacking their object files), is to link them into different executables somehow.
On platforms that support dynamic linking (eg: DLLs) you could build one or both into a separate DLL. If they aren't part of the exported interface the symbols shouldn't clash then.
Otherwise you would be stuck putting them into entirely separate processes and using IPC to pass data between them.