Do dynamic linked libraries(.dll,.so etc) have an entry point? - dll

Today I was in a discussion involving that libraries dont have an entry point.Generally the executable loads the libraries and the entry point is the main in the executable itself.
Are there execeptions wherein the libraries themselves can have an entry point ??
Update:
#sgr91 explained that DllMain is the entry point in Windows! What about linux ? Or is it just a feature of Windows ?

Yes, dynamic libraries do have entry-points.
It may be named differently (may or may not be exposed for usage), based on compiler and OS.
For Linux:
void __attribute__ ((constructor)) my_init(void);
void __attribute__ ((destructor)) my_fini(void);
The _init and _fini sections are now obsolete.
Read more

Related

How do I link multiple libraries in a Firebreath plugin?

Does anyone know where I can find a Firebreath sample (either Mac OS X or Windows) that illustrates how to create a plugin that includes 1 or more other libraries (.DLLs or .SOs) that each rely on other sub-projects built as static libraries (LIBs)?
For example, let's say that the Firebreath plugin is called PluginA, and that PluginA calls methods from DLL_B and DLL_C. DLL_B and DLL_C are C++ projects. DLL_B calls methods from another project called LIB_D, and DLL_C calls methods from a project called DLL_E.
Therefore, the final package should contain the following files:
PluginA.dll
DLL_B.dll (which also incorporates LIB_D)
DLL_C.dll
DLL_E.dll
I am currently forced to dump all source files in the pluginA solution, but this is just a bottleneck (for example I cannot call libraries written in other languages, such as Objective-C on Mac OS X).
I tried following the samples on Firebreath, but couldn't get them to work, and I found no samples from other users that claimed they were able to get it to work. I tried using CMAKE, and also running the solutions directly from X-Code, but the end result was the same (received linking errors, after deployment DLL_C couldn't find DLL_E etc.)
Any help would be appreciated - thank you,
Mihnea
You're way overthinking this.
On windows:
DLLs don't depend on a static library because if they did it would have been compiled in when they were built.
DLLs that depend on another DLL generally just need that other DLL to be present in the same location or otherwise in the DLL search path.
Those two things taken into consideration, all you need to do is locate the .lib file that either is the static library or goes with the .dll and add a target_link_library call for each one. There is a page on firebreath.org that explains how to do this.
On linux it's about the same but using the normal rules for finding .so files.

what does objdump mean by 'unsafe'?

objdump, run on a relatively modern 64-bit linux system, complains as follows about one of our shared libs:
use of unsafe function-scope static in ‘lib64/libwhatever.so’.
What does that mean?
The man page doesn't mention 'unsafe' or 'function-scope' anywhere I can see.
"function-scope" doesn't appear in the binutils source tree, from what I can see. So maybe this comes from a vendor patch; in which case you ought to ask your vendor.

What is the difference in byte code like Java bytecode and files and machine code executables like ELF?

What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.

static versus shared libraries in small embedded systems using C without OS (assuming XIP)

Does small embedded system without RTOS/OS uses dynamic/shared libraries. my understanding is that its very tough to use it and will be not productive.
If we are calling an API multiple times which is present in a static library. Does API code will be placed at every call location like macro expansion or code/text will be common for all calls. I think code/text will be common.
If I have made a static library for a .c files which has multiple API's and I am statically linking it with main file and in main file only one API has been called so my question is does whole library is included in final .bin or only particular API code.
from above questions you can assume that I am missing fundamentals itself so can anyone please provide the related links to brush up these.
Regards
[edit]
I have tried following things
addition.c module
`int addition(int a,int b)`
`{`
`int result;`
`result = a + b;`
`return result;`
`}`
`size addition.o`
23 0 0 23 17 addition.o
multiplication.c module
`int multiplication(int a, int b)`
`{`
`int result;`
`result = a * b;`
`return result;`
`}`
`size multiplication.o`
21 0 0 21 15 multiplication.o
created object file of both and put in archieve
ar cr libarith.a addition.o multiplication.o
then statically linked to my main application
example.c module
`#include "header.h"`
`#include <stdio.h>`
`1:int main()`
`2:{`
`3:int result;`
`4:result = addition(1,2);`
`5:printf("addition result is : %d\n",result);`
`6:result = multiplication(3,2);`
`7:printf("multiplication result is : %d\n",result);`
`8:return 0;`
`9:}`
gcc -static example.c -L. -larith -o example
size of example
511141 1928 7052 520121 7efb9 example
commented line number 6 of example.c
and again linked
gcc -static example.c -L. -larith -o example
size of example
511109 1928 7052 520089 7ef99 example
32 bytes of difference between above two
thats mean addition.o is not included in example
merged both modules addition.c and multiplication.c as addmult.c as below
int addition(int a,int b)
{
int result;
result = a + b;
return result;
}
int multiplication(int a, int b)
{
int result;
result = a * b;
return result;
}
created object file and put in archieve
before doing that i have deleted previous archieve
ar cr libarith.a addmult.o
now commented line number 6 of example.c
gcc -static example.c -L. -larith -o example
size example
511093 1928 7052 520073 7ef89 example
uncommented line nmber 6 of example.c
size example
511141 1928 7052 520121 7efb9 example
My question is in both cases if both functions are called final text size is same but if only one function is called then there is difference of 16
but multiplication.o size is 23 so definitly it has been not included but how we will justify 16.
If i am missing some fundamental itself ?
To dynamically load and link a library at runtime requires code to perform the load/link operation. That capability is normally part of an operating system. Moreover in a system without mass-storage of some kind, dynamic linking would not have any benefits since the dynamically linked code would have to exist in memory in any case so may as well have been statically linked.
To answer the second part of your question, a static library is simply a collection of object files in an archive. The linker will only extract and link the object code necessary to resolve symbols referenced in the executable as a whole. Some smart linkers can discard unused functions from within an object file, but you should not rely on that.
So by linking a static library you are not including all the unused code in the library. You can probably tell that by comparing the size of all your library files with the size of the executable binary - you will probably see that your executable is far smaller than the sum of the sizes of the libraries linked. Also your linker will have an option to create a map file which will tell you exactly what code has been included, and if it has a cross-reference output facility, what code references or is referenced by what.
If you are building your own static libraries, or even your own non-library code, it will pay to ensure good granularity at the object file level. For example if an object file contains two functions, one used and one unused, most linkers will have no choice but to include both, whereas if the functions are defined in separate compilation units (source files), then they will be in separate object files (even when collated into a library) and can be separately linked.
If you really have a embedded system without any operating system, then your hardware has essentially a fixed software, which you can change only by physical means (e.g. a soldering iron, or plugging something, etc...). In that case, that software runs on the "bare iron" and is doing somehow what an OS is providing (it is managing the physical resources and interacts directly with the I/O ports by appropriate machine instruction).
In particular, an embedded system without any OS cannot have any kind of dynamic libraries, because by definition these libraries need to be inside some files (on the embedded processor), and to have files you need an operating system.
The exact definition of what exactly is an operating system is debatable and fuzzy; I believe that providing a file system is one of the roles of most current OSes
Since shared libraries (or static libraries) are libraries sitting inside some files, you cannot have them without an OS. Something which provide files is by definition an operating system.
Perhaps you are using a cross-development chain to develop your embedded software. If you want to get something which runs on the bare metal, your chain has to ultimately give a single binary image which you can flash into a ROM, then solder or plug that ROM -or transfer somehow physically- in your embedded hardware (some tools enable you to flash an entire self contained processor).
I believe you might be confused, and you should read more about operating systems, kernels, the linux kernel, file systems, syscalls, RTOS, linkers & loaders, cross-compilers, microcontrollers, shared libraries, dynamic linkers ....
As Clifford suggested in comments, you could have an embedded system with some file system and some dynamic linker; in my view that would make an embryonic operating system, but it is a debatable matter of definition.
Notice that making a dynamic linker might not be an easy task (you'll need to do relocation); you could either make a generic ELF dynamic loader, or you could restrict the form of the dynamically loaded modules, and perhaps use your specific ld script to generate them.
You already have all the fundamentals you need. Without an operating system, mass storage (disc, filesystem, etc) and mulitple/many different programs that can take advantage of the shared library it doesnt make any sense. You dont save anything and it probably costs you a little more if you were to fake it enough to use a shared library in a fixed bare metal environment.
You mentioned having codesourcery, how do you learn these things? You disassemble your binaries and see what the compiler did. Does it link the entire gcc library because you used one divide? Does it link the entire C library because you used one function (does it even work to try to link a C library function, many have system calls to an operating system which you have to resolve). Start by using a simple divide in a very simple function (needs to be generic)
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a/b);
}
DO NOT call that function with fixed constants and do not call it from the same .c file, the best thing would be to simply add that function as is, and do nothing else with it just have it sit there. You may hit problems even trying to compile it, once you do, disassemble and see what the compiler did with it, see if the entire gcc library was added or just the code for that one function.
You cant trust any old web page or resource as it may not be the same tools you are using and may be out dated, the compiler you are using right now is the one that matters, right now, no other. And the answers are all right there in front of you.
No, they dont use dynamic libraries, the functions needed are linked in as needed. The optimizer may choose to inline some code, but in general the code for each function is in one place and each call to it is a call, it is not like a macro, in general. Again the optimizer may choose otherwise for performance reasons (small enough functions that dont consume too much memory and are small enough that the code required to make a function call is excessive compared to the function itself. Also that function needs to be in the same optimization space, for gcc this is the same .c file, for llvm this could be any code in the project.
I have some examples, cortex-m and others, bare metal. http://github.com/dwelch67 you may find some that may help answer your questions, examine for example that the compiler will implement a public function like the one above AND inline it when used. If you declare the function as static, then the optimizer, if it inlines, doesnt need to implement the function in the binary. if you make a call to a function like that in the same .c file, for example
c = fun(10,5);
there is a good chance that the optimizer if used, will replace that code with
c = 2;
and not perform the divide at all.

What is proper naming convention for MSVC dlls, static libraries and import libraries

What is standard or "most-popular" naming convention for MSVC library builds.
For example, for following platforms library foo has these conventions:
Linux/gcc:
shared: libfoo.so
import: ---
static: libfoo.a
Cygwin/gcc:
shared: cygfoo.dll
import: libfoo.dll.a
static: libfoo.a
Windows/MinGW:
shared: libfoo.dll
import: libfoo.dll.a
static: libfoo.a
What should be used for MSVC buidls? As far as I know, usually names are foo.dll and foo.lib, but how do you usually distinguish between import library and static one?
Note: I ask because CMake creates quite unpleasant collision between them naming both import and static library as foo.lib. See bug report. The answer would
help me to convince the developers to fix this bug.
You distinguish between a library and a .dll by the extension. But you distinguish between a import library and a static library by the filename, not the extension.
There will be no case where an import library exists for a set of code that was built to be a static library, or where a static library exists for a dll. These are two different things.
There is no single MSVC standard filename convention. As a rule, a library name that ends in "D" is often a debug build of library code, msvcrtd.dll vs msvcrt.dll but other than that, there are no standards.
As mentioned by others, there are no standards, but there are popular conventions. I'm unsure how to unambiguously judge what is the most popular convention. In addition the nomenclature for static vs. import libraries, which you asked about, there is also an analogous distinction between the naming of Release libraries vs. Debug libraries, especially on Windows.
Both cases (i.e. static vs. import, and debug vs. release) can be handled in one of two ways: different names, or different directory locations. I usually choose to use different names, because I feel it minimizes the chance of mistaking the library type later, especially after installation or other file moving activities.
I usually use foo.dll and foo.lib for the shared library on Windows, and foo_static.lib for the static library, when I wish to have both shared and static versions. I have seen others use this convention, so it might be the "most popular".
So I would recommend the following addition to your table:
Windows/MSVC:
shared: foo.dll
import: foo.lib
static: foo_static.lib
Then in cmake, you could either
add_library(foo_static STATIC foo.cpp)
or
add_library(FooStatic STATIC foo.cpp)
set_target_properties(FooStatic PROPERTIES OUTPUT_NAME "foo_static")
if for some reason you don't wish to use "foo_static" as the symbolic library name.
There is no standard naming convention for libraries. Traditional library names are prefixed with lib. Many linkers have options to prepend lib to a library name on the command line.
The static and dynamic libraries are usually identified by their file extension; although this is not required. So libmath.a would be a static library whereas libmath.so or libmath.dll would be a dynamic library.
A common naming convention is to append the category of the library to the name. For example, a debug static math library would be 'libmathd.a' or in Windows, 'lib_math_debug'. Some shops also add Unicode as a filename attribute.
If you want, you can append _msvc to the library name to indicate the library requires or was created by MSVC (to differentiate from GCC and other tools). A popular convention when working with multiple platforms, is to place the objects and libraries in platform specific folders. For example a ./linux/ folder would contain objects and libraries for Linux and similarly ./msw/ for Microsoft Windows platform.
This is a style issue. Style issues are often treated like religious issues: none of them are wrong, there is no universal style, and they are an individual preference. What ever system you choose, just be consistent.
As far as I know, there's no real 'standard', at least no standard most software would conform to.
My convention is to name my dynamic and static .lib equally, but place them in different directories if a project happens to support both static and dynamic linkage. For example:
foo-static
foo.lib
foo
foo.lib
foo.dll
The library to link against depends on the choice of the library directories, so it's almost totally decoupled from the rest of the build process (it won't appear in-source if you use MSVC's #pragma comment(lib,"foo.lib") facility, and it doesn't appear in the list of import libraries for the linker).
I've seen this quite a few times. Also, I think that MSVC/Windows based projects tend to stick more often with a single, official linkage type - either static, or dynamic. But that's just my personal observation.
In short:
Windows/MSVC
shared: foo.dll
import: foo.lib
static: foo.lib
You should be able to use this directory-based pattern with CMAKE (never used it). Also, I don't think it's a 'bug'. It's merely lack of standardization. CMAKE does (imho) the right thing not to establish a pseudo-standard if everyone likes it differently.
As the others have said, there is no single standard to file naming on windows.
For our complete product base which covers 100's of exes, dlls, and static libs we have used the following successfully for many years now and it has saved a lot of confusion. Its basically a mixing of several methods I've seen used throughout the years.
In a nutshell all our files of both a prefix and suffix (not including the extension itself). They all start with "om" (based on our company name), and then have a 1 or 2 character combination that roughly identifies the area of code.
The suffix explains what type of built-file they are and includes up to three letters used in combination depending on the build which includes Unicode, Static, Debug (Dll builds are the default and have no explicit suffix identifier). When we started this system Unicode was not so prevalent and we had to support both Unicode and Non-unicode builds (pre Windows 2000 os), now everything is exclusively built unicode but we still use the same nomenclature.
So a typical .lib "set" of files might look like
omfThreadud.lib (Unicode/Debug/Dll)
omfThreadusd.lib (Unicode/Static/Debug)
omfThreadu.lib (Unicode/Release/Dll)
omfThreadus.lib (Unicode/static)
All files are built-in into a common bin folder, which eliminates a lot of dll-hell issues for developers and also makes it simpler to adjust compiler/linker settings - they all point to the same location using relative paths and there is never any need for manual (or automatic) copying of the libraries a project needs. Having these suffixes also eliminates any confusion as to what type of file you may have, and guarantees you can't have a mixed scenario where you put down the debug dll on a release kit or vice-versa. All exes also use a similar suffix (Unicode/Debug) and build into the same bin folder.
There is likewise one single "include" folder, each library has one header file in the include folder that matches the name of the library/dll (for example omfthread.h) That file itself #includes all the other items that are exposed by that library. This keeps its simpler if you want functionality that is in foo.dll you just #include "foo.h"; our libraries are highly segmented by areas of functionality - effectively we don't have any "swiss-army knife" dlls so including the libraries entire functionality makes sense. (Each of these headers also include other prerequisite headers whether they be our internal libraries or other vendor SDKs)
Each of these include files internally uses macros that use #pramga's to add the appropriate library name to the linker line so individual projects don't need to be concerned with that. Most of of our libraries can be built statically or as a DLL and #define OM_LINK_STATIC (if defined) is used to determine which the individual project wants (we usually use the DLLs but in some cases static libraries built-in into the .exe make more sense for deployment or other reasons)
#if defined(OM_LINK_STATIC)
#pragma comment (lib, OMLIBNAMESTATIC("OMFTHREAD"))
#else
#pragma comment (lib, OMLIBNAME("OMFTHREAD"))
#endif
These macros (OMLIBNAMESTATIC & OMLIBNAME) use _DEBUG determine what type of build it is and generate the proper library name to add to the linker line.
We use a common define in the static & dll versions of a library to control proper exporting of the class/functions in dll builds. Each class or function exported from the library is decorated with this macro (the name of which matches the base name for the library, though that is largely unimportant)
class OMUTHREAD_DECLARE CThread : public CThreadBase
In the DLL version of the project settings we define OMFTHREAD_DECLARE=__declspec(dllexport), in the static library version of the library we define OMFTHREAD_DECLARE as empty.
In the libraries header file we define it based on how the client is trying to link to it
#if defined(OM_LINK_STATIC)
#define OMFTHREAD_DECLARE
#else
#define OMFTHREAD_DECLARE __declspec(dllimport)
#endif
A typical project that wants to use one of our internal libraries would just add the appropriate include to their stdafx.h (typically) and it just works, if they need to link against the static version they just add OM_LINK_STATIC to their compiler settings (or define it in the stdafx.h) and it again it just works.
As far as I know there still aren't any conventions with regards to this. Here's an example of how I do it:
{Project}{SubModule}{Platform}{Architecture}{CompilerRuntime}_{BuildType}.lib/dll
The full filename shall be lowercase only and shall only contain alphanumerics with predesignated underscores. The submodule field, including its leading underscore, is optional.
Project: holds project name/identifier. Preferably as short as possible. ie "dna"
SubModule: optional. holds module name. Preferably as short as possible. ie "dna_audio"
Platform: identifies the platform the binary is compiled for. ie "win32" (Windows), "winrt", "xbox", "android".
Architecture: describes the architecture the binary is compiled for. ie "x86", "x64", "arm". There where architecture names are equal for various bitnesses use its name followed by the bitness. ie. "name16", "name32", "name64"
CompilerRuntime: optional. Not all binaries link to a compiler runtime, but if they do, it's included here. ie "vc90" (Visual Studio 2008), "gcc". Where applicable apartment can be included ie "vc90mt"
BuildType: optional. This can hold letters (in any order desired), each which tell something about the build-specifics. d=debug (omitted if release) t=static (omitted if dynamic) a=ansi (omitted if unicode)
Examples (assuming a project named "DNA"):
dna_win32_x86_vc90.lib/dll
dna_win32_x64_vc90_d.lib/dll
dna_win32_x86_vc90_sd.lib
dna_audio_win32_x64_vc90.lib/dll
dna_audio_winrt_x64_vc110.lib/dll