I am under the impression that DLLs are mapped into the address space of the calling process. If that is the case, how DLLs can be shared among other processes?
Regards,
Arun
There are two ways:
1) Common library code is put in DLL files (it stands for Dynamic-Link Library), so that common portions of code uses less space on HDD (and are easier to update).
2) In-memory sharing (copy-on-write scheme, when pages that are common to several processes are mapped to the same physical page).
Related
Excerpt From Micrsoft's "What is a .dll?":
"By using a DLL, a program can be modularized into separate
components. For example, an accounting program may be sold by module.
Each module can be loaded into the main program at run time if that
module is installed. Because the modules are separate, the load time
of the program is faster, and a module is only loaded when that
functionality is requested. Additionally, updates are easier to apply
to each module without affecting other parts of the program. For
example, you may have a payroll program, and the tax rates change each
year. When these changes are isolated to a DLL, you can apply an
update without needing to build or install the whole program again."
Ref:http://support.microsoft.com/kb/815065
DLL's are:
loaded at runtime
can "dynamically loaded" (by multiple programs at the same time)
- which allows saving of resources
- lowers disk space requirements
But why do they promote "modulizing" programs?What would happen if there weren't .dll files?Could someone provide/expand on the example
Modular programs provide a way of making a particular functionality available to many programs without having to include the same code in all of them. Also, they allow greater compatibility between programs since they would essentially use the same methods in common DLLs to obtain the same results.
One would write a program in a modular fashion such that different parts of the program could be maintained separately. Say you had some clever way of reading and writing your own data format to files. Say you make improvements to that technique. If the code for reading and writing the files lived in a DLL, you would only need to update the DLL. The program itself would remain unchanged.
If you have one monolithic EXE, you have to
pay for all the extra time relinking it, even if 1 source file changed (this is painful if it's > 80 MB, as is the case in large projects),
ship the entire EXE, when you could only ship a single DLL which is a fraction of the size (for patches/updates).
Breaking it up into DLLs you
have pluggability: The EXE is the host application and others can write DLLs that "plug into" the host via a well-defined interface. DLLs can be interchanged as long as they conform to the interface.
can share code across other DLLs and EXEs.
can have some DLLs be optionally loaded on demand, only if they're used, and unloaded when they're not needed
similar to above, have optional functionality. With a single EXE you have to download everything, even if some components are rarely used. With DLLs, you could have a system that downloads and installs features as needed.
The biggest advantage of dlls is probably during development of the original program. Without dlls you wouldn't be able to integrate with existing libraries without including the original source code. By including an existing library as a dll you don't need the source since it's all encapsulated in the dll. It would be a nightmare to develop in frameworks like .Net without dlls since you constantly include other libraries...
The alternative to breaking your program down in n > 1 pieces is to keep it in n == 1 piece. Why is this bad? Well it isn't always bad (maybe the BIOS is a good example?). But for user programs it usually is. Why? First we need to define what a program is.
What is a program?
A simple "program", roughly speaking, consists of an entry point (i.e. offset to the main function), functions and global variables. A function consists of instructions and information about what local variables are needed to run the function. To be executed a program must be loaded in primary memory/RAM (the aforementioned information). Because our program has functions (and not just jump statements), that implies the existence of a stack, which implies the existence of a containing environment managing the stack. (I suppose you could have a program that manages its own stack but I'd argue then your program is not a program anymore but an environment.) This environment contains the program, starts in the entry point and executes each instruction, be it "go to this part of the RAM and add it to whatever is in this register" or "If this register is all 0 then jump ahead this many instructions and resume execution there" indefinitely or until the program gives control back to its environment. (This is somewhat simplified - context switches in multi-process environments, illegal memory access, illegal instructions, etc. can also cause control to be taken from the program.)
Anyway, so we have two options: either load the entire program at once or have it stored and loaded in pieces.
n == 1
There are some advantages to doing it all at once:
Once the program is in memory no disk access is required to execute further (unless the program explicitly asks there to be).
Since the program is compiled/linked before execution begins you can do everything without any sort of string names/comparisons - go directly to the address (or an offset).
Functions are never out of sync with one another.
n > 1
There are some disadvantages, though, which mirror the advantages:
Most programs don't execute all code paths most of the time. I think there's some studies that in most programs most of the time spent executing is spent in a fraction of the instructions present in the program. In other words something like 20% of the program is executed 80% of the time (I just made that particular figure up - but you get the idea). If we divide our program up enough and only load instruction sets (i.e. functions) as they are needed then we won't waste time loading the 80% we'll never use this execution of the program. Along these lines we can ultimately fit more concurrently executing programs in our RAM at once if we only end up loading the fraction of the program we need.
Most programs share similar functions (i.e. storing data/trees/hashes/sorting/etc., reading input, writing output, etc.) and if each program has its own local copy then you can't reuse instruction code.
Many programs depend on the existence of others and are maintained by separate companies/groups/individuals. By releasing versioned modules we don't have to synchronize releases all the time.
Conclusion
These aren't the only points to consider but the first ones that came to my mind. I'd recommend reading about compilers, linkers and operating systems. That will answer this question more thoroughly than I and other questions I'm sure this has brought up. To recap dll's aren't the "best" way of packaging executable programs in all situations and circumstances - they have a particular use and advantages and disadvantages.
i have a doubt in dlls loading &processing in memory ,normally dlls are shared library so dll should loads once is enough.if a process loads a dll (ex.advapi32.dll )into memory means ,after that another process how refers advapi32.dll to that process ...how can share common location for each process...
I'm not entirely sure what your question is, but yes, if multiple processes import the same DLL, then the read-only sections of that DLL are typically mapped into all of those processes. On the other hand, section that can change, like the BSS (variable) segment, get a copy in each process so that the changes that one process makes are invisible to other processes. If you want certain changes to be shared between processes for your own DLL, you can mark a data section in the DLL as shared. Exactly how you do this depends on the development tools you're using.
I know DLL contains one or more exported functions that are compiled, linked, and stored separately..
My question is about not about how to create it.. but it is all about in what form it is stored.. Is it going to be in the form of 0's & 1's.. or in assembly commands ADD, MUL, DIV, MOV, CALL, RETURN etc..
Also what makes it to be processor dependent.. (like x86, x87, IBM 700 instruction set)..
Can someone please explain it little briefly..!
First of all, everything in a computer is in the form of "0's & 1's" . The fact that the computer can display some of these as text, pictures, sounds, 3D models, etc. is just a matter of how you interpret them. But down there, at the metal, it's all just "0's & 1's" (also known as bits). Note though that they are always grouped together in groups of 8, and these are called "bytes". It's really for the sake of efficiency, because operating with every bit individually would be too tedious. Actually, todays computers don't even operate on single bytes anymore (or rather - they do it very rarely). Mostly you operate with 4 or 8 bytes at a time, depending on whether you have a 32-bit or 64-bit CPU (that's in layman's terms, it's actually a bit more complicated than that).
As for a .DLL file - like an .EXE file, it contains bytes that describe instructions that a CPU can execute. The CPU takes these bytes directly from the .DLL/.EXE and executes them without any further modifications. That's why these files are CPU-specific. In different CPU architectures the same combination of bytes means different things, so a .DLL/.EXE will run correctly only on the CPU for which it was designed. On other CPUs these bytes will mean some other instructions, and when run, the program will most likely do some utter nonsense and crash immediately.
The assembly commands you mentioned also deserve an explanation. "Assembler" is not a language that a CPU can understand. It's a language a human can understand. It was created because writing directly in machine code (the bytes that the CPU actually understands) is very difficult. What you get is utter gibberish on the screen (try opening some .EXE file in Notepad!) but every bit has to be precisely set for it to work.
So assembly language is basically the same thing, except these instructions are written in text that humans can read. For every machine code that a CPU can understand, there is am instruction with a human-friendly name. An assembly compiler simply reads these instructions and replaces them with the bytes that represent the actual instructions for the CPU to execute. It's a 1:1 operation. Every command in assembly language matches a single machine instruction (again, in layman's terms).
So you see, there isn't even a single assembly language. Every CPU architecture has its own assembly language, because they each have different instructions.
Note though that all this applies to native .DLL/.EXE files. .NET files are different - they don't contain machine code, but rather instructions for an abstract, nonexistent CPU. It's like Java bytecodes. When a .NET .DLL/.EXE is run, the .NET runtime translates it from the abstract instructions to the instructions that the specific CPU can understand. They use a lot of tricks to make this very fast, so these files run almost as fast as simple .DLL/.EXE files.
Does this clear things up? :)
Native DLLs (not .NET assemblies) usually contain machine code that can only be run on a certain platform. The machine code is a sequence of bytes that the processor treats as instructions (ADD, MOV, etc.).
In Windows, dll's are stored in the PE format which is basically a collection of sections that holds the information about how to map it into memory. some sections contains the program's code (which is of course processor dependent), others contains the program's data, other the exported and imported functions and so on.
Managed code is compiled to some intermediate language that is JITed by the run-time as it is executed. therefore, your dll won't contain any processor dependent code and you'll be able to execute your program on any platform with the relevant run-time.
it depends on your DLL. generally, a DLL contains executable code as an EXE file. those code DLLs are processor dependent since the code can only be executed on a specific platform. the code is stored using the same "format" as an EXE file (binary machine code).
however, a DLL can sometimes contains only data: they are then called "resource DLL" and are not processor dependent at all. they act as a container for data files used by applications.
note that many DLLs are hybrids: they contain both code and resources. for example, most DLLs which comprises the user part of the Windows operating system are hybrid: you can open them using Visual Studio or a Resource Explorer to see the resources (the data segments) they contain, or open them with Dependency Walker or dumpbin to see the functions (the code segments) they contain.
(of course this answer is really Windows specific, i don't know for .so files which are the linux equivalent of a DLL)
Both a DLL and an EXE contain executable code.
In the case of a DLL it doesn't have the necessary parts to be directly executable. It must be called from an other piece of executable code. One DLL can call another, but all must ultimately be called from and EXE.
So the rules about what's compatible with what processor that apply to EXEs also apply to DLLs.
I have a C++ static library that supports both x32 and x64 platforms.
My question is: should I name the .lib file different depending on which platform?
i.e. MyLib32.lib vs MyLib64.lib
Intel Math library and TBB handle this using folder name to differentiate between the 2 libraries instead.
i.e. x32\Math.lib vs x64\Math.lib
Is there a better way compared to the other?
I think explicitly naming the lib to correspond to the intended platform should be better? That way we dont depend on folder name and the lib is self-documenting.
Be nice to your users and add 32 or 64 to the end like you propose. It's absolutely 100% clear what it means at first glance and you'll never mix them up.
I've been doing a lot of 32 and 64 bit work lately and I definitely prefer different names.
No, I don't think that one approach is superior to the other, and I think you've properly enumerated the two differences in each.
From my experience, however, many libraries have the same name, but are kept in either separate folders and are distributed in separate zip files.
LPSolve on sourceforge, for example, has their binaries named the same, regardless of platform.
We have developed a number of custom dll's which are called by third-party Windows applications. These dlls are loaded / unloaded as required.
Most of the dlls call web services and these need to have urls, timeouts, etc configured.
Because the dll is not permanently in memory, it has to read the configuration every time it is invoked. This seems sub-optimal to me.
Is there a better way to handle this?
Note: The configurable information is in an xml file so that the IT department can alter as required. They would not accept registry edits.
Note: These dll's cater for a number of third-party applications, It esentially implements an external EDMS interface. The vendors would not accept passing the required parameters.
Note: It’s a.NET application and the dll is written in C#. Essentially, there are both thick (Windows application) and thin clients that access this dll when they need to perform some kind of EDMS operation. The EDMS interface is defined as a set of calls that have to be implemented in the dll and the dll decides how to implement the EDMS functions e.g. for some clients, “Register Document” would update a DB and for others the same call would utilise a third-party EDMS system. There are no ASP clients.
My understanding is that the dll is loaded when the client wants to access an EDMS operation and is then unloaded when the call is finished. The client may not need to do another EDMS operation for a while (in some cases over an hour).
Use the registry to store your configuration information, it's definitely fast enough.
I think you need to provide more information. There are so many approaches at persisting configuration information. We don't even know the development platform. .Net?
I wouldn't rely on the registry unless I was sure it would always be available. You might get away with that on client machines, but you've already mentioned webservices.
XML file in the current directory seems to be very popular now for server side third-party dlls. But those configurations are optional.
If this is ASP, Your Trust Level will be very important in choosing a configuration persistance method.
You may be able to use your Application server's "Application Scope". Which gets loaded once per lifetime of the application. Your DLL can invalidate that data if it detects it needs too.
I've used text files, XML files, database, various IPC like shared memory segments, application scope, to persist configuration information. It depends a lot on the specifics of your project.
Care to elaborate further?
EDIT. Considering your clarifications, I'd go with an XML file. This custom XML file would be loaded using a search path that has been predefined and documented. If this is ASP.Net you can use Server.MapPath() for example to check various folders like App_Data. The DLL would check the current directory for the configuration file first though. You can then use a "manager" thread that holds the configuration data and passes it to any child threads that require it. The sharing can use IPC like a shared memory segment.
This seems like hassle, but you have to store the information in some scope... Either from disk, memory ( application scope, session scope, DLL global scope, another process/IPC etc. )
ASP.Net also gives you the ability to add custom configuration sections to standard configuration files like web.config. You can access those sections at will and they will not depend on when your DLL was loaded.
Why do you believe your DLL is being removed from memory?
Why don't you let the calling application fill out a data-structure with the stuff you need? Can be done as part of an init-call or so.
How often is the dll getting unloaded? COM dlls can control when they are unloaded via the DllCanUnload method. If these are COM components you could look at implementing some kind of timeout here to prevent frequent loads and unloads. Unless the dll is reload the configuration at a significant frequency it is unlikely to be a real performance bottleneck.
Knowing that the dll will reload its configuration at certain points is a useful feature, since it prevents the users wondering if they have to restart the host process, reboot the machine, etc for the configuration to take effect. You could even watch the file for changes to keep it up to date.
I think the best way for a DLL to get configuration information is via the application that is using it - either via implicit "Init"-calls, like Nils suggested, or via their configuration files.
DLLs shouldn't usually "configure themselves", as they can never be sure in which context they are used. Different users (as in applications) may have different configuration settings to make.
Since you said that the application is written in .NET, you should probably simply require them to put the necessary configuration for your DLL's functions in their configuration file ("whatever.exe.config") and access it from your DLL via AppSettings or even better via a custom configuration section.
Additionally, you may want to provide sensible default values for settings where that is possible (probably not for network addresses though).
If the dlls are loaded and unloaded from memory only at a gap of every 1 hour or so the in-efficiency due to mslal initializations (read file / registry) will be negligible.
However if this is more frequent, a higher inefficiency would be the physical action of loading and unloading of dlls. This could be more of an in-efficiency than small initializations.
It might therefore be better to keep them pinned in memory. That way the initialization performed at the load time, does not get repeated and you also avoid the in-efficiency of load and unload. You solve 2 issues this way.
I could tell you how to do this in C++. Not sure how you would do this in C#. GetModuleHandle + making an extra a LoadLibrary call on this handle is how i would do this in C++.
One way to do it is to have an Interface in the DLL which specify the required settings.
Then it's up to the "application project" to have a class that implements this interface and pass it to the DLL at initiation, this makes you free to change the implementation depending on project. One might read from web.config while another reads from DB.