Let's say I have a C# Windows Class library in my solution and I build it in my VS2010 IDE.
The output here in my bin directory is X.dll
1) X.dll does not contain MSIL at this stage but "compressed byte code".
Is this true?
2) This "compressed byte code" is converted to MSIL somehow.
When does this occur?
3) When X.dll is accessed the JIT compiler of CLR takes the portion of MSIL that it needs to convert and does so into machine code.
Am I good on this final part?
Can anybody help in filling in the gaps in my understanding here?
X.dll contains MSIL bytecode after you build it with Visual Studio. You can prove this by disassembling it with ildasm.
At some time between assembly loading and the actual execution of the code, the MSIL is translated to native code. I am not familiar with where exactly this is done, but I would suspect at assembly load.
To build upon Dark Falcon's response, on question no.2, the IL code (not compressed byte code) is converted to native code by the JIT compiler, method by method, at each of their first call. So, a shorter answer for no.2 is "when the method is first accessed". Fields are not subject to this, properties are disguised methods themselves.
As for the third question, no, the IL is not JITed when it is loaded. See above.
Related
I have an executable that looks for a particular DLL. I have changed the source for DLL and recompiled it (written and recompiled in VB6). Once I replace the DLL, the executable hits a runtime error when it gets to using that particular DLL. Works ok when I recompile the executable.
So my question is, with same DLL path, same name, and virtually identical DLLs, why does the executable need to be recompiled?
This is driving me bananas so any thoughts would be appreciated. Thanks, Callum.
A VB6 (or any COM) DLL has unique IDs for itself and its public interfaces, if you recompile these can change and any existing code bound to the old IDs fails.
Tldr; Tick "Binary Compatibility" in the DLL's project options & select the old working DLL as the thing to maintain compatibility with & recompile.
Detailed explanation: I keep hearing about DLL hell - what is this?
i am new to java.
i wanted to know this.
what is the need to create the .class file in java ?
can't we just pass the source code to every machine so that each machine can compile it according to the OS and the hardware ?
I believe it's mostly for efficiency reasons.
From wikipedia http://en.wikipedia.org/wiki/Bytecode:
Bytecode, also known as p-code (portable code), is a form of
instruction set designed for efficient execution by a software
interpreter. Unlike human-readable source code, bytecodes are compact
numeric codes, constants, and references (normally numeric addresses)
which encode the result of parsing and semantic analysis of things
like type, scope, and nesting depths of program objects. They
therefore allow much better performance than direct interpretation of
source code.
(my emphasis)
And as others have mentioned possible weak obfuscation of the source code.
The main reason for the compilation is that the Virtual Machines which are used to host java classes and run them only understands bytecode
And since compiling a class each time to the language the virtual machine understands is expensive. That's the only reason why the source code is compiled into bytecode.
But we can also use some compilers which compiles source code directly into machine code.But that's a different story which I don't know about much.
This is what I think I know:
TypeLib only descripes "what should be out there" (=How to locate and call the actual class)
If .tlb -file (=TypeLib) is imported in C++ -code at compile time, the compiled code already contains all information "what should be out there".
Interop -assembly (.NET) is pretty much the same as .tlb -file. It's only in assembly-form so that the .NET code can be compiled against it, but also contains info only about "what should be out there".
If interop in embedded, its TypeLib-info is included in the assembly and there's no need to deploy separate interop-dll when the application is deployed.
If the interop is not embedded, the interop-dll must be deployed to the same folder as the calling assembly so that the info can be found by the caller.
Of course the actual classes that the TypeLib is pointing to must be registered so that they can be found from the depth of the machine with the TybeLib-info.
Please correct me, if something above is wrong.
My question is, based on above "knowledge", why I sometimes see articles/questions/tutorials about REGISTERING TYPELIBs? It just doesn't make any sense to me (yet), since registering them shouldn't provide any excess information for the caller.
Thank you.
I thought it would be Common Intermediate Language, but in notepad it does not look like that at all. Does it just look uglier in reality than in tutorials? Or is it some bytecode form that is further compiled from CIL?
It's CIL is the name of the binary format, not of the "assembler" you're thinking of.
Can you possibly imagine that .NET assemblies would be text files?
A .NET executable is a binary file that has a PE header (same as a native executable, but with slightly different values). The PE header tells the OS to load the CLR, which in turn loads the assembly.
The content beyond the header is a binary representation of the CIL code, plus some metadata and other stuff. The text you see in tutorials is the text representation of CIL, in much the same way that the assembly language code you see in a tutorial about assembly language programming is just the text representation of the binary machine code.
See http://www.yetanotherchris.me/home/2010/7/12/inside-net-assemblies-part-1.html (among many others) for more information.
A .Net executable is usually not written, it is compiled from another language such as C#, F# or VB.Net.
The contents of a .Net executable can be viewed with the ILDASM tool.
The contents are first a manifest which is used for reflection, signatures or other meta-code purposes.
Secondly there are the MSIL instructions themselves. These are in a kind of bytecode format, but ILDASM will show you what the instructions are.
And there are sometimes resources such as imagery, sounds or other content packed into the executable.
The executable is just-in-time compiled to native code either during installation (I think this is uncommon), or as a precursor to execution. The resulting native code can be stored for reuse. (This is what I was told during PDC 2001, might be "out of date".)
I know DLL contains one or more exported functions that are compiled, linked, and stored separately..
My question is about not about how to create it.. but it is all about in what form it is stored.. Is it going to be in the form of 0's & 1's.. or in assembly commands ADD, MUL, DIV, MOV, CALL, RETURN etc..
Also what makes it to be processor dependent.. (like x86, x87, IBM 700 instruction set)..
Can someone please explain it little briefly..!
First of all, everything in a computer is in the form of "0's & 1's" . The fact that the computer can display some of these as text, pictures, sounds, 3D models, etc. is just a matter of how you interpret them. But down there, at the metal, it's all just "0's & 1's" (also known as bits). Note though that they are always grouped together in groups of 8, and these are called "bytes". It's really for the sake of efficiency, because operating with every bit individually would be too tedious. Actually, todays computers don't even operate on single bytes anymore (or rather - they do it very rarely). Mostly you operate with 4 or 8 bytes at a time, depending on whether you have a 32-bit or 64-bit CPU (that's in layman's terms, it's actually a bit more complicated than that).
As for a .DLL file - like an .EXE file, it contains bytes that describe instructions that a CPU can execute. The CPU takes these bytes directly from the .DLL/.EXE and executes them without any further modifications. That's why these files are CPU-specific. In different CPU architectures the same combination of bytes means different things, so a .DLL/.EXE will run correctly only on the CPU for which it was designed. On other CPUs these bytes will mean some other instructions, and when run, the program will most likely do some utter nonsense and crash immediately.
The assembly commands you mentioned also deserve an explanation. "Assembler" is not a language that a CPU can understand. It's a language a human can understand. It was created because writing directly in machine code (the bytes that the CPU actually understands) is very difficult. What you get is utter gibberish on the screen (try opening some .EXE file in Notepad!) but every bit has to be precisely set for it to work.
So assembly language is basically the same thing, except these instructions are written in text that humans can read. For every machine code that a CPU can understand, there is am instruction with a human-friendly name. An assembly compiler simply reads these instructions and replaces them with the bytes that represent the actual instructions for the CPU to execute. It's a 1:1 operation. Every command in assembly language matches a single machine instruction (again, in layman's terms).
So you see, there isn't even a single assembly language. Every CPU architecture has its own assembly language, because they each have different instructions.
Note though that all this applies to native .DLL/.EXE files. .NET files are different - they don't contain machine code, but rather instructions for an abstract, nonexistent CPU. It's like Java bytecodes. When a .NET .DLL/.EXE is run, the .NET runtime translates it from the abstract instructions to the instructions that the specific CPU can understand. They use a lot of tricks to make this very fast, so these files run almost as fast as simple .DLL/.EXE files.
Does this clear things up? :)
Native DLLs (not .NET assemblies) usually contain machine code that can only be run on a certain platform. The machine code is a sequence of bytes that the processor treats as instructions (ADD, MOV, etc.).
In Windows, dll's are stored in the PE format which is basically a collection of sections that holds the information about how to map it into memory. some sections contains the program's code (which is of course processor dependent), others contains the program's data, other the exported and imported functions and so on.
Managed code is compiled to some intermediate language that is JITed by the run-time as it is executed. therefore, your dll won't contain any processor dependent code and you'll be able to execute your program on any platform with the relevant run-time.
it depends on your DLL. generally, a DLL contains executable code as an EXE file. those code DLLs are processor dependent since the code can only be executed on a specific platform. the code is stored using the same "format" as an EXE file (binary machine code).
however, a DLL can sometimes contains only data: they are then called "resource DLL" and are not processor dependent at all. they act as a container for data files used by applications.
note that many DLLs are hybrids: they contain both code and resources. for example, most DLLs which comprises the user part of the Windows operating system are hybrid: you can open them using Visual Studio or a Resource Explorer to see the resources (the data segments) they contain, or open them with Dependency Walker or dumpbin to see the functions (the code segments) they contain.
(of course this answer is really Windows specific, i don't know for .so files which are the linux equivalent of a DLL)
Both a DLL and an EXE contain executable code.
In the case of a DLL it doesn't have the necessary parts to be directly executable. It must be called from an other piece of executable code. One DLL can call another, but all must ultimately be called from and EXE.
So the rules about what's compatible with what processor that apply to EXEs also apply to DLLs.