What's this extra bytes? - executable

I'm studying the PE (Portable Executable) format, but I saw a difference between C++ programs compiled with MinGW and MSVC:
It's some extra bytes after 'This program cannot be run in DOS mode' and before the 'PE' magic signature.
Anyone know what this is and why there's the 'Rich' word?

This is the "Rich header". It was added by Microsoft's link.exe (notice the text "Rich" at the end of the block). It's a structure in PE files between the DOS Header and the NT Header (between DOS stub and PE Header). It contains version information of linked libraries and the linkers versions.
Further reading:
The Undocumented Microsoft "Rich" Header
Microsoft's Rich Signature (undocumented)
Rich Header

Related

use program or section headers to load an ELF

I'm writing an EFI application that loads an ELF into memory and jumps to it, but I don't know what header I should analyse first (program or section header). I have a function that reads the program headers to load the ELF into memory (which works) and a function that reads the section headers to load the ELF into memory (which also works).
The program loader should look at the program header only. The section headers are for tools such as debuggers. I don't think this is spelled out explicitly in the original ELF specification or the System V ABI specification, but it is very much implied:
System V Application Binary Interface
Even today, when new features are defined which are used by the dynamic linker, references are added the dynamic to the dynamic section, even though in theory, the information could also be obtained from the section header (but there are probably some exceptions for certain architectures).

Elf representation in HEX

I am working on understanding some ground concepts in embedded Systems. My question is similar to understand hexedit of an elf .
In order to burn compiler output to ROM, the .out file is converted to HEX (say intel-hex). I wonder how the following informations are preserved in HEX format:
Section header
Symbol tables, debug symbols, linker symbols etc.
Elf header.
If these are preserved in HEX file, how they can be read from hex file?
A bit out question but how the microcontroller on boot knows where .data .bss etc. exists in HEX and to be coppied to RAM?
None of that is preserved. A HEX file only contains the raw program and data. https://en.wikipedia.org/wiki/Intel_HEX
The microcontroller does not know where .data and .bss are located - it doesn't even know that they exist. The start-up code which is executed before main() is called contains the start addresses of those sections - the addresses are hard-coded into the program. This start-up code will be in the HEX file like everything else.
The elements in points 1 to 3 are not included in the raw binary since they serve no purpose in the application; rather they are used by the linker and the debugger on the development host, and are unnecessary for program execution where all you need is the byte values and the address to write them to, which is more or less all the hex file contains (may also contain a start address record).
Systems that have dynamic linking or self-hosted debug capabilities (such as VxWorks for example) use the object file file.
With respect to point 5, the microcontroller does not need to know; the linker uses that information when resolving absolute and relative addresses in the object code. Once filly resolved (linked), the addresses are embedded in the code directly. Again where dynamic loading/linking is used the object file meta-data is required and such systems do not normally load a raw hex file or binary.

What is the difference in byte code like Java bytecode and files and machine code executables like ELF?

What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.

what language is dotnet executable written in?

I thought it would be Common Intermediate Language, but in notepad it does not look like that at all. Does it just look uglier in reality than in tutorials? Or is it some bytecode form that is further compiled from CIL?
It's CIL is the name of the binary format, not of the "assembler" you're thinking of.
Can you possibly imagine that .NET assemblies would be text files?
A .NET executable is a binary file that has a PE header (same as a native executable, but with slightly different values). The PE header tells the OS to load the CLR, which in turn loads the assembly.
The content beyond the header is a binary representation of the CIL code, plus some metadata and other stuff. The text you see in tutorials is the text representation of CIL, in much the same way that the assembly language code you see in a tutorial about assembly language programming is just the text representation of the binary machine code.
See http://www.yetanotherchris.me/home/2010/7/12/inside-net-assemblies-part-1.html (among many others) for more information.
A .Net executable is usually not written, it is compiled from another language such as C#, F# or VB.Net.
The contents of a .Net executable can be viewed with the ILDASM tool.
The contents are first a manifest which is used for reflection, signatures or other meta-code purposes.
Secondly there are the MSIL instructions themselves. These are in a kind of bytecode format, but ILDASM will show you what the instructions are.
And there are sometimes resources such as imagery, sounds or other content packed into the executable.
The executable is just-in-time compiled to native code either during installation (I think this is uncommon), or as a precursor to execution. The resulting native code can be stored for reuse. (This is what I was told during PDC 2001, might be "out of date".)

Pex Generated Tests Encoded UCS-2 Little Endian, Why, how to change?

HI there
i noticed that when I generate a pex test solution the default encoding of the files is UCS-2 Little Endian, this is not really cool, because all the rest of the files are normally encoded with Windows ANSI
(I m getting this info from Notepadd ++) and its confirmed by my CI breaking
Anyone knows
1) why is it using this encoding?
2) how to change it so by default it uses Windows ANSI like the rest of the files
NOTE:I know this is the issue because i saved the file with Windows Ansi Encoding and it all works
I know I probably shouldnt but I went and posted this same question on the pex forum
link to the question
and this was an Answer from Peli ( he is heavily involved in the Pex project AFAIK)
Copy of the Answer
1) why is it using this encoding?
There is no particular reason for this, besides that we decide to use this particular encoding. We will switch on Windows-1252 (ANSI) encoding in the future for source files. XML files will still be encoded as UTF-8.
2) how to change it so by default it uses Windows ANSI like the rest of the files
Unfortunately, this is hard-coded in Pex and you cannot change this. The next release of Pex (0.93) will use ANSI.