Building a COM object vtable in x86 assembly

Building a COM object vtable in x86 assembly - com

I am building a COM object in x86 assembly using NASM. I understand COM quite well and I understand x86 assembly pretty well, but getting the two to mesh is getting me hung up... (by the way, if you're thinking of attempting to dissuade me from using x86 assembly, please refrain, I have very particular reasons why I'm building this in x86 assembly!)
I am trying to build a vtable to use in my COM object, but I keep getting strange pointers, rather than actual pointers to my functions. (I'm thinking that I'm getting relative offsets or that NASM is embedding temporary values in there and they're not being replaced with the real values during linking)
The current interface I'm trying to build is the IClassFactory interface, with code as follows:
%define S_OK 0x00000000
%define E_NOINTERFACE 0x80004002
section .text
; All of these have very simple shells rather than implementations, but that is just until I can get the vtable worked out
ClassFactory_QueryInterface:
mov eax, E_NOINTERFACE
retn 12
ClassFactory_AddRef:
mov eax, 1
retn 4
ClassFactory_Release:
mov eax, 1
retn 4
ClassFactory_CreateInstance:
mov eax, E_NOINTERFACE
retn 16
ClassFactory_LockServer:
mov eax, S_OK
retn 8
global ClassFactory_vtable
ClassFactory_vtable dd ClassFactory_QueryInterface, ClassFactory_AddRef, ClassFactory_Release, ClassFactory_CreateInstance, ClassFactory_LockServer
global ClassFactory_object
ClassFactory_object dd ClassFactory_vtable
Note: This is not all of the code, I have DllGetClassObject, DllMain, etc. in a different file.
But when I assemble (using NASM: nasm -f win32 comobject.asm) and link (using MS Link: link /dll /subsystem:windows /out:comobject.dll comobject.obj), and examine the executable using OllyDbg, the vtable comes out with strange values. For example, in my last build, the actual addresses for the functions are as follows:
QueryInterface - 0x00381012
AddRef - 0x0038101A
Release - 0x00381020
CreateInstance - 0x00381026
LockServer - 0x0038102E
But the vtable came out with these values:
QueryInterface - 0x00F51012
AddRef - 0x00F5101A
Release - 0x00F51020
CreateInstance - 0x00F51026
LockServer - 0x00F5102E
These values look awfully suspicious... almost like the relocation didn't take. Also, the vtable comes out as 0x00F5104A, all of which are inaccessible memory addresses. (for informational purposes, these values come out different every time)
I tried doing the same thing in C++ using Visual Studio 2010 Express and everything comes out fine. So I'm assuming that it's just something that I'm missing in my assembly...
Can anyone point out to me why these values aren't coming out properly?

I must apologize, the problem turned out to be my own fault... In all of the scuffle building the thing, I had removed the /dll from the linker invocation, causing it to be built as an EXE, not a DLL...
Let me explain this a little better for the next person who runs across this.
All Windows executables have a base address which is assumed to be the virtual address that the executable will be loaded into. Executables that are loaded into a running process in most cases will not be loaded at the "preferred" base address, because another DLL (or the application itself) is probably already occupying the address. For this reason, Windows PE executables use what is called a Relocation Table. The Relocation Table tells Windows which locations in the executable need to be rewritten in case of a relocation to a new base address.
However, with the advent of Virtual Memory, most linkers will omit the relocation table from EXEs as an optimization, because the executable will always be loaded at it's base address (unless it conflicts with the reserved kernel addresses, in which case it will fail to load all-together). So because I stopped compiling as a DLL, my executable was not being given a Relocation Table and as a result, would not load properly into running process' address space.
Update:
By default, MSVC only includes relocation tables in DLL projects, as described on MSDN:
By default, /FIXED:NO is the default when building a DLL, and /FIXED is the default for any other project type.
This behavior can be changed by supplying the /FIXED:NO switch to the linker. The default for non-DLL projects is /FIXED which tells the linker that the target has a fixed base address and does not require a relocation table.

have you tried to build a stub COM interface in C and disassemble the result? That should give you a clue what is going wrong in your implementation.

Have you tried to also declare your globals as export? I haven't done x86 for a long time. But reading nasm's doc seems to imply you need to both global and export for the DLL relocation fixup to work.

Related

About the necessity of GOT & PLT structures in ELF header

In some architectures (like x86_64) where it's possible to reference data (mov e.g.) and code (jmp, call) using PC(RIP)-relative addressing mode, is there really a technical reason justifying the need of such structures (got, plt) ?
I mean, if I want to mov a global data (for instance) to a register, I could make the following instruction (standard PIE) :
mov rax,QWORD PTR [rip+0x2009db]
mov eax,DWORD PTR [rax]
(where 0x2009db is the offset between rip and the right entry in the got containing the symbol address)
And why couldn't we do something like that :
mov rax, rip+0xYYYYYY
mov eax,DWORD PTR [rax]
(0xYYYYYYY being the direct delta between the RIP value and the symbol (a global variable e.g.))
I'm not used to do ASM, so my example is perhaps false. But my idea is : why not just simply compute the absolute address of the symbol based on RIP, put it in EAX, and then access its content. If the instruction set allows to do whatever we want with relative addressing, why use such structures (got, plt) ?
The same question would apply for call/jmp instructions.
Is it because the instruction set does not allow it ?
Is it because the offset value cannot cover the entire address space ? But.. is it important ? Since the structure of a section is maintained one mapped into the virtual addressing space of a process (e.g. .dat section followed by .got or something like). I mean, why would the offset be bigger in referring directly to the symbol address instead of the entry address in the got ?
Other reason ?
Thanks !

Basically, the reason for these structures is exactly having an extra level of indirection.
In this way, you can interpose symbols in dynamic libraries with LD_PRELOAD. And even without it, the dynamic binding rules are such that a symbol defined in an executable overrides one defined in a shared library even for calls from that library (see this).
Also, consider these points.
The address at which a shared library holding the implementation of the called function gets loaded is not known beforehand (this is by design, in particular it is done on purpose: it's a feature of ld.so known as ASLR), so the dynamic loader needs to apply relocations to at least all call sites that are executed at run time.
If not for PLT, this would kill the advantage of sharing code segments of libraries mapped to memory by different process images, because in different processes the same library might have a different address, leading to different patched code. PLT is a relatively small piece of data that is not shared. See this post.
PLT allows to bind functions lazily, upon first invocation. PLT slots initially hold the address of the resolver. After the resolution is done, the result is cached in the PLT stot.
Relocation mechanism for GOT/PLT is covered here. All in all, there's enough information on the internet on how (and why) PLT and GOT work.
Also, check out GCC's -fno-plt option. This is an optimization, but note that GOT is still needed and that lazy binding is not supported for functions without PLT entries.

Why cant you statically link dynamic libraries?

When using external libraries, you often have to decide whether you use the static or the dynamic version of the library. Typically, you can not exchange them: If the library is build as dynamic library, you can not link statically against it.
Why is this the case?
Example: I am building a C++ program on windows and use a library that provides a small .lib file for the linker and a large .dll file that must be present when running my executable. If the library code in the .dll can be resolved at runtime, why can't it be resolved at compile time and directly put into my executable?

Why is this the case?
Most linkers (AIX linker is a notable exception) discard information in the process of linking.
For example, suppose you have foo.o with foo in it, and bar.o with bar in it. Suppose foo calls bar.
After you link foo.o and bar.o together into a shared library, the linker merges code and data sections, and resolves references. The call from foo to bar becomes CALL $relative_offset. After this operation, you can no longer tell where the boundary between code that came from foo.o and code that came from bar.o was, nor the name that CALL $relative_offset used in foo.o -- the relocation entry has been discarded.
Suppose now you want to link foobar.so with your main.o statically, and suppose main.o already defines its own bar.
If you had libfoobar.a, that would be trivial: the linker would pull foo.o from the archive, would not use bar.o from the archive, and resolve the call from foo.o to bar from main.o.
But it should be clear that none of above is possible with foobar.so -- the call has already been resolved to the other bar, and you can't discard code that came from bar.o because you don't know where that code is.
On AIX it's possible (or at least it used to be possible 10 years ago) to "unlink" a shared library and turn it back into an archive, which could then be linked statically into a different shared library or a main executable.
If foo.o and bar.o are linked into a foobar.so, wouldn't it make sense that the call from foo to bar is always resolved to the one in bar.o?
This is one place where UNIX shared libraries work very differently from Windows DLLs. On UNIX (under common conditions), the call from foo to bar will resolve to the bar in main executable.
This allows one to e.g. implement malloc and free in the main a.out, and have all calls to malloc use that one heap implementation consistently. On Windows you would have to always keep track of "which heap implementation did this memory come from".
The UNIX model is not without disadvantages though, as the shared library is not a self-contained mostly hermetic unit (unlike a Windows DLL).
Why would you want to resolve it to another bar from main.o?
If you don't resolve the call to main.o, you end up with a totally different program, compared to linking against libfoobar.a.

Discovering registered COM components

Is there a way to determine if a registered COM component is creatable as a stand-alone component simply by parsing the information available in the registry? In particular, by the information found in HKCR/ClsId?
My system has over 12,000 entries in this key, and I am already excluding any items that do not have an InProcServer32 or LocalServer32 key, but this only eliminates about half of the items. I believe there are still another couple thousand that are not creatable objects. I really don't want to have to attempt to do a CreateObject() on every one of them to distinguish the ones that can be created from the ones that cannot. Is there a more efficient way?

Oleview
I used Oleview
for this purpose (back in the day :))
Manual/programmatic
If I remember correctly (no Windows PC nearby):
the class should link to a typelibrary
the typelib will point to a binary (dll, ocx, exe)
this binary contains the physical typelibrary, which you should parse
the midl compiler can do that (generate stubs/C headers)
oleview can do that (extract IDL)
tlbimp can do that
you can do it with Win32 API
any creatable objects should be marked coclass (not interface or source; there were also global modules which I suppose are creatable too: I'm just not sure whether they are defined as coclasses
Show me the code
It is possible to read the information within a type library with the ITypeLib and ITypeInfo interfaces. They can be created with the ICreateTypeLib and ICreateTypeInfo interfaces. However, the Microsoft IDL compiler (MIDL) is probably the only application to ever use ICreateType and ICreateTypeInfo.
A quick google turned up this useful page: Reading Type Libraries with C++.
It contains just the code to get started. Just to see whether it was worth anything, I fired up a cloud Windows instance, grabbed all the sources and compiled it.
In contrast with the options mentioned on the site, I simply compiled on windows with
cl.exe *.cpp /EHs ole32.lib oleaut32.lib
Just for fun, I compiled the stuff on Linux (64 bit) using MingW:
i586-mingw32msvc-g++ *.cpp -loleaut32 -lole32 -o Typelib.exe
To save you the work I have put a zip-file up for download containing:
win32_Unicode.cpp - sources by René Nyffenegger
win32_Unicode.h
TestTypelib.cpp
Typelib.cpp
Typelib.h
VariantHelper.cpp
VariantHelper.h
TestTypelib.exe - binary compiled on windows
A test run:
# linux: ./a.exe ~/.wine/drive_c/windows/system32/msxml6.dll
C:\Games\Stacko>TestTypelib.exe c:\Windows\System32\msxml6.dll
MSXML2: Microsoft XML, v6.0
Nof Type Infos: 149
IXMLDOMImplementation
----------------------------
Interface: Dispatch
functions: 8
variables: 0
Function : QueryInterface
returns : VT_VOID
flags :
invoke kind: function
params : 2
params opt : 0
Parameter : riid type = VT_PTR (VT_USERDEFINED (GUID)) in
Parameter : ppvObj type = VT_PTR (VT_PTR) out
Function : AddRef
returns : VT_UI4
flags :
invoke kind: function
params : 0
params opt : 0
(snip) and 15499 lines more
Concluding
I hope this gives you a good starting point in scanning your system for installed, creatable, COM components

Depends what you mean by "createable". If it has a LocalServer32 or InprocServer32 key it should be locally creatable. It may also be creatable remotely if it has an AppID and the AppID has either LocalService or RemoteServer keys.
However consulting the registry will only answer the question "does it look like it ought to be creatable".
You might still not be able to create it:
The registration might be broken, or "fossil" registry entries from uninstalled components.
The component might be an internal Windows component of some sort that you have no idea how to use since it is intentionally not documented.
The component might be an internal component of an installed application which has additional requirements not documented.
You might not have permission.
There may be other components you could create:
There might be registration-free COM components, such as WSC scriptlets.
there might be registration-free COM DLLs. There is no law saying you have to be registered to be a COM component. Registration is an optional service that most people opt into.
So I guess the answer is you should be able to get a mostly complete list using the registry, but what is the list for?
Without knowing what you want the list for, it is impossible to know if the list is good enough.

What's the principle of LOADDLL.EXE?

It can be used to run arbitary Dynamic Link Library in windows,
how can it possibly know the entry point of an arbitary dll?

The answer depends on how much details you need. Basically, it comes down to this:
A DLL can optionally specify an entry-point function. If present, the system calls the entry-point function whenever a process or thread loads or unloads the DLL.
[...] If you are providing your own entry-point, see the DllMain function. The name DllMain is a placeholder for a user-defined function. You must specify the actual name you use when you build your DLL.
(Taken from the MSDN article Dynamic-Link Library Entry-Point Function.)
So basically, the entry point can be specified inside the DLL, and the operating system's DLL loader knows how to look this up.

The IMAGE_OPTIONAL_HEADER (part of the portable executable's header on Windows machines) contains an RVA of the AddressOfEntryPoint that is called by programs looking for an entry point to call (e.g., the loader).
More information on the IMAGE_OPTIONAL_HEADER can be found here. And this paper is good for just general PE knowledge.

What do you mean by "run a DLL"? DLLs aren't normal programs, they are just a collection of functions. The entry point itself usually doesn't do much apart from initializing stuff required by other functions in the DLL. The entry point is automatically called when the DLL is loaded (you can use LoadLibrary to do this).
If you want to call a specific function after loading the DLL, you can use GetProcAddress to get a pointer to the function you want.

Is it OK to use boost::shared ptr in DLL interface?

Is it valid to develop a DLL in C++ that returns boost shared pointers and uses them as parameters?
So, is it ok to export functions like this?
1.) boost::shared_ptr<Connection> startConnection();
2.) void sendToConnection(boost::shared_ptr<Connection> conn, byte* data, int len);
In special: Does the reference count work across DLL boundaries or would the requirement be that exe and dll use the same runtime?
The intention is to overcome the problems with object ownership. So the object gets deleted when both dll and exe don't reference it any more.

According to Scott Meyers in Effective C++ (3rd Edition), shared_ptrs are safe across dll boundaries. The shared_ptr object keeps a pointer to the destructor from the dll that created it.
In his book in Item 18 he states, "An especially nice feature of
tr1::shared_ptr is that it automatically uses its per-pointer deleter
to eliminate another potential client error, the "cross-DLL problem."
This problem crops up when an object is created using new in one
dynamically linked library (DLL) but is deleted in a different DLL. On
many platforms, such cross-DLL new/delete pairs lead to runtime
errors. tr1::shared_ptr avoid the problem, because its default deleter
uses delete from the same DLL where the tr1::shared_ptr is created."
Tim Lesher has an interesting gotcha to watch for, though, that he mentions here. You need to make sure that the DLL that created the shared_ptr isn't unloaded before the shared_ptr finally goes out of scope. I would say that in most cases this isn't something you have to watch for, but if you're creating dlls that will be loosely coupled then I would recommend against using a shared_ptr.
Another potential downside is making sure both sides are created with compatible versions of the boost library. Boost's shared_ptr has been stable for a long while. At least since 1.34 it's been tr1 compatible.

In my opinion, if it's not in the standard and it's not an object/mechanism provided by your library, then it shouldn't be part of the interface to the library. You can create your own object to do the reference counting, and perhaps use boost underneath, but it shouldn't be explicitly exposed in the interface.

DLLs do not normally own resources - the resources are owned by the processes that use the DLL. You are probably better off returning a plain pointer, which you then store in a shared pointer on the calling side. But without more info it's hard to be 100% certain about this.

Something to lookout for if you expose raw pointers from a dll interface. It forces you to use the shared dll CRT, memory allocated in one CRT cannot be deallocated in a different CRT. If you use the shared dll CRT in all your modules ( dll's & exe's ) then you are fine, they all share the same heap, if you dont you will be crossing CRT's and the world will meltdown.
Aside from that issue, I agree with the accepted answer. The creation factory probably shouldn't define ownership & lifecycle management for the client code.

No it is not.
The layout of boost::shared_ptr<T> might not be the same on both sides of the DLL boundary. (Layout is influenced by compiler version, packing pragmas, and other compiler options, as well as the actual version of the Boost source code.)
Only "standard layout" (a new concept in C++11, related to the old "POD = plain old data" concept) types can safely be passed between separately-built modules.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas