Decyphering undocumented COM interfaces - com

I have a pointer to a COM object that implements an undocumented interface. I would really, really like to be able to use said interface. All I have is the IID though. Master software analyst Geoff Chappell has documented a host of these undocumented COM interfaces on his site; see IListView for example. Somehow he even managed to get the function names and signatures. How is something like that even possible? Are they guesses?
Can someone point me in the right direction as to how I would go about something like this? I know the risks of using anything undocumented.
To elaborate, the object I'm interested in is ExplorerFrame.dll's notoriously undocumented ItemsView. By setting an API hook on CoCreateInstance, I can see that the object is created with a certain undocumented IID as its main interface. I'm assuming this is the interface that through which the control is manipulated, hence my interest in figuring out its members.

You know, you could write to me and ask! There was a time when I would write explicitly that the names and prototypes come from Microsoft's public symbol files, but I long ago abandoned that as verbiage. What sort of reverse engineer would I be if I was always explaining how I got my information! I'd be insulting those of my readers who are reverse engineers and I'd risk boring those who just want the information (which, let's face it, is typically not riveting).
If you don't have the public symbol files, then typelibs are the next best thing. But, of course, not all interfaces appear in the typelibs - not even all that implement IDispatch.
Given that you have an executable and its public symbol file, getting the IID and listing the methods is very nearly the simplest reverse engineering. It's maybe just a bit too complex for reliable automation - though I'd love to be proved wrong on that.
You likely know of the interface because you have a virtual function table for an implementation. Most likely, you found this because you're reverse engineering a class, in which case you find the virtual function tables for all its interfaces by working from the constructor or destructor. The virtual function table is an array of pointers to functions. The public symbol files give you the decorated names of these functions. A competent reverse engineer can undecorate these symbols by sight, mostly, and Visual C++ provides an UNDNAME tool (and your debugger or disassembler may anyway do the work for you). Finding the IID typically requires inspection of the QueryInterface method, matching against the known offset of the interface's virtual function table from the start of the class.
For a simple interface of, say, half a dozen methods, the whole exercise of writing up just a basic listing of IID, offsets and prototypes takes maybe 10 minutes on a good day, and no more than 30 if you're being lazy. Of course, with a lot of these undocumented interfaces, you may then want to check that the implementation and IID are the same in multiple versions - which can quickly turn a good day into a bad one.
By the way, if I guess something or hypothesise, I try to be sure of saying so. For instance, near the end of the documentation you cite of the otherwise undocumented IListView interface, I speak of a window message: you can know the name I give is made up by me because I say "perhaps named something like".

The definitive interpreter of PDB files is MSPDBxx.DLL. The primary tool for interpreting PDB files is the debugger, and by extension nowadays also the Microsoft Visual C++ linker in its guise as the DUMPBIN disassembler. These do not show everything from the PDB files, but they do all the basic stuff, such as listing all the symbols, labelling code and data, and summarising from whatever type information is in the file (which is typically none in public symbol files).
As usual, a competent - well, accomplished - reverse engineer can read these files by sight for information not shown by the standard tools. The most notable example is the section contribution information, which is as close as the public symbol files come to matching code to source files.
How you point the debuggers at symbol files is well documented. My practice for making a listing with DUMPBIN is just to copy both the binary and the corresponding PDB file to the current directory. As long as the filename of the PDB file matches the filename in the binary's debug directory, DUMPBIN works with the PDB file automatically. It really couldn't be easier.
I imagine that non-Microsoft disassemblers and decompilers are at least as capable of using whatever PDB file happens to be available for the target binary.

If your pointer impls IDispatch (which is quite likely) you can QueryInterface for that and then GetIDsOfNames. You likely end up guessing what interfaces it might use and calling QI just to see what works :)

Related

Where can I get a copy of Dwrite_1.dll?

On Windows 10, I am trying to write VBA code to list the Unicode code point ranges in a font.
I was able to use GetFontUnicodeRanges in gdi32.dll, but it does not handle code points beyond xFFFF.
Further research uncovered GetUnicodeRanges (as part of DirectWrite), in Dwrite_1.dll, which should handle all Unicode code points, but that is not on my system (although I have Dwrite.dll which does NOT contain GetFontUnicodeRanges.
I searched SO, the internet at large and Microsoft's web sites but could not find that dll.
Question: Does anyone know how/where I can get a copy of Dwrite_1.dll?
There is no DWrite_1.dll. You're mixing up the dll with the header file: the GetUnicodeRanges method is implemented in the IDWriteFont1 and IDWriteFontFace1 interfaces, which are supported in the DWrite_1.h header file.
DWrite makes use of COM. You start by calling the DWriteCreateFactory function to get a factory interface — that is, an object that implements the requested factory interface. DWrite has multiple factory interfaces which correspond to different versions --- IDWriteFactory (v1), IDWriteFactory1 (v2), etc., each adding new functionality.
IIRC, VBA makes use of COM, but I've never tried to call into DWrite.dll from VBA. I'd search for discussions on VBA calling into COM interfaces.
Do you really need to do this programmatically? There are tools you can use to inspect fonts. I've long used SIL ViewGlyph; also check out BabelMap.

Environment Variable To Register Libraries From Custom Location (OCX, DLL)

I've searched far an wide for this specific problem, but I only find separate solutions for each problem individually. I basically want to know what the name of the environment variable should be. My assumption is that the name of the variable should be the name of the component and that it should be User variable and not System variable, for example:
name -> "mydll.dll"
path -> "c:\myCustomPath\mydll.dll"
The reason why I want to do this is because of two reasons. First, I often run my custom made tools either directly from the source code in a VM (which is sort of a pain), or I compile it and run it in W10. However, I just cannot do that with more complex apps that have dependencies because then I would have to register tons of DLLs onto the system root, and I know that I would lose track of it easily. The second reason is because I read this reply the guy says it's not recommended to use the system root for private libraries and he also suggests using an environment variable which sounded like a good solution to my problem.
The reason why I have not tested this myself through trial and error is because I'm afraid of leaving my only computer unusable if I put something wrong in the variable. Also all the libraries and exe files that I'm using are written and compiled in VB6, so I have no easy way around it since I already tried merging the multiple projects into one on a rather small project. I ended up rewriting almost the whole thing because VB6 doesn't like public types enums, etc in private Object Classes.
Finally, I am not sure if my question should be here since it doesn't involve programming, but I just felt it would be better understood here.
If I understand your question correctly, you are asking where you can place COM DLLs so that you can register them on your computer.
The answer is - fundamentally - that it does not matter where they are located because registration has a "global" effect. (Simplifying a little).
Now of course there are standards or conventions for where system-wide registered DLLs should go - e.g., Windows\SysWOW64 folder. But the point is that if you register the wrong thing, or leave out dependencies, or remove a registered DLL without unregistering it - etc. etc. - you will cause problems.
I am not aware of any environment variable that has anything to do with this basic function of COM DLLs. (I may be ignorant of something).
If you are actually using an application manifest (as maybe implied in the question) then you don't need to and should not register any DLL which is manifested.

Get importlib directives from type library

How can one programmatically determine which type libraries (GUID and version) a given native, VB6-generated DLL/OCX depends on?
For background: The VB6 IDE chokes when opening a project where one of the referenced type libraries can't load one of its dependencies, but it's not so helpful as to say which dependency can't be met--or even which reference has the dependency that can't be met. This is a common occurrence out my company, so I'm trying to supplement the VB6 IDE's poor troubleshooting information.
Relevant details/attempts:
I do have the VB source code. That tells me the GUIDs and versions as of a particular revision in the repo, but when analyzing a DLL/OCX/TLB file I don't know which version of the repo (if any--could be from a branch or might never have been committed to a branch) a given DLL/OCX corresponds to.
I've tried using tlbinf32.dll, but it doesn't appear to be able to list imports.
I don't know much about PE, but I popped open one of the DLLs in a PE viewer and it only shows MSVBVM60.dll in the imports section. This appears to be a special quirk of VB6-produced type libraries: they link only to MSVBVM60 but have some sort of delay-loading mechanism for the rest of the dependencies.
Even most of the existing tools I've tried don't give the information--e.g., depends.exe only finds MSVBVM60.dll.
However: OLEView, a utility that used to ship with Visual Studio, somehow produces an IDL file, which includes the importlib directives. Given that VB doesn't use IDL files, it's clearly generating the information somehow. So it's possible--I just have no idea how.
Really, if OLEView didn't do it I'd have given it up by now as impossible. Any thoughts on how to accomplish this?
It turns out that I was conflating basic DLL functionality and COM. (Not all DLLs are COM DLLs.)
For basic DLLs, the Portable Executable format includes a section describing its imports. The Optional Header's directory 1 is about the DLL's imports. Its structure is given by IMAGE_IMPORT_DESCRIPTOR. This is a starting point for learning about that.
COM DLLs don't seem to have an equivalent as such, but you can discover which other COM components its public interface needs: for each exposed interface, list out the types of their properties and their method arguments, and then use the Registry to look up where those types come from. tlbinf32.dll provides some of the basic functionality for listing members, etc. Here's and intro to that.

Process for reducing the size of an executable

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC
General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.
Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.
On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.
Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.
Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.
Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.
To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.
You could look at something like executable compression.

Refactoring dissassembled code

You write a function and, looking at the resulting assembly, you see it can be improved.
You would like to keep the function you wrote, for readability, but you would like to substitute your own assembly for the compiler's. Is there any way to establish a relationship between your high-livel language function and the new assembly?
If you are looking at the assembly, then its fair to assume that you have a good understanding about how code gets compiled down. If you have this knowledge, then its sometimes possible to 'reverse enginer' the changes back up into the original language but its often better not to bother.
The optimisations that you make are likely to be very small in comparison to the time and effort required in first making these changes. I would suggest that you leave this kind of work to the compiler and go have a cup of tea. If the changes are significant, and the performance is critical, (as say in the embedded world) then you might want to mix the normal code with the assemblar in some fashion, however, on most computers and chips the performance is usually sufficient to avoid this headache.
If you really need more performance, then optimise the code not the assembly.
None, I suppose. You've rejected the compiler's work in favor of your own. You might as well throw out the function you wrote in the compiled language, because now all you have is your assembler in that platform.
I would highly advise against engaging in this kind of optimization because unless you're sure, via profiling and analysis, that you truly are making a difference.
It depends on the language you wrote your function in. Some languages like C are very low-level, translating each function call or statement to specific assembly statements. If you did use C, you can replace your function with inline assembly to improve performance.
Other high-level languages may convert each statement into macro routines or other more complex calls on the assembly side. Certain optimizations (like tail recursion, loop unrolling, etc) can be implemented easily on the source side, but others (like making more efficient use of the register file) may be impossible (again, depending on the language and the compiler you're using).
Its tough to say there is any relationship between modified assembly and the source which generated the unmodified version. It will certainly confuse debugging tools: register contents will no longer match the source variables they were supposed to correspond to.
There are a number of places in packet processing code where I've examined the generated assembly and gone back to change the original source code in order to improve the result. Re-arranging source can reduce the number of branches, __attribute__ and compiler arguments can align branch points and functions to reduce I$ misses. In desperate cases a little inline assembly can be used, so that the binary can still be compiled from source.
Something you could try is to separate your original function into its own file, and provide a make rule to build the assembler from there. Then update the assembler file with your improved version, and provide a make rule to build an object file from the assembler file. Then change your link rules to include that object file.
If you only ever change the assembler file, that will keep on being used. If you ever change the original higher-level language file, the assembler file will be rebuilt and the object file built from the new (unimproved) version.
This gives you a relationship between the two; you probably want to add a warning comment at the top of the higher-level language file to warn about the behaviour. Using some form of VCS will give you the ability to recover the improved assembler file if you make a mistake here.
If you're writing a native compiled app in Visual C++, there are two methods:
Use the __asm { } block and write your assembler in there.
Write your functions in MASM assembler, assemble to .obj, and link it as an static library. In your C/C++ code, declare the function with an extern "C" declaration.
Other C/C++ compilers have similar approaches.
In this situation, you generally have two options: optimize the code or rewrite the compiler. I can't see where breaking the link between source and op is ever going to be the correct solution.