Thread safe hooking of DirectX Device - com

I successfully hooked BeginScene/EndScene methods of DirectX9's DeviceEx, in order to override regions on the screen of a graphics application. I did it by overriding the first 'line' of the function pointed by the appropriate vtable entry (42 for EndScene) with an x86 jump command.
The problem is that when I would like to call the original EndScene method, I have to write the original code overriden by the jump. This operation is not thread safe, and the application has two devices used by two threads.
I tried overriding the vtable entry or copying it and override the COM interface pointer to the vtable, neither ways worked. I guess the original function pointer is cached somewhere or was optimized in the compilation.
I thought about copying the whole original method body to another memory block, but two problems I'm afraid of: (1) (the easy one I think) I don't know how to discover the length of the method and (2) I don't know if the function body stores offsets which are relative to the location where the function is in memory.
I'm trying to hook WPF's device, if it can help somehow.
Do anyone know a thread safe way for such hooking?

Answering my own question: It seems that for my purpose (performing another method before or instead of the original one within my own process), 'trampoline' is the answer. Generally it means I need to make another code segment that makes exactly what the overriden assembly commands did.
Because it is not an easy task, using an external library is recommended.
A discussion about this topic:
How to create a trampoline function for hook

Related

Wait for a thread inside a C++ static object

I have a static object that needs to initialize an imaging API. The allocated resources of this imaging API need to be released by the same thread.
So I'm starting a thread in my static object that initializes everything and then waits for a counter to reach zero. When this happens the thread cleans all up and finishes.
This is an unmanaged class inside a managed library, so I can't use System::Threading::Thread (needs a managed static member function) or std::thread (compiler error, not supported with /clr).
So I have to start my thread like:
CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)&Initialize, this, 0, 0);
All works fine, the init is done and the API functions work. But when I close the application I see that the usage counter of my static object reaches zero but the clean up function is never called by the thread, as if the thread was killed. Is there a way to make sure the thread will continue to exist and execute until its end?
After turning this around in all possible ways and adding events etc I guess this is not possible so I'll have to change the structure of my code and encapsulate the non managed class inside a managed class, and add the thread to the managed class.
I think you could proceed in one of two ways:
Wrap the resources in RAII-style classes, and refactor to have the objects' lifetimes be on the stack of your created thread, ensuring their destructors get called when the thread loop exits without having to call any additional cleanup. If there is no issue with the thread returning correctly when your counter reaches 0, this should be the simplest and cleanest way of addressing this.
I'm thinking you could intercept the WM_CLOSE message using window procedures, process necessary cleanup and then pass the message on, effectively "stalling" it until you are ready to close. Note that even though you are in a DLL you can still set up a window procedure and message pump system, you don't need a GUI to do that. I am however not 100% sure on whether you'll receive the WM_CLOSE message that concerns the application that "owns" your DLL, it's not something I've tried out yet.
You will have to implement some form of messaging through events within your thread's loop however, as the WindowProc will be called on a different thread, so you know when to call the cleanup procedure.
I also am not very familiar with CLR, so there might be a simpler way of interacting with those APIs than with raw C++ calls and handles.

cocoa - Subtle difference between -removeObserver:forKeyPath: and -removeObserver:forKeyPath:context:?

Short version:
What use is -removeObserver:forKeyPath:?
Why not always use -removeObserver:forKeyPath:context:?
Long version
While working on a Cocoa program, I discovered that using -removeObserver:forKeyPath: could (but would not always) lead to an error like:
Cannot remove an observer <ObservedClass 0x1001301d0> for the key path "exampleKeyPath" from <__NSCFConstantString 0x100009138> because it is not registered as an observer.
while using -removeObserver:forKeyPath:context: instead would work just fine.
Since it is required that a context be specified when setting up observation (with -observeValueForKeyPath:ofObject:change:context:), I'm puzzled at why the context:-less removal method exists.
Based on my reading of the NSKeyValueObserving Protocol, I supposed that the removal might apply to the specified observer and specified key path in all contexts, but the failure of -removeObserver:forKeyPath: (with no context) to work as a replacement for -removeObserver:forKeyPath:context: (with a context of NULL) seems to shoot down that idea.
So: why might I have that error? What does -removeObserver:forKeyPath: do with contexts? How's it differ from its context:-equipped younger sibling?
Code example
Problematic code:
-(void) invalidate {
[(id)observedObject removeObserver:self
forKeyPath:#"exampleKeyPath"];
}
Non-Problematic code:
-(void) invalidate {
[(id)observedObject removeObserver:self
forKeyPath:#"exampleKeyPath"
context:NULL];
}
Short version: -removeObserver:forKeyPath:context: was only introduced in 10.7, hence both.
Long version: Why might you have the error? Looks like a bug, either in your code or the system (I've never seen the error and use the shorter version a lot). The descriptions of the two methods do not suggest there should be any difference. If nobody else comes up with an explanation, and you can't find anything in your code, then report a bug to Apple.
The documentation has an excellent discussion as to the use of the new method:
Examining the value in context you are able to determine precisely which addObserver:forKeyPath:options:context: invocation was used to create the observation relationship. When the same observer is registered for the same key-path multiple times, but with different context pointers, an application can determine specifically which object to stop observing
It's just a way to be more specific about just which binding you want to remove from the object. For example, I might bind to a keypath twice, but with the memory locations of different static variables, a little like how dispatch_once() works. The context-free subscription method was the only way of binding to an object until 10.7 rolled around and filled in that gap.
As for your KVO troubles, the problem can occur in many different cases. The most common being that you've subscribed on one thread, then very soon after, removed a subscription from a different thread. Occasionally it can occur because the object you tried to observe was just about to deallocate, meaning you would subscribe to some bogus memory location that happened to fill what you needed to, then removing the subscription from this garbage pointer would be impossible. Either way, make sure to monitor the methods you're utilizing bindings in, as they can be a little unstable if used in the wrong way.

Strange COM behaviour called from .net

i am working on an application which calls the COM component of a partner's application. Ours is .Net, theirs isn't. I don't know much about COM; I know that the component we're calling is late-bound i.e.
obj As Object = CreateObject("THIRDPARTY.ThirdPartyObject")
We then call a method on this COM object (Option Strict Off in the head of the VB file):
obj.AMethod(ByVal Arg1 As Integer, ByVal Arg2 As Integer, ByVal Arg3 as Boolean)
I am a bit nonplussed that even though this call works, this overload doesn't exist in the COM interop .dll that is created if I instead add a reference to the COM server using Add Reference. The only available call to this method that it says is available is AMethod().
However, this in itself is not what bothers me. What bothers me is that this call works for a while, THEN throws a TargetParameterCountException after a few dozen calls have executed successfully.
I ask thee thus, StackOverflow:
What. The. Hell.
The only thing I can guess at is that the documentation for the COM component states that this method is executed synchronously - so therefore maybe whatever's responsible for throwing that exception is being blocked until some indeterminate point in time? Other than that, I'm completely stumped at this bizarre, and more importantly inconsistent behaviour.
edit #1:
More significant information that I've just remembered - from time to time the call throws an ExecutionEngineException instead. It only took one glance at the documentation to realise that this is VERY BAD. Doing a little bit of digging suggests to me that the late-binding call is causing stack corruption, crashing the entire CLR. Presumably this means that the runtime is shooting down bad calls (with TargetParameterCountException) some of the time and missing them (ExecutionEngineException) others.
edit #2:
Answering David Lively's questions:
The call with zero arguments that's currently in the code has been there for a long time. I haven't been able to get hold of a manual for the third party's COM implementation past two major revisions ago, so it's possible that they've withdrawn that signature from service
There is only one location that this method is called from
This is one desktop app calling another, on the same machine. Nothing fancy
The object is persisted throughout the scope of the user's interaction with the application, so there's never a new one created.
Unfortunately, it seems likely that there is indeed a bug in the implementation, as you suggest. The trouble with this vendor is that, when we report a bug, their response tends to follow the general form: i) deny there's a problem; ii) deny it's their problem; iii) refuse to fix it. These three steps tend to span a frustratingly long period of time.
No, it can't cause stack corruption. IDispatch::Invoke() is used to call the method, the arguments are packaged in an array. The stock implementation of IDispatch certainly would detect the argument mismatch, it uses the type library info to check. But it is conceivable that the COM server author implemented it himself. Imperfectly. It is something a C++ hacker might do, the stock implementation is dreadfully slow. The GC heap getting corrupted is the kind of thing that happens when imperfect code executes.
I haven't played with calling COM objects from VB in quite a while, but I'll take a wild guess:
I would expect an exception to be thrown if you're calling the object with too few or too many arguments, but it appears that's not the case. What is the real signature of the method you're calling?
In some languages and some situations, when you call a method, arguments are placed on the stack. If you place too many arguments, it's possible for the extraneous ones to remain on the stack after the method completes. This should cause lots of other problems, though.
Some possibilities/considerations:
The object is throwing this exception internally. This should be taken up with the author.
You're calling with too many parameters. If, as you said, the overload you're trying to call isn't published in the object's type library, you may actually be calling a different published method with a different signature. I'd REALLY expect a compiler error if this is the case.
Are your later calls taking place in the same part of your code, or is there a different execution branch that might be doing things a bit differently, and causing the error?
Are you running this from a desktop app/script, or a website? If a website, are you receiving a valid, expected response, or does the request hang as if an internal long-running process doesn't complete?
The object may be allocating and not releasing resources, which could cause undefined behavior when those resources are exhausted.
Are you releasing the object between calls, or is it recreated every time?
Also, re: your comments about late binding: the .CreateObject() method of instantiating a COM object is the normal, accepted way to do this. That shouldn't have anything to do with the issue. Based on the exceptions you listed, I'm strongly inclined to believe that there is an internal issue with the object.
Good luck.
OK, basically - false alarm. I've done it wrong - I've copied some code over from somewhere improperly and the thing I'm calling was never supposed to support that overload. What I find interesting is that the component didn't reject that late-bound call out of hand, but did everything it was supposed to do, at least initially.

OOP question about functions that struck me all of a sudden

May be my question is stupid. But i would like to get it cleared. We know that functions are loaded in memory only once and when you create new objects, only instance variables gets created, functions are never created. My question is, say suppose there is server and all clients access a method named createCustomer(). Say suppose all clients do something which fired createCustomer on server. So, if the method is in middle of execution and new client fires it. Will the new request be put on wait? or new request also will start executing the method? How does it all get managed when there is only one copy of function in memory? No book mentions answers to this type of questions. So i am posting here where i am bound to get answers :).
Functions are code which is then executed in a memory context. The code can be run many times in parallel (literally in parallel on a multi-processor machine), but each of those calls will execute in a different memory context (from the point of view of local variables and such). At a low level this works because the functions will reference local variables as offsets into memory on something called a "stack" which is pointed to by a processor register called the "stack pointer" (or in some interpreted languages, an analog of that register at a higher level), and the value of this register will be different for different calls to the function. So the x local variable in one call to function foo is in a different location in memory than the x local variable in another call to foo, regardless of whether those calls happen simultaneously.
Instance variables are different, they're referenced via a reference (pointer) to the memory allocated to the instance of an object. Two running copies of the same function might access the same instance variable at exactly the same time; similarly, two different functions might do so. This is why we get into "threading" or concurrency issues, synchronization, locks, race conditions, etc. But it's also one reason things can be highly efficient.
It's called "multi-threading". If each request has its own thread, and the object contains mutable data, each client will have the opportunity to modify the state of the object as they see fit. If the person who wrote the object isn't mindful of thread safety you could end up with an object that's in an inconsistent state between requests.
This is a basic threading issue, you can look it up at http://en.wikipedia.org/wiki/Thread_(computer_science).
Instead of thinking in terms of code that is executed, try to think of memory context of a thread that is changed. It does not matter where and what the actual code happens to be, and if it is the same code or a duplicate or something else.
Basically, it can happen that the function is called while it was already called earlier. The two calls are independent and may even happen to run in parallel (on a multicore machine). The way to achieve this independence is by using different stacks and virtual address spaces for each thread.
There are ways to synchronize calls, so additional callers have to wait until the first call finishes. This is also explained in the above link.

Low-level details of the implementation of performSelectorOnMainThread:

Was wondering if anyone knows, or has pointers to good documentation that discusses, the low-level implementation details of Cocoa's 'performSelectorOnMainThread:' method.
My best guess, and one I think is probably pretty close, is that it uses mach ports or an abstraction on top of them to provide intra-thread communication, passing selector information along as part of the mach message.
Right? Wrong? Thanks!
Update 09:39AMPST
Thank you Evan DiBiase and Mecki for the answers, but to clarify: I understand what happens in the run loop, but what I'm looking for an answer to is; "where is the method getting queued? how is the selector information getting passed into the queue?" Looking for more than Apple's doc info: I've read 'em
Update 14:21PST
Chris Hanson brings up a good point in a comment: my objective here is not to learn the underlying mechanisms in order to take advantage of them in my own code. Rather, I'm just interested in a better conceptual understanding of the process of signaling another thread to execute code. As I said, my own research leads me to believe that it's takes advantage of mach messaging for IPC to pass selector information between threads, but I'm specifically looking for concrete information on what is happening, so I can be sure I'm understanding things correctly. Thanks!
Update 03/06/09
I've opened a bounty on this question because I'd really like to see it answered, but if you are trying to collect please make sure you read everything, including all currently posed answers, comments to both these answers and to my original question, and the update text I posted above. I'm look for the lowest-level detail of the mechanism used by performSelectorOnMainThread: and the like, and as I mentioned earlier, I suspect it has something to do with Mach ports but I'd really like to know for sure. The bounty will not be awarded unless I can confirm the answer given is correct. Thanks everyone!
Yes, it does use Mach ports. What happens is this:
A block of data encapsulating the perform info (the target object, the selector, the optional object argument to the selector, etc.) is enqueued in the thread's run loop info. This is done using #synchronized, which ultimately uses pthread_mutex_lock.
CFRunLoopSourceSignal is called to signal that the source is ready to fire.
CFRunLoopWakeUp is called to let the main thread's run loop know it's time to wake up. This is done using mach_msg.
From the Apple docs:
Version 1 sources are managed by the run loop and kernel. These sources use Mach ports to signal when the sources are ready to fire. A source is automatically signaled by the kernel when a message arrives on the source’s Mach port. The contents of the message are given to the source to process when the source is fired. The run loop sources for CFMachPort and CFMessagePort are currently implemented as version 1 sources.
I'm looking at a stack trace right now, and this is what it shows:
0 mach_msg
1 CFRunLoopWakeUp
2 -[NSThread _nq:]
3 -[NSObject(NSThreadPerformAdditions) performSelector:onThread:withObject:waitUntilDone:modes:]
4 -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:]
Set a breakpoint on mach_msg and you'll be able to confirm it.
One More Edit:
To answer the question of the comment:
what IPC mechanism is being used to
pass info between threads? Shared
memory? Sockets? Mach messaging?
NSThread stores internally a reference to the main thread and via that reference you can get a reference to the NSRunloop of that thread. A NSRunloop internally is a linked list and by adding a NSTimer object to the runloop, a new linked list element is created and added to the list. So you could say it's shared memory, the linked list, that actually belongs to the main thread, is simply modified from within a different thread. There are mutexes/locks (possibly even NSLock objects) that will make sure editing the linked list is thread-safe.
Pseudo code:
// Main Thread
for (;;) {
lock(runloop->runloopLock);
task = NULL;
do {
task = getNextTask(runloop);
if (!task) {
// function below unlocks the lock and
// atomically sends thread to sleep.
// If thread is woken up again, it will
// get the lock again before continuing
// running. See "man pthread_cond_wait"
// as an example function that works
// this way
wait_for_notification(runloop->newTasks, runloop->runloopLock);
}
} while (!task);
unlock(runloop->runloopLock);
processTask(task);
}
// Other thread, perform selector on main thread
// selector is char *, containing the selector
// object is void *, reference to object
timer = createTimerInPast(selector, object);
runloop = getRunloopOfMainThread();
lock(runloop->runloopLock);
addTask(runloop, timer);
wake_all_sleeping(runloop->newTasks);
unlock(runloop->runloopLock);
Of course this is oversimplified, most details are hidden between functions here. E.g. getNextTask will only return a timer, if the timer should have fired already. If the fire date for every timer is still in the future and there is no other event to process (like a keyboard, mouse event from UI or a sent notification), it would return NULL.
I'm still not sure what the question is. A selector is nothing more than a C string containing the name of a method being called. Every method is a normal C function and there exists a string table, containing the method names as strings and function pointers. That are the very basics how Objective-C actually works.
As I wrote below, a NSTimer object is created that gets a pointer to the target object and a pointer to a C string containing the method name and when the timer fires, it finds the right C method to call by using the string table (hence it needs the string name of the method) of the target object (hence it needs a reference to it).
Not exactly the implementation, but pretty close to it:
Every thread in Cocoa has a NSRunLoop (it's always there, you never need to create on for a thread). PerformSelectorOnMainThread creates a NSTimer object like this, one that fires only once and where the time to fire is already located in the past (so it needs firing immediately), then gets the NSRunLoop of the main thread and adds the timer object there. As soon as the main thread goes idle, it searches for the next event in its Runloop to process (or goes to sleep if there is nothing to process and being woken up again as soon as an event is added) and performs it. Either the main thread is busy when you schedule the call, in which case it will process the timer event as soon as it has finished its current task or it is sleeping at the moment, in which case it will be woken up by adding the event and processes it immediately.
A good source to look up how Apple is most likely doing it (nobody can say for sure, as after all its closed source) is GNUStep. Since the GCC can handle Objective-C (it's not just an extension only Apple ships, even the standard GCC can handle it), however, having Obj-C without all the basic classes Apple ships is rather useless, the GNU community tried to re-implement the most common Obj-C classes you use on Mac and their implementation is OpenSource.
Here you can download a recent source package.
Unpack that and have a look at the implementation of NSThread, NSObject and NSTimer for details. I guess Apple is not doing it much different, I could probably prove it using gdb, but why would they do it much different than that approach? It's a clever approach that works very well :)
The documentation for NSObject's performSelectorOnMainThread:withObject:waitUntilDone: method says:
This method queues the message on the run loop of the main thread using the default run loop modes—that is, the modes associated with the NSRunLoopCommonModes constant. As part of its normal run loop processing, the main thread dequeues the message (assuming it is running in one of the default run loop modes) and invokes the desired method.
As Mecki said, a more general mechanism that could be used to implement -performSelectorOn… is NSTimer.
NSTimer is toll-free bridged to CFRunLoopTimer. An implementation of CFRunLoopTimer – although not necessarily the one actually used for normal processes in OS X – can be found in CFLite (open-source subset of CoreFoundation; package CF-476.14 in the Darwin 9.4 source code. (CF-476.15, corresponding to OS X 10.5.5, is not yet available.)