Where is JVM PC stored during a call? - jvm

I am currently reading the last specification of the JVM. It is clear that each thread has its own call stack and its own program counter that keeps track of the (next) instruction to execute.
My question is maybe dump, but from the description, I cannot find an answer.
Where is the current program counter stored when a new or a method is invoked?
In other terms, how does the thread now where to continue after the invokation of a method?

The answer is implementation-dependent as different hardware architectures and even different JVMs may implement this behavior in different ways. In the standard Oracle JVM, most of your bytecode will be compiled to native code by JIT (Just in Time compiler) and method calls will be executed as for native code (give or take some extra code which may be added to handle checkpointing etc.). On a PC this means that current register values, including the instruction pointer / program counter will be saved on the stack before a method call. When returning from the call, the processor pops these values from the stack, among them the return address.

Related

JVM Memory Segments and JIT Compiler

I know this is JVM dependent and every virtual machine would choose to implement it a little bit different yet I want to understand the overall concept.
It has been said that for the Memory Segments that the JVM uses to execute a Java program
Java Stacks
Heap
Method Area
PC Registers
Native Method Stacks
are not necessarily implemented with contiguous memory and may be all actually allocated on some heap memory provided from the OS, this leads me to my question.
JVM's that fully use the JIT mechanism and compiles bytecode methods
into native machinecode methods store these methods somewhere, where
would that be? the execution engine ( that is usually written in C /
C++ ) would have to invoke these JIT compiled functions, yet the kernel shouldn't allow a program to execute code saved on the stack / heap / static memory segment, how could the JVM overcome this?
Another question I have is regarding the Java stacks, when a method ( after JIT compilation ) is executed within the processor it's local variables should be saved within the Java stacks, yet again the Java stacks may be implemented with a non-contiguous memory and perhaps even just some stack data structure allocated on the heap acting as a stack, how and where do the local variables of a method being executed get saved? the kernel shouldn't allow a program to treat a heap allocated memory as a process stack, how does JVM overcome this difficuly as well?
Again, I want to emphasis that I'm asking for an overall concept, I know each JVM would choose to implement this a little bit different...
JVM's that fully use the JIT mechanism and compiles bytecode methods into native machinecode methods store these methods somewhere, where would that be?
It is stored in the "Perm Gen" in Java <= 7 and "meta space" in Java 8. This is another native memory region.
the execution engine ( that is usually written in C / C++ ) would have to invoke these JIT compiled functions, yet the kernel shouldn't allow a program to execute code saved on the stack / heap / static memory segment, how could the JVM overcome this?
The memory region is both writable and executable, though I don't know exactly which system call is required to implement this.
Another question I have is regarding the Java stacks, when a method ( after JIT compilation )
Initially the code is not compiled but it uses the stack in the same way.
is executed within the processor it's local variables should be saved within the Java stacks, yet again the Java stacks may be implemented with a non-contiguous memory
There is a stack per thread which is continuous.
and perhaps even just some stack data structure allocated on the heap acting as a stack, how and where do the local variables of a method being executed get saved?
On the thread stack.
the kernel shouldn't allow a program to treat a heap allocated memory as a process stack, how does JVM overcome this difficuly as well?
It doesn't do this.

Thread safe hooking of DirectX Device

I successfully hooked BeginScene/EndScene methods of DirectX9's DeviceEx, in order to override regions on the screen of a graphics application. I did it by overriding the first 'line' of the function pointed by the appropriate vtable entry (42 for EndScene) with an x86 jump command.
The problem is that when I would like to call the original EndScene method, I have to write the original code overriden by the jump. This operation is not thread safe, and the application has two devices used by two threads.
I tried overriding the vtable entry or copying it and override the COM interface pointer to the vtable, neither ways worked. I guess the original function pointer is cached somewhere or was optimized in the compilation.
I thought about copying the whole original method body to another memory block, but two problems I'm afraid of: (1) (the easy one I think) I don't know how to discover the length of the method and (2) I don't know if the function body stores offsets which are relative to the location where the function is in memory.
I'm trying to hook WPF's device, if it can help somehow.
Do anyone know a thread safe way for such hooking?
Answering my own question: It seems that for my purpose (performing another method before or instead of the original one within my own process), 'trampoline' is the answer. Generally it means I need to make another code segment that makes exactly what the overriden assembly commands did.
Because it is not an easy task, using an external library is recommended.
A discussion about this topic:
How to create a trampoline function for hook

Strange COM behaviour called from .net

i am working on an application which calls the COM component of a partner's application. Ours is .Net, theirs isn't. I don't know much about COM; I know that the component we're calling is late-bound i.e.
obj As Object = CreateObject("THIRDPARTY.ThirdPartyObject")
We then call a method on this COM object (Option Strict Off in the head of the VB file):
obj.AMethod(ByVal Arg1 As Integer, ByVal Arg2 As Integer, ByVal Arg3 as Boolean)
I am a bit nonplussed that even though this call works, this overload doesn't exist in the COM interop .dll that is created if I instead add a reference to the COM server using Add Reference. The only available call to this method that it says is available is AMethod().
However, this in itself is not what bothers me. What bothers me is that this call works for a while, THEN throws a TargetParameterCountException after a few dozen calls have executed successfully.
I ask thee thus, StackOverflow:
What. The. Hell.
The only thing I can guess at is that the documentation for the COM component states that this method is executed synchronously - so therefore maybe whatever's responsible for throwing that exception is being blocked until some indeterminate point in time? Other than that, I'm completely stumped at this bizarre, and more importantly inconsistent behaviour.
edit #1:
More significant information that I've just remembered - from time to time the call throws an ExecutionEngineException instead. It only took one glance at the documentation to realise that this is VERY BAD. Doing a little bit of digging suggests to me that the late-binding call is causing stack corruption, crashing the entire CLR. Presumably this means that the runtime is shooting down bad calls (with TargetParameterCountException) some of the time and missing them (ExecutionEngineException) others.
edit #2:
Answering David Lively's questions:
The call with zero arguments that's currently in the code has been there for a long time. I haven't been able to get hold of a manual for the third party's COM implementation past two major revisions ago, so it's possible that they've withdrawn that signature from service
There is only one location that this method is called from
This is one desktop app calling another, on the same machine. Nothing fancy
The object is persisted throughout the scope of the user's interaction with the application, so there's never a new one created.
Unfortunately, it seems likely that there is indeed a bug in the implementation, as you suggest. The trouble with this vendor is that, when we report a bug, their response tends to follow the general form: i) deny there's a problem; ii) deny it's their problem; iii) refuse to fix it. These three steps tend to span a frustratingly long period of time.
No, it can't cause stack corruption. IDispatch::Invoke() is used to call the method, the arguments are packaged in an array. The stock implementation of IDispatch certainly would detect the argument mismatch, it uses the type library info to check. But it is conceivable that the COM server author implemented it himself. Imperfectly. It is something a C++ hacker might do, the stock implementation is dreadfully slow. The GC heap getting corrupted is the kind of thing that happens when imperfect code executes.
I haven't played with calling COM objects from VB in quite a while, but I'll take a wild guess:
I would expect an exception to be thrown if you're calling the object with too few or too many arguments, but it appears that's not the case. What is the real signature of the method you're calling?
In some languages and some situations, when you call a method, arguments are placed on the stack. If you place too many arguments, it's possible for the extraneous ones to remain on the stack after the method completes. This should cause lots of other problems, though.
Some possibilities/considerations:
The object is throwing this exception internally. This should be taken up with the author.
You're calling with too many parameters. If, as you said, the overload you're trying to call isn't published in the object's type library, you may actually be calling a different published method with a different signature. I'd REALLY expect a compiler error if this is the case.
Are your later calls taking place in the same part of your code, or is there a different execution branch that might be doing things a bit differently, and causing the error?
Are you running this from a desktop app/script, or a website? If a website, are you receiving a valid, expected response, or does the request hang as if an internal long-running process doesn't complete?
The object may be allocating and not releasing resources, which could cause undefined behavior when those resources are exhausted.
Are you releasing the object between calls, or is it recreated every time?
Also, re: your comments about late binding: the .CreateObject() method of instantiating a COM object is the normal, accepted way to do this. That shouldn't have anything to do with the issue. Based on the exceptions you listed, I'm strongly inclined to believe that there is an internal issue with the object.
Good luck.
OK, basically - false alarm. I've done it wrong - I've copied some code over from somewhere improperly and the thing I'm calling was never supposed to support that overload. What I find interesting is that the component didn't reject that late-bound call out of hand, but did everything it was supposed to do, at least initially.

OOP question about functions that struck me all of a sudden

May be my question is stupid. But i would like to get it cleared. We know that functions are loaded in memory only once and when you create new objects, only instance variables gets created, functions are never created. My question is, say suppose there is server and all clients access a method named createCustomer(). Say suppose all clients do something which fired createCustomer on server. So, if the method is in middle of execution and new client fires it. Will the new request be put on wait? or new request also will start executing the method? How does it all get managed when there is only one copy of function in memory? No book mentions answers to this type of questions. So i am posting here where i am bound to get answers :).
Functions are code which is then executed in a memory context. The code can be run many times in parallel (literally in parallel on a multi-processor machine), but each of those calls will execute in a different memory context (from the point of view of local variables and such). At a low level this works because the functions will reference local variables as offsets into memory on something called a "stack" which is pointed to by a processor register called the "stack pointer" (or in some interpreted languages, an analog of that register at a higher level), and the value of this register will be different for different calls to the function. So the x local variable in one call to function foo is in a different location in memory than the x local variable in another call to foo, regardless of whether those calls happen simultaneously.
Instance variables are different, they're referenced via a reference (pointer) to the memory allocated to the instance of an object. Two running copies of the same function might access the same instance variable at exactly the same time; similarly, two different functions might do so. This is why we get into "threading" or concurrency issues, synchronization, locks, race conditions, etc. But it's also one reason things can be highly efficient.
It's called "multi-threading". If each request has its own thread, and the object contains mutable data, each client will have the opportunity to modify the state of the object as they see fit. If the person who wrote the object isn't mindful of thread safety you could end up with an object that's in an inconsistent state between requests.
This is a basic threading issue, you can look it up at http://en.wikipedia.org/wiki/Thread_(computer_science).
Instead of thinking in terms of code that is executed, try to think of memory context of a thread that is changed. It does not matter where and what the actual code happens to be, and if it is the same code or a duplicate or something else.
Basically, it can happen that the function is called while it was already called earlier. The two calls are independent and may even happen to run in parallel (on a multicore machine). The way to achieve this independence is by using different stacks and virtual address spaces for each thread.
There are ways to synchronize calls, so additional callers have to wait until the first call finishes. This is also explained in the above link.

Low-level details of the implementation of performSelectorOnMainThread:

Was wondering if anyone knows, or has pointers to good documentation that discusses, the low-level implementation details of Cocoa's 'performSelectorOnMainThread:' method.
My best guess, and one I think is probably pretty close, is that it uses mach ports or an abstraction on top of them to provide intra-thread communication, passing selector information along as part of the mach message.
Right? Wrong? Thanks!
Update 09:39AMPST
Thank you Evan DiBiase and Mecki for the answers, but to clarify: I understand what happens in the run loop, but what I'm looking for an answer to is; "where is the method getting queued? how is the selector information getting passed into the queue?" Looking for more than Apple's doc info: I've read 'em
Update 14:21PST
Chris Hanson brings up a good point in a comment: my objective here is not to learn the underlying mechanisms in order to take advantage of them in my own code. Rather, I'm just interested in a better conceptual understanding of the process of signaling another thread to execute code. As I said, my own research leads me to believe that it's takes advantage of mach messaging for IPC to pass selector information between threads, but I'm specifically looking for concrete information on what is happening, so I can be sure I'm understanding things correctly. Thanks!
Update 03/06/09
I've opened a bounty on this question because I'd really like to see it answered, but if you are trying to collect please make sure you read everything, including all currently posed answers, comments to both these answers and to my original question, and the update text I posted above. I'm look for the lowest-level detail of the mechanism used by performSelectorOnMainThread: and the like, and as I mentioned earlier, I suspect it has something to do with Mach ports but I'd really like to know for sure. The bounty will not be awarded unless I can confirm the answer given is correct. Thanks everyone!
Yes, it does use Mach ports. What happens is this:
A block of data encapsulating the perform info (the target object, the selector, the optional object argument to the selector, etc.) is enqueued in the thread's run loop info. This is done using #synchronized, which ultimately uses pthread_mutex_lock.
CFRunLoopSourceSignal is called to signal that the source is ready to fire.
CFRunLoopWakeUp is called to let the main thread's run loop know it's time to wake up. This is done using mach_msg.
From the Apple docs:
Version 1 sources are managed by the run loop and kernel. These sources use Mach ports to signal when the sources are ready to fire. A source is automatically signaled by the kernel when a message arrives on the source’s Mach port. The contents of the message are given to the source to process when the source is fired. The run loop sources for CFMachPort and CFMessagePort are currently implemented as version 1 sources.
I'm looking at a stack trace right now, and this is what it shows:
0 mach_msg
1 CFRunLoopWakeUp
2 -[NSThread _nq:]
3 -[NSObject(NSThreadPerformAdditions) performSelector:onThread:withObject:waitUntilDone:modes:]
4 -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:]
Set a breakpoint on mach_msg and you'll be able to confirm it.
One More Edit:
To answer the question of the comment:
what IPC mechanism is being used to
pass info between threads? Shared
memory? Sockets? Mach messaging?
NSThread stores internally a reference to the main thread and via that reference you can get a reference to the NSRunloop of that thread. A NSRunloop internally is a linked list and by adding a NSTimer object to the runloop, a new linked list element is created and added to the list. So you could say it's shared memory, the linked list, that actually belongs to the main thread, is simply modified from within a different thread. There are mutexes/locks (possibly even NSLock objects) that will make sure editing the linked list is thread-safe.
Pseudo code:
// Main Thread
for (;;) {
lock(runloop->runloopLock);
task = NULL;
do {
task = getNextTask(runloop);
if (!task) {
// function below unlocks the lock and
// atomically sends thread to sleep.
// If thread is woken up again, it will
// get the lock again before continuing
// running. See "man pthread_cond_wait"
// as an example function that works
// this way
wait_for_notification(runloop->newTasks, runloop->runloopLock);
}
} while (!task);
unlock(runloop->runloopLock);
processTask(task);
}
// Other thread, perform selector on main thread
// selector is char *, containing the selector
// object is void *, reference to object
timer = createTimerInPast(selector, object);
runloop = getRunloopOfMainThread();
lock(runloop->runloopLock);
addTask(runloop, timer);
wake_all_sleeping(runloop->newTasks);
unlock(runloop->runloopLock);
Of course this is oversimplified, most details are hidden between functions here. E.g. getNextTask will only return a timer, if the timer should have fired already. If the fire date for every timer is still in the future and there is no other event to process (like a keyboard, mouse event from UI or a sent notification), it would return NULL.
I'm still not sure what the question is. A selector is nothing more than a C string containing the name of a method being called. Every method is a normal C function and there exists a string table, containing the method names as strings and function pointers. That are the very basics how Objective-C actually works.
As I wrote below, a NSTimer object is created that gets a pointer to the target object and a pointer to a C string containing the method name and when the timer fires, it finds the right C method to call by using the string table (hence it needs the string name of the method) of the target object (hence it needs a reference to it).
Not exactly the implementation, but pretty close to it:
Every thread in Cocoa has a NSRunLoop (it's always there, you never need to create on for a thread). PerformSelectorOnMainThread creates a NSTimer object like this, one that fires only once and where the time to fire is already located in the past (so it needs firing immediately), then gets the NSRunLoop of the main thread and adds the timer object there. As soon as the main thread goes idle, it searches for the next event in its Runloop to process (or goes to sleep if there is nothing to process and being woken up again as soon as an event is added) and performs it. Either the main thread is busy when you schedule the call, in which case it will process the timer event as soon as it has finished its current task or it is sleeping at the moment, in which case it will be woken up by adding the event and processes it immediately.
A good source to look up how Apple is most likely doing it (nobody can say for sure, as after all its closed source) is GNUStep. Since the GCC can handle Objective-C (it's not just an extension only Apple ships, even the standard GCC can handle it), however, having Obj-C without all the basic classes Apple ships is rather useless, the GNU community tried to re-implement the most common Obj-C classes you use on Mac and their implementation is OpenSource.
Here you can download a recent source package.
Unpack that and have a look at the implementation of NSThread, NSObject and NSTimer for details. I guess Apple is not doing it much different, I could probably prove it using gdb, but why would they do it much different than that approach? It's a clever approach that works very well :)
The documentation for NSObject's performSelectorOnMainThread:withObject:waitUntilDone: method says:
This method queues the message on the run loop of the main thread using the default run loop modes—that is, the modes associated with the NSRunLoopCommonModes constant. As part of its normal run loop processing, the main thread dequeues the message (assuming it is running in one of the default run loop modes) and invokes the desired method.
As Mecki said, a more general mechanism that could be used to implement -performSelectorOn… is NSTimer.
NSTimer is toll-free bridged to CFRunLoopTimer. An implementation of CFRunLoopTimer – although not necessarily the one actually used for normal processes in OS X – can be found in CFLite (open-source subset of CoreFoundation; package CF-476.14 in the Darwin 9.4 source code. (CF-476.15, corresponding to OS X 10.5.5, is not yet available.)