How to force finalization (for testing) in Cuis/Squeak/Pharo? - smalltalk

I've implemented a few ExternalStrctures (as part of an "FFI effort"), and for some of them I want to implement finalization for reclaiming the external memory.
I'm trying to write some tests for that, and it seems no matter how many times I force a garbage collection with Smalltalk garbageCollect, finalization is (apparently) never triggered.
To be sure of that I placed an external breakpoint (using gdb) in the function I'm calling from the finalizer, but the breakpoint is never hit.
I've also inspected all references to the object (after running many GCs) and the only reference is the WeakRegistry (I got to the object using allInstances).
Is there a way to force finalization (in Cuis/Pharo/Squeak)?

Smalltalk garbageCollect should always trigger finalization. If the object is new, then Smalltalk garbageCollectMost might trigger it, too (depending on the VM).
Note that finalization is implemented by a process in the image. If that process is not running for some reason, no finalization will happen.
Also, there are historically different ways for the VM to signal the image which objects to finalize. Initially, the finalization process would have to scan all weak objects, which was robust, but inefficient. Nowadays the VM and image will have to agree on a method to find the objects to finalize. If these are mismatched, finalization might not be reliable.

I think that #testFinalization in the OpenCL.pck.st package could be of help, at least as a working experiment to start with.

Related

Thread safe hooking of DirectX Device

I successfully hooked BeginScene/EndScene methods of DirectX9's DeviceEx, in order to override regions on the screen of a graphics application. I did it by overriding the first 'line' of the function pointed by the appropriate vtable entry (42 for EndScene) with an x86 jump command.
The problem is that when I would like to call the original EndScene method, I have to write the original code overriden by the jump. This operation is not thread safe, and the application has two devices used by two threads.
I tried overriding the vtable entry or copying it and override the COM interface pointer to the vtable, neither ways worked. I guess the original function pointer is cached somewhere or was optimized in the compilation.
I thought about copying the whole original method body to another memory block, but two problems I'm afraid of: (1) (the easy one I think) I don't know how to discover the length of the method and (2) I don't know if the function body stores offsets which are relative to the location where the function is in memory.
I'm trying to hook WPF's device, if it can help somehow.
Do anyone know a thread safe way for such hooking?
Answering my own question: It seems that for my purpose (performing another method before or instead of the original one within my own process), 'trampoline' is the answer. Generally it means I need to make another code segment that makes exactly what the overriden assembly commands did.
Because it is not an easy task, using an external library is recommended.
A discussion about this topic:
How to create a trampoline function for hook

objective-c: relying on dealloc

Is it a legitimate practice to rely on a deterministic dealloc (ex: for clean-up)?
Since ARC, and even manual reference counting, is inherently deterministic, I was wondering what other people thought about relying on dealloc getting called immediately (relatively, considering autoreleasepool).
In other modern programming languages, like C#, a dispose-like pattern is employed when you need deterministic clean-up. And I would imagine Obj-C with garbage collection encourages this behavior, as well.
So, with that said, an example would be a UIViewController which cancels outstanding operations in dealloc, rather than trying to program around the sometimes frustrating semantics of viewDidDisappear.
Another example would be a stream object that implicitly opens and closes in init and dealloc, respectively, rather than requiring open or close to be called.
Since Apple has deprecated GC, I would imagine that these sorts of patterns won't be broken anytime soon, and they are incredibly handy, though I can't find any documentation on whether this should be encouraged.
You are absolutely correct, you can rely on dealloc being called relatively soon after the last reference is released (manually or though ARC, it does not matter). Unlike GC where the finalizer is called when the system has some free time, or in some cases is never called, dealloc gets called very reliably. Apple allows and even encourages using this pattern by suggesting that we should perform all of our resource clean-up tasks inside dealloc.
This does not mean, however, that you should rely on dealloc exclusively. For example, take a look at the NSStream class: it offers you an explicit close method, letting you force closing the stream at will, without waiting for the call of dealloc to happen. This is a very good pattern to follow in case the resource is very expensive (file handles, semaphores, etc.): the primary mechanism for releasing these resources should be a separate close method. The dealloc method should release the resource as well, but it should also issue a warning, informing your that you missed a call of close.
Regardless of your memory management system, tying expensive resources (e.g. files & sockets, images, views, large memory allocations, etc) to object lifetimes is risky. Even if you're manually retaining & releasing, you might unwittingly retain an object somewhere and forget about it (or otherwise delay its release unnecessarily). ARC makes it even more likely that these things will happen, since it's much less apparent where the retains come from, and when the corresponding releases take effect. And of course GC makes it all completely indefinite.
Generally for expensive and/or limited resources you should try to follow a sole-ownership pattern. Yes, you can still retain/release as normal to prevent premature deallocation, but the dominant owner should be well defined and responsible for clearing out the object when it's done - e.g. calling invalidate, or close, or whatever is appropriate. This makes perfect sense in a lot of the common cases - you generally know when you're done with a file or a socket, for example, so even if they happen to be encapsulated inside some wrapper class, you should just close them explicitly.
In some cases this can also help find bugs that would otherwise remain hidden, or be hard to track down. For example, if your file wrapper class raises an exception when read or write are called after the file is closed, you'll soon catch those cases. Whereas if you didn't close the file when you thought you were done with it, the reads and writes would just happen as usual and you might not notice that your file has unexpected data in it.
You can also use the same principles to break retain cycles.

OOP question about functions that struck me all of a sudden

May be my question is stupid. But i would like to get it cleared. We know that functions are loaded in memory only once and when you create new objects, only instance variables gets created, functions are never created. My question is, say suppose there is server and all clients access a method named createCustomer(). Say suppose all clients do something which fired createCustomer on server. So, if the method is in middle of execution and new client fires it. Will the new request be put on wait? or new request also will start executing the method? How does it all get managed when there is only one copy of function in memory? No book mentions answers to this type of questions. So i am posting here where i am bound to get answers :).
Functions are code which is then executed in a memory context. The code can be run many times in parallel (literally in parallel on a multi-processor machine), but each of those calls will execute in a different memory context (from the point of view of local variables and such). At a low level this works because the functions will reference local variables as offsets into memory on something called a "stack" which is pointed to by a processor register called the "stack pointer" (or in some interpreted languages, an analog of that register at a higher level), and the value of this register will be different for different calls to the function. So the x local variable in one call to function foo is in a different location in memory than the x local variable in another call to foo, regardless of whether those calls happen simultaneously.
Instance variables are different, they're referenced via a reference (pointer) to the memory allocated to the instance of an object. Two running copies of the same function might access the same instance variable at exactly the same time; similarly, two different functions might do so. This is why we get into "threading" or concurrency issues, synchronization, locks, race conditions, etc. But it's also one reason things can be highly efficient.
It's called "multi-threading". If each request has its own thread, and the object contains mutable data, each client will have the opportunity to modify the state of the object as they see fit. If the person who wrote the object isn't mindful of thread safety you could end up with an object that's in an inconsistent state between requests.
This is a basic threading issue, you can look it up at http://en.wikipedia.org/wiki/Thread_(computer_science).
Instead of thinking in terms of code that is executed, try to think of memory context of a thread that is changed. It does not matter where and what the actual code happens to be, and if it is the same code or a duplicate or something else.
Basically, it can happen that the function is called while it was already called earlier. The two calls are independent and may even happen to run in parallel (on a multicore machine). The way to achieve this independence is by using different stacks and virtual address spaces for each thread.
There are ways to synchronize calls, so additional callers have to wait until the first call finishes. This is also explained in the above link.

Track all Objective-C's alloc/allocWithZone/dealloc

Sorry for long description, however the questions aren't so easy...
My project written without GC. Recently I found a memory leak that I can't find. I did use new Xcode Analyzer without a result. I did read my code line by line and verified all alloc/release/copy/autorelease/mutableCopy/retain and pools... - still nothing.
Preamble: Standard Instruments and Omni Leak Checker don't work for me by some reason (Omin Tool rejects my app, Instruments.app (Leaks) eats too many memory and CPU so I have no chance to use it).
So I wanna write and use my own code to hook & track "all" alloc/allocWithZone:/dealloc messages statistics to write some simple own leaks checking library (the main goal is only to mark objects' class names with possible leaks).
The main hooking technique that I use:
Method originalAllocWithZone = class_getClassMethod([NSObject class],#selector(allocWithZone:));
if (originalAllocWithZone)
{
imp_azo = (t_impAZOriginal)method_getImplementation(originalAllocWithZone);
if (imp_azo)
{
Method hookedAllocWithZone = class_getClassMethod([NSObject class],#selector(hookedAllocWithZone:));
if (hookedAllocWithZone)
{
method_setImplementation(originalAllocWithZone,method_getImplementation(hookedAllocWithZone));
fprintf(stderr,"Leaks Hook: allocWithZone: ; Installed\n");
}
}
}
code like this for hook the alloc method, and dealloc as NSObject category method.
I save IMP for previous methods implementation then register & calculate all alloc/allocWithZone: calls as increment (+1) stat-array NSInteger values, and dealloc calls as decrement (-1).
As end point I call previous implementation and return value.
In concept all works just fine.
If it needs, I can even detect when class are part of class cluster (like NSString, NSPathStore2; NSDate, __NSCFDate)... via some normalize-function (but it doesn't matter for the issues described bellow).
However this technique has some issues:
Not all classes can be caught, for
example, [NSDate date] doesn't catch
in alloc/allocWithZone: at all, however, I can see alloc call in GDB
Since I'm trying to use auto singleton detection technique (based on retainCount readind) to auto exclude some objects from final statistics, NSLocale creation freezes on pre-init stage when starting of full Cocoa application (actually, even simple Objective-C command line utility with the Foundation framework included has some additional initialization before main()) - by GDB there is allocWithZone: calls one after other,....
Full Concept-Project draft sources uploaded here: http://unclemif.com/external/DILeak.zip (3.5 Kb)
Run make from Terminal.app to compile it, run ./concept to show it in action.
The 1st Question: Why I can't catch all object allocations by hooking alloc & allocWithZone: methods?
The 2nd Question: Why hooked allocWithZone: freezes in CFGetRetainCount (or [inst retainCount]) for some classes...
Holy re-inventing the wheel, batman!
You are making this way harder than it needs to be. There is absolutely no need whatsoever to roll your own object tracking tools (though it is an interesting mental exercise).
Because you are using GC, the tools for tracking allocations and identifying leaks are all very mature.
Under GC, a leak will take one of two forms; either there will be a strong reference to the object that should long ago been destroyed or the object has been CFRetain'd without a balancing CFRelease.
The collector is quite adept at figuring out why any given object is remaining beyond its welcome.
Thus, you need to find some set of objects that are sticking around too long. Any object will do. Once you have the address of said object, you can use the Object Graph instrument in Instruments to figure out why it is sticking around; figure out what is still referring to it or where it was retained.
Or, from gdb, use info gc-roots 0xaddr to find all of the various things that are rooting the object. If you turn on malloc history (see the malloc man page), you can get the allocation histories of the objects that are holding the reference.
Oh, without GC, huh...
You are still left with a plethora of tools and no need to re-invent the wheel.
The leaks command line tool will often give you some good clues. Turn on MallocStackLoggingNoCompact to be able to use malloc_history (another command line tool).
Or use the ObjectAlloc instrument.
In any case, you need to identify an object or two that is being leaked. With that, you can figure out what is hanging on to it. In non-GC, that is entirely a case of figuring out why it there is a retain not balanced by a release.
Even without the Leaks instrument, Instruments can still help you.
Start with the Leaks template, then delete the Leaks instrument from it (since you say it uses too much memory). ObjectAlloc alone will tell you all of your objects' allocations and deallocations, and (with an option turned on, which it is by default in the Leaks template) all of their retentions and releases as well.
You can set the ObjectAlloc instrument to only show you objects that still exist; if you bring the application to the point where no objects (or no objects of a certain class) should exist, and such objects do still exist, then you have a leak. You can then drill down to find the cause of the leak.
This video may help.
Start from the Xcode templates. Don't try to roll your own main() routine for a cocoa app until you know what you're doing.

Does the .dispose() method do anything at all?

I was experimenting with ways to get rid of some memory leaks within my application the other day when I realized that I know virtually nothing about cleaning up my resources. I did some research, and hoped that just calling the .dispose() would solve all of my problems. We have a table in our database that contains about 65,000 records. Obviously when I fill my dataset from the data adapter, the memory usage can get pretty high. When I called the dispose method on the dataset, I was surprised to find out that NONE of the memory got released. Why did this happen? Clearing the dataset doesn't help either.
IDisposable and thus Dispose is not used to reduced memory pressure, although in some cases it might, but instead used for deterministic cleanup.
Consider this, you construct an object that maintains an active and open connection to your database server. This connection uses resources, both on your machine, and the server.
You could of course just leave the object be when you're done with it, and eventually it'll get picked up by the garbage collector, but suppose you want to make sure at least the resources gets freed, and thus the connection closed, when you're done with it. This is where IDisposable.Dispose comes into play.
It is used to clean up resources managed by the object.
It will, however, not free the managed memory allocated to the object. This is still left to the garbage collector, that will kick in at some later time to do that.
Do you actually have a memory problem, or do you just look at the memory usage in Task Manager or similar and go "that's a bit high."?
If the latter, then you should just leave it be for now. .NET will run garbage collection more often if you have less memory available, so unless you're in a situation where you get, or might suspect you will get soon, a memory overflow condition, you're probably not going to have any problems.
Let me explain what I mean by "run less often".
If you have 8GB of memory in your machine, and only have Windows and Notepad running, most of that memory will be available. When you now run your program, even if it loads minor data blocks into memory, you can keep doing that for a long time, and memory usage will steadily grow. Exactly when the GC will kick in and try to reduce your memory footprint I don't know, but I can almost guarantee you that you will wonder why it gets so high.
Let's just for the sake of the argument say that your program will eventually use 2GB of memory.
Now, if you run your program on a machine that has less memory available, GC will occur more often, and will kick in on a lower limit, which might keep the memory usage below 500MB or possibly even less.
The important part to note here is that in order for you to get an accurate picture of how much memory application actually requires, then you can't rely on Task Manager or similar ways to measure it, you need something more targetted.
Calling Dispose() will only release unmanaged resources, such as file handles, database connections, unmanaged memory, etc. It will not release garbage collected memory.
Garbage collected memory will only get released at the next collection. Usually when the application domain memory is deamed full.
I'm going to point out something here that hasn't been explicitly mentioned: calling Dispose() will only clean up (free) unmanaged resources if the developer of the component has coded it.
What I mean is this: if you suspect you have a memory leak, calling Dispose() is not going to fix it if the original developer has done a lousy job and not correctly freed up unmanaged resources. For a bit more info, check this blog post. Take note of the statement The behaviour of Dispose is defined by the developer.
Some objects will ask one or more other entities to do something on its behalf until further notice, to the detriment of other entities. If an object which did so were to disappear without informing the former entities that their services were no longer needed, those entities would continue to uselessly act on behalf of an object that no longer needed them, to the continuing detriment of other entities that would want to use them.
In many cases, for an object "George" to tell an outside entity "Joe" that its services were no longer needed, George would have to know that its services were no longer needed. There are two normal means via which that can happen in .NET, finalization and IDIsposable.
If an object overrides a method called Finalize, then when the object is created the .NET garbage collector will add it to a list of objects with registered finalizers. If the GC discovers that there exists no rooted reference to the object other than that list, the GC will remove the object from that list and add it to a strongly-rooted queue of objects which should have their Finalize method called as soon as possible. Such an object can then use its Finalize method to inform other entities that their services are no longer required.
Although finalization-based cleanup can sometimes work, there's no guarantee of timeliness. At one point during the design of .net Microsoft may have intended that finalization would be the primary cleanup method, but for a variety of reasons it cannot safely be relied upon.
The other cleanup approach, which should be the focus of one's efforts, is IDisposable. Basically, the idea behind IDisposable is simple: for every object that implements IDisposable, there should be one entity (generally either an object or a nested execution scope) which is responsible for ensuring that that object's IDisposable.Dispose method will get called sometime within the lifetime of the universe (which would imply sometime while a reference to the object still exists), and preferably as soon as code can tell that the object's services will no longer be required.
Note that IDisposable.Dispose generally promises that any outside entities which had been asked to do something on an object's behalf will be told that they no longer need to do so, but such a promise does not imply that the number of entities is non-zero. If an object hasn't asked any outside entities to do anything on its behalf, then delivering a message "all" such entities doesn't require doing anything at all. On the other hand, the fact that a Dispose method may do nothing in some cases doesn't mean that it's guaranteed never to do anything in any case, nor that failure to call it in those cases where it would do something won't have detrimental effects.